Intelligent Database Systems Lab
Presenter : YAN-SHOU SIE
Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha,
Abdelmajid Ben Hamadou
2013. KBS
Computing semantic relatedness using Wikipedia features
Intelligent Database Systems Lab
Outlines
MotivationObjectivesMethodologyExperimentsConclusionsComments
Intelligent Database Systems Lab
Motivation
• Measuring semantic relatedness is a critical task in many domains such as psychology, biology, linguis-
tics, cognitive science and artificial intelligence.
Intelligent Database Systems Lab
Objectives
• We propose a novel system for computing semantic relatedness between words. Recent approaches have exploited Wikipedia as a huge semantic resource that showed good performances.
Intelligent Database Systems Lab
Methodology• Our semantic relatedness computing system– Filtering Wikipedia category graph– pre-processing• Filtering article content• Porter stemming• Weighting article stems• Providing a Category Semantic Depiction (CSD)
Intelligent Database Systems Lab
• Different steps performed to generate the Category Semantic DepictionFiltering Wikipedia category graph
Methodology
Intelligent Database Systems Lab
Methodology• Filtering Wikipedia category graph– First : clean meta-categories
» We remove all those nodes whose labels contain any of the following strings : Wikipedia, wikiproject, lists, mediawiki,template, user, portal, categories, articles, pages, stub and album
– Second : remove orphan nodes and we keep only the category Contents as root» maximum depth 291 to 221
Intelligent Database Systems Lab
• pre-processing– Filtering article content
» Remove html tags,infobox, language translation, hyperlinks. . .
– Porter stemming» filtered a stop list to eliminate words which do not have any
contribution.
– Weighting article stems
– Providing a Category Semantic Depiction (CSD)
Methodology
Intelligent Database Systems Lab
• Semantic relatedness computing system architecture– Extraction categories algorithm• WordNet:• resolve the disambiguation pages problem:
– Setp1 : extracting all outLinks– Setp2 : find links containing disambiguation tag in parenthesis– Setp3 : extract categories to the two first links – Final : take the categories of the article assigned to the first link existing in the ordered set
Methodology-
Intelligent Database Systems Lab
Methodology• Semantic relatedness computing system
architecture– Semantic relatedness computing
Intelligent Database Systems Lab
Methodology• Evaluating semantic relatedness measuresComparison with human judgments
Pearson product-moment correlation coefficient
Spearman rank order correlation coefficient
Datasets
Intelligent Database Systems Lab
Experiments• Our semantic relatedness computing system modules using
Wikipedia features– Basic system– First module– Second module– Third module– Forth module
Intelligent Database Systems Lab
Experiments• Basic system
Intelligent Database Systems Lab
Experiments• First module: simple patterns
Intelligent Database Systems Lab
Experiments• Second module: Wikipedia pages
Intelligent Database Systems Lab
Experiments• Third module: enrichment using categories
neighbors in WCG
Intelligent Database Systems Lab
Experiments• Forth module: Categories enrichment using WCG
and redirects
Intelligent Database Systems Lab
Experiments• Application of the SR measure on other datasets– Datasets RG-65 and MC-30– The verbal dataset YP-130
• Solving word choice problems
Intelligent Database Systems Lab
Conclusions• Our result system shows a good performance and outperforms sometimes
ESA (Explicit Semantic Analysis) and TSA (Temporal Semantic Analysis) approaches
Intelligent Database Systems Lab
Comments• Advantages
Able to use wiki to get a lot of semantic relationship information, semantic relations for many measurements related work of great help.
• Applications– cognitive science– artificial intelligence
Top Related