Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed...

20
Intelligent Database Systems Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing semantic relatedness using Wikipedia features

Transcript of Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed...

Page 1: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Presenter : YAN-SHOU SIE

Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha,

Abdelmajid Ben Hamadou

2013. KBS

Computing semantic relatedness using Wikipedia features

Page 2: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Motivation

• Measuring semantic relatedness is a critical task in many domains such as psychology, biology, linguis-

tics, cognitive science and artificial intelligence.

Page 4: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Objectives

• We propose a novel system for computing semantic relatedness between words. Recent approaches have exploited Wikipedia as a huge semantic resource that showed good performances.

Page 5: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Methodology• Our semantic relatedness computing system– Filtering Wikipedia category graph– pre-processing• Filtering article content• Porter stemming• Weighting article stems• Providing a Category Semantic Depiction (CSD)

Page 6: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

• Different steps performed to generate the Category Semantic DepictionFiltering Wikipedia category graph

Methodology

Page 7: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Methodology• Filtering Wikipedia category graph– First : clean meta-categories

» We remove all those nodes whose labels contain any of the following strings : Wikipedia, wikiproject, lists, mediawiki,template, user, portal, categories, articles, pages, stub and album

– Second : remove orphan nodes and we keep only the category Contents as root» maximum depth 291 to 221

Page 8: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

• pre-processing– Filtering article content

» Remove html tags,infobox, language translation, hyperlinks. . .

– Porter stemming» filtered a stop list to eliminate words which do not have any

contribution.

– Weighting article stems

– Providing a Category Semantic Depiction (CSD)

Methodology

Page 9: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

• Semantic relatedness computing system architecture– Extraction categories algorithm• WordNet:• resolve the disambiguation pages problem:

– Setp1 : extracting all outLinks– Setp2 : find links containing disambiguation tag in parenthesis– Setp3 : extract categories to the two first links – Final : take the categories of the article assigned to the first link existing in the ordered set

Methodology-

Page 10: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Methodology• Semantic relatedness computing system

architecture– Semantic relatedness computing

Page 11: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Methodology• Evaluating semantic relatedness measuresComparison with human judgments

Pearson product-moment correlation coefficient

Spearman rank order correlation coefficient

Datasets

Page 12: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Experiments• Our semantic relatedness computing system modules using

Wikipedia features– Basic system– First module– Second module– Third module– Forth module

Page 13: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Experiments• Basic system

Page 14: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Experiments• First module: simple patterns

Page 15: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Experiments• Second module: Wikipedia pages

Page 16: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Experiments• Third module: enrichment using categories

neighbors in WCG

Page 17: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Experiments• Forth module: Categories enrichment using WCG

and redirects

Page 18: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Experiments• Application of the SR measure on other datasets– Datasets RG-65 and MC-30– The verbal dataset YP-130

• Solving word choice problems

Page 19: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Conclusions• Our result system shows a good performance and outperforms sometimes

ESA (Explicit Semantic Analysis) and TSA (Temporal Semantic Analysis) approaches

Page 20: Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing.

Intelligent Database Systems Lab

Comments• Advantages

Able to use wiki to get a lot of semantic relationship information, semantic relations for many measurements related work of great help.

• Applications– cognitive science– artificial intelligence