Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity

Align, Disambiguate, and WalkA Unified Approach for Measuring Semantic Similarity

Mohammad Taher PilehvarDavid JurgensRoberto Navigli

Semantic Similarity; how

similar are a pair of lexical items?

Semantic Similarity

Sentence Level Word Level Sense Level

Semantic Similarity

The worker was terminated

The boss fired him

Sentence level

Applications• Paraphrase recognition

(Tsatsaronis et al., 2010)

• MT evaluation (Kauchak and Barzilay, 2006)

• Question Answering (Surdeanu et al., 2011)

• Textual Entailment (Dagan et al., 2006)

Semantic Similarity


Semantic SimilarityWord level

fireplace

Applications• Lexical simplification (Biran et al., 2011)

Locuacious → Talkative

• Lexical substitution (McCarthy and Navigli, 2009)

heater

Semantic Similarity


Semantic Similarityfire sense #1 Sense level

fire sense #8

Applications• Coarsening sense inventories

(Snow et al., 2007)

• Semantic priming (Neely et al., 1989)

Exisiting Similarity Measures

Patwardan (2003)

Banerjee and Pederson (2003)

Hirst and St-Onge (1998)

Lin (1998)

Jiang and Conrath (1997)

Resnik (1995)

Sussna (1993, 1997)

Wu and Palmer (1994)

Leacock and Chodorow (1998)

Salton and McGill (1983)

Gabrilovich and Markovitch (2007)

Radinsky et al. (2011)

Ramage et al. (2009)

Yeh et al., (2009)

Turney (2007)

Landauer et al. (1998)

Allison and Dix (1986)Gusfield (1997)

Wise (1996)Keselj et al. (2003)

Sentence Word Sense

Exisiting Similarity Measures

Sense Word Sentence

Patwardan (2003)

Banerjee and Pederson (2003)

Hirst and St-Onge (1998)

Lin (1998)

Jiang and Conrath (1997)

Resnik (1995)

Sussna (1993, 1997)

Wu and Palmer (1994)

Leacock and Chodorow (1998)

Salton and McGill (1983)

Gabrilovich and Markovitch (2007)

Radinsky et al. (2011)

Ramage et al. (2009)

Yeh et al., (2009)

Turney (2007)

Landauer et al. (1998)

Allison and Dix (1986)Gusfield (1997)

Wise (1996)Keselj et al. (2003)

But

Different internal representationswhich are not comparable to each other

None directly covers all levels at the same time

Different output scales

Contribution

Sense Word sentence

State-of-the-art performance in each level Using only WordNet

A unified representationfor any lexical item

Advantage 1Unified representation

sense

word

sentencetext word

sense

All lexical itemsthis way

sense

word

Advantage 2Cross-level semantic similarity

A large and imposing house Mansion Residence#3

sentence

sentence

Advantage 3Sense-level operation

set of sensessense

word

sense

Outline

sense

word

text

Introduction Methodology Experiments

How Does it work?

lexical item 1 lexical item 2

semanticsignature 1

semanticsignature 2

Semantic Signature

sense word

Unified semantic signature

sentence

Semantic Signature

sense word


Sense or set of senses

Alignment-based disambiguation

sentence

A woman is frying food

Semantic Signature

Semantic SignatureDistributional representation

over all synsets in WordNet

Importance of this synset (syn_4) for our lexical item

syn_1syn_2

syn_3syn_4

syn_5syn_6

syn_7syn_8

syn_9syn_n

. . .

Semantic Signaturea woman is frying food

syn_1syn_2

syn_3syn_4

syn_5syn_6

syn_7syn_8

syn_9syn_n

cooking#1

table#3frying_pan#1

oil#4kitchen#3

carpet#2

sugar#2

natural_gas#2

physics#1

ship#1

Personalized PageRank

Personalized PageRank

vfry 2

nfood 1 nwoman 1

a woman is frying food{ , , }nwoman 1

nfood 1vfry 2

vfry2

nfood 1

ncooking 1

vcook 3

nfat 1

nfrench_fries 1

ndish 2nnutriment 1

nfood 2

nbeverage1

These weights form a semantic signature

Comparing Semantic Signatures

Comparing Semantic Signatures

• Parametric– Cosine

• Non-parametric– Weighted Overlap– Top-k Jaccard

Comparing Semantic SignaturesWeighted Overlap

syn_1 syn_2 syn_3 syn_4 syn_5 syn_6


Comparing Semantic SignaturesWeighted Overlap



Comparing Semantic SignaturesTop- Jaccard

syn_1 syn_2 syn_3 syn_4 syn_5 syn_6 syn_7 syn_8 syn_9 syn_10

syn_1 syn_2 syn_3 syn_4 syn_5 syn_6 syn_7 syn_8 syn_9 syn_10

𝑘=4


sense word




sentence


sense word




sentencesentence

Why is disambiguation needed?

The worker was fired

He was terminated

A manager fired the worker.An employee was terminated from work by his boss.


managern

employeen

firev workern terminatev bossnworkn


manager terminate work boss

bosswork

. . .

terminate

terminate

terminate

employee

work

. . .

nmanager 2

n1

n1

v1

v2

v3

v4

n2

n1

n3

n1

n2

fire worker

fire

fire

worker

. . .

v1

v2

v3

n1

n2

managern

employeen


. . .

n

Sentence 1 Sentence 2


manager

terminate work boss

bosswork

. . .

terminate

terminate

terminate

employee

work

. . .

nmanager 2

n1

n1

v1

v2

v3

v4

n2

n1

n3

n1

n2

n


0.50.3

Tversky (1977)Markman and Gentner (1993)

terminate work boss

bosswork

. . .

terminate

terminate

terminate

employee

work

. . .

n1

v1

v2

v3

v4

n2

n1

n3

n1

n2


0.3

nmanager 2n

managern1

0.50.3

manager terminate work boss

bosswork

. . .

terminate

terminate

terminate

employee

work

. . .

nmanager 2

n1

n1

v1

v2

v3

v4

n2

n1

n3

n1

n2

fire worker

fire

fire

worker

. . .

v1

v2

v3

n1

n2

managern

employeen


. . .

n

Sentence 1 Sentence 2

fire v4


Outline

sense

word

text

Introduction Methodology Experiments

Experiments• Sentence level– Semantic Textual Similarity (SemEval-2012)


• Word level– Synonymy recognition (TOEFL dataset)– Correlation-based (RG-65 dataset)



• Sense level– Coarsening WordNet sense inventory

Experiment 1Similarity at Sentence level

• Semantic Textual Similarity (STS-12)– 5 datasets – Three evaluation measures

• ALL, ALLnrm, and Mean

• Top-ranking systems– UKP2 (Bär et al., 2012)– TLSim and TLSyn (Šarić et al., 2012)


Features– Main features

• Cosine• Weighted Overlap• Top-k Jaccard

– String-based features• Longest common substring• Longest common subsequence• Greedy string tiling• Character/word n-grams

STS Results


ALL

ALLnrm

Mean

0.6 0.65 0.7 0.75 0.8 0.85 0.9

ADW

ADW

ADW

UKP2

UKP2

UKP2

TLsyn

TLsyn

TLsyn

TLsim

TLsim

TLsim

+0.034

+0.007

+0.043

Experiments• Sentence level– Semantic Textual Similarity (Semeval-12)



Experiment 2Similarity at Word Level

Synonymy recognition Correlating word similarity judgments

Similarity at Word LevelSynonym Recognition

• TOEFL dataset (Landauer and Dumais, 1997)

– 80 multiple choice questions– Human test takers: 64.5% only

Similarity at Word LevelSynonym Recognition

Accuracy on TOEFL dataset

PPMIC (Bullinaria and Levy, 2007)

GLSA (Matveeva et al., 2005)

LSA (Rapp, 2003)

ADW_Jaccard

ADW_WO

ADW_Cosine

PR (Turney et al., 2003)

PCCP (Bullinaria and Levy, 2012)

75 80 85 90 95 100

Similarity at Word LevelJudgment Correlation

• Dataset: RG-65 (Rubenstein and Goodenough, 1965)

– 65 word pairs • judged by 51 human subjects

– Scale of 0 → 4

Similarity at Word LevelJudgment Correlation

Spearman correlation, RG-65 dataset

ADW_Cosine

Agirre et al. (2009)

Hughes and Ramage (2007)

Zesch et al. (2008)

ADW_Jaccard

ADW_WO

0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9

+0.03

Experiments• Text level– Semantic Textual Similarity (Semeval-12)



Experiment 3Similarity at Sense Level

• Coarse-graining WordNet

Snow et al. (2007)Navigli (2006)


• Binary classification: Merged or not-merged

bass#1

bass#3

bass#2

bass#4

Senseval-2 English Lexical Sample WSD

task(Kilgarriff, 2001)

OntoNotes (Hovy et al., 2006)

about 3500 sense pairs (Noun)about 5000 sense pairs (Verb)

about 16000 sense pairs (Noun)about 31000 sense pairs (Verb)


• Binary classification: Merge or not-merged

Merged

Not-merged

Tuned using 10% of the dataset

if similarity ≥ 𝑡

h𝑜𝑡 𝑒𝑟𝑤𝑖𝑠𝑒


F-score on OntoNotes dataset

ADW_Cosine

ADW_WO

ADW_Jaccard

Snow et al. (2007)

Navigli (2006)

0 0.2 0.4 0.6

0.41

0.42

0.42

0.37

0.22

Noun

0 0.2 0.4 0.6

0.52

0.54

0.53

0.45

0.37

Verb


F-score on Senseval-2 dataset

ADW_Cosine

ADW_WO

ADW_Jaccard

Snow et al. (2007)

Navigli (2006)

0 0.2 0.4 0.6

0.44

0.47

0.47

0.42

0.37

Noun

0 0.1 0.2 0.3 0.4 0.5 0.6

0.49

0.5

0.49

0.43

0.29

Verb

Conclusions• A unified approach for computing semantic

similarity for any pair of lexical items

• Experiments with SOA performance– Sense level (sense coarsening)– Word level (synonymy recognition and judgment)– Sentence level (Semantic Textual Similarity)

Future Direction• Larger sense inventories (e.g., BabelNet)

• Cross-level semantic similarity

• Create datasets for cross-level similarity– Future Semeval task?

nlistening 1

nthank_you 1

ngratitude 1

nthanks 1

nexpression 3

vheed 1nsensing 2

vlisten 2

nauscultation 1near 2

nhearing 6

jaudio-lingual 1

vlisten 1

Thank you for listening!

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity

Documents

Transcript of Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity