Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network

Transcript of Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 1: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Oana Adriana Şoica

Building and Ordering a SenDiS Lexicon Network

Page 2: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 2


SenDiS operates on a specific lexicon network (LexNet)

– “sense tagged glosses” relations

lexicon networks obtained from other semantic / lexical relations

obtaining a SenDiS LexNet: build a “sense tagged glosses” LexNet

(manually annotate the lexicon with a specific tool) import a “sense tagged glosses” LexNet

(WordNet tagged glosses, as of 2008)

preprocessing (ordering) the SenDiS LexNet (before WSD) truncation of the LexNet leveling the LexNet


Page 3: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 3


o hypernyms

o hyponyms

o similar to

o has part

o synonyms

o antonyms

o holonyms

o meronyms

o coordinate terms

o troponyms

o entailment

Semantic/Lexical Relations

Page 4: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 4


An excerpt of the WordNet semantic network* Navigli, R. 2009.Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2, Article 10 (2009)

Semantic/Lexical relations: WordNet

Page 5: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 5

SenDiSSemantic/Lexical relations: GRAALAN

Tail of relation Head of relation Relation type

{synonym } {synonym} Bidirectional, symmetric

{antonym } {antonym} Bidirectional, symmetric

{paronym} {paronym} Bidirectional, symmetric

{ hypernym } {hyponym} Bidirectional, asymmetric

{connotation} - Unidirectional

{holonym} {meronym} Bidirectional, asymmetric

{homonym} {homonym} Bidirectional, symmetric

{heteronym} {heteronym} Bidirectional, symmetric

{homophone} {homophone} Bidirectional, symmetric

{diminutive of} {diminutive by} Bidirectional, asymmetric

{augmentative of} {augmentative by} Bidirectional, asymmetric

{extension from} {extension into} Bidirectional, asymmetric

{reduction from} {reduction into} Bidirectional, asymmetric

{generalization from} {generalization into} Bidirectional, asymmetric

{specialization from} {specialization into} Bidirectional, asymmetric

{figurative of} {literal for} Bidirectional, asymmetric

{reference to} - Unidirectional

{derived from} {derived into} Bidirectional, asymmetric

{back formatted form} {back formats} Bidirectional, asymmetric

{abstract for} {concretized from} Bidirectional, asymmetric

{with variant} {variant for} Bidirectional, asymmetric

Page 6: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 6


manually annotating the glosses from a lexicon(using a specific tool that can ease the process)

importing an existing “gloss tagged” lexicon net (also obtained manually or semi-automatically), this usually translates in a dependency to a specific list of meanings/glosses

Obtaining a SenDiS LexNet

Page 7: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 7


o implied a significant effort, usually measured in months, involving several trained linguists

o using a specialized collaborative tool(BuildLNTool – Build Lexicon Network Tool)

o enriching the “gloss tagged” relation with three relative degrees of importance (in the gloss context) weak medium strong or ignoring the gloss word

o SenDiS objective, two LexNets: “gloss tagged” LexNet for the Romanian language “gloss tagged” LexNet for the English language

Creating the SenDiS LexNet

Page 8: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 8


o BuildLNTool (Build Lexicon Network Tool) provides:

a visual and effective mechanism to manually annotate the lexicon glosses

a synchronized overview of the already created relations

a browsing mechanism for inspecting the already tagged glosses and relations


Page 9: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 9


“Lemmas & MWEs” “Lemma \ MWE Info” “Competence & Definition Trees”

“Root & Leaf Meanings” Messages and progress

BuildLNTool - Sections

Page 10: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 10


o “Lemmas & MWEs”: list of lexicon entries

o “Root & Leaf Meanings”: list of roots and leafs for the lexicon network

o “Lemma/MWE Info”: current lexicon entry being analyzed

o “Competence & Definition Trees”: spanning trees for a given meaning over the current lexicon net

o section for messages and progress

BuildLNTool – Sections II

Page 11: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 11


selection of lexicon entry type

selection of unfinished lexicon entries filter

selection of viewing interval

text filter

lexicon entry text

lexicon entry status

BuildLNTool – Lemmas & MWEs

Page 12: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 12


double click

BuildLNTool – Selection of a current lexicon entry

Page 13: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 13


lexicon entry text morphologic interpretation

list of meanings filters

meaning/gloss fully tagged

meaning/gloss partially tagged

meaning/gloss not tagged

BuildLNTool – Browsing the meanings of the current lexicon entry

Page 14: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 14


double click

BuildLNTool – Selection of a current meaning for tagging

Page 15: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 15


unrecognizedgloss constituent


BuildLNTool – Gloss constituent without interpretations

Page 16: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 16


Default setting: Medium

BuildLNTool – Degrees of relevance (in gloss context)

Page 17: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 17


‘Strong’ tokens

‘Medium’ tokens

‘Weak’ tokens

Ignored (X) tokens

BuildLNTool – Degrees of relevance II

Page 18: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 18




BuildLNTool – Gloss tagging

Page 19: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 19


view of meaning tagging tree

selection of constituent / group of gloss constituents

set / modifyrelevance degree

edit textof gloss constituent

select / modify the sense for the gloss constituent

further annotate meaning / save annotations

chose the next meaning

further on

save annotations

current gloss constituent

withoutsense interpretations

BuildLNTool – Gloss tagging protocol

Page 20: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 20


LexNets AllTokens OperatedTokens OpTokensValid OpTokensRelated OpTokens V & R

LL_Romanian - 99% 1,528,819 1,191,942 691,010 720,420 686,210

LL_English - 2% 36,828 30,350 18,523 17,641 17,505

LexNets Glosses Tagged Glosses Targeted Glosses Tags Density

LL_Romanian - 99% 130,087 118,536 58,976 0.5757

LL_English - 2% 259,651 3,496 7,551 0.5767

Built LexNets for Romanian and English

Page 21: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 21


o WordNet (3.0) is organized in synsets 117,659 synsets 155,287 words (lexicon entries) 206,941 word-sense pairs (gloss + usage examples)

o the synsets were split and transformed in to a classical lexicon format

o the lexicon network imported:

LexNets Glosses Tagged Glosses Targeted Glosses Tags Density

WordNet 206,941 206,938 59,251 0.3486

WordNet_extendedGlosses 206,941 206,941 83,174 0.3006

LexNets AllTokens OperatedTokens OpTokensValid OpTokensRelated OpTokens V & R

WordNet 2,394,190 2,394,190 2,394,189 834,803 834,803

WordNet_extendedGlosses 3,114,968 3,114,968 3,114,967 936,397 936,397

Imported WordNet tagged glosses

Page 22: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 22


o “gloss tagged” lexicon nets are large and dense graphs between 100,000 and 200.000 vertices over 1,000,000 edges / arcs

o to ease the operation with such graphs, “gloss tagged” lexicon nets can be preprocessed and optimized truncation of a lexicon net leveling of a lexicon net

o aims when optimizing a lexicon net elimination of loops or strong connected components a minimum number of removed edges leveling on a minimum number of levels minimization/maximization of roots/leafs vertices

Ordering a SenDiS LexNet

Page 23: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 23



e4 e5 e6 e7


e1 e2 e3

A minimal lexicon net in the original form

Unordered LexNet

Page 24: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 24


























The same minimal lexicon net leveled

Ordered (leveled) LexNet

Page 25: Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Page 25


LNs Vertices Edges InOLN

Algorithm Edges Out Edges Removed Levels Time (s)

wn 202,361 834,803 Patentv1 821,048 13,755 192 4.5

wn_ex 205,188 936,397 Patentv1 936,397 74,526 382 5.7

ro_48% 72,067 318,741 Patentv1 308,592 10,149 195 1.6

ro_78% 100,175 523,192 Patentv1 504,210 18,982 244 2.3

ro_99% 120,472 686,784 Patentv1 659,030 27,754 291 2.8

ro_48% 130,407 318,741 NT_eades 308,334 10,407 58 60

ro_99% 130,099 686,784 NT_eades 654,025 32,759 70 330

wn_ex 206,941 936,397 NT_eades 904,992 31,405 46 1,315

Results on leveling experimental LexNets