BabelNet: The automatic construction, evaluation and...

Roberto Navigli, Simone Paolo Ponzetto

BabelNet: The automatic construction, evaluation and application of a wide-coverage

multilingual semantic network

What is BabelNet �  ‘a very large, wide-coverage multilingual semantic network’ �  Built from WordNet and Wikipedia �  3 million concepts �  Mapping Accuracy of 82% �  Covers 52% of noun senses in WordNet �  Lemmas in 6 languages

�  72.3% of synsets contain all 6

�  WordNet �  Synset(s) for each word �  Labeled, Hierarchical relations Relations � Glosses

{play1n, drama1N, dramatic play1N} play1N is a dramatic composition1N Shakespeare1N instance-of dramatist1N Stage direction1N part-of play1N

�  WordNet �  Synset(s) for each word �  Labeled, Hierarchical relations Relations � Glosses

{play8N, child’s play2n}

�  Wikipedia �  Page title

�  Optional label

�  Structured Relations �  Redirect pages �  disambiguation pages �  internal links �  inter-language links �  categories

PLAY PLAY(THEATRE)

Building BabelNet 1.  Map Wikipedia to WordNet 2.  Harvest multilingual lexicalizations from Wikipedia 3.  Identify relations between synsets using WordNet and

Wikipedia articles from all relevant languages

Mapping Wikipedia to WordNet 1.  All available word senses and labeled relationships between them are

pulled from WordNet 2.  All encyclopedic entries and unspecified relationships between them

are pulled from English Wikipedia 3.  Combine them, and merge any intersecting senses

1.  If a sense exists in Wikipedia, but not in WordNet, the sense gets a null mapping

2.  If there is only one sense in Wikipedia and one sense in WordNet, create a one-to-one mapping

3.  For all remaining Wikipedia senses, map using the mapping algorithm:

argmaxs∈SensesWN (w )

p(sw) = argmaxs

p(s,w)

p(s,w) =score(s,w)

score(s',w')s'∈SensesWN (w ),w'∈SensesWiki (w )

Mapping Wikipedia to WordNet

p(s,w) =score(s,w)

score(s',w')s'∈SensesWN (w ),w'∈SensesWiki (w )

score(s,w) = Ctx(s)∩Ctx(w) +1

score(s,w) = e−(length( p )−1)p∈pathsWN (s,s' )∑

s'∈SensesWN (cw )∑

cw∈Ctx(w )∑

Bag-of-words Graph-based

•  Simple and fast •  Exploits structural information available in WordNet and Wikipedia • Provides mappings even when the intersection between WordNet and Wikipedia disambiguation contexts is empty.

Disambiguation Contexts

WordNet Wikipedia

�  Synonymy �  Hypernymy/hyponymy �  Gloss (lemmatized, non-

stop words)

�  Sense labels �  Links �  Redirections �  Categories

Translating Synsets Synsets included 1.  The word 2.  The WordNet synset 3.  Wikipedia redirections to the word 4.  Other language Wikipedia articles

1.  Inter-language links 2.  Redirections to the inter-language links in the target language

5.  Machine Translation using Google Translate 1.  WordNet contexts using SemCor contexts 2.  Wikipedia contexts from inter-wiki links to the page (BabelCor)

Translating Synsets �  Machine translation was not executed on named entities

from Wikipedia � These were identified by titles containing at least 2 words with

initial uppercase letters �  94% validation on a sample of 100 pages

�  Contextless translation was performed on monosemous words from WordNet.

Finding Semantic Relations �  Wikipedia:

� Harvesting relations from hyperlinks within all Wikipedias in the relevant languages �  Ex. Play (theatre) links to Acting through the German Wikipedia, but not the

English Wikipedia

�  Evaluate the frequency of co-occurrence of word1 and word2 (fw.w’) in a context.

2 × fw.w'fw + f w'

Finding Semantic Relations �  WordNet:

� Gather lemmatized, non-stop word bags of words from: �  Synonyms, gloss words, and directly linked synsets

� Weighting relations using the Dice coefficient to determine their strength

2 × S∩ S'S + S'

Evaluation: Mapping �  Gold standard of 1,000 sample wikipages

�  505 non-empty mappings

�  Experienced annotator provided the correct WordNet sense for each page �  2nd annotator sense tagged subset of 200 pages to evaluate

annotation accuracy �  Inter annotator agreement of 0.9

Evaluation: Mapping �  Bag-of-words:

�  F1 = 65.1 (just gloss) �  F1= 75.0 (using taxonomy and gloss)

�  Graph-based: �  F1 = 77.7 (depth 2, using taxonomy and gloss)

�  Baseline: �  F1 = 33.5 (most frequent sense) �  F1 = 31.9 (random selection)

Evaluation: Translation �  Automatic

� Compared coverage of BabelNet with gold standard corpora � Catalan & Spanish: Multilingual Central Repository �  French: Wordnet Libre du Français � German: GermaNet subset of EuroWordNet for German �  Italian: MultiWordNet

Evaluation: Translation

SynsetCov(B,F) =δ(SB ,SF )SF F

∑SF ∈ F{ }

WordCov(B,F) =δ'(SB ,SF )sF ∈SF

∑SF ∈F

∑sF ∈ SF : SF ∈ F{ }

WordExtraCov(B,F) =δ'(SB ,Sε )sε ∈Sε

∑Sε ∈ε \F

∑sF ∈ SF : SF ∈ F{ }€

WordExtraCov(B,F) =δ(SB ,Sε )Sε ∈ε \F

∑SF ∈ F{ }

Evaluation: Automatic Translation

Evaluation: Manual Translation �  Manual Evaluation

�  Evaluates the novel synsets � Random set of 600 Babel synsets � Manually evaluated by expert annotators � Wordnet only synsets: 72% � Wikipedia only synsets: 95% � WordNet and Wikipedia synsets: 80%

How this relates to our project �  It really doesn’t anymore �  We’re now working on topic identification and classification

of journal articles, using attributes derived from the article context.

�  BabelNet can be used for topic identification � … so they’re kind of related

BabelNet: The automatic construction, evaluation and...

Documents

Transcript of BabelNet: The automatic construction, evaluation and...

Aktionsarten, aspectual class States, Activities ...verbs.colorado.edu/~mpalmer/Ling7800/Dowty_Effects.pdfIn accord with the observations I have cited above about the role of ... of

Tiziano Flati and Roberto Navigli Three birds (in the LLOD cloud) with one stone: BabelNet, Babelfy and the Wikipedia Bitaxonomy.

BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

Word Senses, WordNet and the Ontologies Groupingsverbs.colorado.edu/~mpalmer/Ling7800/WordSenses.Sep4.pdf · WordNet – ITA 73% fine grained distinctions, 64% Tagging w/groups, ITA

ELEXIS project overview · • Now it can be linked with other dictionaries via shared concepts • LINKING DIRECTLY OR VIA BABELNET • Shared concepts end up in MATRIX dictionary

Language-Independent Concept Representations from a ...If E is a sense embedding space we map the ... Using SEW to build vector representations for BabelNet senses and Wikipedia pages:

Improving Machine Translation through Linked Data · 2.3. Linked Data Resources For the MT improvement, we are going to use three linked data resources: DB-pedia, BabelNet, and JRC-names.

Constituent structure and structural ambiguity - Verbs …verbs.colorado.edu/mpalmer/ling5200/syntax1.pdf•Intuitions and tests for constituent structure •Representing constituent

BabelNet, the LLOD cloud and the · PDF filelexinfo:translation , ... BabelNet, the LLOD cloud and the Industry 03/03/2016 39 Roberto Navigli

Bilingual Dictionary Drafting: Bootstrapping WordNet and BabelNet · 2017-09-25 · fritz.kliche@uni-hildesheim.de Motivation 400+ languages with 1 million L1-speakers or more Availability

BabelNet Workshop 2016 - Making sense of building data and building product data

BabelNet and Friends: A manifesto for multilingual ...navigli/pubs/IA_2013_Navigli.pdf · most popular collaborative and multilingual resource of world and linguistic knowledge. Resources

Thematic Roles in Linguistics - Verbs Indexverbs.colorado.edu/~mpalmer/Ling7800/ThematicRoles... · Thematic Roles in Linguistics Martha Palmer University of Colorado LING 7800/CSCI

BabelNet: The automatic construction, evaluation and ...navigli/pubs/AIJ_2012_Navigli_Ponzetto.pdf · BabelNet aims at providing an “encyclopedic dictionary” by merging WordNet

What is Machine Learning? An Introduction to ... - Verbs …verbs.colorado.edu/~mpalmer/Ling7800/ML_presentation.pdf · –The goal in machine learning is to select the model that

Bilingual Dictionary Drafting: Bootstrapping …...Bilingual Dictionary Drafting: Bootstrapping WordNet and BabelNet David Lindemann1,2, Fritz Kliche2 1 The Bilingual Mind, UPV/EHU

Statistical Dependency Parsing Korean - Verbs Indexverbs.colorado.edu/~mpalmer/Ling7800/LeePaper.pdfStatistical Dependency Parsing Korean LINGUISTICS ... High LS implies that models

YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’

Generative Lexicon Theory: Theoretical and Empirical ...verbs.colorado.edu/~mpalmer/Ling7800/GL.pdfGenerative Lexicon Theory: Theoretical and Empirical Foundations James Pustejovsky

Outline From Tuesday Shallow Semantics: Semantic Role ...verbs.colorado.edu/~mpalmer/Ling7800/LSA-SemanticRoleLabeling.pdf · 1 Shallow Semantics: Semantic Role Labelling, and VerbNet