UWN: A Large Multilingual Lexical Knowledge Base
-
Upload
gerard-de-melo -
Category
Data & Analytics
-
view
151 -
download
1
description
Transcript of UWN: A Large Multilingual Lexical Knowledge Base
Step 1: Link PredictionStep 1: Link Prediction
UWN's Multilingual GraphUWN's Multilingual Graph
• Goal: Richer, Less Sparse Features• How: Model Synonymy, Polysemy, Semantic Relatedness, Taxonomy. (within and across languages)
UWN: A Large MultilingualLexical Knowledge Base
Gerard de Melo and Gerhard WeikumICSI Berkeley / Max Planck Institute for Informatics
Better NLP Features using Lexical SemanticsBetter NLP Features using Lexical Semantics
More Information:www.lexvo.org/gdm/
• Downloadable API available
• Web User Interface
EntityEntitypor: “entidade”por: “entidade”
cmn: “制度”cmn: “制度” InstitutionInstitution
Educationalinstitution
Educationalinstitution
UniversityUniversity
heb: “ישות.”heb: “ישות.”
deu: “Bildungs-einrichtung”
deu: “Bildungs-einrichtung”
srp:“универзитете”
srp:“универзитете”
...
University of California, Berkeley
University of California, Berkeley
eng: “Berkeley ”eng: “Berkeley ”
ara: ”وجود، كينونة“
ara: ”وجود، كينونة“
tha: “ สถาบัน”tha: “ สถาบัน”
fin: “oppilaitos”fin: “oppilaitos”
fin: “yliopisto”fin: “yliopisto”
cmn: “柏克萊加州大學”
cmn: “柏克萊加州大學”
Berkeley, CABerkeley, CA
George BerkeleyGeorge Berkeley
deu: “Schulgebäude”deu: “Schulgebäude”
school (group of fish)
school (group of fish)
school(institution)
school(institution)
school(building)school
(building)
deu: “Schulhaus”deu: “Schulhaus”
deu: “Fischschwarm”deu: “Fischschwarm”
ces: “hejno”ces: “hejno”
fra: “banc”fra: “banc”
chv: “шкул”chv: “шкул”
jpn: “学校”jpn: “学校”
kor: “학교”kor: “학교”
lao: “ໂຮງຮຽນ”lao: “ໂຮງຮຽນ”
kat: “სკოლა”kat: “სკოლა”
• Over 16 million words and names in over 200 languages semantically connected
• Ambiguity and synonymy captured
eng: “UC Berkeley”eng: “UC Berkeley” eng: “Cal”eng: “Cal”
CityCity
GeopoliticalEntity
GeopoliticalEntity
ChuvashChuvash
GeorgianGeorgian
Lexvo.org LanguageDescriptions:LanguagesScriptsCharactersCountries
Cyrllic(Script) Cyrllic(Script)
Russia (Country)Russia
(Country)
UWN: Meaning Distinctions
OntologicalTaxonomy
Encyclopedic Knowledge,
Pictures, Video,
Sounds, Maps
Etymological and other word
relationships
Millions of Named Entities(People, Places,
Proteins, Asteroids,
Companies, etc.)
200+ languages
Step 2: Entity IntegrationStep 2: Entity Integration
Step 3: Taxonomy InductionStep 3: Taxonomy Induction ExtrasExtras
• Markov Chain to rank taxonomic parents• 270 Wikipedia taxonomies integrated with WordNet's hypernym hierarchy
es: Televisores: Televisor
es: Televisiónes: Televisión
ru: Телевизорru: Телевизор
hi: दूरदर्शनhi: दूरदर्शन
ja: テレビja: テレビ
en: Televisionen: Television
en:Television
set
en:Television
set
zh: 电视机zh: 电视机
ja: テレビ受像機ja: テレビ受像機
en: TV seten: TV set
en: T.V.en: T.V.
V1 ,u
V1 ,u
V1 ,v
V1 ,v
• LP for constraint-based computation of equivalence classes of entities• Region Growing approximation algorithm
• Link multilingual words to WordNet• Connect Wikipedia with WordNet (equivalence and
taxonomic links)
• FrameNet Linking• Common-Sense Knowledge Extraction
• Multilingual Roget's Thesaurus