UWN: A Large Multilingual Lexical Knowledge Base

Post on 19-Jun-2015

151 views 1 download

Tags:

description

We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.

Transcript of UWN: A Large Multilingual Lexical Knowledge Base

Step 1: Link PredictionStep 1: Link Prediction

UWN's Multilingual GraphUWN's Multilingual Graph

• Goal: Richer, Less Sparse Features• How: Model Synonymy, Polysemy, Semantic Relatedness, Taxonomy. (within and across languages)

UWN: A Large MultilingualLexical Knowledge Base

Gerard de Melo and Gerhard WeikumICSI Berkeley / Max Planck Institute for Informatics

Better NLP Features using Lexical SemanticsBetter NLP Features using Lexical Semantics

More Information:www.lexvo.org/gdm/

• Downloadable API available

• Web User Interface

EntityEntitypor: “entidade”por: “entidade”

cmn: “制度”cmn: “制度” InstitutionInstitution

Educationalinstitution

Educationalinstitution

UniversityUniversity

heb: “ישות.”heb: “ישות.”

deu: “Bildungs-einrichtung”

deu: “Bildungs-einrichtung”

srp:“универзитете”

srp:“универзитете”

...

University of California, Berkeley

University of California, Berkeley

eng: “Berkeley ”eng: “Berkeley ”

ara: ”وجود، كينونة“

ara: ”وجود، كينونة“

tha: “ สถาบัน”tha: “ สถาบัน”

fin: “oppilaitos”fin: “oppilaitos”

fin: “yliopisto”fin: “yliopisto”

cmn: “柏克萊加州大學”

cmn: “柏克萊加州大學”

Berkeley, CABerkeley, CA

George BerkeleyGeorge Berkeley

deu: “Schulgebäude”deu: “Schulgebäude”

school (group of fish)

school (group of fish)

school(institution)

school(institution)

school(building)school

(building)

deu: “Schulhaus”deu: “Schulhaus”

deu: “Fischschwarm”deu: “Fischschwarm”

ces: “hejno”ces: “hejno”

fra: “banc”fra: “banc”

chv: “шкул”chv: “шкул”

jpn: “学校”jpn: “学校”

kor: “학교”kor: “학교”

lao: “ໂຮງຮຽນ”lao: “ໂຮງຮຽນ”

kat: “სკოლა”kat: “სკოლა”

• Over 16 million words and names in over 200 languages semantically connected

• Ambiguity and synonymy captured

eng: “UC Berkeley”eng: “UC Berkeley” eng: “Cal”eng: “Cal”

CityCity

GeopoliticalEntity

GeopoliticalEntity

ChuvashChuvash

GeorgianGeorgian

Lexvo.org LanguageDescriptions:LanguagesScriptsCharactersCountries

Cyrllic(Script) Cyrllic(Script)

Russia (Country)Russia

(Country)

UWN: Meaning Distinctions

OntologicalTaxonomy

Encyclopedic Knowledge,

Pictures, Video,

Sounds, Maps

Etymological and other word

relationships

Millions of Named Entities(People, Places,

Proteins, Asteroids,

Companies, etc.)

200+ languages

Step 2: Entity IntegrationStep 2: Entity Integration

Step 3: Taxonomy InductionStep 3: Taxonomy Induction ExtrasExtras

• Markov Chain to rank taxonomic parents• 270 Wikipedia taxonomies integrated with WordNet's hypernym hierarchy

es: Televisores: Televisor

es: Televisiónes: Televisión

ru: Телевизорru: Телевизор

hi: दूरदर्शनhi: दूरदर्शन

ja: テレビja: テレビ

en: Televisionen: Television

en:Television

set

en:Television

set

zh: 电视机zh: 电视机

ja: テレビ受像機ja: テレビ受像機

en: TV seten: TV set

en: T.V.en: T.V.

V1 ,u

V1 ,u

V1 ,v

V1 ,v

• LP for constraint-based computation of equivalence classes of entities• Region Growing approximation algorithm

• Link multilingual words to WordNet• Connect Wikipedia with WordNet (equivalence and

taxonomic links)

• FrameNet Linking• Common-Sense Knowledge Extraction

• Multilingual Roget's Thesaurus