canloc.edu.vncanloc.edu.vn/uploads/laws/bc-thang-3_kh-thang-4_2016.pdf · canloc.edu.vn
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao...
-
Upload
nickolas-sanders -
Category
Documents
-
view
218 -
download
0
Transcript of Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao...
![Page 1: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/1.jpg)
Named Entity Disambiguation on an Ontology Enriched by
Wikipedia
Hien Thanh Nguyen1, Tru Hoang Cao2
1Ton Duc Thang University, Vietnam2Ho Chi Minh City University of Technology, Vietnam
International IEEE Conference - RIVF’08
![Page 2: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/2.jpg)
2
Outline
• Introduction
• Background
• Approach
• Evaluation
• Conclusion
![Page 3: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/3.jpg)
3
Introduction
• No explicit semantic information about data and objects are presented in most of the Web pages.
• Semantic Web aim at solving this problem by making semantic metadata available in web page content– Ex: the entity “John McCarthy” pointing to the
homepage of the inventor of Lisp programming
– Entity disambiguation
![Page 4: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/4.jpg)
4
Introduction- Entity disambiguation
• Entity disambiguation is the process of identifying when different references correspond to the same real world entity (Jorge Cardoso and Amit Sheth)
• Our work aim at detecting named entities in a text and linking them to a given ontology
![Page 5: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/5.jpg)
5
Introduction - What are Named Entities?
• Named Entities (NE) are considered: people, organizations, locations, date, time, money, measures, percentage, etc.
• Example
“Ms. Washington's candidacy is being championed by several powerful lawmakers including her boss, Chairman John Dingell (D., Mich.) of the House Energy and Commerce Committee.”
![Page 6: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/6.jpg)
6
Introduction – Basic problem in NE
• Many NEs share the same name– Ambiguity of NE types: John Smith
(company vs. person) – May (person vs. month) – Washington (person vs. location) – etc.
– Ambiguity of referent (e.g. Paris may be the capital of French, or a small town in Texas)
![Page 7: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/7.jpg)
7
Introduction - Our contribution are two-fold
• Utilizing ontological concepts, and properties of instances in a specific KB, to automatically generate a corpus of labeled training data
• Exploiting Wikipedia to enrich the training data with new and informative features.
• Exploring a range of features extracted from texts, a KB, and Wikipedia
![Page 8: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/8.jpg)
8
Background - Ontology
• Ontology schema defines taxonomy of classes and properties (relations and attributes)
• Knowledge base contains semantic descriptions, including attributes and relations, of named entities in real world
![Page 9: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/9.jpg)
9
Background - Wikipedia
• Each article defines an entity or a concept
• Four sources of information– Title– Redirect titles– Categories– Hyperlinks
• Outlinks vs. Inlinks
![Page 10: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/10.jpg)
10
Background - Wikipedia
![Page 11: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/11.jpg)
11
Approach
• Expoiting terms (i.e. base noun phrases) and named entities coocurring with ambiguous name for disambiguation
• Casting the problem as ranking problem– Using TFIDF to calculate similarity and
choose the candidate with the highest score
![Page 12: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/12.jpg)
12
Approach
• Constructing corpus– Utilizing classes and properties to generate a
snippet for each instance in an ontology– Feature generation for enriching
representation of those instances
• Analyzing a text for disambiguation and identification of NEs occurring therein
![Page 13: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/13.jpg)
13
Approach - Construct corpus
![Page 14: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/14.jpg)
14
Approach- Construct corpus
![Page 15: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/15.jpg)
15
Approach – Disambiguation process
• For each ambiguous name– Looking up candidates– Extracting base noun phrases in the same
sentence an in the headline– Extracting named entities in the whole text– Using TFIDF to rank and choose the
candidate with the highest score
![Page 16: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/16.jpg)
16
Approach – An example
![Page 17: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/17.jpg)
17
Evaluation
• Using KIM Ontology• 140 texts of news articles in some news
agencies• Focusing on four names: John McCarthy,
John Wiliams, Georgia, and Columbia• Measure accuracy as the total number of
correctly assignment NEs (in text)/ontology instances divided by the total number of assignment
![Page 18: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/18.jpg)
18
Evaluation
![Page 19: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/19.jpg)
19
Conclusion
• Our approach is quite natural and similar to the way humans do, relying on co-occurring NEs and terms to resolve other ambiguous entities in a given context.
• Currently Wikipedia editions are available for approximately 200 languages, so our method can be used to build NE disambiguation systems for a large number of languages
• The features from Wikipedia, and NEs in the whole text are meaningful evidence for disambiguation
• In the future: detecting NEs out of the ontology, and investigating other similarity metrics
![Page 20: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.](https://reader035.fdocuments.us/reader035/viewer/2022070413/5697bfa51a28abf838c97a27/html5/thumbnails/20.jpg)
20
Thanks for your attention !