Wikipedia Knowledge Extraction. Pronoun Resolution module Infobox extraction SRL parsing ...
-
Upload
curtis-perkins -
Category
Documents
-
view
234 -
download
0
Transcript of Wikipedia Knowledge Extraction. Pronoun Resolution module Infobox extraction SRL parsing ...
Wikipedia Knowledge Extraction
Pronoun Resolution module Infobox extraction SRL parsing Improved refinement Clustering Hadoop compatibility
“His mother wanted him to get a good education so she sent him to live with his grandparents in Honolulu, HI” (Barack Obama)
“His mother wanted him to get a good education so she sent him to live with his grandparents in Honolulu, HI” (Barack Obama)
Current solution: replace pronouns with article title (very primitive)
Target solution: ◦ Nobody in the world has solved this yet◦ Use an existing system that is usually correct?◦ Simple rules for common patterns?
Convert information into simple sentences:◦ Joe Biden is Barack Obama’s Vice
President ◦ Barack Obama is preceded by
George W. Bush Use type of phrase (Noun
Phrase, Verb Phrase) to determine sentence to form.
Read papers from Turing Center (University of Washington)
Performs a deep analysis on each sentence. E.g. “Yoshi has a long tongue which he uses
to grab enemies and eat them.”◦ has (A0: Yoshi, A1: long tongue)◦ use (A0: Yoshi, A1: long tongue, A2: grab enemies
and eat them) Use SRL parsing to improve quality and
representation of knowledge. Problem: speed and complexity
Current system has Subject, Object, Verb tuples
Problem: hard to define what words to incorporate in each phrase
E.g. “'The dog ( Canis lupus familiaris )' 'is' 'a mammal from the family Canidae‘”◦ The dog? dog? The dog ( Canis lupus familiaris )?◦ a mammal? a mammal from the family Canidae?
Possible solutions: ◦ Different levels of information?◦ Simple rules based on part of speech tags?
Idea: Determine whether two separate mentions point to the same concept◦ ‘The dog’, ‘a dog’, ‘dogs’◦ ‘Cats’, ‘C.A.T.S’, ‘CAT Scan’◦ ‘President Obama’, ‘President Barack Obama’
Possible solutions:◦ Feature-based classification◦ Self organizing map◦ Terms associated
Need to ensure scaling is possible for move to regular Wikipedia
Hadoop is an open source implementation of the Map-Reduce algorithm
Map-Reduce is an algorithm that parallelizes a process by splitting its iterations over several machines