TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting...
-
Upload
tyrese-frakes -
Category
Documents
-
view
212 -
download
0
Transcript of TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting...
![Page 1: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/1.jpg)
TEXTRUNNER
Turing CenterComputer Science and Engineering
University of Washington
Reporter: Yi-Ting HuangDate: 2009/9/4
1. Banko, M., Cafarella, M. J., Soderland. S., Broadhead, M., & Etzioni O. (2007). Open Information Extraction from the Web. Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007)
2. Cafarella, M. J., Banko, M., & Etzioni, Oren. (2006). Relational Web Search. UW CSE Tech Report 2006-04-02
3. Yates, A., & Etzioni, O. (2007). Unsupervised Resolution of Objects and Relations on the Web. NAACL-HLT2007
![Page 2: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/2.jpg)
2
Relationship Queries
Factoid Queries
Qualified List Queries
Unnamed-Item Queries
PART 1. Query
![Page 3: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/3.jpg)
3
Relationship Queries
![Page 4: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/4.jpg)
4
Factoid Queries
![Page 5: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/5.jpg)
5
Qualified List Queries
![Page 6: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/6.jpg)
6
PART 2. RetrievalTn=(ei, r, ej)
![Page 7: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/7.jpg)
7
PART 3. clustering
![Page 8: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/8.jpg)
input
output
Query Processing
Relationship QueriesFactoid QueriesQualified List QueriesUnnamed-Item Queries
8
Buildinverted
index
PART 2
PART 1
a subset of extractions
input
output
a set of raw triple
Assessor
ExtractorLearnercorpus
a structured set of extractions PART 3
![Page 9: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/9.jpg)
9
Spreading Activation Search
• Spreading activation is a technique that has been used to perform associative retrieval of nodes in a graph.
A(100%)
C (80%)
B (80%) D (50%)
E (50%)
G (20%)
F (20%)
decay factor
![Page 10: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/10.jpg)
10
PART1: Scoring based on Spreading Activation Search
• search query term Q={q0, q1,…qn-1 }e.g. king of pop, Q={q0=king, q1=of, q3=pop }
•
TextHit(ni; qj) =1, If a node ni contains a query term qj, TextHit(ni; qj) = 0, Otherwise.
TextHit(e; qj ) = 1, If an edge e contains a query term qj,TextHit(e; qj ) = 0, Otherwise.
decay factor:0~1
![Page 11: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/11.jpg)
11
Q={q0, q1, … qn-1}
A ranked list, T1={q0 q1…,e1,n12}T2={q0 q1…,e2,n22}T3={q0 q1…,e3,n32} ….
![Page 12: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/12.jpg)
12
PART 2. Input & Output
• Input (corpus): – Given a corpus of 9 million Web pages– containing 133 million sentences,
• Output:– extracted a set of 60.5 million tuples – an extraction rate of 2.2 tuples per sentence.
![Page 13: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/13.jpg)
13
Learner
Training example
Dependency parser Heuristics Rule
Positive example
Falseexample
Learner
![Page 14: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/14.jpg)
14
Dependency parser
• Dependency parsers locate instances of semantic relationships between words, forming a directed graph that connects all words in the input.– the subject relation (John ← hit),– the object relation (hit → ball) – phrasal modification (hit → with → bat).
![Page 15: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/15.jpg)
15
S=(Mary is a good student who lives in Taipei.)T=(Mary/NP1 is/VB a good student/NP2 who/PP lives/VB in/PP Taipei/NP3.)
e1 is NP, e.g. Marye.g. Taipei
e.g. who lives ine.g. positive / negative
e.g. |R|=3<M
e.g. lives/VB
e.g. (Mary, Taipei,T)
![Page 16: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/16.jpg)
16
S=(Mary is a good student who lives in Taipei.)T=(Mary/NP1 is/VB a good student/NP2 who/PP lives/VB in/PP Taipei/NP3.)
A=(Mary/NP1 is/VB a good student/NP2 who/PP lives/VB in/PP Taipei/NP3.)
e.g. Mary is subject
e.g. Mary is head
e.g. (Mary, Taipei, T)
e.g. if “Taipei” is object of PP, then if “Taipei” is valid semantic role then positive else negative else positive, R=normalize(R ) e.g. lives—> live
![Page 17: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/17.jpg)
17
Learner
• Naive Bayes classifier • T(ei, ri,j ,ej )
Features include – the presence of part-of-speech tag sequences in the
relation ri,j , – the number of tokens in ri,j ,
– the number of stopwords in ri,j , – whether or not an object e is found to be a proper noun, – the part-of-speech tag to the left of ei,
– the part-of-speech tag to the right of ej .
![Page 18: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/18.jpg)
Unsupervised Resolution of Objects and Relations on the Web
Alexander Yates, Oren EtzioniTuring Center
Computer Science and EngineeringUniversity of Washington
Proceedings of NAACL HLT 2007
![Page 19: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/19.jpg)
19
Research Motivation
• Web Information Extraction (WIE) systems extract assertions that describe a relation and its arguments from Web text.– (is capital of ,D.C.,United States)– (is capital city of ,Washington,U.S.)which describes the same relationship as above but contains
a different name for the relation and each argument.• We refer to the problem of identifying synonymous
object and relation names as Synonym Resolution (SR).
![Page 20: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/20.jpg)
20
Research purpose
• we present RESOLVER, a novel, domain-independent, unsupervised synonym resolution system that applies to both objects and relations.
• RESOLVER Elements co-referential names together using a probabilistic model informed by string similarity and the similarity of the assertions containing the names.
![Page 21: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/21.jpg)
21
Assessor
SSM
outputESP
CombineEvidenceClustering
A structured subset of
extractions
input
a set of extractions
![Page 22: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/22.jpg)
22
String Similarity Model (SSM)
• T=(s, r, o); – s and o are object string; – r is relation string.– (r,s) is the property of s.– (s,o) is the instance of r.
•
•
• If s1 and s2 are object string, sim(s1, s2) based on Monge-Elkan string similarity.
• If s1 and s2 are relation string, sim(s1, s2) based on Levenshtein string distance., , ,
( , , ); ( , , )i i j j
t fi j i j i j
T s r o T s r o
R R orR
![Page 23: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/23.jpg)
23
Levenshtein string distance
• Food• Good
• God• Good
![Page 24: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/24.jpg)
24
Extracted Shared Property Model (ESP)
• T=(s, r, o); – s and o are object string; – r is relation string.– (r,s) is the property of s.
• (si, sj), si=Mars; sj=red planet• (Mars, lacks, ozone layer) 659• (red Planet, lacks, ozone layer) 26• They share four properties 4
26659
MarsRed planet
k=4
|Ei|=ni|Ej|=nj
6593500
|Ei|=ni
|Ui|=Pi
![Page 25: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/25.jpg)
25
• Ball and urns abstraction• ESP uses a pair of urns, containing Pi and Pj
balls respectively, for the two strings si and sj . Some subset of the Pi balls have the exact same labels as an equal-sized subset of the Pj balls. Let the size of this subset be Si,j .
• ti,j i j i,j
fi,j i j i,j
S =min(P ,P ),if R
S <min(P ,P ),if R
![Page 26: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/26.jpg)
26
|Fi|=r |Fj|=s
Si Sj
|Ui|=Pi |Uj|=Pj
|Ei|=ni|Ej|=nj
Sij
k
| |
( )
( )
ij i j
i j
i i ij
j j ij
S U U
K k
K E E
F E S K
F E S K
![Page 27: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/27.jpg)
27
Ball and urns abstraction
![Page 28: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/28.jpg)
28
Sij
Pini
Sij
![Page 29: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/29.jpg)
29
|Fi|=r |Fj|=s
Si Sj
|Ui|=Pi |Uj|=Pj
|Ei|=ni|Ej|=nj
Sij
k
r s
Sij
![Page 30: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/30.jpg)
30
Combine Evidence
![Page 31: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/31.jpg)
31
e1=(dog, live, house)e2=(puppy, live, house)e3=(cat, live, house)e4=(cat, live, home)e5=(kitty, live, home)
Elements[dog]=1Elements[live]=2Elements[house]=3Elements[puppy]=4Elements[cat]=5Elements[home]=6Elements[kitty]=7
Elements[1]=dogElements[2]=liveElements[3]=houseElements[4]=puppyElements[5]catElements[6]=homeElements[7]kitty
1 roundIndex[live house]=(1,4,5, live house)Index[live home]=(5,7, live home)Index[cat live]=(3,6, cat live)
Max=50
Sim(1,4)Sim(1,5)Sim(4,5)Sim(5,7)Sim(3,6)
![Page 32: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/32.jpg)
32
e1=(dog, live, house)e2=(puppy, live, house)e3=(cat, live, house)e4=(cat, live, home)e5=(kitty, live, home)
Elements[dog]=1Elements[live]=2Elements[house]=3Elements[puppy]=1Elements[cat]=5Elements[home]=6Elements[kitty]=7
Elements[1]=dog+puppyElements[2]=liveElements[3]=houseElements[4]=puppyElements[5]catElements[6]=homeElements[7]kitty
1 roundIndex[live house]=(1,4,5, live house)Index[live home]=(5,7, live home)Index[cat live]=(3,6, cat live)
Max=50
Sim(1,4)Sim(1,5)Sim(4,5)Sim(5,7)Sim(3,6)
UsedCluster={}
![Page 33: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/33.jpg)
33
e1=(dog, live, house)e2=(puppy, live, house)e3=(cat, live, house)e4=(cat, live, home)e5=(kitty, live, home)
Elements[dog]=1Elements[live]=2Elements[house]=3Elements[puppy]=1Elements[cat]=5Elements[home]=3Elements[kitty]=5
Elements[1]=dog+puppyElements[2]=liveElements[3]=house+homeElements[4]=puppyElements[5]cat+kittyElements[6]=homeElements[7]kitty
1 roundIndex[live house]=(1,4,5, live house)Index[live home]=(5,7, live home)Index[cat live]=(3,6, cat live)
Max=50
Sim(1,4)Sim(1,5)Sim(4,5)Sim(5,7)Sim(3,6)
UsedCluster={(1,4), (5,7), (3,6)}
![Page 34: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/34.jpg)
34
e1=(dog, live, house)e2=(puppy, live, house)e3=(cat, live, house)e4=(cat, live, home)e5=(kitty, live, home)
Elements[dog]=1Elements[live]=2Elements[house]=3Elements[puppy]=1Elements[cat]=5Elements[home]=3Elements[kitty]=5
Elements[1]=dog+puppyElements[2]=liveElements[3]=house+homeElements[4]=puppyElements[5]cat+kittyElements[6]=homeElements[7]kitty
2 roundIndex[live house]=(1,1,5, live house)Index[live home]=(5,5, live home)Index[cat live]=(3,3, cat live)
Max=50
UsedCluster={}
![Page 35: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/35.jpg)
35
e1=(dog, live, house)e2=(puppy, live, house)e3=(cat, live, house)e4=(cat, live, home)e5=(kitty, live, home)
Elements[dog]=1Elements[live]=2Elements[house]=3Elements[puppy]=1Elements[cat]=1Elements[home]=3Elements[kitty]=5
Elements[1]=dog+puppy+catElements[2]=liveElements[3]=house+homeElements[4]=puppyElements[5]cat+kittyElements[6]=homeElements[7]kitty
2 roundIndex[live house]=(1,5, live house)Index[live home]=(5, live home)Index[cat live]=(3, cat live)
Max=50
Sim(1,5) UsedCluster={(1,5)}
![Page 36: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/36.jpg)
36
e1=(dog, live, house)e2=(puppy, live, house)e3=(cat, live, house)e4=(cat, live, home)e5=(kitty, live, home)
Elements[dog]=1Elements[live]=2Elements[house]=3Elements[puppy]=1Elements[cat]=1Elements[home]=3Elements[kitty]=5
Elements[1]=dog+puppy+catElements[2]=liveElements[3]=house+homeElements[4]=puppyElements[5]cat+kittyElements[6]=homeElements[7]kitty
2 roundIndex[live house]=(1,5, live house)Index[live home]=(5, live home)Index[cat live]=(3, cat live)
Max=50
Sim(1,5) UsedCluster={(1,5)}
(dog, puppy, cat)(live,house)(cat, kitty)(live,home)(cat, live)(house, home)
![Page 37: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/37.jpg)
37
Experiment
• Dataset :– 9797 distinct object strings– 10151 distinct relation strings
• Metric:– Measure the precision by manually labeling all of the
cluster.– Measure the recall
• The top 200 object strings formed 51 clusters of size, with an average cluster is size of 2.9.
• For relation string, formed 110 clusters, with an avg cluster size of4.9.
![Page 38: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/38.jpg)
38
Result
• CSM had particular trouble with lower-frequency strings, judging ar too many of them to be co-referential on too little evidence.
• Extraction error• Multiple word sense.
![Page 39: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/39.jpg)
39
Function Filtering
• (West Virginia, capital of, Richmond)(Virginia, capital of, Charleston)
• sim(y1,y2)>thdif there exist a function f and extraction f(x1,y1) and f(x2, y2) match (1 to 1)then not be merged.
• It requires as input the set of functional and one-to-one relations in the data
![Page 40: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/40.jpg)
40
Web Hitcounts
• While names for two similar objects may often appear together in the same sentence, it is relatively rare for two different names of the same object to appear in the same sentence.
• Coordination-Phrase Filter searches
![Page 41: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/41.jpg)
41
Experiment
![Page 42: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/42.jpg)
42
Conclusion
• In the study, it showed that how the TEXTRUNNER automatically extracts information from web and the RESOLVER system finds clusters of co-referential object names in the relations 78% and recall of 68% with the aid of CPF.
![Page 43: TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting Huang Date: 2009/9/4 1.Banko, M., Cafarella, M. J.,](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7d5503460f94932bf4/html5/thumbnails/43.jpg)
43
Comments
• The assumption of ESP model
• How to use TextRunner in my research– Find relation– Using TextRunner as query expansion