TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting...

TEXTRUNNER

Turing CenterComputer Science and Engineering

University of Washington

Reporter: Yi-Ting HuangDate: 2009/9/4

1. Banko, M., Cafarella, M. J., Soderland. S., Broadhead, M., & Etzioni O. (2007). Open Information Extraction from the Web. Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007)

2. Cafarella, M. J., Banko, M., & Etzioni, Oren. (2006). Relational Web Search. UW CSE Tech Report 2006-04-02

3. Yates, A., & Etzioni, O. (2007). Unsupervised Resolution of Objects and Relations on the Web. NAACL-HLT2007

Relationship Queries

Factoid Queries

Qualified List Queries

Unnamed-Item Queries

PART 1. Query

Relationship Queries

Factoid Queries

Qualified List Queries

PART 2. RetrievalTn=(ei, r, ej)

PART 3. clustering

output

Query Processing

Relationship QueriesFactoid QueriesQualified List QueriesUnnamed-Item Queries

Buildinverted

PART 2

PART 1

a subset of extractions

output

a set of raw triple

Assessor

ExtractorLearnercorpus

a structured set of extractions PART 3

Spreading Activation Search

• Spreading activation is a technique that has been used to perform associative retrieval of nodes in a graph.

A(100%)

C (80%)

B (80%) D (50%)

E (50%)

G (20%)

F (20%)

decay factor

PART1: Scoring based on Spreading Activation Search

• search query term Q={q0, q1,…qn-1 }e.g. king of pop, Q={q0=king, q1=of, q3=pop }

TextHit(ni; qj) =1, If a node ni contains a query term qj, TextHit(ni; qj) = 0, Otherwise.

TextHit(e; qj ) = 1, If an edge e contains a query term qj,TextHit(e; qj ) = 0, Otherwise.

decay factor:0~1

Q={q0, q1, … qn-1}

A ranked list, T1={q0 q1…,e1,n12}T2={q0 q1…,e2,n22}T3={q0 q1…,e3,n32} ….

PART 2. Input & Output

• Input (corpus): – Given a corpus of 9 million Web pages– containing 133 million sentences,

• Output:– extracted a set of 60.5 million tuples – an extraction rate of 2.2 tuples per sentence.

Learner

Training example

Dependency parser Heuristics Rule

Positive example

Falseexample

Learner

Dependency parser

• Dependency parsers locate instances of semantic relationships between words, forming a directed graph that connects all words in the input.– the subject relation (John ← hit),– the object relation (hit → ball) – phrasal modification (hit → with → bat).

S=(Mary is a good student who lives in Taipei.)T=(Mary/NP1 is/VB a good student/NP2 who/PP lives/VB in/PP Taipei/NP3.)

e1 is NP, e.g. Marye.g. Taipei

e.g. who lives ine.g. positive / negative

e.g. |R|=3<M

e.g. lives/VB

e.g. (Mary, Taipei,T)

S=(Mary is a good student who lives in Taipei.)T=(Mary/NP1 is/VB a good student/NP2 who/PP lives/VB in/PP Taipei/NP3.)

A=(Mary/NP1 is/VB a good student/NP2 who/PP lives/VB in/PP Taipei/NP3.)

e.g. Mary is subject

e.g. Mary is head

e.g. (Mary, Taipei, T)

e.g. if “Taipei” is object of PP, then if “Taipei” is valid semantic role then positive else negative else positive, R=normalize(R ) e.g. lives—> live

Learner

• Naive Bayes classifier • T(ei, ri,j ,ej )

Features include – the presence of part-of-speech tag sequences in the

relation ri,j , – the number of tokens in ri,j ,

– the number of stopwords in ri,j , – whether or not an object e is found to be a proper noun, – the part-of-speech tag to the left of ei,

– the part-of-speech tag to the right of ej .

Unsupervised Resolution of Objects and Relations on the Web

Alexander Yates, Oren EtzioniTuring Center

Computer Science and EngineeringUniversity of Washington

Proceedings of NAACL HLT 2007

Research Motivation

• Web Information Extraction (WIE) systems extract assertions that describe a relation and its arguments from Web text.– (is capital of ,D.C.,United States)– (is capital city of ,Washington,U.S.)which describes the same relationship as above but contains

a different name for the relation and each argument.• We refer to the problem of identifying synonymous

object and relation names as Synonym Resolution (SR).

Research purpose

• we present RESOLVER, a novel, domain-independent, unsupervised synonym resolution system that applies to both objects and relations.

• RESOLVER Elements co-referential names together using a probabilistic model informed by string similarity and the similarity of the assertions containing the names.

Assessor

outputESP

CombineEvidenceClustering

A structured subset of

extractions

a set of extractions

String Similarity Model (SSM)

• T=(s, r, o); – s and o are object string; – r is relation string.– (r,s) is the property of s.– (s,o) is the instance of r.

• If s1 and s2 are object string, sim(s1, s2) based on Monge-Elkan string similarity.

• If s1 and s2 are relation string, sim(s1, s2) based on Levenshtein string distance., , ,

( , , ); ( , , )i i j j

t fi j i j i j

T s r o T s r o

R R orR

Levenshtein string distance

• Food• Good

• God• Good

Extracted Shared Property Model (ESP)

• T=(s, r, o); – s and o are object string; – r is relation string.– (r,s) is the property of s.

• (si, sj), si=Mars; sj=red planet• (Mars, lacks, ozone layer) 659• (red Planet, lacks, ozone layer) 26• They share four properties 4

MarsRed planet

|Ei|=ni|Ej|=nj

6593500

|Ei|=ni

|Ui|=Pi

• Ball and urns abstraction• ESP uses a pair of urns, containing Pi and Pj

balls respectively, for the two strings si and sj . Some subset of the Pi balls have the exact same labels as an equal-sized subset of the Pj balls. Let the size of this subset be Si,j .

• ti,j i j i,j

fi,j i j i,j

S =min(P ,P ),if R

S <min(P ,P ),if R

|Fi|=r |Fj|=s

|Ui|=Pi |Uj|=Pj

|Ei|=ni|Ej|=nj

ij i j

i i ij

j j ij

F E S K

Ball and urns abstraction

|Fi|=r |Fj|=s

|Ui|=Pi |Uj|=Pj

|Ei|=ni|Ej|=nj

Combine Evidence

e1=(dog, live, house)e2=(puppy, live, house)e3=(cat, live, house)e4=(cat, live, home)e5=(kitty, live, home)

Elements[dog]=1Elements[live]=2Elements[house]=3Elements[puppy]=4Elements[cat]=5Elements[home]=6Elements[kitty]=7

Elements[1]=dogElements[2]=liveElements[3]=houseElements[4]=puppyElements[5]catElements[6]=homeElements[7]kitty

1 roundIndex[live house]=(1,4,5, live house)Index[live home]=(5,7, live home)Index[cat live]=(3,6, cat live)

Max=50

Sim(1,4)Sim(1,5)Sim(4,5)Sim(5,7)Sim(3,6)

Elements[1]=dog+puppyElements[2]=liveElements[3]=houseElements[4]=puppyElements[5]catElements[6]=homeElements[7]kitty

Max=50

UsedCluster={}

Elements[1]=dog+puppyElements[2]=liveElements[3]=house+homeElements[4]=puppyElements[5]cat+kittyElements[6]=homeElements[7]kitty

Max=50

UsedCluster={(1,4), (5,7), (3,6)}

Elements[1]=dog+puppyElements[2]=liveElements[3]=house+homeElements[4]=puppyElements[5]cat+kittyElements[6]=homeElements[7]kitty

Max=50

UsedCluster={}

Elements[1]=dog+puppy+catElements[2]=liveElements[3]=house+homeElements[4]=puppyElements[5]cat+kittyElements[6]=homeElements[7]kitty

2 roundIndex[live house]=(1,5, live house)Index[live home]=(5, live home)Index[cat live]=(3, cat live)

Max=50

Sim(1,5) UsedCluster={(1,5)}

Elements[1]=dog+puppy+catElements[2]=liveElements[3]=house+homeElements[4]=puppyElements[5]cat+kittyElements[6]=homeElements[7]kitty

2 roundIndex[live house]=(1,5, live house)Index[live home]=(5, live home)Index[cat live]=(3, cat live)

Max=50

Sim(1,5) UsedCluster={(1,5)}

(dog, puppy, cat)(live,house)(cat, kitty)(live,home)(cat, live)(house, home)

Experiment

• Dataset :– 9797 distinct object strings– 10151 distinct relation strings

• Metric:– Measure the precision by manually labeling all of the

cluster.– Measure the recall

• The top 200 object strings formed 51 clusters of size, with an average cluster is size of 2.9.

• For relation string, formed 110 clusters, with an avg cluster size of4.9.

Result

• CSM had particular trouble with lower-frequency strings, judging ar too many of them to be co-referential on too little evidence.

• Extraction error• Multiple word sense.

Function Filtering

• (West Virginia, capital of, Richmond)(Virginia, capital of, Charleston)

• sim(y1,y2)>thdif there exist a function f and extraction f(x1,y1) and f(x2, y2) match (1 to 1)then not be merged.

• It requires as input the set of functional and one-to-one relations in the data

Web Hitcounts

• While names for two similar objects may often appear together in the same sentence, it is relatively rare for two different names of the same object to appear in the same sentence.

• Coordination-Phrase Filter searches

Experiment

Conclusion

• In the study, it showed that how the TEXTRUNNER automatically extracts information from web and the RESOLVER system finds clusters of co-referential object names in the relations 78% and recall of 68% with the aid of CPF.

Comments

• The assumption of ESP model

• How to use TextRunner in my research– Find relation– Using TextRunner as query expansion

TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting...

Documents

Transcript of TEXTRUNNER Turing Center Computer Science and Engineering University of Washington Reporter: Yi-Ting...

M.C.A.D. & FRANCINA CAFARELLA, v. 06-BEM-02657 GERRITY ... · Complainant Francina Cafarella (“Complainant”) was born May 27, 1948 and currently resides in Winchester, Massachusetts.

Turing Machine - Computer Action Teamweb.cecs.pdx.edu/~mperkows/temp/Turing-Machine.pdf · Turing Machine • Devised by Alan Turing, 1937 paper* • Definition of Computable •

Alan Turing and the Turing Award Winners

Turing Machines - KeithSchwarz.comDesign Turing machines! Alternate problem session tomorrow night, 7-8PM, in Gates 100. The Turing Machine A Turing machine consists of three parts:

Intro to Web Search Michael J. Cafarella December 5, 2007.

banko zoltan 111010

Turing Compatibility Guide for CUDA Applications · Turing Compatibility Guide for CUDA Applications DA-09074-001_v10.2 | 2 1.3. Compatibility between Volta and Turing The Turing

All About Nutch Michael J. Cafarella CSE 454 April 14, 2005.

Turing Machines (TM) Model of Computation. Jaruloj Chongstitvatana 2301379Turing Machines2 Outlines Structure of Turing machines Deterministic Turing.

Turing Machines Chapter 3.1 1. Plan Turing Machines(TMs) – Alan Turing Church-Turing Thesis – Definitions Computation Configuration Recognizable vs. Decidable.

The Turing Test Computing Machinery and Intelligence Alan Turing.

Turing Test: «Can machines think?». Alan Turing Year

Máquinas de Turing, programas y tesis de Turing …webdiis.unizar.es/~elvira/L12_Tesis de Turing-Church.pdf · de Turing . 41 Variaciones de máquinas de Turing • Más de una cinta

Turing Machines & Computability · Turing Machines & Computability Lecture 19 1. CS 374 Course Trajectory ... General Recursive Functions Lambda Calculus Turing Turing Machine Church

From Hilbert to Turing and beyondpeople.scs.carleton.ca/~bertossi/talks/hilbTur08.pdf · Turing Machines A Turing machine (TM) is a computational model introduced by Alan Turing around

Banko Sentral Emp Assn v. Banko Sentral(1).pdf

Alan Turing and the Turing Award Winners - UFRGS

Turing Patterns with Turing Machines: Emergence and Low-level … · 2012-10-08 · Turing Patterns with Turing Machines: Emergence and Low-level Structure Formation Hector Zenil

Turing Tests with Turing Machines

Automatic optimization of MapReduce Programs Michael Cafarella , Eaman Jahani , Christopher Re