Joint Information Extraction
Ph.D. Thesis Defense
Qi Li
Advisor: Heng Ji
Computer Science Department
Rensselaer Polytechnic Institute
April 7th, 2015
Doctoral Committee
Dr. Heng Ji, Chair, RPI
Dr. James Hendler, RPI
Dr. Peter Fox, RPI
Dr. Dan Roth, UIUC
Dr. Daniel Bikel, Google
2
…, dozens of Israeli tanks advanced into the northern Gaza Strip backed by helicopters which fired at least three rockets in the Jabaliya area, Palestinian security sources said. …
advanced
fired
Owner Vehicle
Destination
Instrument
Place
Instrument
Event:Attack
Event: Transport
AFP 2003/03/05
Israeli
GPE
tanks
VEHICLE
Gaza Strip
LOCATION
helicopters
VEHICLE
Jabaliya area
LOCATION
rockets
WEAPON
Relation
Background
Entity Mention: a mention of an entity in the worldRelation: a semantic relationship between two entity
mentionsACE Relation Type
Example
Physical a town(GPE) some 50 miles south of Salzburg(GPE)
Person-Social Relatives(PER) of the dead(PER)
EMP-ORG The tire maker(ORG) still employs 1,400(PER)
Agent-Artifact Rubin Military Design, the makers(ORG) of the Kursk(VEH)
PER/ORG Affiliation
Republican(ORG) senators(PER)
GPE-Affiliation Salzburg(GPE) Red Cross officials(PER)
• Automatic Content Analysis
3
Background
Background
Event Mention: an occurrence of an event with a particular type and subtype
Event Mention Trigger (anchor): the word (or phrase) that most clearly expresses the event mention
Event Argument: the entity mentions that serve as participant or attribute to the event“In Baghdad, a cameraman died when an American tank
fired on the Palestine Hotel.”
Trigger fired
ACE Type Attack
Place Baghdad
Target Palestine Hotel
Instrument tank
• Automatic Content Analysis
4
Outline• Background• Overview• Joint Extraction Framework
o Leverage Cross-component Dependencies
• Cross-document Inferenceo Leverage Cross-doc Dependencies
• Bilingual Name Taggingo Leverage Cross-lingual Dependencies
• Conclusions & Future Directions• Related Publications
5
6
IE Hill-climbing
TextsTexts
Entity Mentions
Relations
Events
[McCallum et al., 2003; Finkel et al., 2005;Florian et al., 2006; Ji&Grishman, 2006]
[Sun2011; Jiang2007; Bunescu 2005; Zhao2005;Qian2010; Chan&Roth2011; Plank 2013]
[Ji&Grishman2008, Liao2010, Hong2011]
90 %
50 %
40 %Traditional Approach
•Focused on each individual subtask
•Lack of global inference about the entire results
•End-to-End Performances are limited
o Relation: 40.8%o Event Arg: 36.6%
7
OverviewThis thesis investigates cross-component, cross-document, and cross-lingual dependencies to improve Information Extraction.
8
Cross-component Dependencies
Different components have various dependencies •Long-distance dependencies •Dependencies among multiple subtasks
9
“Since the June 4 summit in Jordan between Abbas, Sharon and George W. Bush,
Hamas has been a thorn in the side of Abbas ...”
George W. Bush Republican PartyMember of
0.91
George W. Bush HamasMember of
0.47
Cross-document Dependencies
“The list included Sheik Ahmed Yassin, Hamas’ founder and spiritual
leader, senior Hamas official Abdel Aziz Rantisi”
10
Cross-lingual Dependencies
Different languages in parallel corpora are complimentary •Resources •Patterns, features, and language phenomenon
11
• Constrained Conditional Models, ILP Inference[Roth2004; Punyakanok2005; Roth2007; Chang2012; Yang2013; Jindal&Roth2013; Cheng&Roth2013]• Our method is a single unified joint model for both
learning and inference • Re-ranking Methods
[Ji2005; Huang2002; Chen2010; McClosky2011]• Their models were separately learned• Need additional training data for re-ranking
• Probabilistic Graphical Models[Sutton2004; Poon2007; Poon2010; Kiddon2012; Wick2012; Singh2013]• Computationally expensive• Our method uses beam-search, and thus can
explore global features with low cost
Related Joint Modeling Methods
12
ContributionsI. Leverage Cross-component Dependencies
• Proposed a new representation: information networks
• The first attempt to extract entity mentions, relations and events in a single joint model
• Achieved state-of-the-art performance for each subtask
II. Leverage Cross-doc Dependencies• Our method can effectively remove incorrect IE
results without using any additional training data
III.Leverage Cross-lingual Dependencies • Modeled the task of name tagging in parallel
corpora as a single structured prediction problem
Outline• Background• OverviewI. Joint Extraction Framework
o Leverage Cross-component Dependencies
II.Cross-document Inferenceo Leverage Cross-doc Dependencies
III.Bilingual Name Taggingo Leverage Cross-lingual Dependencies
• Conclusions & Future Directions • Related Publications
13
14
“Asif Mohammed Hanif detonated explosives in Tel Aviv”
Input
GPE Tel Aviv
PER Asif Mohammed Hanif
WEA explosives
Relation: Physical
arg-1 PER Asif Mohammed Hanif
arg-2 GPE Tel Aviv
• Typical pipelined architecture
Event: Attack
trigger detonated
attacker PER Asif Mohammed Hanif
place GPE Tel Aviv
instrument
WEA explosives
Relation Extraction
Event Extraction
Entity Mention Boundaries +
Types
Motivation
Motivation
• A Closer Look at Event Pipeline
☹Components do not talk to each other• Error propagation without feedback
☹Incapable of dealing with global dependencies
15
Trigger DetectionPattern matching / Classifier
1
Argument ClassificationMaxent / SVMs Binary Classifier
2
Arg Role ClassificationMaxent / SVMs Multiclass Classifier
3
Reportability Classification
Maxent / SVMs Binary Classifier
4
EventExtractio
n
[Ji and Grishman 2008, Liao and Grishman 2010, Hong et al. 2011]
Interactions among Multiple
Components
16
• “fired” is ambiguous: Attack, End-Position, NIL … • Argument labeling can benefit event trigger decisions
American tank: Instrument (Attack)
Air defense chief: Position (End-
Position)
1) “In Baghdad, a cameraman died when an American tank fired on the Palestine Hotel.”
2) “He has fired his air defense chief. ”
victim target
17
• Make use of arbitrary global features
• For example:
• Attack event usually has one Place
• penalize assignments with more than one Places
In Baghdad, a cameraman died when a tank fired on a hotel.
place place
In Baghdad, a cameraman died when a tank fired on a hotel.
place target
Interactions among Multiple
Components
Joint Extraction w/ Inexact Search
18
Propose a novel representation: Information NetworksJoint Extraction of Entity Mentions, Relations, and
Events
Joint Search
Algorithm“Kiichiro Toyoda founded the automaker”
search space
beam
• Jointly Construct Information Networks• Nodes: entity mentions, event triggers• Edges : relations, event argument links
1. Joint Search Algorithm• beam search
2. Evaluate Candidates: • local and global features
3. Estimate Feature Weights• structured perceptron
Joint Extraction w/ Inexact Search
19
search space exponentially largebeam
+
-
Search Algorithms• Joint search algorithm for multiple IE
componentso Beam search
• Node-step: extract triggers, entity mentions, etc.
• Edge-step: extract relation links, event-argument links.
o Token-based vs. Segment-based decoding
20
The tire maker still employs 1,400 O B-ORG L-ORG O O U-PER
The tire maker still employs 1,400
ORG PER
Token-based
vs.Segment-based
(Sarawagi & Cohen, 2004)(Zhang & Clark, 2008)
(Florian et. al., 2006)(Ratinov & Roth, 2009)
BILOU schema: B-X: beginning of X;L-X: last token of X; U-X: single token of X; O: no type
• Structured Perceptron with Beam Search
oUpdate Weights: • Perceptron Update:
• K-best MIRA (Margin Infused Relaxed Algorithm)[McDonald et. al., 2005]
Parameter Estimation
21
Beam Search
update weights
[Collins and Roark 2004, Huang et al. 2012]
input
ground-truth
prediction
Standard-update vs. Early-update
standard update: invalid update!
early update:early update
beam
[Collins and Roark 2004, Huang et al. 2012]
beam
1-best prefix z
global 1-best z
correct solution y
22
ground-truth prefix falls off beam
Token-based Search Algorithm
• Assume argument candidates are given• Decoding example (beam size = 1):
In Baghdad, a cameraman died when an American tank fired on the Palestine Hotel.
place
placevictim
target
targetinstrumentinstrument
LOC
Die Attack O O O O O O O O O O O O
23
PER VEH FAC
Segment-based Search Algorithm
• Limitations of the Token-based decoder
o unfair to compare nodes with different boundaries
• Complete mention is biased by the model
o difficult to synchronize edge steps• (NewB-FAC YorkI-FAC) is not yet a complete mention
no link can be made at this step
24
Not parsed yet
✓
✗
Segment-based Search Algorithm
• Node-step (search for entity mentions and event triggers)o propose various nodes at the current tokeno append to previous assignmentso evaluate and rank new assignments
25
ORG
PER
O
…
Asif Mohammed Hanif detonated explosives in Tel Aviv
Segment-based Search Algorithm
• Node-step (search for entity mentions and event triggers)o propose various nodes at the current tokeno append to previous assignmentso evaluate and rank new assignments
26
PER
ORG
O
…
PER
Context Features:noun phrase
person gazetteerprevious word:
“the”…
× PER
Segment-based Feature
Asif Mohammed Hanif detonated explosives in Tel Aviv
Segment-based Search Algorithm
• Node-step (search for entity mentions and event triggers)o propose various nodes at the current tokeno append to previous assignmentso evaluate and rank new assignments
27
Attack
PER
O
…
Injure
Asif Mohammed Hanif detonated explosives in Tel Aviv
Segment-based Search Algorithm
• Node-step (search for entity mentions and event triggers)o propose various nodes at the current tokeno append to previous assignmentso evaluate and rank new assignments
28
Asif Mohammed Hanif detonated explosives in Tel Aviv
Attack×
Append each candidate to previous prefixes
PER
ORG
PERO
…
Buffer at “hanif”
Segment-based Search Algorithm
• Edge-step (search for relation/argument links)o At each sub-step, connect each new node with a
previous one by a typed edge, or NIL.
29
Asif Mohammed Hanif detonated explosives in Tel Aviv
AttackPER WEAPON
Attacker Instrument
agent-artifact
Relation-Event Feature
Attacker Instrument
Agent-artifact
Segment-based Search Algorithm
• Return the candidate with the highest model scoreas the final prediction
30
Asif Mohammed Hanif detonated explosives in Tel Aviv
AttackPER WEAPON
attacker
O GPE
place
physical
agent-artifact
instrument
Can
did
at
es
• The maximal length of each node typeo ORG example: “Pearl River Hang Cheong Real Estate Consultants Ltd”
Segment-based Search Algorithm
31
Local Features• Local Features
o Similar to the features in pipelined approacheso Only care about local decisions
32
In Baghdad, a cameraman died when an American tank fired on the Palestine Hotel
place
targettarget
instrument
1. Trigger Word: “fired”2. Trigger POS: VDB3. Argument Word: “Baghdad”4. Dependency Path: argument prep_indiedadvcl trigger…
Global Features• Global Features
o Involve a wider range of the output structureo Ask arbitrary questions about the entire
structure
33
In Baghdad, a cameraman died when an American tank fired on the Palestine Hotel
place
targettarget
instrument
1. does “fired” have only one Place ? 2. is “Baghdad” an argument to “died” ?3. …
Global Trigger Feature
34
“a cameraman died when an American tank fired on …”
advcl
Die
advcl
Attack
Dependency link:
Die
“when”
Attack
Context word:
advcl: adverbial clause modifier
o two triggers share the same mention as arguments
“ a cameraman died when an American tank fired on …”
35
Global Argument Feature
Die(“died”)
Attack(“fired”)
Entity(“cameramen
”)
AdvclVict
imTarget
Global Entity Mention Features
• Neighbor entity mentions should have coherent types
36
prep_from
“Barbara Starr was reporting from the Pentagon”
FACPER
prep_from
PERPER
prep_from
Positive feature
Negative feature
prep_from: prepositional modifier “from”
Global Relation Features• Dependency compatibility
o two dependent mentions should have compatible relations
37
“U.S. forces in Somalia, Haiti and Kosovo”
GPE(“Somalia”
)
GPE(“Kosovo”)
PER(“forces”)
conj_andPHYS PHYS
conj_and: conjunction by “and”
• Data Setso ACE’05 corpus: excluding informal genres cts and uno ACE’04 corpus: bnews and nwire subsets
• Evaluate the performance for each subtask and the end-to-end systems by using F1 measure
38
Data Set # sents
# mentions
# relations
# triggers
# args
ACE’05Train 7.2k 26.4k 4.7k 2.8k 4.5k
Dev 1.7k 6.4k 1.1k 0.7k 1.1k
Test 1.5k 5.4k 1.1k 0.6k 1.0k
ACE’04 6.7k 22.7 4.3k N/A
Experiments
• Results on ACE’05 With gold-standard entity mentions, values and timex
Experiments
39
Token-based Decoder
[Q. Li, H. Ji, L. Huang. ACL 2013]
• Results on ACE’05 (Li and Ji, ACL 2014)
40
End-to-end Relation Extraction
Experiments
[Q. Li, H. Ji. ACL 2014]
• Three types of loss functions in K-best MIRAo F1 Measure
o 0-1 loss
o Similar to F1 loss, but sensitive to the size of structures
41
Asif Mohammed Hanif detonated explosives in Tel Aviv
InjurePER
Victim
ExperimentsComplete Model (Entity Mention, Relation, Event)
• Overall Performance
42
ApproachEntity Mentio
n
Relation
Event Trigger
Event Argument
Preliminary Results
Pipelined Baseline 79.5
51.6 64.4 35.7
Pipelined + Token-based
64.5 43.1
Li and Ji (2014) 80.8 52.1
Complete Joint Model
Joint w/ Avg. Perceptron
81.0 52.0 65.3 45.6
Joint w/ MIRA w/ F1 Loss
79.0 49.2 61.5 47.4
Joint w/ MIRA w/ 0-1 Loss
80.0 51.0 63.2 47.9
Joint w/ MIRA w/ Loss 3
80.7 52.8 65.2 46.8
ExperimentsComplete Model (Entity Mention, Relation, Event)
[Q. Li, H. Ji, Y. Hong, S. Li. ACL 2014]
Remaining Challenges• Capture world knowledge
o Williams picked up the child and this time, threwAttack her out the window.
o We believe that the likelihood of them usingAttack those weapons goes up.
• Disambiguate physical and non-physical eventso Sam Brownback vowed Monday to defend Kansas' ban of ... o it is still hurts me to read this. (“hurt” is not an attack here)
• Pronoun resolutiono It’s important that people know that we don’t believe in the warAttack.
o Nobody questions whether thisAttack is right or not.
• Semantic inferenceo Negotiations between Washington and Pyongyang on their nuclear
dispute have been set for April 23 in Beijing and are widely seen here as a blow to Moscow efforts to stamp authority on the region by organizing such a meeting.
38
44
This work • Provided a novel view about the whole task.• Significantly improved the end-to-end
performance.
• Is limited to single-sentence and single language.
Can we go beyond the sentence boundaries, and break the barrier of different languages?
Next: we study cross-doc dependencies and cross-lingual dependencies.
Outline• Background• OverviewI. Joint Extraction Framework
o Leverage Cross-component Dependencies
II.Cross-document Inferenceo Leverage Cross-doc Dependencies
III.Bilingual Name Taggingo Leverage Cross-lingual Dependencies
• Conclusions & Future Directions• Related Publications
45
Joint Inference for Cross-doc IE
• Input:
• Output:
46
Sentence-level IE System
A large collection of documents
Information graph
cleaned graph
Join
t Infe
rence
Joint Inference for Cross-doc IE
47
• Estimate constraints from ACE 2005 corpus using point-wise mutual information (PMI)
• : frequency of relation Li
• : co-occurrence frequency of and , for any two entities and
PMI < threshold Incompatible resultsUse as hard constraint
relation/event
entity
Pairwise
Li Lj
Person A founded Organization B
Organization B hired Person A
Person A has a Business relation with Person B
Person A has a Person-Social relation (e.g. family) with Person B
Triangle Entity
Li Lj
Person A and B are involved in Meet events
Person A and Person B are members of the same Organization C
Triangle Link
Li Lj Lk
Entity A is involved in a Transport event originated from Location B
Person C is affiliated with/member of Entity A
Person C is located in Location B
• Example of hard constraintso 34 pairwise and 16 triangle
Meet
Mem
ber Mem
ber
Democratic Party
Joint Inference for Cross-doc IE
48
Joint Inference for Cross-doc IE
• ILP inference framework
49
p(i,j) is local confidence for j-th occurrence of i-th fact
If xi is 1, i-th fact is selected, otherwise, i-th
fact is discarded
Joint Inference for Cross-doc IE
• Data Set: DARPA GALE distillation task data, 381,588 newswire
• Scoring metric: Browsing Cost (i) = the number of incorrect or redundant facts that a user must examine before finding i correct facts
50
# of correct relation in objective function
# o
f in
corr
ect
re
latio
n
# o
f re
mo
vals
Limitations: •Only took 1-best results as input•Cross-document co-reference is accomplished by string matching
[Qi Li et al. CIKM 2011]
Outline• Background• OverviewI. Joint Extraction Framework
o Leverage Cross-component Dependencies
II.Cross-document Inferenceo Leverage Cross-doc Dependencies
III.Bilingual Name Taggingo Leverage Cross-lingual Dependencies
• Conclusions & Future Work • Related Publications
51
Bilingual Name Tagging
52
• Monolingual name tagging for parallel corpora☹lack of cross-lingual consistency☺features from two languages are
complementary
o Chinese side is ambiguous between Person name and Organization name
o English side has more clues for Organization: capitalization, key word “Bank”, and “the”
BIO schema: B-X: beginning of X;I-X: within X; O: no type
Bilingual Name Tagging• Linear-chain CRFs with cross-lingual
featureso two separate classifierso propagate features from one language to the
other using word alignmento implicitly enforce bilingual consistency
53
English Features for “ 亚行” :
first word = theword=Asianword=Developmentlast word=BankCapitalized=True…
Bilingual Name Tagging
54
• Joint Bilingual CRFso factor graph representation
Monolingual bigram factorsLinear-chains on each sentence
Bilingual factorsBased on word alignmentExplicitly model the cross-language dependency
InferenceApply loopy belief propagation to do approximate inference (Wainwright et al., 2001; Sutton et al., 2007)
Bilingual Name Tagging• Training/Test Data
o 288 Chinese-English documents from Parallel Treebank
o 230 documents for training; 58 documents for test
• Evaluation Metric: bilingual name pair metrico Precision/Recall/F-measure on
bilingual name pairs
55
Type English Chinese Bilingual Pairs
GPE(Geo-political entity)
4.0k 4.0k 4.0k
Person 1.0k 1.0k 1.0k
Organization 1.5k 1.5k 1.5k
All 6.6k 6.6k 6.6k
Bilingual Name Tagging
56
• Overall Performance (Li et al., CIKM 2012)
[Q. Li et al. CIKM 2012]
Bilingual Name Tagging
57
• Bilingual name tagging improves name-aware machine translation & word alignment (Haibo Li et al., ACL 2013)o Baseline: Hierarchical Phrase-based Machine Translation
(Zheng et al., 2009)Task Metric Baseline MT Name-aware
MT
Name Translation
Weak Accuracy
66.5% 72.9%
Overall MT BLEU 35.8% 36.3%
Name-aware BLEU
36.1% 39.4%
Name Alignment
F-measure 46.0% 50.3%
[H. Li et al. ACL 2013]
Conclusions
58
• We investigated cross-component, cross-document, and cross-lingual dependencies to improve IE performance
Conclusions
59
• We investigated cross-component, cross-document, and cross-lingual dependencies to improve IE performance
1. For the first time, we formulated the problem of IE as the task of constructing information networks. We showed that performing structured learning with global features is possible and very useful to this task. Our joint framework achieved state-of-the-art in each subtask.
2. Beyond sentence-level, our cross-document inference method can effectively remove conflicting results.
3. Our bilingual name tagger significantly outperforms the traditional monolingual method. It can improve name-aware machine translation.
Future Directions• Expand Information Types
• Knowledge Acquisition for IEo Use world knowledge to guide IE (Chan & Roth 2010
etc.)
60
Asif Mohammed Hanif detonated explosives in Tel Aviv
AttackPER WEAPON
attacker
O GPE
place
physical
agent-artifact
instrument
“A Germanwings flight 9525 crashed in the French Alps”•Germanwings: Commercial ORG•Flight 9525: Air Vehicle•New Event Types: Accident, Rescue, Evacuation etc.
Related Publications• Constructing Information Networks Using One Single Model
Qi Li, Heng Ji, Yu Hong, Sujian Li. EMNLP 2014• Incremental Joint Extraction of Entities and Relations
Qi Li, Heng Ji. ACL 2014• Joint Event Extraction via Structured Prediction with Global Features
Qi Li, Heng Ji, Liang Huang. ACL 2013
• Joint Bilingual Name Tagging for Parallel CorporaQi Li, Haibo Li, Heng Ji, Wen Wang, Jing Zheng, Fei Huang. CIKM 2012
• Name-aware Machine TranslationHaibo Li, Jing Zheng, Heng Ji, Qi Li, Wen Wang. ACL 2013
• Joint Inference for Cross-document Information ExtractionQi Li, Sam Anzaroot, Wen-Pin Lin, Xiang Li, and Heng Ji. CIKM 2011
61
36 citations
18 citations
19 citations
Results in International Evaluations
62
Task Year Ranking
KBP Temporal Slot Filling
2011 1st
KBP Event Argument Extraction
2014 3rd
Event Nugget Evaluation
2014 1st
63
Thank You
Thanks to my committee, Blender members, friends, and family
for their advice and support!
Top Related