Improving Information Extraction by Acquiring External...
Transcript of Improving Information Extraction by Acquiring External...
![Page 1: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/1.jpg)
Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning
Karthik Narasimhan, Adam Yala, Regina Barzilay
CSAIL, MIT
1
![Page 2: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/2.jpg)
Information Extraction: State of the Art
• Dependence on large training sets
ACE:300Kwords Freebase:24Mrela6ons
2
Not available for many domains (ex. medicine, crime)
• Even large corpora do not guarantee high performance ~ 75% F1 on relation extraction (ACE) ~ 58% F1 on event extraction (ACE)
![Page 3: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/3.jpg)
3
Task: Identify food carcinogens
CoffeesignificantlyreducedERandcyclinD1abundanceinER(+)cells…CoffeereducedthepAktlevelsinbothER(+)andER(-)cells.
A hard reading task for you
![Page 4: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/4.jpg)
3
Task: Identify food carcinogens
CoffeesignificantlyreducedERandcyclinD1abundanceinER(+)cells…CoffeereducedthepAktlevelsinbothER(+)andER(-)cells.
Is coffee a carcinogen?
A hard reading task for you
![Page 5: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/5.jpg)
A hard reading task for machines: IE
4
A2yearoldgirlandfourotherpeoplewerewoundedinashoo6nginWestEnglewoodThursdaynight,policesaid
four
Extraction (NumWounded)
![Page 6: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/6.jpg)
5
Thelastshoo6nglePfivepeoplewounded. five
A2yearoldgirlandfourotherpeoplewerewoundedinashoo6nginWestEnglewoodThursdaynight,policesaid
four
A hard reading task: IE (not always!)
Extraction (NumWounded)
![Page 7: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/7.jpg)
6
Incorporate External Evidence
Traditional formulation
Our approach
extract + reason
extra articles
extract agg.
![Page 8: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/8.jpg)
Challenges
7
1. Event Coreference 2. Reconciling Predictions
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
Several irrelevant articles! Inconsistent extractions
![Page 9: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/9.jpg)
Learning through Reinforcement
8
extractOriginalShooter: Scott Westerhuis
NumKilled: 4 Location: S.D
Start with traditional extraction system
![Page 10: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/10.jpg)
Learning through Reinforcement
9
extract
extractquery
OriginalShooter: Scott Westerhuis
NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
Perform a query and extract from a new article
![Page 11: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/11.jpg)
Learning through Reinforcement
10
extractOriginal
State
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
extractsearch
Current
New
![Page 12: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/12.jpg)
RL: State
11
0.3
0.2
0.1
0.4
0.6
0.3
Conf
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
State
New
Curr
![Page 13: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/13.jpg)
RL: State
11
0.3
0.2
0.1
0.4
0.6
0.3
Conf
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
0.3 0.2 0.1
currentConf
State
New
Curr
![Page 14: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/14.jpg)
RL: State
11
0.3
0.2
0.1
0.4
0.6
0.3
Conf
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
0.3 0.2 0.1
currentConf
0.4 0.6 0.3
newConf
State
New
Curr
![Page 15: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/15.jpg)
RL: State
11
0.3
0.2
0.1
0.4
0.6
0.3
Conf
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
0.3 0.2 0.1
currentConf
0.4 0.6 0.3
newConf
1 0 0
matches
State
New
Curr
![Page 16: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/16.jpg)
RL: State
11
0.3
0.2
0.1
0.4
0.6
0.3
Conf
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
0.3 0.2 0.1
currentConf
0.4 0.6 0.3
newConf
1 0 0
matches
0.65 docSim
State
New
Curr
![Page 17: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/17.jpg)
RL: State
11
0.3
0.2
0.1
0.4
0.6
0.3
Conf
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
0.3 0.2 0.1
currentConf
0.4 0.6 0.3
newConf
1 0 0
matches
0.65 docSim
1 0 .. 0 0
contextState
New
Curr
![Page 18: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/18.jpg)
State 1
RL: Actions
1. Reconcile (d) old values and new values. Pick a single value, all values or no value from new set
12
New
reconcileCurrShooter: Scott Westerhuis
NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
Shooter: Scott Westerhuis NumKilled: 6 Location: S.D
![Page 19: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/19.jpg)
RL: Actions
13
Final
Shooter: Scott Westerhuis NumKilled: 6 Location: S.D
New
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
State 1
reconcileCurr
2.Decidehowtoproceed:Stop
![Page 20: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/20.jpg)
RL: Actions
14
selectq
extractsearch
State 2
2.Decidehowtoproceed:Selectnextquery(q)
Shooter: Scott Westerhuis NumKilled: 6 Location: S.D
Shooter: Westerhuis NumKilled: 4
Location: Platte
New
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
State 1
reconcileCurr
![Page 21: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/21.jpg)
Queries
Querytemplatesareinducedautoma<cally
• Titleoforiginalar6cle• Contentwordshavinghighmutualinforma6onwithgoldvalues
<title> <title> + ( suspect | shooter | said | men | arrested | …) <title> + ( injured | wounded | victims | shot | … )
15
![Page 22: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/22.jpg)
• Changeinaccuracy
• Smallpenaltyforeachtransi6on
Rewards
R(s, a) =X
entityj
Acc(ejcur)�Acc(ejprev)
16
= 1
Shooter: Scott Westerhuis NumKilled: 6
NumWounded: 0 Location: Platte
Current Values
Shooter: Scott Westerhuis NumKilled: 6
NumWounded: 1 Location: Platte
Previous Values
![Page 23: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/23.jpg)
Deep Q-Network
State space is continuous: requires function approximation
Q(s, a) ⇡ Q(s, a; ✓)
17
Trained to maximize cumulative reward
(reconcile) (query)
![Page 24: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/24.jpg)
Acquiring External Evidence
18
1. Select a query to search for articles on the same event
2. Use base extractor to obtain values for entities of interest
3. Reconcile old and new extractions
extract Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
![Page 25: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/25.jpg)
Related Work
19
• Open Information Extraction (Etzioni et al., 2011; Fader et al., 2011; Wu and Weld, 2010)
![Page 26: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/26.jpg)
Related Work
19
• Open Information Extraction (Etzioni et al., 2011; Fader et al., 2011; Wu and Weld, 2010)
• Slot filling (Surdeanu et al., 2010; Ji and Grishman, 2011)
![Page 27: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/27.jpg)
Related Work
19
• Open Information Extraction (Etzioni et al., 2011; Fader et al., 2011; Wu and Weld, 2010)
• Slot filling (Surdeanu et al., 2010; Ji and Grishman, 2011)
• Searching for additional sources on the web (Banko et al., 2002, West et al., 2014; Kanani and McCallum, 2012)
![Page 28: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/28.jpg)
Datasets
1. Mass shootings in the United States
20
Train Test DevSource 306 292 66
Downloaded 8k 7.9k 1.6k
Shooter Name
Num Killed
Num Wounded
City
![Page 29: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/29.jpg)
Datasets
2. Adulteration events from Foodshield EMA
21
Train Test DevSource 292 148 42
Downloaded 7.6k 5.3k 1.5k
Food
Adulterant
Location
![Page 30: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/30.jpg)
Base Extraction Model
22
Indirect supervision: Project database values onto articles
Maximum entropy model with contextual features
(Chieu and Ng, 2002; Bunescu et al., 2005)
![Page 31: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/31.jpg)
Baselines (1)
Simple Aggregation systems:
• Confidence-based: Choose entity value with highest confidence
23
Shooter: Scott Westerhuis NumKilled: 6 Location: S.D
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
0.3
0.2
0.1
0.4
0.6
0.3
0.7
0.2
0.1
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
Original
Extra Final
(Skounakis and Craven, 2003)
![Page 32: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/32.jpg)
Baselines (1)
Simple Aggregation systems:
• Majority-based: Choose entity value extracted the most from all articles on the event
24
Shooter: Scott Westerhuis NumKilled: 6 Location: S.D
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
Shooter: Scott Westerhuis NumKilled: 6 Location: S.D
Original
Extra Final
(Skounakis and Craven, 2003)
![Page 33: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/33.jpg)
Baselines (2)
25
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Westerhuis NumKilled: 0
Location: Platte
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott NumKilled: 2 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
Shooter: Westerhuis NumKilled: 4
Location: Platte
Shooter: Scott Westerhuis NumKilled: 2 Location: S.D
Meta-classifier: • Same input space S and set of reconciliation
decisions as RL agent.
Original Extra Reconciled
![Page 34: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/34.jpg)
Baselines (2)
26
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
Final
Confidence agg.Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Westerhuis NumKilled: 0
Location: Platte
Shooter: Scott Westerhuis NumKilled: 4 Location: S.D
Shooter: Scott NumKilled: 2 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
Shooter: Westerhuis NumKilled: 4
Location: Platte
Shooter: Scott Westerhuis NumKilled: 2 Location: S.D
Original Extra Reconciled
Meta-classifier: • Same input space S and set of reconciliation
decisions as RL agent.
![Page 35: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/35.jpg)
Accuracy (Shootings)
27
Acc
urac
y
60
65
70
75
80
Maxent Confidence Agg. Meta-Classifier RL-Extract
77.6
70.770.369.7
NumKilled
![Page 36: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/36.jpg)
Accuracy (Shootings)
28
Acc
urac
y
60
65
70
75
80
Maxent Confidence Agg. Meta-Classifier RL-Extract
77.6
70.770.369.7
NumKilled
![Page 37: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/37.jpg)
Accuracy (Adulterations)
29
Food
Acc
urac
y
50
53.75
57.5
61.25
65
Maxent Majority Agg. Meta-Classifier RL-Extract
59.6
55.4
56.756.0
![Page 38: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/38.jpg)
Oracle
• Given:
• Same base extractor
• Same set of queries
• Agent performing perfect reconciliation and querying decisions.
• Upper-bound on performance of any system given these extra articles on each event.
30
![Page 39: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/39.jpg)
Accuracy (Shootings)
31
Acc
urac
y
50
60
70
80
90
Maxent RL-Extract Oracle
86.4
77.6
69.7
NumKilled
![Page 40: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/40.jpg)
RL-Extract
32
selectq
extractsearch
State 2
Shooter: Scott Westerhuis NumKilled: 6 Location: S.D
Shooter: Westerhuis NumKilled: 4
Location: Platte
New
OldShooter: Scott Westerhuis
NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
State 1
reconcile
Both reconciliation and querying
![Page 41: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/41.jpg)
RL-Basic
33
selectq
extractsearch
State 2
Shooter: Scott Westerhuis NumKilled: 6 Location: S.D
Shooter: Westerhuis NumKilled: 4
Location: Platte
New
OldShooter: Scott Westerhuis
NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
State 1
reconcile
Documents are presented in round robin order from different query lists
![Page 42: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/42.jpg)
RL-Query
34
selectq
extractsearch
State 2
Shooter: Scott Westerhuis NumKilled: 6 Location: S.D
Shooter: Westerhuis NumKilled: 4
Location: Platte
New
OldShooter: Scott Westerhuis
NumKilled: 4 Location: S.D
Shooter: Scott Westerhuis NumKilled: 6
Location: Platte
State 1
reconcile
Reconciliation is confidence-based
![Page 43: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/43.jpg)
RL ModelsA
ccur
acy
50
57.5
65
72.5
80
RL-Basic RL-Query RL-Extract
77.6
66.6
71.2
Both reconciliation and querying are important and inter-linked
35
NumKilled
![Page 44: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/44.jpg)
Agent learns to balance all entity choices simultaneously
36
Avg. RewardShooterName
NumKilledNumWounded
City
Evolution of Test Accuracy
![Page 45: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/45.jpg)
Examples
37
TextShooterName
BasicExtractor
AsourcetellsChannel2Ac6onNewsthatThomasLeehasbeenarrestedin
Mississippi...Sgt.StewartSmith,withtheTroupCountySheriff’soffice,said.
Stewart
RL-Extract Leeisaccusedofkillinghiswife,Chris6e;… Lee
![Page 46: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/46.jpg)
Examples
38
Text NumKilled
BasicExtractor
Shoo6ngleaves25yearoldPi_sfieldmandead,4injured
0
RL-ExtractOnemanisdeadaPerashoo6ngSaturdaynightattheintersec6onofDeweyAvenue
andLindenStreet.1
Our system finds alternative sources of information for reliable extraction
![Page 47: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/47.jpg)
Adulteration Detection
39
![Page 48: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/48.jpg)
Conclusion
‣ Alternative paradigm to improve Information Extraction, especially for low-resource domains.
‣ Use of Reinforcement Learning to find and incorporate external information.
40
Code and data available at: http://people.csail.mit.edu/karthikn/rl-ie/
![Page 49: Improving Information Extraction by Acquiring External ...people.csail.mit.edu/karthikn/assets/pdf/rlie16-slides.pdf · Information Extraction: State of the Art • Dependence on](https://reader035.fdocuments.us/reader035/viewer/2022071218/604f3955d320cd572b3d2047/html5/thumbnails/49.jpg)
41