Entity-oriented filtering of large streams
description
Transcript of Entity-oriented filtering of large streams
![Page 1: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/1.jpg)
Entity-oriented filtering of large streams
John R. [email protected]
Dan A. [email protected]
http://trec-kba.org
![Page 2: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/2.jpg)
Date: Tue, 13 Mar 2012 02:45:40 +0000
From: Google Alerts <[email protected]>
Subject: Google Alert - "John R. Frank"
=== Web - 2 new results for ["John R. Frank"] ===
John R. Frank
SPOKANE, Wash. - John R. Frank, 55, died March 4, 2012, in Coeur d' Alene,
Idaho. Survivors include: his wife, Miki; daughter, Patricia Frank; ...
<http://www.hutchnews.com/obituaries/Frank--John-CP>
In Memory of John R Frank
Biography. John R. Frank, age 55, passed away at Sacred Heart Medical
Center in Spokane, WA, on March 4, 2012. John was born in Hutchison, KS, ...
<http://www.englishfuneralchapel.com/sitemaker/sites/Englis1/obit.cgi?user=583335Frank>
![Page 3: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/3.jpg)
2012 Task:Filtering to Recommend Citations
1) Initialize with a target WP entity• state of WP from Jan 2012
2) Iterate over stream of text items• Oct-Dec 2011: train on labels
3) For each, output confidence between 0, 1• Jan-Apr 2012: labels hidden
Content Stream•462M texts, 40% English•4,973 hourly chunks of a 105 docs/hour•News, blogs, forums, and link shortening
Content Stream•462M texts, 40% English•4,973 hourly chunks of a 105 docs/hour•News, blogs, forums, and link shortening
Your KBA System
Entities in Wikipedia or another Knowledge Base
Automatically recommend
new editsDiffeo
Sponsors:
![Page 4: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/4.jpg)
s3://aws-publicdatasets/trec/kba/kba-stream-corpus-2012/
![Page 5: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/5.jpg)
![Page 6: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/6.jpg)
Accelerate?
rate of assimilation << stream size
# editors << # entities << # mentions
(definition of a “large” KB)
![Page 7: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/7.jpg)
How many days must a news article wait before being cited in Wikipedia?
![Page 8: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/8.jpg)
Complex entity with many relationships and attributes.
![Page 9: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/9.jpg)
Has many interests, including trying to takeover UK soccer teams.
His empire includes many entities…
Note: Usmanov not mentioned in this text!
Citation #18
Elaborate link trails…
![Page 10: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/10.jpg)
Example KBA Rating Task
Published: March 31, 2012Impact of Thoughts on WaterBy Denis Gorce-Bourge
Water covers 70% of our Blue planet and our body is made of about 70% water.
Masaru Emoto is a Japanese Photographer and scientist. He is known over the world for his remarkable work on water and its deep connection with individual and collective consciousness.
For decades, Masaru took pictures of frozen crystals of water and tested the direct influence of the environment on the quality of those crystals.
Pollution has a direct impact on the beauty of a frozen crystal but as well words, music and thoughts. He tested the quality of water crystals by exposing it to various conditions: to written words like hate and violence and Love and gratitude. The results were just astonishing. The crystal exposed to Love and gratitude was beautiful and perfectly formed where the other one was severely degraded. He demonstrated as well the impact of Heavy Metal music versus Mozart or Beethoven and how the vibration of music impacts water.
The very shape of water crystals is modified by violence, aggression, and negative words.
![Page 11: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/11.jpg)
Example KBA Rating Task
Published: March 31, 2012Impact of Thoughts on WaterBy Denis Gorce-Bourge
Water covers 70% of our Blue planet and our body is made of about 70% water.
Masaru Emoto is a Japanese Photographer and scientist. He is known over the world for his remarkable work on water and its deep connection with individual and collective consciousness.
For decades, Masaru took pictures of frozen crystals of water and tested the direct influence of the environment on the quality of those crystals.
Pollution has a direct impact on the beauty of a frozen crystal but as well words, music and thoughts. He tested the quality of water crystals by exposing it to various conditions: to written words like hate and violence and Love and gratitude. The results were just astonishing. The crystal exposed to Love and gratitude was beautiful and perfectly formed where the other one was severely degraded. He demonstrated as well the impact of Heavy Metal music versus Mozart or Beethoven and how the vibration of music impacts water.
The very shape of water crystals is modified by violence, aggression, and negative words.
![Page 12: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/12.jpg)
![Page 13: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/13.jpg)
97.6% +/- 1.4% (N=5365) coref
69.5% +/- 2.7% (N=1352) central
70.9% +/- 2.0% (N=2403) relevant
58.4% +/- 3.4% (N=884) neutral
84.9% +/- 2.0% (N=2599) garbage
82.6% +/- 1.8% (N=3200) central relevant
89.0% +/- 1.7% (N=3551) central relevant neutral
Interannotator Agreement
![Page 14: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/14.jpg)
IR:•User task centric•Variation in interpretation•Scores cascading lists•Constructionist, emergence
NLP:•Data parsing centric•Universal annotation•Scores probabilities•Reductionist
TRECing the continental divide between NLP and IR
![Page 15: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/15.jpg)
![Page 16: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/16.jpg)
string matching
task generator 91% recall15% precision26% F1
![Page 17: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/17.jpg)
![Page 18: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/18.jpg)
KBA 2013More entity types with an emphasis on temporality in the stream.
Target Entities KB Centrally Relevant
Training Data Annotation
People and Organizations
Wikipedia ormaybe Freebase
Citation worthy Judgments from early stream
High recall on all mentioning docs.
Pharmaceutical Compounds
Merck KB? Reporting of Adverse Drug Reaction (ADR)(an event)
(same) Focus recall on first person reporting & negative reactions?
Event-type Entities
Defined by a cluster of entities and possibly a Type-of-Event from a taxonomy
WP/FB for cluster of entities, possibly also event itself.
Provides causality info
Judgments on docs for that Type-of-Event but different specific event.
Find training data from TDT?Use citations in Category:Current_events?
Judge post-hoc?
![Page 19: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/19.jpg)
KBXCold Start queries
focused on:nil entities
related to target clusterand/or
causality of event
Pool top-K filtered docs, or use each KBA run as separate KBP input.
(1000x filter)
Must coordinate choice of KBA target entities with
desired content of KBs for Cold Start queries.
KBA Stream Corpus 2012 (or the new Stream Corpus 2013)•462M texts, 40% English•4,973 hourly chunks of a 105 docs/hour•News, blogs, forums, and link shortening
KBA Stream Corpus 2012 (or the new Stream Corpus 2013)•462M texts, 40% English•4,973 hourly chunks of a 105 docs/hour•News, blogs, forums, and link shortening
Output KB
Output KB
Clusters of related entities
and/orevent-type
entities
Clusters of related entities
and/orevent-type
entities
KBP
KBA
![Page 20: Entity-oriented filtering of large streams](https://reader035.fdocuments.us/reader035/viewer/2022062222/56814b85550346895db8697d/html5/thumbnails/20.jpg)
Sponsors: Thank You.
Diffeo