Document Level Models (1) 1em Coreference Resolution
Transcript of Document Level Models (1) 1em Coreference Resolution
![Page 1: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/1.jpg)
Document Level Models (1)
Coreference Resolution
Yoav Goldberg
Bar Ilan University
1 / 27
![Page 2: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/2.jpg)
Beyond sentences
Until nowI Mostly low-level components (building blocks)I Working at sentence level – analyzing each sentence
individually
Today (++)
I Looking at the document and the corpus level.I Still focusing on building-blocks.
2 / 27
![Page 3: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/3.jpg)
Coreference Resolution
(Which entities are the same?)
3 / 27
![Page 4: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/4.jpg)
Coreference Resolution
(Which entities are the same?)
3 / 27
![Page 5: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/5.jpg)
Coreference Resolution
(Which entities are the same?)
3 / 27
![Page 6: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/6.jpg)
Coreference Resolution Revisited
Mihai Surdeanu [email protected]
www.surdeanu.info/mihai
April 19th, 2013
![Page 7: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/7.jpg)
What is it?
Coreference resolution = the task of clustering together of expressions that refer to the same entity/concept.
Michelle LaVaughn Robinson Obama
is an American lawyer and writer. She
is the wife of the 44th and current
President of the United States, Barack
Obama, and the first African-American
First Lady of the United States.
Michelle LaVaughn Robinson Obama
is an American lawyer and writer. She
is the wife of the 44th and current
President of the United States, Barack
Obama, and the first African-American
First Lady of the United States.
/event
![Page 8: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/8.jpg)
Why is it important?
• Question answering • “Who is Barack Obama’s spouse?”
• Information extraction • “Find all per:spouse relations between all named
entities in this large corpus.” • News aggregation
• “What are recent events involving Michelle Obama?”
• Requires cross-document coreference resolution. More on this soon.
![Page 9: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/9.jpg)
Why is it important?
• Performance doubles for these applications when coreference resolution is used.
• See: R. Gabbard, M. Freedman, and R. Weischedel, "Coreference for Learning to Extract Relations: Yes, Virginia, Coreference Matters," ACL 2011.
![Page 10: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/10.jpg)
Real-world example
![Page 11: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/11.jpg)
Chengdu, China (CNN) – The researcher dressed in blueplastic smock, slippers and gloves is having a tough timegetting his work done.Every time Zhang Zhen sets up his camera on a tripod in aneffort to document the behavior of one of [[the panda cubs]scattered on [a grassy hillside]], one particularly frisky babypanda comes wobbling towards him, interrupting his shoot.“Mumu!” he yells in frustration, as the four-month old cub rearsup on [her] hind legs, lunging towards him. He picks Mumu upand deposits her at [the opposite end] of [the enclosure].“I’m not sure why she’s been all over me like this. I think she’sexcited today,” Zhang says.Mumu is the oldest of fourteen baby pandas that were born lastsummer here at the Research Base of [[Giant Panda] Breeding]in [Chengdu, China].CNN.com, Dec 2013
5 / 27
![Page 12: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/12.jpg)
Stages towards understanding
1. (Pre-processing – sentence boundary, tagging, parsing,. . . )
2. Entity Extraction3. Coreference Resolution4. Entity Linking (link to an ontology / database / wikipedia)
6 / 27
- Entity linking is a different related task, partly overlapping coref
![Page 13: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/13.jpg)
Stages towards understanding
1. (Pre-processing – sentence boundary, tagging, parsing,. . . )
2. Entity Extraction3. Coreference Resolution4. Entity Linking (link to an ontology / database / wikipedia)
6 / 27
![Page 14: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/14.jpg)
Chengdu, China (CNN) – The researcher dressed in blueplastic smock, slippers and gloves is having a tough timegetting his work done.Every time Zhang Zhen sets up his camera on a tripod in aneffort to document the behavior of one of [[the panda cubs]scattered on [a grassy hillside]], one particularly frisky babypanda comes wobbling towards him, interrupting his shoot.“Mumu!” he yells in frustration, as the four-month old cub rearsup on [her] hind legs, lunging towards him. He picks Mumu upand deposits her at [the opposite end] of [the enclosure].“I’m not sure why she’s been all over me like this. I think she’sexcited today,” Zhang says.Mumu is the oldest of fourteen baby pandas that were born lastsummer here at the Research Base of [[Giant Panda] Breeding]in [Chengdu, China].CNN.com, Dec 2013
7 / 27
![Page 15: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/15.jpg)
Chengdu, China (CNN) – The researcher dressed in blueplastic smock, slippers and gloves is having a tough timegetting his work done.Every time Zhang Zhen sets up his camera on a tripod in aneffort to document the behavior of one of [[the panda cubs]scattered on [a grassy hillside]], one particularly frisky babypanda comes wobbling towards him, interrupting his shoot.“Mumu!” he yells in frustration, as the four-month old cub rearsup on [her] hind legs, lunging towards him. He picks Mumu upand deposits her at [the opposite end] of [the enclosure].“I’m not sure why she’s been all over me like this. I think she’sexcited today,” Zhang says.Mumu is the oldest of fourteen baby pandas that were born lastsummer here at the Research Base of [[Giant Panda] Breeding]in [Chengdu, China].CNN.com, Dec 2013
7 / 27
![Page 16: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/16.jpg)
I Chengdu, ChinaI CNNI The researcherI blue plastic smock, slippers and glovesI though timeI [his] workI Zhang ZhenI [his] cameraI a tripodI an effortI the behaviorI one of the panda cubs scattered on a grassy
hillsideI the panda cubs scattered on . . .I a grassy hillsideI One particularly frisky baby pandaI himI [his] shootI MumuI he
I the four-month old cubI [her] hind legsI himI HeI MumuI herI the opposite end of [the enclosure]I II sheI meI II sheI todayI ZhangI MumuI fourteen baby pandasI last summerI hereI the Research Base of [[Giant Panda] Breeding] in
[Changdu, China]
8 / 27
![Page 17: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/17.jpg)
Stages towards understanding
1. (Pre-processing – sentence boundary, tagging, parsing,. . . )
2. Entity Extraction3. Coreference Resolution4. Entity Linking (link to an ontology / database / wikipedia)
9 / 27
![Page 18: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/18.jpg)
Stages towards understanding
1. (Pre-processing – sentence boundary, tagging, parsing,. . . )
2. Entity Extraction3. Coreference Resolution4. Entity Linking (link to an ontology / database / wikipedia)
9 / 27
![Page 19: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/19.jpg)
I Chengdu, ChinaI CNNI The researcherI blue plastic smock, slippers and glovesI though timeI [his] workI Zhang ZhenI [his] cameraI a tripodI an effortI the behaviorI one of the panda cubs scattered on a grassy
hillsideI the panda cubs scattered on . . .I a grassy hillsideI One particularly frisky baby pandaI himI [his] shootI MumuI he
I the four-month old cubI [her] hind legsI himI HeI MumuI herI the opposite end of [the enclosure]I II sheI meI II sheI todayI ZhangI MumuI fourteen baby pandasI last summerI hereI the Research Base of [[Giant Panda] Breeding] in
[Changdu, China]
10 / 27
![Page 20: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/20.jpg)
I Chengdu, ChinaI CNNI The researcherI blue plastic smock, slippers and glovesI though timeI [his] workI Zhang ZhenI [his] cameraI a tripodI an effortI the behaviorI one of the panda cubs scattered on a grassy
hillsideI the panda cubs scattered on . . .I a grassy hillsideI One particularly frisky baby pandaI himI [his] shootI MumuI he
I the four-month old cubI [her] hind legsI himI HeI MumuI herI the opposite end of [the enclosure]I II sheI meI II sheI todayI ZhangI MumuI fourteen baby pandasI last summerI hereI the Research Base of [[Giant Panda] Breeding] in
[Changdu, China]
10 / 27
![Page 21: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/21.jpg)
I Chengdu, ChinaI CNNI The researcherI blue plastic smock, slippers and glovesI though timeI [his] workI Zhang ZhenI [his] cameraI a tripodI an effortI the behaviorI one of the panda cubs scattered on a grassy
hillsideI the panda cubs scattered on . . .I a grassy hillsideI One particularly frisky baby pandaI himI [his] shootI MumuI he
I the four-month old cubI [her] hind legsI himI HeI MumuI herI the opposite end of [the enclosure]I II sheI meI II sheI todayI ZhangI MumuI fourteen baby pandasI last summerI hereI the Research Base of [[Giant Panda] Breeding] in
[Changdu, China]
10 / 27
![Page 22: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/22.jpg)
I Chengdu, ChinaI CNNI The researcherI blue plastic smock, slippers and glovesI though timeI [his] workI Zhang ZhenI [his] cameraI a tripodI an effortI the behaviorI one of the panda cubs scattered on a grassy
hillsideI the panda cubs scattered on . . .I a grassy hillsideI One particularly frisky baby pandaI himI [his] shootI MumuI he
I the four-month old cubI [her] hind legsI himI HeI MumuI herI the opposite end of [the enclosure]I II sheI meI II sheI todayI ZhangI MumuI fourteen baby pandasI last summerI hereI the Research Base of [[Giant Panda] Breeding] in
[Changdu, China]
10 / 27
![Page 23: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/23.jpg)
I Chengdu, ChinaI CNNI The researcherI blue plastic smock, slippers and glovesI though timeI [his] workI Zhang ZhenI [his] cameraI a tripodI an effortI the behaviorI one of the panda cubs scattered on a grassy
hillsideI the panda cubs scattered on . . .I a grassy hillsideI One particularly frisky baby pandaI himI [his] shootI MumuI he
I the four-month old cubI [her] hind legsI himI HeI MumuI herI the opposite end of [the enclosure]I II sheI meI II sheI todayI ZhangI MumuI fourteen baby pandasI last summerI hereI the Research Base of [[Giant Panda] Breeding] in
[Changdu, China]
10 / 27
![Page 24: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/24.jpg)
I Chengdu, ChinaI CNNI The researcherI blue plastic smock, slippers and glovesI though timeI [his] workI Zhang ZhenI [his] cameraI a tripodI an effortI the behaviorI one of the panda cubs scattered on a grassy
hillsideI the panda cubs scattered on . . .I a grassy hillsideI One particularly frisky baby pandaI himI [his] shootI MumuI he
I the four-month old cubI [her] hind legsI himI HeI MumuI herI the opposite end of [the enclosure]I II sheI meI II sheI todayI ZhangI MumuI fourteen baby pandasI last summerI hereI the Research Base of [[Giant Panda] Breeding] in
[Changdu, China]
10 / 27
![Page 25: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/25.jpg)
Coreferene is a clustering task
I Decide on number of clustersI Assign entity mentions to clusters
11 / 27
![Page 26: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/26.jpg)
I Chengdu, ChinaI CNNI The researcherI blue plastic smock, slippers and glovesI though timeI [his] workI Zhang ZhenI [his] cameraI a tripodI an effortI the behaviorI one of the panda cubs scattered on a grassy
hillsideI the panda cubs scattered on . . .I a grassy hillsideI One particularly frisky baby pandaI himI [his] shootI MumuI he
I the four-month old cubI [her] hind legsI himI HeI MumuI herI the opposite end of [the enclosure]I II sheI meI II sheI todayI ZhangI MumuI fourteen baby pandasI last summerI hereI the Research Base of [[Giant Panda] Breeding] in
[Changdu, China]
12 / 27
![Page 27: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/27.jpg)
Evaluation
How to evaluate?
I Types of mistakes:I Splitting a clusterI Merging of clusterI Incorrect assignment
I Which are more important?I How do we design a metric to capture these?
13 / 27
![Page 28: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/28.jpg)
Evaluation
How to evaluate?I Types of mistakes:
I Splitting a clusterI Merging of clusterI Incorrect assignment
I Which are more important?I How do we design a metric to capture these?
13 / 27
![Page 29: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/29.jpg)
Evaluation
How to evaluate?I Types of mistakes:
I Splitting a clusterI Merging of clusterI Incorrect assignment
I Which are more important?I How do we design a metric to capture these?
13 / 27
![Page 30: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/30.jpg)
I Chengdu, ChinaI CNNI The researcherI blue plastic smock, slippers and glovesI though timeI [his] workI Zhang ZhenI [his] cameraI a tripodI an effortI the behaviorI one of the panda cubs scattered on a grassy
hillsideI the panda cubs scattered on . . .I a grassy hillsideI One particularly frisky baby pandaI himI [his] shootI MumuI he
I the four-month old cubI [her] hind legsI himI HeI MumuI herI the opposite end of [the enclosure]I II sheI meI II sheI todayI ZhangI MumuI fourteen baby pandasI last summerI hereI the Research Base of [[Giant Panda] Breeding] in
[Changdu, China]
14 / 27
![Page 31: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/31.jpg)
EvaluationThe B3 F-Score Metric
P =1|docs|
∑doc∈docs
∑m∈doc
|gm ∩ pm||pm|
R =1|docs|
∑doc∈docs
∑m∈doc
|gm ∩ pm||gm|
F1 = 2PR
P + Rgm gold cluster containing m
pm predicted cluster containing m
|x| size of cluster x
Eval is open for debate
I B3 is good, but not perfect.I Other variants exist.
15 / 27
![Page 32: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/32.jpg)
EvaluationThe B3 F-Score Metric
P =1|docs|
∑doc∈docs
∑m∈doc
|gm ∩ pm||pm|
R =1|docs|
∑doc∈docs
∑m∈doc
|gm ∩ pm||gm|
F1 = 2PR
P + Rgm gold cluster containing m
pm predicted cluster containing m
|x| size of cluster x
Eval is open for debate
I B3 is good, but not perfect.I Other variants exist.
15 / 27Common CoNLL F1 - averaging 3 measures
![Page 33: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/33.jpg)
How to solve?
16 / 27
![Page 34: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/34.jpg)
I Chengdu, ChinaI CNNI The researcherI blue plastic smock, slippers and glovesI though timeI [his] workI Zhang ZhenI [his] cameraI a tripodI an effortI the behaviorI one of the panda cubs scattered on a grassy
hillsideI the panda cubs scattered on . . .I a grassy hillsideI One particularly frisky baby pandaI himI [his] shootI MumuI he
I the four-month old cubI [her] hind legsI himI HeI MumuI herI the opposite end of [the enclosure]I II sheI meI II sheI todayI ZhangI MumuI fourteen baby pandasI last summerI hereI the Research Base of [[Giant Panda] Breeding] in
[Changdu, China]
17 / 27
![Page 35: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/35.jpg)
I Chengdu, ChinaI CNNI The researcherI blue plastic smock, slippers and glovesI though timeI [his] workI Zhang ZhenI [his] cameraI a tripodI an effortI the behaviorI one of the panda cubs scattered on a grassy
hillsideI the panda cubs scattered on . . .I a grassy hillsideI One particularly frisky baby pandaI himI [his] shootI MumuI he
I the four-month old cubI [her] hind legsI himI HeI MumuI herI the opposite end of [the enclosure]I II sheI meI II sheI todayI ZhangI MumuI fourteen baby pandasI last summerI hereI the Research Base of [[Giant Panda] Breeding] in
[Changdu, China]
17 / 27
![Page 36: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/36.jpg)
A typical algorithm
Iden%fy all men%ons
![Page 37: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/37.jpg)
A typical algorithm
Iden%fy all men%ons Compute link scores between all pairs
![Page 38: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/38.jpg)
A typical algorithm
Iden%fy all men%ons Compute link scores between all pairs Par%%on this graph into en%ty clusters
![Page 39: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/39.jpg)
Pairwise Approaches
I How to score each pair?I Classifier.I But how to train? (what are the examples? what are the
features?)I How to choose best partition given pair scores?
18 / 27
![Page 40: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/40.jpg)
Choosing a partition
I Choosing a globally optimal clustering under reasonableobjectives is NP-hard.
I Resort to heuristics or approximations.
A possible algorithm 1
I For each mention in order of appearanceI Compute scores to previous mentionsI Decide if starting a new cluster or linking to existing cluster
A possible algorithm 2
I Assume that each mention has at most one antecedentI Mentions with 0 antecedents start a new clusterI Now, search for trees instead of clusters
I (trees are easy..)
19 / 27
![Page 41: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/41.jpg)
Choosing a partition
I Choosing a globally optimal clustering under reasonableobjectives is NP-hard.
I Resort to heuristics or approximations.
A possible algorithm 1
I For each mention in order of appearanceI Compute scores to previous mentionsI Decide if starting a new cluster or linking to existing cluster
A possible algorithm 2
I Assume that each mention has at most one antecedentI Mentions with 0 antecedents start a new clusterI Now, search for trees instead of clusters
I (trees are easy..)
19 / 27
![Page 42: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/42.jpg)
Choosing a partition
I Choosing a globally optimal clustering under reasonableobjectives is NP-hard.
I Resort to heuristics or approximations.
A possible algorithm 1
I For each mention in order of appearanceI Compute scores to previous mentionsI Decide if starting a new cluster or linking to existing cluster
A possible algorithm 2
I Assume that each mention has at most one antecedentI Mentions with 0 antecedents start a new clusterI Now, search for trees instead of clusters
I (trees are easy..)
19 / 27
![Page 43: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/43.jpg)
A possible algorithm 1
I For each mention in order of appearanceI Compute scores to previous mentionsI Decide if starting a new cluster or linking to existing cluster
Discuss features.
Discuss training examples.
20 / 27
![Page 44: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/44.jpg)
Discuss potential problems with the pairwise approach.
21 / 27
![Page 45: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/45.jpg)
I Chengdu, ChinaI CNNI The researcherI blue plastic smock, slippers and glovesI though timeI [his] workI Zhang ZhenI [his] cameraI a tripodI an effortI the behaviorI one of the panda cubs scattered on a grassy
hillsideI the panda cubs scattered on . . .I a grassy hillsideI One particularly frisky baby pandaI himI [his] shootI MumuI he
I the four-month old cubI [her] hind legsI himI HeI MumuI herI the opposite end of [the enclosure]I II sheI meI II sheI todayI ZhangI MumuI fourteen baby pandasI last summerI hereI the Research Base of [[Giant Panda] Breeding] in
[Changdu, China]
22 / 27
![Page 46: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/46.jpg)
Some insights learned
• Most algorithms focus on step 2: computing mention-pair scores using machine learning, which is a local operation • Poor representation of context: only two mentions considered
• Recent work showed that it is important to address coreference resolution as a global task, where all mentions are modeled jointly • This is hard to model using machine learning
• ML models generalize poorly to new words, domains, and languages • Annotating coreference is expensive
![Page 47: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/47.jpg)
Idea I
✖ machine learning
✔ deterministic, rule-based model
✔ “baby steps” approach ✔ global model
![Page 48: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/48.jpg)
Entity coreference resolution model
• Novel architecture for coreference resolu%on: • “Baby steps” – accurate things first • Global – aDribute sharing in clusters • Determinis4c – rule-‐based model
• Top ranked system at CoNLL-‐2011 Shared Task:
• 58.3% (open), 57.8% (closed)
![Page 49: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/49.jpg)
• Multiple passes (or “sieves”) over text
• Precision of each pass is smaller than preceding ones
• Recall keeps increasing with each pass
• Decisions once made cannot be modified by later passes
• Modular architecture
Baby-steps approach Increasing R
ecall
Pass 1
Pass 2
Pass 3 Incr
easi
ng P
reci
sion
![Page 50: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/50.jpg)
Men%on Detec%on
sieve 2 3
4 5
6
7
8
9
10
11
Focus on high recall
Recall increases (precision decreases) as more sieves added
More global
decisions
Post processing
![Page 51: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/51.jpg)
Why multiple sieves?
The second attack occurred after some rocket firings aimed, apparently, toward the israelis, apparently in retaliation. we’re checking our facts on that one. ... the strike will undermine efforts by palestinian authorities to bring an end to terrorist attacks and does not contribute to the security of israel.
number: plural animacy: unknown
number: singular animacy: inanimate
number: plural animacy: animate
![Page 52: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/52.jpg)
Why multiple sieves?
The second attack occurred after some rocket firings aimed, apparently, toward the israelis, apparently in retaliation. we’re checking our facts on that one. ... the strike will undermine efforts by palestinian authorities to bring an end to terrorist attacks and does not contribute to the security of israel.
number: plural animacy: unknown
number: singular animacy: inanimate
number: plural animacy: animate
![Page 53: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/53.jpg)
Why multiple sieves?
The second attack occurred after some rocket firings aimed, apparently, toward the israelis, apparently in retaliation. we’re checking our facts on that one. ... the strike will undermine efforts by palestinian authorities to bring an end to terrorist attacks and does not contribute to the security of israel.
number: plural animacy: unknown
number: singular animacy: inanimate
number: plural animacy: animate
![Page 54: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/54.jpg)
Why multiple sieves?
The second attack occurred after some rocket firings aimed, apparently, toward the israelis, apparently in retaliation. we’re checking our facts on that one. ... the strike will undermine efforts by palestinian authorities to bring an end to terrorist attacks and does not contribute to the security of israel.
number: plural animacy: unknown
number: singular animacy: inanimate
number: plural animacy: animate
number: plural, singular animacy: inanimate
number: plural, singular animacy: inanimate
![Page 55: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/55.jpg)
Pass 1 – Mention detection
• Extract all noun phrases (NP) plus pronouns and named en%%es even in modifier posi%on
• Remove non-‐referring expressions, e.g., generic “it”, with manually wriDen paDerns • E.g., It is possible that...
![Page 56: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/56.jpg)
Pass 2 – Speaker identification
• Extract speakers and use the info for resolu%on • “….”, she said.
• Posi%ve and nega%ve constraints for following sieves: “I voted for Nader because he was most aligned with my values,” she said.
![Page 57: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/57.jpg)
Pass 3 – Exact string match
Exactly the same text: …TWA 's bid for USAir skeptically , seeing it as a ploy to pressure USAir into buying TWA. The Shahab 3 ground-ground missile: the new addition to Iran’s military capabilities … developed the Shahab 3 ground-ground missile for defense purposes with capabilities ranging from …
![Page 58: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/58.jpg)
Pass 4 – Relaxed string match
String match a^er dropping the text following the head word:
…Clinton... Clinton, whose term ends in January...
![Page 59: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/59.jpg)
Pass 5 – Precise constructs
Role apposi%ves:
Apposi%ves: … but Bob Gerson, video editor of This Week in Consumer Electronics, says Sony conceives …
Started three years ago, Chemical's interest-rate options group was a leading force in the field.
… [[actress] Rebecca Schaeffer] … … [[painter] Pablo Picasso] …
Predicate nomina%ves:
![Page 60: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/60.jpg)
Pass 5 – Precise constructs
Demonyms/Gen%lics:
Acronyms:
Rela%ve pronouns: … [the finance street [which] has already formed in the Waitan district] …
Agence France Presse … AFP
Israel… Israeli
![Page 61: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/61.jpg)
Passes 6 – 9: Strict head match
• Coupled with various constraints: • No new informa%on in men%ons to be resolved • No loca%on mismatch, “Lebanon” != “southern Lebanon” • No numeric mismatch, “people” != “around 200 people” • No i-‐within-‐i, e.g., [[Sony Corpora%on] of America]
The Japanese company already has 12% of the total camcorder market, ranking it third behind the RCA and Panasonic brands … The company also plans to aggressively start marketing … The electronics company…
✕
![Page 62: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/62.jpg)
Pass 10 – Relaxed head match
• Same constraints as above but anaphora head can match any word in the candidate cluster
“Sanders” is compa%ble with the cluster: {Sauls, the judge, Circuit Judge N. Sanders Sauls}
![Page 63: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/63.jpg)
Pass 11 – Pronoun resolution
• ADributes must agree • Number • Gender • Person • Animacy
• Assigned using POS tags, NER labels, sta%c list of assignments for pronouns
• Improved further using gender and animacy dic%onaries of Bergsma and Lin (2006), and Ji and Lin (2009)
![Page 64: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/64.jpg)
Post processing
• Discard singleton clusters • This is why we could maximize recall in mention detection!
• Discard shorter mentions in appositive patterns • Discard mentions that appear later in copulative
relations
• Implemented to comply with OntoNotes annotations
![Page 65: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/65.jpg)
A run-through example
John is a musician. He played a new song. A girl was listening to the song. “It is my favorite,” John said to her.
![Page 66: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/66.jpg)
A run-through example
John is a musician. He played a new song. A girl was listening to the song. “It is my favorite,” John said to her.
Men%on detec%on
![Page 67: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/67.jpg)
A run-through example
John is a musician. He played a new song. A girl was listening to the song. “It is my favorite,” John said to her.
Speaker iden%fica%on
![Page 68: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/68.jpg)
A run-through example
John is a musician. He played a new song. A girl was listening to the song. “It is my favorite,” John said to her.
String match
![Page 69: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/69.jpg)
A run-through example
John is a musician. He played a new song. A girl was listening to the song. “It is my favorite,” John said to her.
Precise constructs
![Page 70: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/70.jpg)
A run-through example
John is a musician. He played a new song. A girl was listening to the song. “It is my favorite,” John said to her.
Strict head match
![Page 71: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/71.jpg)
A run-through example
John is a musician. He played a new song. A girl was listening to the song. “It is my favorite,” John said to her.
Pronoun resolu%on
![Page 72: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/72.jpg)
Mention selection in a given sieve
• In each sieve, we consider for resolution only mentions that are currently first in textual order in their cluster.
• Most informative!
textual order
✔ ✔
![Page 73: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/73.jpg)
Features are shared within clusters
• Within a cluster: • Union of all modifiers • Union of all head words • Union of all attributes: number, gender, animacy
• Robustness to missing/incorrect attributes
“a group of students” “five students”
number: singular number: plural
number: singular, plural
![Page 74: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/74.jpg)
EXPERIMENTS
![Page 75: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/75.jpg)
Results on older corpora
UNSUPERVISED ACE 2004 Test ACE NWIRE MUC6
This work 81 80.2 74.4
Haghighi and Klein (2009) 79.0 76.9 75.0
SUPERVISED Ace 2004 Test ACE NWIRE MUC6
Culotta et al. (2007) 79.3 - - Bengston and Roth (2008) 80.8 - - Finkel and Manning (2008) +G - 74.5 64.3
B3 F1 scores of different systems on standard corpora
![Page 76: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/76.jpg)
Results: CoNLL-2011closed track
0
10
20
30
40
50
60
CoNLL score = (MUC F1 + B3 F1 + CEAF F1) / 3
![Page 77: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/77.jpg)
Results: CoNLL-2011open track
0
10
20
30
40
50
60
Our work Heidelberg Institute of Technology
University of Trento/IIT/Essex
University of Zurich Nara Institute of Science and Technology
CoNLL score = (MUC F1 + B3 F1 + CEAF F1) / 3
![Page 78: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/78.jpg)
CoNLL-2012 shared task
• Mul%lingual unrestricted coreference resolu%on in OntoNotes • English, Chinese, Arabic
• Higher barrier of entry • 16 submissions vs. 23 submissions in 2011
• But there was significant progress • Best score for English increased from 58.3 to 63.4
![Page 79: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/79.jpg)
CoNLL-2012 shared task
• Two out of the top three systems used our system • Fernandes et al., PUC/IBM Brazil
• Adapted our system to Chinese and Arabic • Reranked the output of our system • Best system overall
• Chen and Ng, UT Dallas • Adapted our system to Chinese and Arabic • Added two ML-‐based sieves to our system • Best for Chinese, top 3 overall
• Proof that our approach is mul%lingual
![Page 80: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/80.jpg)
Analysis: Importance of sharing features
En%ty-‐centric model 59.3
Men%on-‐pair model 55.9
CoNLL F1 in OntoNotes Dev
![Page 81: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/81.jpg)
Analysis: Importance of multiple sieves
Mul%-‐pass model 59.3
Single-‐pass model 53
CoNLL F1 in OntoNotes Dev
![Page 82: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/82.jpg)
Analysis: Importance of features
CoNLL F1 in OntoNotes Dev
Complete 59.3 wo/ Number 56.7 -‐ 2.6 wo/ Gender 58.9 -‐ 0.4 wo/ Animacy 58.3 -‐ 1.0 wo/ NE 58.8 -‐ 0.5
![Page 83: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/83.jpg)
Idea I: Conclusions
• Novel architecture for coreference resolu%on • “Baby steps” • Global • Determinis%c
• State of the art results (in mul%ple languages) • Best at CoNLL-‐2011 • Two of the top 3 systems at CoNLL-‐2012 used it
![Page 84: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/84.jpg)
Big-picture conclusions
• Understanding the problem is more important than machine learning
• Model things jointly when you can
![Page 85: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/85.jpg)
Recent Improvements
23 / 27
![Page 86: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/86.jpg)
Taking Coreference Resolu2on
beyond the 60% Performance Barrier
Marta Recasens Google Research
(Joint work with Ma5hew Can, Marie-‐Catherine de Marneffe,
Chris Po5s, Dan Jurafsky, Eduard Hovy, and M. Antònia MarF)
April 26, 2013 ·∙ Carnegie Mellon University
![Page 87: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/87.jpg)
Life and Death of DEs 73
Nestle USA issued a voluntary recall of its Nesquik chocolate powder aler being Ipped off by an ingredient supplier of possible salmonella contaminaIon.
The Glendale-‐based company said it was calling back canisters of the product, which is mixed with milk to create a sweet drink, that were made in October and sold naIonwide.
Consumers should look for containers bearing an expiraIon date of October 2014.
Nestle decided to recall the power
![Page 88: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/88.jpg)
Life and Death of DEs 74
Nestle USA issued a voluntary recall of its Nesquik chocolate powder aler being Ipped off by an ingredient supplier of possible salmonella contaminaIon.
The Glendale-‐based company said it was calling back canisters of the product, which is mixed with milk to create a sweet drink, that were made in October and sold naIonwide.
Consumers should look for containers bearing an expiraIon date of October 2014.
Nestle decided to recall the powder
![Page 89: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/89.jpg)
Life and Death of DEs 75
Nestle USA issued a voluntary recall of its Nesquik chocolate powder aler being Ipped off by an ingredient supplier of possible salmonella contamina2on.
The Glendale-‐based company said it was calling back canisters of the product, which is mixed with milk to create a sweet drink, that were made in October and sold naIonwide.
Consumers should look for containers bearing an expira2on date of October 2014.
Nestle decided to recall the powder
![Page 90: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/90.jpg)
Life and Death of DEs 76
Nestle USA issued a voluntary recall of its Nesquik chocolate powder aler being Ipped off by an ingredient supplier of possible salmonella contamina2on.
The Glendale-‐based company said it was calling back canisters of the product, which is mixed with milk to create a sweet drink, that were made in October and sold naIonwide.
Consumers should look for containers bearing an expira2on date of October 2014.
Nestle decided to recall the powder
![Page 91: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/91.jpg)
Singletons
I Singleton mentions are hard and common.I Design a classifier specifically for predicting them.I Use a different set of linguistically motivated features.
24 / 27
![Page 92: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/92.jpg)
What’s hard? Why only ∼60% accuracy?
25 / 27
![Page 93: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/93.jpg)
The flaw was first reported by a security researcher David Emery, who posted his findings to the Cryptome mailing list. [...] The bug has not been corrected by any subsequent updates .
The soTware is used to turn 2D photos into 3D models; in reality, a person uploads photos taken or stored on an iPad to the Autodesk Cloud, where the actual conversion happens. [...] The app is free, but requires an iPad 2 or be5er running iOS 5.x.
37
The unsolved problem of coreference resoluIon
![Page 94: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/94.jpg)
Autodesk's had its 123D Catch iPad applicaIon in the works for quite some Ime now, but starIng today, you'll finally be able to use that Cuper2no slate to turn those beauIful snaps into three-‐dee creaIons.
Now you can keep up with all of the people you follow with a “best-‐of” weekly email from TwiUer. [...] The micro-‐blogging service will now be sending out weekly email digests that will feature a summary of your Twi5er stream.
38
The unsolved problem of coreference resoluIon
![Page 95: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/95.jpg)
• Surface features – String/head match – Sentence/token distance
• Morphological features – MenIon is a pronoun/definite/demonstraIve/proper noun
• SyntacIc features – Gender/number agreement
– GrammaIcal role • SemanIc features
– NE type – WordNet
– Wikipedia – Others: Yago, lexico-‐semanIc pa5erns, etc.
39
Features (Soon et al. 2001, Ng & Cardie 2002)
![Page 96: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/96.jpg)
locaIon
WordNet 40
female
person
male organizaIon
object
date Ime
money
percent
SemanIc class match (Soon et al. 01)
chairman
IBM ✗ Mr. Lim ✓
![Page 97: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/97.jpg)
WordNet 41
WordNet paths (Harabagiu et al. 01, Ng & Cardie 02, Poesio et al. 04, Ponze5o & Strube 06)
IN-GLOSS
SYNONYM
![Page 98: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/98.jpg)
WordNet 42
WordNet paths (Harabagiu et al. 01, Ng & Cardie 02, Poesio et al. 04, Ponze5o & Strube 06)
terrain
piece of land
site
IS-A RIS-A
engine
car
HAS-PART
window
car
HAS-PART
neck
body part
lip
IS-A RIS-A
![Page 99: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/99.jpg)
scienGsts
people voters
people
authoriGes
government
arm
leg
fire
cause
government
chairman
energy sources
gas supplies
company
allies
SemanIc similarity is not coreference 43
related by WordNet coreferent
![Page 100: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/100.jpg)
DistribuIonal similarity 44
DistribuIonal hypothesis (Harris 1954): words that occur in the same contexts tend to have similar meanings.
aardvark computer data pinch result sugar …apricot 0 0 0 1 0 1pineapple 0 0 0 1 0 1digital 0 2 1 0 1 0information 0 1 6 0 4 0
![Page 101: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/101.jpg)
Apple
phone
device
Europe
European Union informaGon
technology
marriage
divorce
DistribuIonal similarity is sIll not coreference 45
distribuIonally similar
coreferent
![Page 102: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/102.jpg)
IntuiIon of our soluIon: 46
MicrosoT has released a new feature.
The search giant has released a new feature.
![Page 103: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/103.jpg)
IntuiIon of our soluIon: Restricted distribuIonal similarity
47
Google has acquired the company.
The search giant has acquired the company.
AcquisiIon of Nik Solware
![Page 104: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/104.jpg)
Sprint blocks out vacaIon days for a major phone announcement.
According to SprintFeed, the carrier is blocking out vacaIon days for employees.
Story: Sprint-blocks-out-employees-vacations
Restricted distribuIonal semanIcs 48
block out release launch prevent ...
![Page 105: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/105.jpg)
Techmeme (www.techmeme.com)
Comparable corpus 49
![Page 106: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/106.jpg)
Comparable corpus 50
![Page 107: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/107.jpg)
2 years worth of Techmeme 160 million words 374,547 documents 24,612 stories
Comparable corpus 51
![Page 108: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/108.jpg)
• Stanford sentence spli5er, tagger, NER
• MaltParser (linear Ime)
• Top 10 z*idf ranked verbs for each story
– Phrasal verbs (give up vs. give away) – Excluding: light verbs (do, have, give...)
report verbs (say, tell...) copular verbs (seem, become...)
– WordNet synonyms are included (release, publish...)
!!
€
tf(v, !s) * idf(v, !S) = tf(v, !s) * log | S ||{s∈ S : v ∈ s} |
ExtracIon 52
![Page 109: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/109.jpg)
• AssumpIon: In a story, the same verb refers to the same event
• Subjects and objects are clustered, repecIvely – Passive construcIons (X compromised Y Y has been compromised)
– ErgaIve verbs (X scaPered Y Y scaPered)
– NominalizaIons from NomBank (acquire Google’s acquisi>on of Sparrow)
• Exclude same-‐head NPs and pronouns
the Internet giant
the search giant
the company
crawl
their missing phones
lost or stolen smartphones
your device
your lost iPhone
locate
ExtracIon 53
![Page 110: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/110.jpg)
Coreference rela2ons Android phones products
pictures shots
Mark Zuckerberg the hoodie-‐wearing Facebook co-‐founder
Bad rela2ons
① Parsing errors
[aPacks against Chrome]S exploit ... [the full details on the]S exploit
② Algorithm violaIons (one verb ≠ one event) Remove [spam from the emails]O ... Remove [the test accounts]O
③ Text extracIon errors </li> <li id="gadgets"> <a href=hPp://www.thetechherald.com/> Networking </a>
54
ExtracIon
![Page 111: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/111.jpg)
Filters for:
• Parsing errors – Non-‐nominal head shopping ( ah
• Algorithm violaIons – NE – NE Yahoo, Google – NegaIon [But the operators aren’t mandaGng] plans
– EnumeraIon [1. Remove] spam from the emails
– Numbers 40,000 per year
– Temporals 6:00 PM Pacific Gme
• Text extracIon errors – MenIon length charges that Google unfairly ranks compeGtors in
its search results, penalizing them with lower rankings
– Sentence length – Ill-‐formed sentence </li> <li id="gadgets”>
Filtering 55
![Page 112: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/112.jpg)
Bad relations
Filtering 56
![Page 113: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/113.jpg)
• Remove
– determiners
the promoGon >> promoGon
– relaIve, -‐ing, -‐ed clauses the device available online from Google >> device
• Keep adjecIves and preposiIonal modifiers online piracy
distribuGon of pirated material
• Generalize NE to types
Cook’s departure >> PERSON’s departure • Lemmas
data >> datum
GeneralizaIon 57
128,492 coreferent pairs
![Page 114: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/114.jpg)
• Frequency counts
(rule, limitaIon) 5 (company, HP) 35
(phone, experience) 1 (company, price) 12
(FBI, agent) 20
• Normalized PMI (Bouma 2009) [–1, 1]
(rule, limitaIon) 0.417 (company, HP) 0.203
(phone, experience) –0.152 (company, price) –0.053
(FBI, agent) 0.566
GeneralizaIon 58
![Page 115: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/115.jpg)
offering, IPO
password, login informaIon
user, consumer
firm, company phone, device
Apple, company
iPad, slate
Android, plazorm
site, company app, solware
agreement, wording
plazorm, code
filing, complaint
search, search result
update, change
bug, issue
Google, search giant search algorithm, search engine
hardware key, digital lock
content, photo
rule, limitaIon
coupon, sale medical record, medical file
device, developer
version, handset
Groupon, company
DicIonary snapshot 59
![Page 116: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/116.jpg)
• Synonymy user, consumer
• Hypernymy
Google, company
DicIonary snapshot 60
• Metonymy cloud, users
• General nouns
bug, issue
• World knowledge Google, search giant
![Page 117: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/117.jpg)
Dictionary sieves
Stanford coreference system (Lee et al. 2011)
Sieve Dict 4 Sieve Dict 3 Sieve Dict 2 Sieve Dict 1
61
![Page 118: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/118.jpg)
Existing Tools
Two good tools (available for download) are:
I Stanford Coreference SystemI Sieve + singletons
I Illinois Coreference PackageI Pairwise classification with strong features
26 / 27
![Page 119: Document Level Models (1) 1em Coreference Resolution](https://reader033.fdocuments.us/reader033/viewer/2022060918/6299ac29cb16852af45cf049/html5/thumbnails/119.jpg)
Summary
The coreference resolution taskI DefinitionI EvaluationI Pairwise / machine learning approach
I FeaturesI Constructing training examples
I Rule-based systems (Sieve, baby-steps)I Many smart decisionsI Global constraints
I Targetting specific problemsI A method for detecting singletonsI “semantic” knowledge acquisition from large corpus
27 / 27