Data-Driven Computational Pragmatics 1.pdfSchedule 1:10 –1:20 Introductions 1:20 –1:30 Structure...
Transcript of Data-Driven Computational Pragmatics 1.pdfSchedule 1:10 –1:20 Introductions 1:20 –1:30 Structure...
Data-Driven Computational Pragmatics
Shlomo Argamon and Jonathan Dunn
Schedule
1:10 – 1:20 Introductions
1:20 – 1:30 Structure of Class
1:30 – 2:10 Intro to Data-Driven Computational Pragmatics
2:10 – 2:50 Co-Reference Resolution
2:50 – 3:00 Conclusion / Flex Time
Schedule
1:10 – 1:20 Introductions
1:20 – 1:30 Structure of Class
1:30 – 2:10 Intro to Data-Driven Computational Pragmatics
2:10 – 2:50 Co-Reference Resolution
2:50 – 3:00 Conclusion / Flex Time
Structure of Class
• Class 1:
• Overview of Data-Driven Computational Pragmatics
• Co-Reference Resolution
Structure of Class
• Class 1:
• Overview of Data-Driven Computational Pragmatics
• Co-Reference Resolution
• Class 2:
• Hypothesis Testing with Data-Driven Models
• Inferences and Reasoning
Structure of Class
• Class 3:
• Metaphor and Figurative Language
Structure of Class
• Class 3:
• Metaphor and Figurative Language
• Class 4:
• Social Meaning from Stylistics and Linguistic Variations
Structure of Class
• Readings and Notes Available Online: www.jdunn.name
Structure of Class
• Readings and Notes Available Online: www.jdunn.name
• Assignments:
• Reading Questions (60%)
Structure of Class
• Readings and Notes Available Online: www.jdunn.name
• Assignments:
• Reading Questions (60%)
• Short Paper (500 - 1,000 words; 40%)
• “Consider how you could use these sorts of computational models to address or provide evidence for a hypothesis from linguistic theory.”
• Due M, 7/20
Schedule
1:10 – 1:20 Introductions
1:20 – 1:30 Structure of Class
1:30 – 2:10 Intro to Data-Driven Computational Pragmatics
2:10 – 2:50 Co-Reference Resolution
2:50 – 3:00 Conclusion / Flex Time
Overview
• Pragmatics
• Meaning-in-Language: Cognitive, Contextual, Social• (c.f. Linguistic vs. Non-Linguistic Meaning)
Overview
• Pragmatics
• Meaning-in-Language: Cognitive, Contextual, Social• (c.f. Linguistic vs. Non-Linguistic Meaning)
• Subjective vs. Objective Phenomena• Epistemic Objectivity
• Ontological Objectivity
Overview
• Pragmatics
• Meaning-in-Language: Cognitive, Contextual, Social• (c.f. Linguistic vs. Non-Linguistic Meaning)
• Subjective vs. Objective Phenomena• Epistemic Objectivity
• Ontological Objectivity
• Meaning-in-Language: Epistemically Objective• Can be modeled
• Can make testable predictions
Overview (2)
• Computational Pragmatics
• Models do not have direct access to introspection / intuition
Overview (2)
• Computational Pragmatics
• Models do not have direct access to introspection / intuition
• Models entirely syntactic in nature (e.g., symbol manipulation)
Overview (2)
• Computational Pragmatics
• Models do not have direct access to introspection / intuition
• Models entirely syntactic in nature (e.g., symbol manipulation)
• Data-Driven
• Learn models directly from linguistic data
Overview (2)
• Computational Pragmatics
• Models do not have direct access to introspection / intuition
• Models entirely syntactic in nature (e.g., symbol manipulation)
• Data-Driven
• Learn models directly from linguistic data
• Limited rule-based heuristics
Overview (2)
• Computational Pragmatics
• Models do not have direct access to introspection / intuition
• Models entirely syntactic in nature (e.g., symbol manipulation)
• Data-Driven
• Learn models directly from linguistic data
• Limited rule-based heuristics
• Corpus-based view of language: Language as a produced, observable phenomenon
Overview (3)
Computational Pragmatics: A Spectrum
Computers Doing Pragmatics(e.g., Siri in the future)
AI as an Engineering Task
Overview (3)
Computational Pragmatics: A Spectrum
Computers Doing Pragmatics(e.g., Siri in the future)
AI as an Engineering Task
Computational Modelling for Pragmatics(e.g., computer-assisted corpus linguistics)
AI as Cognitive Science
Sources of Evidence
Introspection
Sources of Evidence
Introspection
Hand-crafted Datasets
Sources of Evidence
Introspection
Hand-crafted Datasets
Huge “Natural”Datasets
Sources of Evidence
Introspection
Hand-crafted Datasets
Huge “Natural”Datasets
But also depend on introspection:Gold-Standard Annotations
Overview (5)
Computational pragmatics is tricky
SYNTAX
Overview (5)
Computational pragmatics is tricky
SYNTAX
SEMANTICS
Overview (5)
Computational pragmatics is tricky
SYNTAX
SEMANTICS
PRAGMATICS
Overview (5)
Data-Driven Computational Pragmatics
Overview (5)
Data-Driven Computational Pragmatics
Data
(Corpora)
Overview (5)
Data-Driven Computational Pragmatics
Data
(Corpora)
Features
(Representations)
Overview (5)
Data-Driven Computational Pragmatics
Data
(Corpora)
Features
(Representations)Algorithms
(Model Building)
Overview (7)
Data-Driven =?= Model-Independent
Overview (10)
• Sample features for the co-reference task, for a given pair of candidate entities:
Overview (10)
• Sample features for the co-reference task, for a given pair of candidate entities:
• Binary Feature: Do these co-reference candidates share number information?
Overview (10)
• Sample features for the co-reference task, for a given pair of candidate entities:
• Binary Feature: Do these co-reference candidates share number information?
• Categorical Feature: What semantic role does entity A have? Entity B?
Overview (10)
• Sample features for the co-reference task, for a given pair of candidate entities:
• Binary Feature: Do these co-reference candidates share number information?
• Categorical Feature: What semantic role does entity A have? Entity B?
• Frequency Feature: Frequency of represented verb-argument pair in reference corpus(Could model verb selection preferences: does the verb accept the candidate?)
Bob bought a new car. Then he drove it for hours.
Overview (10)
• Sample features for the co-reference task, for a given pair of candidate entities:
• Binary Feature: Do these co-reference candidates share number information?
• Categorical Feature: What semantic role does entity A have? Entity B?
• Frequency Feature: Frequency of represented verb-argument pair in reference corpus(Could model verb selection preferences: does the verb accept the candidate?)
• Ratio Feature: Minimum Edit Distance for Candidates / Total String Length
Overview (10)
• Sample features for the co-reference task, for a given pair of candidate entities:
• Binary Feature: Do these co-reference candidates share number information?
• Categorical Feature: What semantic role does entity A have? Entity B?
• Frequency Feature: Frequency of represented verb-argument pair in reference corpus(Could model verb selection preferences: does the verb accept the candidate?)
• Ratio Feature: Minimum Edit Distance for Candidates / Total String Length
• Measurement Feature: Difference between abstractness ratings for candidate and its verb(Could model verb selection preferences)
Schedule
1:10 – 1:20 Introductions
1:20 – 1:30 Structure of Class
1:30 – 2:10 Intro to Data-Driven Computational Pragmatics
2:10 – 2:50 Co-Reference Resolution
2:50 – 3:00 Conclusion / Flex Time
Co-Reference Resolution (3)
• Co-Reference Resolution:
• What words refer to the same entity?
Co-Reference Resolution (3)
• Co-Reference Resolution:
• What words refer to the same entity?
North Korea opened its doors to the U.S. today, welcoming
Secretary of State Madeline Albright. She says her visit is a
good start. The U.S. remains concerned about North Korea’s
missile development program and its exports of missiles to
Iran.
Fernandes, et al. (2014). “Latent Trees for Coreference Resolution.” Computational Linguistics, 40(4)
Co-Reference Resolution (3)
• Co-Reference Resolution:
• What words refer to the same entity?
North Korea opened its doors to the U.S. today, welcoming
Secretary of State Madeline Albright. She says her visit is a
good start. The U.S. remains concerned about North Korea’s
missile development program and its exports of missiles to
Iran.
Fernandes, et al. (2014). “Latent Trees for Coreference Resolution.” Computational Linguistics, 40(4)
Co-Reference Resolution (3)
• Co-Reference Resolution:
• What words refer to the same entity?
North Korea opened its doors to the U.S. today, welcoming
Secretary of State Madeline Albright. She says her visit is a
good start. The U.S. remains concerned about North Korea’s
missile development program and its exports of missiles to
Iran.
Fernandes, et al. (2014). “Latent Trees for Coreference Resolution.” Computational Linguistics, 40(4)
Co-Reference Resolution (3)
• Co-Reference Resolution:
• What words refer to the same entity?
North Korea opened its doors to the U.S. today, welcoming
Secretary of State Madeline Albright. She says her visit is a
good start. The U.S. remains concerned about North Korea’s
missile development program and its exports of missiles to
Iran.
Fernandes, et al. (2014). “Latent Trees for Coreference Resolution.” Computational Linguistics, 40(4)
Co-Reference Resolution (3)
• Co-Reference Resolution:
• What words refer to the same entity?
North Korea opened its doors to the U.S. today, welcoming
Secretary of State Madeline Albright. She says her visit is a
good start. The U.S. remains concerned about North Korea’s
missile development program and its exports of missiles to
Iran.
Fernandes, et al. (2014). “Latent Trees for Coreference Resolution.” Computational Linguistics, 40(4)
Co-Reference Resolution (2)
• Many different kinds of linguistic phenomena:
• Proper names (“George”)
Co-Reference Resolution (2)
• Many different kinds of linguistic phenomena:
• Proper names (“George”)
• Aliases (“LSI”)
Co-Reference Resolution (2)
• Many different kinds of linguistic phenomena:
• Proper names (“George”)
• Aliases (“LSI”)
• Definite NPs (“the Linguistic Summer Institute”)
Co-Reference Resolution (2)
• Many different kinds of linguistic phenomena:
• Proper names (“George”)
• Aliases (“LSI”)
• Definite NPs (“the Linguistic Summer Institute”)
• Pronouns (“it”, “they”)
Co-Reference Resolution (2)
• Many different kinds of linguistic phenomena:
• Proper names (“George”)
• Aliases (“LSI”)
• Definite NPs (“the Linguistic Summer Institute”)
• Pronouns (“it”, “they”)
• Appositives (“the first institute to be...”)
Co-Reference Resolution (2)
• Many different kinds of linguistic phenomena:
• Proper names (“George”)
• Aliases (“LSI”)
• Definite NPs (“the Linguistic Summer Institute”)
• Pronouns (“it”, “they”)
• Appositives (“the first institute to be...”)
• Bridging References (“the cabinet was wood, but the top granite”)
Co-Reference Resolution Data
• Annotated corpora
Co-Reference Resolution Data
• Annotated corpora
• Each mention annotated with an ID of the unique entity it refers to
Co-Reference Resolution Data
• Annotated corpora
• Each mention annotated with an ID of the unique entity it refers to
• Can extract pairwise relations between mentions
Co-Reference Resolution Data
• Annotated corpora
• Each mention annotated with an ID of the unique entity it refers to
• Can extract pairwise relations between mentions
• Genre of the text kinds of co-reference phenomena
Co-Reference Resolution Features
• Agreement: number, person, case, etc.
Co-Reference Resolution Features
• Agreement: number, person, case, etc.
• Syntactic restrictions
Co-Reference Resolution Features
• Agreement: number, person, case, etc.
• Syntactic restrictions
• Semantic selectional preferences
Co-Reference Resolution Features
• Agreement: number, person, case, etc.
• Syntactic restrictions
• Semantic selectional preferences
• Syntactic/semantic role preferences
Co-Reference Resolution Features
• Agreement: number, person, case, etc.
• Syntactic restrictions
• Semantic selectional preferences
• Syntactic/semantic role preferences
• Saliency: recency, repetition
Co-Reference Resolution Features
• Agreement: number, person, case, etc.
• Syntactic restrictions
• Semantic selectional preferences
• Syntactic/semantic role preferences
• Saliency: recency, repetition
• Causal coherence
Co-Reference Resolution Algorithms
• Three basic dichotomies:
• Relationship between linguistic units or between entities
Co-Reference Resolution Algorithms
• Three basic dichotomies:
• Relationship between linguistic units or between entities
• Pairwise relationships or larger clusters
Co-Reference Resolution Algorithms
• Three basic dichotomies:
• Relationship between linguistic units or between entities
• Pairwise relationships or larger clusters
• Whole text at once or processing sequentially item by item
Co-Reference Resolution (5)
• Co-Reference Resolution: The Mention-Pair Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
1. Find all entities (assumed)
Co-Reference Resolution (5)
• Co-Reference Resolution: The Mention-Pair Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
1. Find all entities (assumed)
2. Classify each possible pair (within defined window)
Co-Reference Resolution (5)
• Co-Reference Resolution: The Mention-Pair Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
1. Find all entities (assumed)
2. Classify each possible pair (within defined window)
3. Cluster identified co-referencing pairs into chains
Co-Reference Resolution (6)
• Co-Reference Resolution: The Mention-Pair Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
Classifier: Are Two NPs Co-Referencing? (Binary Category: Yes/No)
Co-Reference Resolution (6)
• Co-Reference Resolution: The Mention-Pair Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
Classifier: Are Two NPs Co-Referencing? (Binary Category: Yes/No)
Pairs To Classify:
Madeline Albright
Iran
North Korea
Her
its
North Korea
its
Co-Reference Resolution (6)
• Co-Reference Resolution: The Mention-Pair Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
Classifier: Are Two NPs Co-Referencing? (Binary Category: Yes/No)
Pairs To Classify:
North Korea
its
U.S.
She
Her
its
Iran
Madeline Albright
Iran
North Korea
Her
its
North Korea
its
Co-Reference Resolution (6)
• Co-Reference Resolution: The Mention-Pair Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
Classifier: Are Two NPs Co-Referencing? (Binary Category: Yes/No)
Pairs To Classify:
North Korea
its
U.S.
She
Her
its
Iran
Madeline Albright
Iran
North Korea
Her
its
North Korea
its
No
No
No
Yes
No
Yes
No
Co-Reference Resolution (7)
• Co-Reference Resolution: The Mention-Pair Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
Clustering: Which Co-Referencing NPs Belong to the Same Chain?
Co-Reference Resolution (7)
• Co-Reference Resolution: The Mention-Pair Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
Clustering: Which Co-Referencing NPs Belong to the Same Chain?
• If “Madeline Albright” and “she” co-reference,
Co-Reference Resolution (7)
• Co-Reference Resolution: The Mention-Pair Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
Clustering: Which Co-Referencing NPs Belong to the Same Chain?
• If “Madeline Albright” and “she” co-reference,
• And if “she” and “her” co-reference,
Co-Reference Resolution (7)
• Co-Reference Resolution: The Mention-Pair Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
Clustering: Which Co-Referencing NPs Belong to the Same Chain?
• If “Madeline Albright” and “she” co-reference,
• And if “she” and “her” co-reference,
• Then “Madeline Albright” and “her” must also co-reference
Co-Reference Resolution (7)
• Co-Reference Resolution: The Mention-Pair Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
Clustering: Which Co-Referencing NPs Belong to the Same Chain?
• If “Madeline Albright” and “she” co-reference,
• And if “she” and “her” co-reference,
• Then “Madeline Albright” and “her” must also co-reference
Works great, but only if there are no classifier errors.
Co-Reference Resolution (8)
• Co-Reference Resolution: The Entity-Mention Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
Classifier: Are an NP and a Preceding Cluster Co-Referencing?
Co-Reference Resolution (8)
• Co-Reference Resolution: The Entity-Mention Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
Classifier: Are an NP and a Preceding Cluster Co-Referencing?
Pairs To Classify:
[North Korea + its]
[North Korea + its]
[Madeline Albright + She]
[Madeline Albright + She]
[She + her]
[North Korea + its]
[North Korea’s + its]
Co-Reference Resolution (8)
• Co-Reference Resolution: The Entity-Mention Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
Classifier: Are an NP and a Preceding Cluster Co-Referencing?
Pairs To Classify:
[North Korea + its]
[North Korea + its]
[Madeline Albright + She]
[Madeline Albright + She]
[She + her]
[North Korea + its]
[North Korea’s + its]
Madeline Albright
She
its
her
its
North Korea’s
Iran
Co-Reference Resolution (8)
• Co-Reference Resolution: The Entity-Mention Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
Classifier: Are an NP and a Preceding Cluster Co-Referencing?
Pairs To Classify:
[North Korea + its]
[North Korea + its]
[Madeline Albright + She]
[Madeline Albright + She]
[She + her]
[North Korea + its]
[North Korea’s + its]
Madeline Albright
She
its
her
its
North Korea’s
Iran
No
No
No
Yes
No
Yes
No
Co-Reference Resolution (9)
• Co-Reference Resolution: The Entity-Mention Model
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
Classifier: Are an NP and a Preceding Cluster Co-Referencing?
• Allows cluster-level features
• Maintains consistency within large clusters
Co-Reference Resolution (10)
• Co-Reference Resolution: Remaining Problems
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
• Often very many candidates• Ranking models help choose best
Co-Reference Resolution (10)
• Co-Reference Resolution: Remaining Problems
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
• Often very many candidates• Ranking models help choose best
• Many entities are not anaphoric • “Singletons”• More than choice of cluster
Co-Reference Resolution (10)
• Co-Reference Resolution: Remaining Problems
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
• Often very many candidates• Ranking models help choose best
• Many entities are not anaphoric • “Singletons”• More than choice of cluster
• Co-reference can depend on lexical relations• “Corsica is …. The island is …..”
Co-Reference Resolution (11)
• Features
• First, what are the cues for / properties of co-referenced words?
Co-Reference Resolution (11)
• Features
• First, what are the cues for / properties of co-referenced words?
• Second, how can we model those cues?
Co-Reference Resolution (11)
• Features
• First, what are the cues for / properties of co-referenced words?
• Second, how can we model those cues?
• e.g., Automatically extract features representing those cues
Co-Reference Resolution (11)
• Features
• First, what are the cues for / properties of co-referenced words?
• Second, how can we model those cues?
• e.g., Automatically extract features representing those cues
• (1) How to represent the language (e.g., WordNet synsets, parse tree)
Co-Reference Resolution (11)
• Features
• First, what are the cues for / properties of co-referenced words?
• Second, how can we model those cues?
• e.g., Automatically extract features representing those cues
• (1) How to represent the language (e.g., WordNet synsets, parse tree)
• (2) How to measure the property in that representation
Co-Reference Resolution (12)
• Hard Constraints
• Number agreement
• “John has an Acura. It is red.”
Co-Reference Resolution (12)
• Hard Constraints
• Number agreement
• “John has an Acura. It is red.”
• Person and case agreement
• “*John and Mary have Acuras. We love them.” (where We=John and Mary)
Co-Reference Resolution (12)
• Hard Constraints
• Number agreement
• “John has an Acura. It is red.”
• Person and case agreement
• “*John and Mary have Acuras. We love them.” (where We=John and Mary)
• Gender agreement
• “John has an Acura. He/it/she is attractive.”
Co-Reference Resolution (12)
• Hard Constraints
• Number agreement
• “John has an Acura. It is red.”
• Person and case agreement
• “*John and Mary have Acuras. We love them.” (where We=John and Mary)
• Gender agreement
• “John has an Acura. He/it/she is attractive.”
Binary Features:For each candidate pair, is the constraint satisfied?
(Yes/No)
Co-Reference Resolution (13)
• Hard Constraints
• Number agreement
• “John has an Acura. It is red.”
• Person and case agreement
• “*John and Mary have Acuras. We love them.” (where We=John and Mary)
• Gender agreement
• “John has an Acura. He/it/she is attractive.”
Categorical Features:A feature for each candidates number, person,
gender, etc.
Co-Reference Resolution (14)
• Syntactic constraints
• “John bought himself a new Acura.” (himself=John)
• “John bought him a new Acura.” (him = not John)
Co-Reference Resolution (14)
• Syntactic constraints
• “John bought himself a new Acura.” (himself=John)
• “John bought him a new Acura.” (him = not John)
• Required representation: parse tree / dependency relations
• Binary Feature: Does the necessary relation obtain?
Co-Reference Resolution (15)
• Selectional Restrictions
• “John parked his Acura in the garage. He had driven it around for hours.”
• “it” must refer to something that can be driven.
Co-Reference Resolution (15)
• Selectional Restrictions
• “John parked his Acura in the garage. He had driven it around for hours.”
• “it” must refer to something that can be driven.
• Knowledge-based Approach:
• VerbNet for verb properties
• WordNet for synset membership
Co-Reference Resolution (16)
• Selectional Restrictions
• “John parked his Acura in the garage. He had driven it around for hours.”
• “it” must refer to something that can be driven.
• Distributional Approach:
• Cluster nouns and verbs into classes
• What is the probability noun A will occur with verb B?
Co-Reference Resolution (17)
• Recency / Salience
• “John has an Integra. Bill has a Legend. Mary likes to drive it.”
Co-Reference Resolution (17)
• Recency / Salience
• “John has an Integra. Bill has a Legend. Mary likes to drive it.”
• Syntactic Operationalization
• How far removed in the parse tree are A and B?
Co-Reference Resolution (17)
• Recency / Salience
• “John has an Integra. Bill has a Legend. Mary likes to drive it.”
• Syntactic Operationalization
• How far removed in the parse tree are A and B?
• Semantic Operationalization
• How prominent is the semantic role of A (e.g., agent vs. patient)
Co-Reference Resolution (18)
• Grammatical Role: Subject preference
• “John went to the Acura dealership with Bill. He bought an Integra.”
• “Bill went to the Acura dealership with John. He bought an Integra.”
• “(?) John and Bill went to the Acura dealership. He bought an Integra.”
Co-Reference Resolution (18)
• Grammatical Role: Subject preference
• “John went to the Acura dealership with Bill. He bought an Integra.”
• “Bill went to the Acura dealership with John. He bought an Integra.”
• “(?) John and Bill went to the Acura dealership. He bought an Integra.”
• Categorical Feature:
• Grammatical Role or Semantic Role of each candidate as features
Co-Reference Resolution (19)
• Repeated Mentions Preference
• “John needed a car to get to his new job. He decided that he wanted something sporty. Bill went to the Acura dealership with him. He bought an Integra.”
Co-Reference Resolution (19)
• Repeated Mentions Preference
• “John needed a car to get to his new job. He decided that he wanted something sporty. Bill went to the Acura dealership with him. He bought an Integra.”
• Not as relevant in pairwise classification
• In entity-mention model, feature of candidate cluster size to favor larger clusters
Co-Reference Resolution (20)
• Verb Semantics Preferences
• “John telephoned Bill. He lost the pamphlet on Acuras.”
• “John criticized Bill. He lost the pamphlet on Acuras.”
Co-Reference Resolution (20)
• Verb Semantics Preferences
• “John telephoned Bill. He lost the pamphlet on Acuras.”
• “John criticized Bill. He lost the pamphlet on Acuras.”
• Implicit causality
• Implicit cause of criticizing is object.
• Implicit cause of telephoning is subject.
Co-Reference Resolution (21)
• Co-Reference Resolution: Features
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
String-Matching:
“U.S.” and “U.S.” = Identical
“North Korea” and “North Korea’s” = Similar
Co-Reference Resolution (21)
• Co-Reference Resolution: Features
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
String-Matching:
“U.S.” and “U.S.” = Identical
“North Korea” and “North Korea’s” = Similar
Minimum Edit Distance
How many operations are required to convert the first string into the second?
Co-Reference Resolution (21)
• Co-Reference Resolution: Features
North Korea opened its doors to the U.S. today,
welcoming Secretary of State Madeline Albright.
She says her visit is a good start. The U.S. remains
concerned about North Korea’s missile
development program and its exports of missiles
to Iran.
String-Matching:
“U.S.” and “U.S.” = Identical
“North Korea” and “North Korea’s” = Similar
Minimum Edit Distance
How many operations are required to convert the first string into the second?
“U.S.” and “U.S.” = 0“North Korea” and “North Korea’s” = 2
Co-Reference Resolution (22)
• Assume we have all the features for a candidate pair:
“North Korea” and “its”
• The next step is to use the features to predict whether they co-reference
Co-Reference Resolution (22)
• Assume we have all the features for a candidate pair:
“North Korea” and “its”
• The next step is to use the features to predict whether they co-reference
Sample Approach 1:
Lappin & Leass. (1994). “An Algorithm for Pronominal Anaphora Resolution.” Computational Linguistics.
Co-Reference Resolution (22)
• Assume we have all the features for a candidate pair:
“North Korea” and “its”
• The next step is to use the features to predict whether they co-reference
Sample Approach 1:
Lappin & Leass. (1994). “An Algorithm for Pronominal Anaphora Resolution.” Computational Linguistics.
Sample Approach 2:
McCarthy & Lehnert. (1995). “Using Decision Trees for Coreference Resolution.” Proceedings of IJCAI.
Co-Reference Resolution (23)
• Lappin and Leass (1994): Given he/she/it, assign antecedent
Co-Reference Resolution (23)
• Lappin and Leass (1994): Given he/she/it, assign antecedent
• (1) Discourse model update
• When a new noun phrase is encountered:
Co-Reference Resolution (23)
• Lappin and Leass (1994): Given he/she/it, assign antecedent
• (1) Discourse model update
• When a new noun phrase is encountered:
(a) Add a representation to discourse model with a salience value
Co-Reference Resolution (23)
• Lappin and Leass (1994): Given he/she/it, assign antecedent
• (1) Discourse model update
• When a new noun phrase is encountered:
(a) Add a representation to discourse model with a salience value(b) Modify saliences
Co-Reference Resolution (23)
• Lappin and Leass (1994): Given he/she/it, assign antecedent
• (1) Discourse model update
• When a new noun phrase is encountered:
(a) Add a representation to discourse model with a salience value(b) Modify saliences
• (2) Pronoun resolution
• (a) Choose the most salient antecedent
Co-Reference Resolution (24)
• Pre-defined Weights:
80Head noun emphasis
50Non-adverbial emphasis
40Ind. Obj and oblique emphasis
50Accusative (direct object) emphasis
70Existential emphasis
80Subject emphasis
100Subject recency
Co-Reference Resolution (24)
• Pre-defined Weights:
• Weights are cut in half after eachsentence is processed
80Head noun emphasis
50Non-adverbial emphasis
40Ind. Obj and oblique emphasis
50Accusative (direct object) emphasis
70Existential emphasis
80Subject emphasis
100Subject recency
Co-Reference Resolution (24)
• Pre-defined Weights:
• Weights are cut in half after eachsentence is processed
• This, and a sentence recencyweight (100 for new sentences,cut in half each time), capturesthe recency preferences
80Head noun emphasis
50Non-adverbial emphasis
40Ind. Obj and oblique emphasis
50Accusative (direct object) emphasis
70Existential emphasis
80Subject emphasis
100Subject recency
Co-Reference Resolution (25)
• Algorithm:
• (1) Collect the potential referents (up to 4 sentences back)
Co-Reference Resolution (25)
• Algorithm:
• (1) Collect the potential referents (up to 4 sentences back)
• (2) Remove potential referents that do not agree in number or gender with the pronoun
Co-Reference Resolution (25)
• Algorithm:
• (1) Collect the potential referents (up to 4 sentences back)
• (2) Remove potential referents that do not agree in number or gender with the pronoun
• (3) Remove potential references that do not pass syntactic coreference constraints
Co-Reference Resolution (25)
• Algorithm:
• (1) Collect the potential referents (up to 4 sentences back)
• (2) Remove potential referents that do not agree in number or gender with the pronoun
• (3) Remove potential references that do not pass syntactic coreference constraints
• (4) Compute total salience value of referent from all factors
Co-Reference Resolution (25)
• Algorithm:
• (1) Collect the potential referents (up to 4 sentences back)
• (2) Remove potential referents that do not agree in number or gender with the pronoun
• (3) Remove potential references that do not pass syntactic coreference constraints
• (4) Compute total salience value of referent from all factors
• (5) Select referent with highest salience value. In case of tie, select closest.
Co-Reference Resolution (26)
• Problems:
• Limited features
Co-Reference Resolution (26)
• Problems:
• Limited features
• Feature weight assumed in advance
Co-Reference Resolution (26)
• Problems:
• Limited features
• Feature weight assumed in advance
• Hard constraints mixed with imperfect feature extraction
Co-Reference Resolution (26)
• Problems:
• Limited features
• Feature weight assumed in advance
• Hard constraints mixed with imperfect feature extraction
• Limited coverage (e.g., pronouns)
Co-Reference Resolution (26)
• Problems:
• Limited features
• Feature weight assumed in advance
• Hard constraints mixed with imperfect feature extraction
• Limited coverage (e.g., pronouns)
• Hand-crafted rules are very language dependent
Co-Reference Resolution (27)
• McCarthy & Lehnert (1995): Given two entities, should they be linked?
Co-Reference Resolution (27)
• McCarthy & Lehnert (1995): Given two entities, should they be linked?
• (1) Create training data by manually annotating gold-standard links
Co-Reference Resolution (27)
• McCarthy & Lehnert (1995): Given two entities, should they be linked?
• (1) Create training data by manually annotating gold-standard links
• Every possible pair is a training example
• Positive: Co-Referenced pairs (very small number)
• Negative: Not Co-Referenced pairs (majority of examples)
Co-Reference Resolution (27)
• McCarthy & Lehnert (1995): Given two entities, should they be linked?
• (1) Create training data by manually annotating gold-standard links
• Every possible pair is a training example
• Positive: Co-Referenced pairs (very small number)
• Negative: Not Co-Referenced pairs (majority of examples)
• (2) Extract features for each possible pair
Co-Reference Resolution (27)
• McCarthy & Lehnert (1995): Given two entities, should they be linked?
• (1) Create training data by manually annotating gold-standard links
• Every possible pair is a training example
• Positive: Co-Referenced pairs (very small number)
• Negative: Not Co-Referenced pairs (majority of examples)
• (2) Extract features for each possible pair
• (3) Use learning algorithm to assign weights to features
Co-Reference Resolution (27)
• Note: Different Uses of Introspection
• (1) To create rules and feature weights in advance
• (2) To annotate gold-standard training set
• But introspection is involved in both methods
Co-Reference Resolution (28)
Features (one for each candidate entity in pair)(features were manually annotated)
Co-Reference Resolution (28)
Features (one for each candidate entity in pair)(features were manually annotated)
NAME-{1,2}: Does reference include a name?
Co-Reference Resolution (28)
Features (one for each candidate entity in pair)(features were manually annotated)
NAME-{1,2}: Does reference include a name?
JV-CHILD-{1,2}: Does reference refer to part of a joint venture?
Co-Reference Resolution (28)
Features (one for each candidate entity in pair)(features were manually annotated)
NAME-{1,2}: Does reference include a name?
JV-CHILD-{1,2}: Does reference refer to part of a joint venture?
ALIAS: Does one reference contain an alias for the other?
Co-Reference Resolution (28)
Features (one for each candidate entity in pair)(features were manually annotated)
NAME-{1,2}: Does reference include a name?
JV-CHILD-{1,2}: Does reference refer to part of a joint venture?
ALIAS: Does one reference contain an alias for the other?
BOTH-JV-CHILD: Do both refer to part of a joint venture?
Co-Reference Resolution (28)
Features (one for each candidate entity in pair)(features were manually annotated)
NAME-{1,2}: Does reference include a name?
JV-CHILD-{1,2}: Does reference refer to part of a joint venture?
ALIAS: Does one reference contain an alias for the other?
BOTH-JV-CHILD: Do both refer to part of a joint venture?
COMMON-NP: Do both contain a common NP?
Co-Reference Resolution (28)
Features (one for each candidate entity in pair)(features were manually annotated)
NAME-{1,2}: Does reference include a name?
JV-CHILD-{1,2}: Does reference refer to part of a joint venture?
ALIAS: Does one reference contain an alias for the other?
BOTH-JV-CHILD: Do both refer to part of a joint venture?
COMMON-NP: Do both contain a common NP?
SAME-SENTENCE: Are both in the same sentence?
Co-Reference Resolution (29)
Algorithm: C4.5 (Builds a Decision Tree Using Training Data)
Co-Reference Resolution (29)
Algorithm: C4.5
(1) Incrementally build decision-tree from labeled training examples
Co-Reference Resolution (29)
Algorithm: C4.5
(1) Incrementally build decision-tree from labeled training examples
(2) At each stage choose “best” attribute to split dataset
e.g., use info-gain to compare features
Co-Reference Resolution (29)
Algorithm: C4.5
(1) Incrementally build decision-tree from labeled training examples
(2) At each stage choose “best” attribute to split dataset
e.g., use info-gain to compare features
(3) After building complete tree, prune the leaves to prevent overfitting
e.g., remove branches based on useless features
Co-Reference Resolution (30)
Algorithm: Building Co-Reference Chains
(1) If A and B co-reference
Co-Reference Resolution (30)
Algorithm: Building Co-Reference Chains
(1) If A and B co-reference
(2) And if B and C co-reference
Co-Reference Resolution (30)
Algorithm: Building Co-Reference Chains
(1) If A and B co-reference
(2) And if B and C co-reference
(3) A-B-C all co-reference
Co-Reference Resolution (30)
Algorithm: Building Co-Reference Chains
(1) If A and B co-reference
(2) And if B and C co-reference
(3) A-B-C all co-reference
The Mention-Pair model
Co-Reference Resolution (31)
• Problems:
• Limited features (and manually annotated)
• Features are domain and possibly language-dependent
Co-Reference Resolution (31)
• Problems:
• Limited features (and manually annotated)
• Features are domain and possibly language-dependent
• Advantages:
• Features weights not assumed, so less language-dependent
Co-Reference Resolution (31)
• Problems:
• Limited features (and manually annotated)
• Features are domain and possibly language-dependent
• Advantages:
• Features weights not assumed, so less language-dependent
• Learning algorithm can give less weight to more inaccurate features
Co-Reference Resolution (31)
• Problems:
• Limited features (and manually annotated)
• Features are domain and possibly language-dependent
• Advantages:
• Features weights not assumed, so less language-dependent
• Learning algorithm can give less weight to more inaccurate features
• Negative as well as Positive information
Schedule
1:10 – 1:20 Introductions
1:20 – 1:30 Structure of Class
1:30 – 2:10 Intro to Data-Driven Computational Pragmatics
2:10 – 2:50 Co-Reference Resolution
2:50 – 3:00 Conclusion / Flex Time
Concluding
• Referencing Relations vs. Lexical Relations
• Co-Referencing: Relations between referents
Concluding
• Referencing Relations vs. Lexical Relations
• Co-Referencing: Relations between referents
• Lexical Relations (e.g., Entailment): Relations between senses
Concluding (2)
Relations Between Referents
Mary brought her bike.
The U.S. is a large country.
The judges heard it and they are angry.
He is certainly the man I saw.
Concluding (2)
Relations Between Referents
Mary brought her bike.
The U.S. is a large country.
The judges heard it and they are angry.
He is certainly the man I saw.
Specific to a particular use (Local)
Concluding (2)
Relations Between Referents Relations Between Senses
Mary brought her bike.
The U.S. is a large country.
The judges heard it and they are angry.
He is certainly the man I saw.
House ENTAILS => Building
Run ENTAILS => Move
Car ENTAILS => Automobile
Corsica ENTAILS => Island
Specific to a particular use (Local)
Concluding (2)
Relations Between Referents Relations Between Senses
Mary brought her bike.
The U.S. is a large country.
The judges heard it and they are angry.
He is certainly the man I saw.
House ENTAILS => Building
Run ENTAILS => Move
Car ENTAILS => Automobile
Corsica ENTAILS => Island
Specific to a particular use (Local) Specific to senses of words (Global)
Concluding (3)
• Reasoning and Inferences
• Requires Co-Referencing
• Requires Lexical Relations
• Thursday’s Focus: Learning Lexical Relations for Reasoning