Data-Driven Computational Pragmatics 1.pdfSchedule 1:10 –1:20 Introductions 1:20 –1:30 Structure...

Post on 12-Aug-2020

0 views 0 download

Transcript of Data-Driven Computational Pragmatics 1.pdfSchedule 1:10 –1:20 Introductions 1:20 –1:30 Structure...

Data-Driven Computational Pragmatics

Shlomo Argamon and Jonathan Dunn

Schedule

1:10 – 1:20 Introductions

1:20 – 1:30 Structure of Class

1:30 – 2:10 Intro to Data-Driven Computational Pragmatics

2:10 – 2:50 Co-Reference Resolution

2:50 – 3:00 Conclusion / Flex Time

Schedule

1:10 – 1:20 Introductions

1:20 – 1:30 Structure of Class

1:30 – 2:10 Intro to Data-Driven Computational Pragmatics

2:10 – 2:50 Co-Reference Resolution

2:50 – 3:00 Conclusion / Flex Time

Structure of Class

• Class 1:

• Overview of Data-Driven Computational Pragmatics

• Co-Reference Resolution

Structure of Class

• Class 1:

• Overview of Data-Driven Computational Pragmatics

• Co-Reference Resolution

• Class 2:

• Hypothesis Testing with Data-Driven Models

• Inferences and Reasoning

Structure of Class

• Class 3:

• Metaphor and Figurative Language

Structure of Class

• Class 3:

• Metaphor and Figurative Language

• Class 4:

• Social Meaning from Stylistics and Linguistic Variations

Structure of Class

• Readings and Notes Available Online: www.jdunn.name

Structure of Class

• Readings and Notes Available Online: www.jdunn.name

• Assignments:

• Reading Questions (60%)

Structure of Class

• Readings and Notes Available Online: www.jdunn.name

• Assignments:

• Reading Questions (60%)

• Short Paper (500 - 1,000 words; 40%)

• “Consider how you could use these sorts of computational models to address or provide evidence for a hypothesis from linguistic theory.”

• Due M, 7/20

Schedule

1:10 – 1:20 Introductions

1:20 – 1:30 Structure of Class

1:30 – 2:10 Intro to Data-Driven Computational Pragmatics

2:10 – 2:50 Co-Reference Resolution

2:50 – 3:00 Conclusion / Flex Time

Overview

• Pragmatics

• Meaning-in-Language: Cognitive, Contextual, Social• (c.f. Linguistic vs. Non-Linguistic Meaning)

Overview

• Pragmatics

• Meaning-in-Language: Cognitive, Contextual, Social• (c.f. Linguistic vs. Non-Linguistic Meaning)

• Subjective vs. Objective Phenomena• Epistemic Objectivity

• Ontological Objectivity

Overview

• Pragmatics

• Meaning-in-Language: Cognitive, Contextual, Social• (c.f. Linguistic vs. Non-Linguistic Meaning)

• Subjective vs. Objective Phenomena• Epistemic Objectivity

• Ontological Objectivity

• Meaning-in-Language: Epistemically Objective• Can be modeled

• Can make testable predictions

Overview (2)

• Computational Pragmatics

• Models do not have direct access to introspection / intuition

Overview (2)

• Computational Pragmatics

• Models do not have direct access to introspection / intuition

• Models entirely syntactic in nature (e.g., symbol manipulation)

Overview (2)

• Computational Pragmatics

• Models do not have direct access to introspection / intuition

• Models entirely syntactic in nature (e.g., symbol manipulation)

• Data-Driven

• Learn models directly from linguistic data

Overview (2)

• Computational Pragmatics

• Models do not have direct access to introspection / intuition

• Models entirely syntactic in nature (e.g., symbol manipulation)

• Data-Driven

• Learn models directly from linguistic data

• Limited rule-based heuristics

Overview (2)

• Computational Pragmatics

• Models do not have direct access to introspection / intuition

• Models entirely syntactic in nature (e.g., symbol manipulation)

• Data-Driven

• Learn models directly from linguistic data

• Limited rule-based heuristics

• Corpus-based view of language: Language as a produced, observable phenomenon

Overview (3)

Computational Pragmatics: A Spectrum

Computers Doing Pragmatics(e.g., Siri in the future)

AI as an Engineering Task

Overview (3)

Computational Pragmatics: A Spectrum

Computers Doing Pragmatics(e.g., Siri in the future)

AI as an Engineering Task

Computational Modelling for Pragmatics(e.g., computer-assisted corpus linguistics)

AI as Cognitive Science

Sources of Evidence

Introspection

Sources of Evidence

Introspection

Hand-crafted Datasets

Sources of Evidence

Introspection

Hand-crafted Datasets

Huge “Natural”Datasets

Sources of Evidence

Introspection

Hand-crafted Datasets

Huge “Natural”Datasets

But also depend on introspection:Gold-Standard Annotations

Overview (5)

Computational pragmatics is tricky

SYNTAX

Overview (5)

Computational pragmatics is tricky

SYNTAX

SEMANTICS

Overview (5)

Computational pragmatics is tricky

SYNTAX

SEMANTICS

PRAGMATICS

Overview (5)

Data-Driven Computational Pragmatics

Overview (5)

Data-Driven Computational Pragmatics

Data

(Corpora)

Overview (5)

Data-Driven Computational Pragmatics

Data

(Corpora)

Features

(Representations)

Overview (5)

Data-Driven Computational Pragmatics

Data

(Corpora)

Features

(Representations)Algorithms

(Model Building)

Overview (7)

Data-Driven =?= Model-Independent

Overview (10)

• Sample features for the co-reference task, for a given pair of candidate entities:

Overview (10)

• Sample features for the co-reference task, for a given pair of candidate entities:

• Binary Feature: Do these co-reference candidates share number information?

Overview (10)

• Sample features for the co-reference task, for a given pair of candidate entities:

• Binary Feature: Do these co-reference candidates share number information?

• Categorical Feature: What semantic role does entity A have? Entity B?

Overview (10)

• Sample features for the co-reference task, for a given pair of candidate entities:

• Binary Feature: Do these co-reference candidates share number information?

• Categorical Feature: What semantic role does entity A have? Entity B?

• Frequency Feature: Frequency of represented verb-argument pair in reference corpus(Could model verb selection preferences: does the verb accept the candidate?)

Bob bought a new car. Then he drove it for hours.

Overview (10)

• Sample features for the co-reference task, for a given pair of candidate entities:

• Binary Feature: Do these co-reference candidates share number information?

• Categorical Feature: What semantic role does entity A have? Entity B?

• Frequency Feature: Frequency of represented verb-argument pair in reference corpus(Could model verb selection preferences: does the verb accept the candidate?)

• Ratio Feature: Minimum Edit Distance for Candidates / Total String Length

Overview (10)

• Sample features for the co-reference task, for a given pair of candidate entities:

• Binary Feature: Do these co-reference candidates share number information?

• Categorical Feature: What semantic role does entity A have? Entity B?

• Frequency Feature: Frequency of represented verb-argument pair in reference corpus(Could model verb selection preferences: does the verb accept the candidate?)

• Ratio Feature: Minimum Edit Distance for Candidates / Total String Length

• Measurement Feature: Difference between abstractness ratings for candidate and its verb(Could model verb selection preferences)

Schedule

1:10 – 1:20 Introductions

1:20 – 1:30 Structure of Class

1:30 – 2:10 Intro to Data-Driven Computational Pragmatics

2:10 – 2:50 Co-Reference Resolution

2:50 – 3:00 Conclusion / Flex Time

Co-Reference Resolution (3)

• Co-Reference Resolution:

• What words refer to the same entity?

Co-Reference Resolution (3)

• Co-Reference Resolution:

• What words refer to the same entity?

North Korea opened its doors to the U.S. today, welcoming

Secretary of State Madeline Albright. She says her visit is a

good start. The U.S. remains concerned about North Korea’s

missile development program and its exports of missiles to

Iran.

Fernandes, et al. (2014). “Latent Trees for Coreference Resolution.” Computational Linguistics, 40(4)

Co-Reference Resolution (3)

• Co-Reference Resolution:

• What words refer to the same entity?

North Korea opened its doors to the U.S. today, welcoming

Secretary of State Madeline Albright. She says her visit is a

good start. The U.S. remains concerned about North Korea’s

missile development program and its exports of missiles to

Iran.

Fernandes, et al. (2014). “Latent Trees for Coreference Resolution.” Computational Linguistics, 40(4)

Co-Reference Resolution (3)

• Co-Reference Resolution:

• What words refer to the same entity?

North Korea opened its doors to the U.S. today, welcoming

Secretary of State Madeline Albright. She says her visit is a

good start. The U.S. remains concerned about North Korea’s

missile development program and its exports of missiles to

Iran.

Fernandes, et al. (2014). “Latent Trees for Coreference Resolution.” Computational Linguistics, 40(4)

Co-Reference Resolution (3)

• Co-Reference Resolution:

• What words refer to the same entity?

North Korea opened its doors to the U.S. today, welcoming

Secretary of State Madeline Albright. She says her visit is a

good start. The U.S. remains concerned about North Korea’s

missile development program and its exports of missiles to

Iran.

Fernandes, et al. (2014). “Latent Trees for Coreference Resolution.” Computational Linguistics, 40(4)

Co-Reference Resolution (3)

• Co-Reference Resolution:

• What words refer to the same entity?

North Korea opened its doors to the U.S. today, welcoming

Secretary of State Madeline Albright. She says her visit is a

good start. The U.S. remains concerned about North Korea’s

missile development program and its exports of missiles to

Iran.

Fernandes, et al. (2014). “Latent Trees for Coreference Resolution.” Computational Linguistics, 40(4)

Co-Reference Resolution (2)

• Many different kinds of linguistic phenomena:

• Proper names (“George”)

Co-Reference Resolution (2)

• Many different kinds of linguistic phenomena:

• Proper names (“George”)

• Aliases (“LSI”)

Co-Reference Resolution (2)

• Many different kinds of linguistic phenomena:

• Proper names (“George”)

• Aliases (“LSI”)

• Definite NPs (“the Linguistic Summer Institute”)

Co-Reference Resolution (2)

• Many different kinds of linguistic phenomena:

• Proper names (“George”)

• Aliases (“LSI”)

• Definite NPs (“the Linguistic Summer Institute”)

• Pronouns (“it”, “they”)

Co-Reference Resolution (2)

• Many different kinds of linguistic phenomena:

• Proper names (“George”)

• Aliases (“LSI”)

• Definite NPs (“the Linguistic Summer Institute”)

• Pronouns (“it”, “they”)

• Appositives (“the first institute to be...”)

Co-Reference Resolution (2)

• Many different kinds of linguistic phenomena:

• Proper names (“George”)

• Aliases (“LSI”)

• Definite NPs (“the Linguistic Summer Institute”)

• Pronouns (“it”, “they”)

• Appositives (“the first institute to be...”)

• Bridging References (“the cabinet was wood, but the top granite”)

Co-Reference Resolution Data

• Annotated corpora

Co-Reference Resolution Data

• Annotated corpora

• Each mention annotated with an ID of the unique entity it refers to

Co-Reference Resolution Data

• Annotated corpora

• Each mention annotated with an ID of the unique entity it refers to

• Can extract pairwise relations between mentions

Co-Reference Resolution Data

• Annotated corpora

• Each mention annotated with an ID of the unique entity it refers to

• Can extract pairwise relations between mentions

• Genre of the text kinds of co-reference phenomena

Co-Reference Resolution Features

• Agreement: number, person, case, etc.

Co-Reference Resolution Features

• Agreement: number, person, case, etc.

• Syntactic restrictions

Co-Reference Resolution Features

• Agreement: number, person, case, etc.

• Syntactic restrictions

• Semantic selectional preferences

Co-Reference Resolution Features

• Agreement: number, person, case, etc.

• Syntactic restrictions

• Semantic selectional preferences

• Syntactic/semantic role preferences

Co-Reference Resolution Features

• Agreement: number, person, case, etc.

• Syntactic restrictions

• Semantic selectional preferences

• Syntactic/semantic role preferences

• Saliency: recency, repetition

Co-Reference Resolution Features

• Agreement: number, person, case, etc.

• Syntactic restrictions

• Semantic selectional preferences

• Syntactic/semantic role preferences

• Saliency: recency, repetition

• Causal coherence

Co-Reference Resolution Algorithms

• Three basic dichotomies:

• Relationship between linguistic units or between entities

Co-Reference Resolution Algorithms

• Three basic dichotomies:

• Relationship between linguistic units or between entities

• Pairwise relationships or larger clusters

Co-Reference Resolution Algorithms

• Three basic dichotomies:

• Relationship between linguistic units or between entities

• Pairwise relationships or larger clusters

• Whole text at once or processing sequentially item by item

Co-Reference Resolution (5)

• Co-Reference Resolution: The Mention-Pair Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

1. Find all entities (assumed)

Co-Reference Resolution (5)

• Co-Reference Resolution: The Mention-Pair Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

1. Find all entities (assumed)

2. Classify each possible pair (within defined window)

Co-Reference Resolution (5)

• Co-Reference Resolution: The Mention-Pair Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

1. Find all entities (assumed)

2. Classify each possible pair (within defined window)

3. Cluster identified co-referencing pairs into chains

Co-Reference Resolution (6)

• Co-Reference Resolution: The Mention-Pair Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

Classifier: Are Two NPs Co-Referencing? (Binary Category: Yes/No)

Co-Reference Resolution (6)

• Co-Reference Resolution: The Mention-Pair Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

Classifier: Are Two NPs Co-Referencing? (Binary Category: Yes/No)

Pairs To Classify:

Madeline Albright

Iran

North Korea

Her

its

North Korea

its

Co-Reference Resolution (6)

• Co-Reference Resolution: The Mention-Pair Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

Classifier: Are Two NPs Co-Referencing? (Binary Category: Yes/No)

Pairs To Classify:

North Korea

its

U.S.

She

Her

its

Iran

Madeline Albright

Iran

North Korea

Her

its

North Korea

its

Co-Reference Resolution (6)

• Co-Reference Resolution: The Mention-Pair Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

Classifier: Are Two NPs Co-Referencing? (Binary Category: Yes/No)

Pairs To Classify:

North Korea

its

U.S.

She

Her

its

Iran

Madeline Albright

Iran

North Korea

Her

its

North Korea

its

No

No

No

Yes

No

Yes

No

Co-Reference Resolution (7)

• Co-Reference Resolution: The Mention-Pair Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

Clustering: Which Co-Referencing NPs Belong to the Same Chain?

Co-Reference Resolution (7)

• Co-Reference Resolution: The Mention-Pair Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

Clustering: Which Co-Referencing NPs Belong to the Same Chain?

• If “Madeline Albright” and “she” co-reference,

Co-Reference Resolution (7)

• Co-Reference Resolution: The Mention-Pair Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

Clustering: Which Co-Referencing NPs Belong to the Same Chain?

• If “Madeline Albright” and “she” co-reference,

• And if “she” and “her” co-reference,

Co-Reference Resolution (7)

• Co-Reference Resolution: The Mention-Pair Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

Clustering: Which Co-Referencing NPs Belong to the Same Chain?

• If “Madeline Albright” and “she” co-reference,

• And if “she” and “her” co-reference,

• Then “Madeline Albright” and “her” must also co-reference

Co-Reference Resolution (7)

• Co-Reference Resolution: The Mention-Pair Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

Clustering: Which Co-Referencing NPs Belong to the Same Chain?

• If “Madeline Albright” and “she” co-reference,

• And if “she” and “her” co-reference,

• Then “Madeline Albright” and “her” must also co-reference

Works great, but only if there are no classifier errors.

Co-Reference Resolution (8)

• Co-Reference Resolution: The Entity-Mention Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

Classifier: Are an NP and a Preceding Cluster Co-Referencing?

Co-Reference Resolution (8)

• Co-Reference Resolution: The Entity-Mention Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

Classifier: Are an NP and a Preceding Cluster Co-Referencing?

Pairs To Classify:

[North Korea + its]

[North Korea + its]

[Madeline Albright + She]

[Madeline Albright + She]

[She + her]

[North Korea + its]

[North Korea’s + its]

Co-Reference Resolution (8)

• Co-Reference Resolution: The Entity-Mention Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

Classifier: Are an NP and a Preceding Cluster Co-Referencing?

Pairs To Classify:

[North Korea + its]

[North Korea + its]

[Madeline Albright + She]

[Madeline Albright + She]

[She + her]

[North Korea + its]

[North Korea’s + its]

Madeline Albright

She

its

her

its

North Korea’s

Iran

Co-Reference Resolution (8)

• Co-Reference Resolution: The Entity-Mention Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

Classifier: Are an NP and a Preceding Cluster Co-Referencing?

Pairs To Classify:

[North Korea + its]

[North Korea + its]

[Madeline Albright + She]

[Madeline Albright + She]

[She + her]

[North Korea + its]

[North Korea’s + its]

Madeline Albright

She

its

her

its

North Korea’s

Iran

No

No

No

Yes

No

Yes

No

Co-Reference Resolution (9)

• Co-Reference Resolution: The Entity-Mention Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

Classifier: Are an NP and a Preceding Cluster Co-Referencing?

• Allows cluster-level features

• Maintains consistency within large clusters

Co-Reference Resolution (10)

• Co-Reference Resolution: Remaining Problems

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

• Often very many candidates• Ranking models help choose best

Co-Reference Resolution (10)

• Co-Reference Resolution: Remaining Problems

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

• Often very many candidates• Ranking models help choose best

• Many entities are not anaphoric • “Singletons”• More than choice of cluster

Co-Reference Resolution (10)

• Co-Reference Resolution: Remaining Problems

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

• Often very many candidates• Ranking models help choose best

• Many entities are not anaphoric • “Singletons”• More than choice of cluster

• Co-reference can depend on lexical relations• “Corsica is …. The island is …..”

Co-Reference Resolution (11)

• Features

• First, what are the cues for / properties of co-referenced words?

Co-Reference Resolution (11)

• Features

• First, what are the cues for / properties of co-referenced words?

• Second, how can we model those cues?

Co-Reference Resolution (11)

• Features

• First, what are the cues for / properties of co-referenced words?

• Second, how can we model those cues?

• e.g., Automatically extract features representing those cues

Co-Reference Resolution (11)

• Features

• First, what are the cues for / properties of co-referenced words?

• Second, how can we model those cues?

• e.g., Automatically extract features representing those cues

• (1) How to represent the language (e.g., WordNet synsets, parse tree)

Co-Reference Resolution (11)

• Features

• First, what are the cues for / properties of co-referenced words?

• Second, how can we model those cues?

• e.g., Automatically extract features representing those cues

• (1) How to represent the language (e.g., WordNet synsets, parse tree)

• (2) How to measure the property in that representation

Co-Reference Resolution (12)

• Hard Constraints

• Number agreement

• “John has an Acura. It is red.”

Co-Reference Resolution (12)

• Hard Constraints

• Number agreement

• “John has an Acura. It is red.”

• Person and case agreement

• “*John and Mary have Acuras. We love them.” (where We=John and Mary)

Co-Reference Resolution (12)

• Hard Constraints

• Number agreement

• “John has an Acura. It is red.”

• Person and case agreement

• “*John and Mary have Acuras. We love them.” (where We=John and Mary)

• Gender agreement

• “John has an Acura. He/it/she is attractive.”

Co-Reference Resolution (12)

• Hard Constraints

• Number agreement

• “John has an Acura. It is red.”

• Person and case agreement

• “*John and Mary have Acuras. We love them.” (where We=John and Mary)

• Gender agreement

• “John has an Acura. He/it/she is attractive.”

Binary Features:For each candidate pair, is the constraint satisfied?

(Yes/No)

Co-Reference Resolution (13)

• Hard Constraints

• Number agreement

• “John has an Acura. It is red.”

• Person and case agreement

• “*John and Mary have Acuras. We love them.” (where We=John and Mary)

• Gender agreement

• “John has an Acura. He/it/she is attractive.”

Categorical Features:A feature for each candidates number, person,

gender, etc.

Co-Reference Resolution (14)

• Syntactic constraints

• “John bought himself a new Acura.” (himself=John)

• “John bought him a new Acura.” (him = not John)

Co-Reference Resolution (14)

• Syntactic constraints

• “John bought himself a new Acura.” (himself=John)

• “John bought him a new Acura.” (him = not John)

• Required representation: parse tree / dependency relations

• Binary Feature: Does the necessary relation obtain?

Co-Reference Resolution (15)

• Selectional Restrictions

• “John parked his Acura in the garage. He had driven it around for hours.”

• “it” must refer to something that can be driven.

Co-Reference Resolution (15)

• Selectional Restrictions

• “John parked his Acura in the garage. He had driven it around for hours.”

• “it” must refer to something that can be driven.

• Knowledge-based Approach:

• VerbNet for verb properties

• WordNet for synset membership

Co-Reference Resolution (16)

• Selectional Restrictions

• “John parked his Acura in the garage. He had driven it around for hours.”

• “it” must refer to something that can be driven.

• Distributional Approach:

• Cluster nouns and verbs into classes

• What is the probability noun A will occur with verb B?

Co-Reference Resolution (17)

• Recency / Salience

• “John has an Integra. Bill has a Legend. Mary likes to drive it.”

Co-Reference Resolution (17)

• Recency / Salience

• “John has an Integra. Bill has a Legend. Mary likes to drive it.”

• Syntactic Operationalization

• How far removed in the parse tree are A and B?

Co-Reference Resolution (17)

• Recency / Salience

• “John has an Integra. Bill has a Legend. Mary likes to drive it.”

• Syntactic Operationalization

• How far removed in the parse tree are A and B?

• Semantic Operationalization

• How prominent is the semantic role of A (e.g., agent vs. patient)

Co-Reference Resolution (18)

• Grammatical Role: Subject preference

• “John went to the Acura dealership with Bill. He bought an Integra.”

• “Bill went to the Acura dealership with John. He bought an Integra.”

• “(?) John and Bill went to the Acura dealership. He bought an Integra.”

Co-Reference Resolution (18)

• Grammatical Role: Subject preference

• “John went to the Acura dealership with Bill. He bought an Integra.”

• “Bill went to the Acura dealership with John. He bought an Integra.”

• “(?) John and Bill went to the Acura dealership. He bought an Integra.”

• Categorical Feature:

• Grammatical Role or Semantic Role of each candidate as features

Co-Reference Resolution (19)

• Repeated Mentions Preference

• “John needed a car to get to his new job. He decided that he wanted something sporty. Bill went to the Acura dealership with him. He bought an Integra.”

Co-Reference Resolution (19)

• Repeated Mentions Preference

• “John needed a car to get to his new job. He decided that he wanted something sporty. Bill went to the Acura dealership with him. He bought an Integra.”

• Not as relevant in pairwise classification

• In entity-mention model, feature of candidate cluster size to favor larger clusters

Co-Reference Resolution (20)

• Verb Semantics Preferences

• “John telephoned Bill. He lost the pamphlet on Acuras.”

• “John criticized Bill. He lost the pamphlet on Acuras.”

Co-Reference Resolution (20)

• Verb Semantics Preferences

• “John telephoned Bill. He lost the pamphlet on Acuras.”

• “John criticized Bill. He lost the pamphlet on Acuras.”

• Implicit causality

• Implicit cause of criticizing is object.

• Implicit cause of telephoning is subject.

Co-Reference Resolution (21)

• Co-Reference Resolution: Features

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

String-Matching:

“U.S.” and “U.S.” = Identical

“North Korea” and “North Korea’s” = Similar

Co-Reference Resolution (21)

• Co-Reference Resolution: Features

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

String-Matching:

“U.S.” and “U.S.” = Identical

“North Korea” and “North Korea’s” = Similar

Minimum Edit Distance

How many operations are required to convert the first string into the second?

Co-Reference Resolution (21)

• Co-Reference Resolution: Features

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

String-Matching:

“U.S.” and “U.S.” = Identical

“North Korea” and “North Korea’s” = Similar

Minimum Edit Distance

How many operations are required to convert the first string into the second?

“U.S.” and “U.S.” = 0“North Korea” and “North Korea’s” = 2

Co-Reference Resolution (22)

• Assume we have all the features for a candidate pair:

“North Korea” and “its”

• The next step is to use the features to predict whether they co-reference

Co-Reference Resolution (22)

• Assume we have all the features for a candidate pair:

“North Korea” and “its”

• The next step is to use the features to predict whether they co-reference

Sample Approach 1:

Lappin & Leass. (1994). “An Algorithm for Pronominal Anaphora Resolution.” Computational Linguistics.

Co-Reference Resolution (22)

• Assume we have all the features for a candidate pair:

“North Korea” and “its”

• The next step is to use the features to predict whether they co-reference

Sample Approach 1:

Lappin & Leass. (1994). “An Algorithm for Pronominal Anaphora Resolution.” Computational Linguistics.

Sample Approach 2:

McCarthy & Lehnert. (1995). “Using Decision Trees for Coreference Resolution.” Proceedings of IJCAI.

Co-Reference Resolution (23)

• Lappin and Leass (1994): Given he/she/it, assign antecedent

Co-Reference Resolution (23)

• Lappin and Leass (1994): Given he/she/it, assign antecedent

• (1) Discourse model update

• When a new noun phrase is encountered:

Co-Reference Resolution (23)

• Lappin and Leass (1994): Given he/she/it, assign antecedent

• (1) Discourse model update

• When a new noun phrase is encountered:

(a) Add a representation to discourse model with a salience value

Co-Reference Resolution (23)

• Lappin and Leass (1994): Given he/she/it, assign antecedent

• (1) Discourse model update

• When a new noun phrase is encountered:

(a) Add a representation to discourse model with a salience value(b) Modify saliences

Co-Reference Resolution (23)

• Lappin and Leass (1994): Given he/she/it, assign antecedent

• (1) Discourse model update

• When a new noun phrase is encountered:

(a) Add a representation to discourse model with a salience value(b) Modify saliences

• (2) Pronoun resolution

• (a) Choose the most salient antecedent

Co-Reference Resolution (24)

• Pre-defined Weights:

80Head noun emphasis

50Non-adverbial emphasis

40Ind. Obj and oblique emphasis

50Accusative (direct object) emphasis

70Existential emphasis

80Subject emphasis

100Subject recency

Co-Reference Resolution (24)

• Pre-defined Weights:

• Weights are cut in half after eachsentence is processed

80Head noun emphasis

50Non-adverbial emphasis

40Ind. Obj and oblique emphasis

50Accusative (direct object) emphasis

70Existential emphasis

80Subject emphasis

100Subject recency

Co-Reference Resolution (24)

• Pre-defined Weights:

• Weights are cut in half after eachsentence is processed

• This, and a sentence recencyweight (100 for new sentences,cut in half each time), capturesthe recency preferences

80Head noun emphasis

50Non-adverbial emphasis

40Ind. Obj and oblique emphasis

50Accusative (direct object) emphasis

70Existential emphasis

80Subject emphasis

100Subject recency

Co-Reference Resolution (25)

• Algorithm:

• (1) Collect the potential referents (up to 4 sentences back)

Co-Reference Resolution (25)

• Algorithm:

• (1) Collect the potential referents (up to 4 sentences back)

• (2) Remove potential referents that do not agree in number or gender with the pronoun

Co-Reference Resolution (25)

• Algorithm:

• (1) Collect the potential referents (up to 4 sentences back)

• (2) Remove potential referents that do not agree in number or gender with the pronoun

• (3) Remove potential references that do not pass syntactic coreference constraints

Co-Reference Resolution (25)

• Algorithm:

• (1) Collect the potential referents (up to 4 sentences back)

• (2) Remove potential referents that do not agree in number or gender with the pronoun

• (3) Remove potential references that do not pass syntactic coreference constraints

• (4) Compute total salience value of referent from all factors

Co-Reference Resolution (25)

• Algorithm:

• (1) Collect the potential referents (up to 4 sentences back)

• (2) Remove potential referents that do not agree in number or gender with the pronoun

• (3) Remove potential references that do not pass syntactic coreference constraints

• (4) Compute total salience value of referent from all factors

• (5) Select referent with highest salience value. In case of tie, select closest.

Co-Reference Resolution (26)

• Problems:

• Limited features

Co-Reference Resolution (26)

• Problems:

• Limited features

• Feature weight assumed in advance

Co-Reference Resolution (26)

• Problems:

• Limited features

• Feature weight assumed in advance

• Hard constraints mixed with imperfect feature extraction

Co-Reference Resolution (26)

• Problems:

• Limited features

• Feature weight assumed in advance

• Hard constraints mixed with imperfect feature extraction

• Limited coverage (e.g., pronouns)

Co-Reference Resolution (26)

• Problems:

• Limited features

• Feature weight assumed in advance

• Hard constraints mixed with imperfect feature extraction

• Limited coverage (e.g., pronouns)

• Hand-crafted rules are very language dependent

Co-Reference Resolution (27)

• McCarthy & Lehnert (1995): Given two entities, should they be linked?

Co-Reference Resolution (27)

• McCarthy & Lehnert (1995): Given two entities, should they be linked?

• (1) Create training data by manually annotating gold-standard links

Co-Reference Resolution (27)

• McCarthy & Lehnert (1995): Given two entities, should they be linked?

• (1) Create training data by manually annotating gold-standard links

• Every possible pair is a training example

• Positive: Co-Referenced pairs (very small number)

• Negative: Not Co-Referenced pairs (majority of examples)

Co-Reference Resolution (27)

• McCarthy & Lehnert (1995): Given two entities, should they be linked?

• (1) Create training data by manually annotating gold-standard links

• Every possible pair is a training example

• Positive: Co-Referenced pairs (very small number)

• Negative: Not Co-Referenced pairs (majority of examples)

• (2) Extract features for each possible pair

Co-Reference Resolution (27)

• McCarthy & Lehnert (1995): Given two entities, should they be linked?

• (1) Create training data by manually annotating gold-standard links

• Every possible pair is a training example

• Positive: Co-Referenced pairs (very small number)

• Negative: Not Co-Referenced pairs (majority of examples)

• (2) Extract features for each possible pair

• (3) Use learning algorithm to assign weights to features

Co-Reference Resolution (27)

• Note: Different Uses of Introspection

• (1) To create rules and feature weights in advance

• (2) To annotate gold-standard training set

• But introspection is involved in both methods

Co-Reference Resolution (28)

Features (one for each candidate entity in pair)(features were manually annotated)

Co-Reference Resolution (28)

Features (one for each candidate entity in pair)(features were manually annotated)

NAME-{1,2}: Does reference include a name?

Co-Reference Resolution (28)

Features (one for each candidate entity in pair)(features were manually annotated)

NAME-{1,2}: Does reference include a name?

JV-CHILD-{1,2}: Does reference refer to part of a joint venture?

Co-Reference Resolution (28)

Features (one for each candidate entity in pair)(features were manually annotated)

NAME-{1,2}: Does reference include a name?

JV-CHILD-{1,2}: Does reference refer to part of a joint venture?

ALIAS: Does one reference contain an alias for the other?

Co-Reference Resolution (28)

Features (one for each candidate entity in pair)(features were manually annotated)

NAME-{1,2}: Does reference include a name?

JV-CHILD-{1,2}: Does reference refer to part of a joint venture?

ALIAS: Does one reference contain an alias for the other?

BOTH-JV-CHILD: Do both refer to part of a joint venture?

Co-Reference Resolution (28)

Features (one for each candidate entity in pair)(features were manually annotated)

NAME-{1,2}: Does reference include a name?

JV-CHILD-{1,2}: Does reference refer to part of a joint venture?

ALIAS: Does one reference contain an alias for the other?

BOTH-JV-CHILD: Do both refer to part of a joint venture?

COMMON-NP: Do both contain a common NP?

Co-Reference Resolution (28)

Features (one for each candidate entity in pair)(features were manually annotated)

NAME-{1,2}: Does reference include a name?

JV-CHILD-{1,2}: Does reference refer to part of a joint venture?

ALIAS: Does one reference contain an alias for the other?

BOTH-JV-CHILD: Do both refer to part of a joint venture?

COMMON-NP: Do both contain a common NP?

SAME-SENTENCE: Are both in the same sentence?

Co-Reference Resolution (29)

Algorithm: C4.5 (Builds a Decision Tree Using Training Data)

Co-Reference Resolution (29)

Algorithm: C4.5

(1) Incrementally build decision-tree from labeled training examples

Co-Reference Resolution (29)

Algorithm: C4.5

(1) Incrementally build decision-tree from labeled training examples

(2) At each stage choose “best” attribute to split dataset

e.g., use info-gain to compare features

Co-Reference Resolution (29)

Algorithm: C4.5

(1) Incrementally build decision-tree from labeled training examples

(2) At each stage choose “best” attribute to split dataset

e.g., use info-gain to compare features

(3) After building complete tree, prune the leaves to prevent overfitting

e.g., remove branches based on useless features

Co-Reference Resolution (30)

Algorithm: Building Co-Reference Chains

(1) If A and B co-reference

Co-Reference Resolution (30)

Algorithm: Building Co-Reference Chains

(1) If A and B co-reference

(2) And if B and C co-reference

Co-Reference Resolution (30)

Algorithm: Building Co-Reference Chains

(1) If A and B co-reference

(2) And if B and C co-reference

(3) A-B-C all co-reference

Co-Reference Resolution (30)

Algorithm: Building Co-Reference Chains

(1) If A and B co-reference

(2) And if B and C co-reference

(3) A-B-C all co-reference

The Mention-Pair model

Co-Reference Resolution (31)

• Problems:

• Limited features (and manually annotated)

• Features are domain and possibly language-dependent

Co-Reference Resolution (31)

• Problems:

• Limited features (and manually annotated)

• Features are domain and possibly language-dependent

• Advantages:

• Features weights not assumed, so less language-dependent

Co-Reference Resolution (31)

• Problems:

• Limited features (and manually annotated)

• Features are domain and possibly language-dependent

• Advantages:

• Features weights not assumed, so less language-dependent

• Learning algorithm can give less weight to more inaccurate features

Co-Reference Resolution (31)

• Problems:

• Limited features (and manually annotated)

• Features are domain and possibly language-dependent

• Advantages:

• Features weights not assumed, so less language-dependent

• Learning algorithm can give less weight to more inaccurate features

• Negative as well as Positive information

Schedule

1:10 – 1:20 Introductions

1:20 – 1:30 Structure of Class

1:30 – 2:10 Intro to Data-Driven Computational Pragmatics

2:10 – 2:50 Co-Reference Resolution

2:50 – 3:00 Conclusion / Flex Time

Concluding

• Referencing Relations vs. Lexical Relations

• Co-Referencing: Relations between referents

Concluding

• Referencing Relations vs. Lexical Relations

• Co-Referencing: Relations between referents

• Lexical Relations (e.g., Entailment): Relations between senses

Concluding (2)

Relations Between Referents

Mary brought her bike.

The U.S. is a large country.

The judges heard it and they are angry.

He is certainly the man I saw.

Concluding (2)

Relations Between Referents

Mary brought her bike.

The U.S. is a large country.

The judges heard it and they are angry.

He is certainly the man I saw.

Specific to a particular use (Local)

Concluding (2)

Relations Between Referents Relations Between Senses

Mary brought her bike.

The U.S. is a large country.

The judges heard it and they are angry.

He is certainly the man I saw.

House ENTAILS => Building

Run ENTAILS => Move

Car ENTAILS => Automobile

Corsica ENTAILS => Island

Specific to a particular use (Local)

Concluding (2)

Relations Between Referents Relations Between Senses

Mary brought her bike.

The U.S. is a large country.

The judges heard it and they are angry.

He is certainly the man I saw.

House ENTAILS => Building

Run ENTAILS => Move

Car ENTAILS => Automobile

Corsica ENTAILS => Island

Specific to a particular use (Local) Specific to senses of words (Global)

Concluding (3)

• Reasoning and Inferences

• Requires Co-Referencing

• Requires Lexical Relations

• Thursday’s Focus: Learning Lexical Relations for Reasoning