Data-Driven Computational Pragmatics 1.pdfSchedule 1:10 –1:20 Introductions 1:20 –1:30 Structure...

Data-Driven Computational Pragmatics

Shlomo Argamon and Jonathan Dunn

Schedule

1:10 – 1:20 Introductions

1:20 – 1:30 Structure of Class

1:30 – 2:10 Intro to Data-Driven Computational Pragmatics

2:10 – 2:50 Co-Reference Resolution

2:50 – 3:00 Conclusion / Flex Time

Structure of Class

• Class 1:

• Overview of Data-Driven Computational Pragmatics

• Co-Reference Resolution

Structure of Class

• Class 1:

• Overview of Data-Driven Computational Pragmatics

• Co-Reference Resolution

• Class 2:

• Hypothesis Testing with Data-Driven Models

• Inferences and Reasoning

Structure of Class

• Class 3:

• Metaphor and Figurative Language

Structure of Class

• Class 3:

• Metaphor and Figurative Language

• Class 4:

• Social Meaning from Stylistics and Linguistic Variations

Structure of Class

• Readings and Notes Available Online: www.jdunn.name

Structure of Class


• Assignments:

• Reading Questions (60%)

Structure of Class


• Assignments:

• Reading Questions (60%)

• Short Paper (500 - 1,000 words; 40%)

• “Consider how you could use these sorts of computational models to address or provide evidence for a hypothesis from linguistic theory.”

• Due M, 7/20

Schedule






Overview

• Pragmatics

• Meaning-in-Language: Cognitive, Contextual, Social• (c.f. Linguistic vs. Non-Linguistic Meaning)

Overview

• Pragmatics


• Subjective vs. Objective Phenomena• Epistemic Objectivity

• Ontological Objectivity

Overview

• Pragmatics


• Subjective vs. Objective Phenomena• Epistemic Objectivity

• Ontological Objectivity

• Meaning-in-Language: Epistemically Objective• Can be modeled

• Can make testable predictions

Overview (2)

• Computational Pragmatics

• Models do not have direct access to introspection / intuition

Overview (2)



• Models entirely syntactic in nature (e.g., symbol manipulation)

Overview (2)




• Data-Driven

• Learn models directly from linguistic data

Overview (2)




• Data-Driven


• Limited rule-based heuristics

Overview (2)




• Data-Driven


• Limited rule-based heuristics

• Corpus-based view of language: Language as a produced, observable phenomenon

Overview (3)

Computational Pragmatics: A Spectrum

Computers Doing Pragmatics(e.g., Siri in the future)

AI as an Engineering Task

Overview (3)

Computational Pragmatics: A Spectrum

Computers Doing Pragmatics(e.g., Siri in the future)

AI as an Engineering Task

Computational Modelling for Pragmatics(e.g., computer-assisted corpus linguistics)

AI as Cognitive Science

Sources of Evidence

Introspection

Sources of Evidence

Introspection

Hand-crafted Datasets

Sources of Evidence

Introspection


Huge “Natural”Datasets

Sources of Evidence

Introspection


Huge “Natural”Datasets

But also depend on introspection:Gold-Standard Annotations

Overview (5)

Computational pragmatics is tricky

SYNTAX

Overview (5)


SYNTAX

SEMANTICS

Overview (5)


SYNTAX

SEMANTICS

PRAGMATICS

Overview (5)


Overview (5)


Data

(Corpora)

Overview (5)


Data

(Corpora)

Features

(Representations)

Overview (5)


Data

(Corpora)

Features

(Representations)Algorithms

(Model Building)

Overview (7)

Data-Driven =?= Model-Independent

Overview (10)

• Sample features for the co-reference task, for a given pair of candidate entities:

Overview (10)


• Binary Feature: Do these co-reference candidates share number information?

Overview (10)



• Categorical Feature: What semantic role does entity A have? Entity B?

Overview (10)




• Frequency Feature: Frequency of represented verb-argument pair in reference corpus(Could model verb selection preferences: does the verb accept the candidate?)

Bob bought a new car. Then he drove it for hours.

Overview (10)





• Ratio Feature: Minimum Edit Distance for Candidates / Total String Length

Overview (10)





• Ratio Feature: Minimum Edit Distance for Candidates / Total String Length

• Measurement Feature: Difference between abstractness ratings for candidate and its verb(Could model verb selection preferences)

Schedule






Co-Reference Resolution (3)

• Co-Reference Resolution:

• What words refer to the same entity?


• Co-Reference Resolution:

• What words refer to the same entity?

North Korea opened its doors to the U.S. today, welcoming

Secretary of State Madeline Albright. She says her visit is a

good start. The U.S. remains concerned about North Korea’s

missile development program and its exports of missiles to

Iran.

Fernandes, et al. (2014). “Latent Trees for Coreference Resolution.” Computational Linguistics, 40(4)


• Many different kinds of linguistic phenomena:

• Proper names (“George”)




• Aliases (“LSI”)





• Definite NPs (“the Linguistic Summer Institute”)






• Pronouns (“it”, “they”)







• Appositives (“the first institute to be...”)







• Appositives (“the first institute to be...”)

• Bridging References (“the cabinet was wood, but the top granite”)

Co-Reference Resolution Data

• Annotated corpora



• Each mention annotated with an ID of the unique entity it refers to




• Can extract pairwise relations between mentions




• Can extract pairwise relations between mentions

• Genre of the text kinds of co-reference phenomena

Co-Reference Resolution Features

• Agreement: number, person, case, etc.



• Syntactic restrictions




• Semantic selectional preferences





• Syntactic/semantic role preferences






• Saliency: recency, repetition






• Saliency: recency, repetition

• Causal coherence

Co-Reference Resolution Algorithms

• Three basic dichotomies:

• Relationship between linguistic units or between entities




• Pairwise relationships or larger clusters




• Pairwise relationships or larger clusters

• Whole text at once or processing sequentially item by item


• Co-Reference Resolution: The Mention-Pair Model

North Korea opened its doors to the U.S. today,

welcoming Secretary of State Madeline Albright.

She says her visit is a good start. The U.S. remains

concerned about North Korea’s missile

development program and its exports of missiles

to Iran.

1. Find all entities (assumed)








to Iran.


2. Classify each possible pair (within defined window)








to Iran.


2. Classify each possible pair (within defined window)

3. Cluster identified co-referencing pairs into chains








to Iran.

Classifier: Are Two NPs Co-Referencing? (Binary Category: Yes/No)








to Iran.


Pairs To Classify:

Madeline Albright

Iran

North Korea

Her

its

North Korea

its








to Iran.


Pairs To Classify:

North Korea

its

U.S.

She

Her

its

Iran

Madeline Albright

Iran

North Korea

Her

its

North Korea

its








to Iran.


Pairs To Classify:

North Korea

its

U.S.

She

Her

its

Iran

Madeline Albright

Iran

North Korea

Her

its

North Korea

its

No

No

No

Yes

No

Yes

No








to Iran.

Clustering: Which Co-Referencing NPs Belong to the Same Chain?








to Iran.


• If “Madeline Albright” and “she” co-reference,








to Iran.



• And if “she” and “her” co-reference,








to Iran.




• Then “Madeline Albright” and “her” must also co-reference








to Iran.




• Then “Madeline Albright” and “her” must also co-reference

Works great, but only if there are no classifier errors.


• Co-Reference Resolution: The Entity-Mention Model






to Iran.

Classifier: Are an NP and a Preceding Cluster Co-Referencing?








to Iran.


Pairs To Classify:

[North Korea + its]

[North Korea + its]

[Madeline Albright + She]


[She + her]

[North Korea + its]

[North Korea’s + its]








to Iran.


Pairs To Classify:

[North Korea + its]

[North Korea + its]



[She + her]

[North Korea + its]


Madeline Albright

She

its

her

its

North Korea’s

Iran








to Iran.


Pairs To Classify:

[North Korea + its]

[North Korea + its]



[She + her]

[North Korea + its]


Madeline Albright

She

its

her

its

North Korea’s

Iran

No

No

No

Yes

No

Yes

No








to Iran.


• Allows cluster-level features

• Maintains consistency within large clusters


• Co-Reference Resolution: Remaining Problems






to Iran.

• Often very many candidates• Ranking models help choose best








to Iran.


• Many entities are not anaphoric • “Singletons”• More than choice of cluster








to Iran.


• Many entities are not anaphoric • “Singletons”• More than choice of cluster

• Co-reference can depend on lexical relations• “Corsica is …. The island is …..”


• Features

• First, what are the cues for / properties of co-referenced words?


• Features


• Second, how can we model those cues?


• Features



• e.g., Automatically extract features representing those cues


• Features




• (1) How to represent the language (e.g., WordNet synsets, parse tree)


• Features




• (1) How to represent the language (e.g., WordNet synsets, parse tree)

• (2) How to measure the property in that representation


• Hard Constraints

• Number agreement

• “John has an Acura. It is red.”





• Person and case agreement

• “*John and Mary have Acuras. We love them.” (where We=John and Mary)







• Gender agreement

• “John has an Acura. He/it/she is attractive.”









Binary Features:For each candidate pair, is the constraint satisfied?

(Yes/No)









Categorical Features:A feature for each candidates number, person,

gender, etc.


• Syntactic constraints

• “John bought himself a new Acura.” (himself=John)

• “John bought him a new Acura.” (him = not John)


• Syntactic constraints

• “John bought himself a new Acura.” (himself=John)

• “John bought him a new Acura.” (him = not John)

• Required representation: parse tree / dependency relations

• Binary Feature: Does the necessary relation obtain?


• Selectional Restrictions

• “John parked his Acura in the garage. He had driven it around for hours.”

• “it” must refer to something that can be driven.





• Knowledge-based Approach:

• VerbNet for verb properties

• WordNet for synset membership





• Distributional Approach:

• Cluster nouns and verbs into classes

• What is the probability noun A will occur with verb B?


• Recency / Salience

• “John has an Integra. Bill has a Legend. Mary likes to drive it.”




• Syntactic Operationalization

• How far removed in the parse tree are A and B?




• Syntactic Operationalization

• How far removed in the parse tree are A and B?

• Semantic Operationalization

• How prominent is the semantic role of A (e.g., agent vs. patient)


• Grammatical Role: Subject preference

• “John went to the Acura dealership with Bill. He bought an Integra.”

• “Bill went to the Acura dealership with John. He bought an Integra.”

• “(?) John and Bill went to the Acura dealership. He bought an Integra.”


• Grammatical Role: Subject preference

• “John went to the Acura dealership with Bill. He bought an Integra.”

• “Bill went to the Acura dealership with John. He bought an Integra.”

• “(?) John and Bill went to the Acura dealership. He bought an Integra.”

• Categorical Feature:

• Grammatical Role or Semantic Role of each candidate as features


• Repeated Mentions Preference

• “John needed a car to get to his new job. He decided that he wanted something sporty. Bill went to the Acura dealership with him. He bought an Integra.”


• Repeated Mentions Preference

• “John needed a car to get to his new job. He decided that he wanted something sporty. Bill went to the Acura dealership with him. He bought an Integra.”

• Not as relevant in pairwise classification

• In entity-mention model, feature of candidate cluster size to favor larger clusters


• Verb Semantics Preferences

• “John telephoned Bill. He lost the pamphlet on Acuras.”

• “John criticized Bill. He lost the pamphlet on Acuras.”


• Verb Semantics Preferences

• “John telephoned Bill. He lost the pamphlet on Acuras.”

• “John criticized Bill. He lost the pamphlet on Acuras.”

• Implicit causality

• Implicit cause of criticizing is object.

• Implicit cause of telephoning is subject.


• Co-Reference Resolution: Features






to Iran.

String-Matching:

“U.S.” and “U.S.” = Identical

“North Korea” and “North Korea’s” = Similar








to Iran.

String-Matching:



Minimum Edit Distance

How many operations are required to convert the first string into the second?








to Iran.

String-Matching:



Minimum Edit Distance

How many operations are required to convert the first string into the second?

“U.S.” and “U.S.” = 0“North Korea” and “North Korea’s” = 2


• Assume we have all the features for a candidate pair:

“North Korea” and “its”

• The next step is to use the features to predict whether they co-reference





Sample Approach 1:

Lappin & Leass. (1994). “An Algorithm for Pronominal Anaphora Resolution.” Computational Linguistics.





Sample Approach 1:

Lappin & Leass. (1994). “An Algorithm for Pronominal Anaphora Resolution.” Computational Linguistics.

Sample Approach 2:

McCarthy & Lehnert. (1995). “Using Decision Trees for Coreference Resolution.” Proceedings of IJCAI.


• Lappin and Leass (1994): Given he/she/it, assign antecedent



• (1) Discourse model update

• When a new noun phrase is encountered:





(a) Add a representation to discourse model with a salience value





(a) Add a representation to discourse model with a salience value(b) Modify saliences





(a) Add a representation to discourse model with a salience value(b) Modify saliences

• (2) Pronoun resolution

• (a) Choose the most salient antecedent


• Pre-defined Weights:

80Head noun emphasis

50Non-adverbial emphasis

40Ind. Obj and oblique emphasis

50Accusative (direct object) emphasis

70Existential emphasis

80Subject emphasis

100Subject recency



• Weights are cut in half after eachsentence is processed






80Subject emphasis

100Subject recency



• Weights are cut in half after eachsentence is processed

• This, and a sentence recencyweight (100 for new sentences,cut in half each time), capturesthe recency preferences






80Subject emphasis

100Subject recency


• Algorithm:

• (1) Collect the potential referents (up to 4 sentences back)


• Algorithm:


• (2) Remove potential referents that do not agree in number or gender with the pronoun


• Algorithm:



• (3) Remove potential references that do not pass syntactic coreference constraints


• Algorithm:




• (4) Compute total salience value of referent from all factors


• Algorithm:




• (4) Compute total salience value of referent from all factors

• (5) Select referent with highest salience value. In case of tie, select closest.


• Problems:

• Limited features


• Problems:


• Feature weight assumed in advance


• Problems:



• Hard constraints mixed with imperfect feature extraction


• Problems:




• Limited coverage (e.g., pronouns)


• Problems:




• Limited coverage (e.g., pronouns)

• Hand-crafted rules are very language dependent


• McCarthy & Lehnert (1995): Given two entities, should they be linked?



• (1) Create training data by manually annotating gold-standard links




• Every possible pair is a training example

• Positive: Co-Referenced pairs (very small number)

• Negative: Not Co-Referenced pairs (majority of examples)







• (2) Extract features for each possible pair







• (2) Extract features for each possible pair

• (3) Use learning algorithm to assign weights to features


• Note: Different Uses of Introspection

• (1) To create rules and feature weights in advance

• (2) To annotate gold-standard training set

• But introspection is involved in both methods


Features (one for each candidate entity in pair)(features were manually annotated)



NAME-{1,2}: Does reference include a name?




JV-CHILD-{1,2}: Does reference refer to part of a joint venture?





ALIAS: Does one reference contain an alias for the other?






BOTH-JV-CHILD: Do both refer to part of a joint venture?







COMMON-NP: Do both contain a common NP?







COMMON-NP: Do both contain a common NP?

SAME-SENTENCE: Are both in the same sentence?


Algorithm: C4.5 (Builds a Decision Tree Using Training Data)


Algorithm: C4.5

(1) Incrementally build decision-tree from labeled training examples


Algorithm: C4.5


(2) At each stage choose “best” attribute to split dataset

e.g., use info-gain to compare features


Algorithm: C4.5


(2) At each stage choose “best” attribute to split dataset

e.g., use info-gain to compare features

(3) After building complete tree, prune the leaves to prevent overfitting

e.g., remove branches based on useless features


Algorithm: Building Co-Reference Chains

(1) If A and B co-reference




(2) And if B and C co-reference





(3) A-B-C all co-reference





(3) A-B-C all co-reference

The Mention-Pair model


• Problems:

• Limited features (and manually annotated)

• Features are domain and possibly language-dependent


• Problems:



• Advantages:

• Features weights not assumed, so less language-dependent


• Problems:



• Advantages:


• Learning algorithm can give less weight to more inaccurate features


• Problems:



• Advantages:


• Learning algorithm can give less weight to more inaccurate features

• Negative as well as Positive information

Schedule






Concluding

• Referencing Relations vs. Lexical Relations

• Co-Referencing: Relations between referents

Concluding

• Referencing Relations vs. Lexical Relations

• Co-Referencing: Relations between referents

• Lexical Relations (e.g., Entailment): Relations between senses

Concluding (2)

Relations Between Referents

Mary brought her bike.

The U.S. is a large country.

The judges heard it and they are angry.

He is certainly the man I saw.

Concluding (2)

Relations Between Referents





Specific to a particular use (Local)

Concluding (2)

Relations Between Referents Relations Between Senses





House ENTAILS => Building

Run ENTAILS => Move

Car ENTAILS => Automobile

Corsica ENTAILS => Island

Specific to a particular use (Local)

Concluding (2)

Relations Between Referents Relations Between Senses





House ENTAILS => Building

Run ENTAILS => Move

Car ENTAILS => Automobile

Corsica ENTAILS => Island

Specific to a particular use (Local) Specific to senses of words (Global)

Concluding (3)

• Reasoning and Inferences

• Requires Co-Referencing

• Requires Lexical Relations

• Thursday’s Focus: Learning Lexical Relations for Reasoning

Data-Driven Computational Pragmatics 1.pdfSchedule 1:10 –1:20 Introductions 1:20 –1:30 Structure...

Documents

Transcript of Data-Driven Computational Pragmatics 1.pdfSchedule 1:10 –1:20 Introductions 1:20 –1:30 Structure...