Post on 14-Dec-2015
Discourse AnalysisDiscourse Analysis
David M. Cassel
Natural Language Processing
Villanova University
April 21st, 2005
David M. Cassel
Natural Language Processing
Villanova University
April 21st, 2005
April 2005Discourse Analysis
David M. Cassel
Discourse AnalysisDiscourse Analysis
Discourse: collocated, related groups of sentences (from book)
Discourse: collocated, related groups of sentences (from book)
April 2005Discourse Analysis
David M. Cassel
Discourse AnalysisDiscourse Analysis
Discourse Model -- a model to represent the entities mentioned in the discourse
Coreference or Anaphora Resolution -- determining which entity a referring expression refers to
Coherence -- modeling the logical flow of the discourse
The book also discusses Psycholinguistic Studies of Reference and Coherence
Discourse Model -- a model to represent the entities mentioned in the discourse
Coreference or Anaphora Resolution -- determining which entity a referring expression refers to
Coherence -- modeling the logical flow of the discourse
The book also discusses Psycholinguistic Studies of Reference and Coherence
April 2005Discourse Analysis
David M. Cassel
Anaphora ResolutionAnaphora ResolutionBefore the game, manager Charlie Manuel said Gavin Floyd's performance would not affect whether he remains with the team when Vicente Padilla comes off the disabled list Tuesday.
Then Floyd went out and had a nightmarish first inning: four walks, one wild pitch, one hit, four runs.
After the game, Manuel said Floyd's disastrous outing had not changed his mind. The righthander will remain with the club and be used in relief.
"The pitcher we saw in St. Louis is a pitcher who has the ability to be a very good major-league pitcher," he said. "He didn't have command of his fastball and couldn't get his breaking ball over tonight... . Maybe the cold was affecting his breaking ball, because he was bouncing a lot of them."
-- Sam Carchidi, Philadelphia Inquirer, 4/16/05
Before the game, manager Charlie Manuel said Gavin Floyd's performance would not affect whether he remains with the team when Vicente Padilla comes off the disabled list Tuesday.
Then Floyd went out and had a nightmarish first inning: four walks, one wild pitch, one hit, four runs.
After the game, Manuel said Floyd's disastrous outing had not changed his mind. The righthander will remain with the club and be used in relief.
"The pitcher we saw in St. Louis is a pitcher who has the ability to be a very good major-league pitcher," he said. "He didn't have command of his fastball and couldn't get his breaking ball over tonight... . Maybe the cold was affecting his breaking ball, because he was bouncing a lot of them."
-- Sam Carchidi, Philadelphia Inquirer, 4/16/05
April 2005Discourse Analysis
David M. Cassel
Discourse ModelDiscourse Model
Gavin Floyd
Charlie Manuel
Vicente Padilla
Gavin Floyd
heFloydThe righthanderThe pitcher we saw in St. Louishis
evoke(introduce)
refer
corefer
Adapted from Figure 18.1, Speech & Language Processing
April 2005Discourse Analysis
David M. Cassel
Types of Anaphoric ReferencesTypes of Anaphoric References Indefinite noun phrases
A baseball player like that should do well.
Definite noun phrases The righthander will remain with the club.
Pronouns He had a bad game.
Demostratives This player has a bright future.
One-anaphora I saw no less than 6 Acura Integras today. Now I want one. (from book)
Indefinite noun phrases A baseball player like that should do well.
Definite noun phrases The righthander will remain with the club.
Pronouns He had a bad game.
Demostratives This player has a bright future.
One-anaphora I saw no less than 6 Acura Integras today. Now I want one. (from book)
April 2005Discourse Analysis
David M. Cassel
Reference ConstraintsReference Constraints
Number Agreement Floyd pitched 6 innings. They went well.
Person and Case He didn’t have command of his fastball.
Gender Agreement Floyd took his glove with him. It fit well.
Syntactic Contraints Floyd threw him the ball.
Selectional Restrictions Floyd stepped onto the mound with the ball. He threw it really fast.
Number Agreement Floyd pitched 6 innings. They went well.
Person and Case He didn’t have command of his fastball.
Gender Agreement Floyd took his glove with him. It fit well.
Syntactic Contraints Floyd threw him the ball.
Selectional Restrictions Floyd stepped onto the mound with the ball. He threw it really fast.
April 2005Discourse Analysis
David M. Cassel
PreferencesPreferences
Recency Floyd threw the ball. Lieberthal picked it up. He put the ball in his pocket.
Grammatical Role Floyd threw the ball to Lieberthal. His arm was getting tired.
Repeated Mention (See article)
Parallelism Floyd threw a ball to Lieberthal. Wagner threw a ball to him, too.
Verb Semantics John telephoned Bill. He lost the pamphlet on Acuras. John criticized Bill. He lost the pamphlet on Acuras.
Recency Floyd threw the ball. Lieberthal picked it up. He put the ball in his pocket.
Grammatical Role Floyd threw the ball to Lieberthal. His arm was getting tired.
Repeated Mention (See article)
Parallelism Floyd threw a ball to Lieberthal. Wagner threw a ball to him, too.
Verb Semantics John telephoned Bill. He lost the pamphlet on Acuras. John criticized Bill. He lost the pamphlet on Acuras.
April 2005Discourse Analysis
David M. Cassel
Pronoun Resolution AlgorithmsPronoun Resolution Algorithms
Traditional Carter: shallow parsing Rich, LuperFoy: distributed
architecture Carbonell, Brown: multi-strategy Rico Pérez: scalar product Mitkov: combination of linguistic,
statistical (high 80s) Lappin, Leass: syntax-based
(86%) Hobbs: Tree Search Algorithm
(91.7%) Grosz, Joshi, Weinstein:
Centering Algorithm (77.6%) Hobbs: Coherence
Traditional Carter: shallow parsing Rich, LuperFoy: distributed
architecture Carbonell, Brown: multi-strategy Rico Pérez: scalar product Mitkov: combination of linguistic,
statistical (high 80s) Lappin, Leass: syntax-based
(86%) Hobbs: Tree Search Algorithm
(91.7%) Grosz, Joshi, Weinstein:
Centering Algorithm (77.6%) Hobbs: Coherence
Alternative Nasukawa: knowledge-
independent (93.8%) Dagan, Itai: statistical, corpus
processing (87% for “genuine” it) Connolly, Burger, Day: machine
learning Aone, Bennett: machine learning
(“close to 90%”) Mitkov: uncertainty reasoning Mitkov: 2-engine (~90%) Tin, Akman: situational semantics Say, Vakman
Alternative Nasukawa: knowledge-
independent (93.8%) Dagan, Itai: statistical, corpus
processing (87% for “genuine” it) Connolly, Burger, Day: machine
learning Aone, Bennett: machine learning
(“close to 90%”) Mitkov: uncertainty reasoning Mitkov: 2-engine (~90%) Tin, Akman: situational semantics Say, Vakman
April 2005Discourse Analysis
David M. Cassel
Lappin & LeassLappin & Leass
Book presents a slightly modified algorithm for nonreflexive, 3rd person pronouns. Two parts:
Update discourse model with salience value Resolve pronouns
Let’s apply this to some text:In the afternoon, Gavin Floyd played baseball at the park. Then he
went to a bar with Mike Lieberthal. He enjoyed a beer.
Book presents a slightly modified algorithm for nonreflexive, 3rd person pronouns. Two parts:
Update discourse model with salience value Resolve pronouns
Let’s apply this to some text:In the afternoon, Gavin Floyd played baseball at the park. Then he
went to a bar with Mike Lieberthal. He enjoyed a beer.
April 2005Discourse Analysis
David M. Cassel
Salience FactorsSalience FactorsFactor Weight
Sentence recency 100
Subject emphasis 80
Existential emphasis 70
Accusative (direct object) emphasis 50
Indirect object, oblique complement emphasis
40
Non-adverbial emphasis 50
Head noun emphasis 80
April 2005Discourse Analysis
David M. Cassel
Pronoun SaliencePronoun Salience
Factor Weight
Role parallelism 35
Cataphora -175
April 2005Discourse Analysis
David M. Cassel
L&L AlgorithmL&L Algorithm
Collect the potential referents (up to four sentences back).
Remove potential referents that do not agree in number or gender with the pronoun.
Remove potential referents that do not pass intrasentential syntactic coreference constraints.
Compute the total salience value of the referent by adding any applicable values to existing salience value.
Select the referent with the highest salience value. In case of ties, select closest referent in terms of string position.
Collect the potential referents (up to four sentences back).
Remove potential referents that do not agree in number or gender with the pronoun.
Remove potential referents that do not pass intrasentential syntactic coreference constraints.
Compute the total salience value of the referent by adding any applicable values to existing salience value.
Select the referent with the highest salience value. In case of ties, select closest referent in terms of string position.
April 2005Discourse Analysis
David M. Cassel
ExampleExample
In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer.
Rec Subj Exist Obj Ind-Obj
Non-Adv
Head Noun
Total
the afternoon 100 80 180
Gavin Floyd 100 80 50 80 310
baseball 100 50 50 50 250
the park 100 50 150
April 2005Discourse Analysis
David M. Cassel
ExampleExample
In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer.
Carry Rec Subj Exist Obj Ind-Obj
Non-Adv
Head Noun
Total
the afternoon 90
Gavin Floyd 155
baseball 125
the park 75
a bar 100 50 80 230
Mike Lieberthal 100 50 150
April 2005Discourse Analysis
David M. Cassel
ExampleExample
In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer.
Carry Rec Subj Exist Obj Ind-Obj
Non-Adv
Head Noun
Total
the afternoon 90
{Gavin Floyd, he} 155 100 80 50 80 465
baseball 125
the park 75
a bar 100 50 80 230
Mike Lieberthal 100 50 150
April 2005Discourse Analysis
David M. Cassel
ExampleExample
In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer.
Carry
the afternoon 45
{Gavin Floyd, he} 230
baseball 62
the park 37
a bar 115
Mike Lieberthal 75
a beer 280
Gavin Floyd gets 35 point for Role Parallelism. Mike Lieberthal does not.
Floyd => 265 pointsLieberthal => 75 points
We pick Floyd as the antecedent of He.
April 2005Discourse Analysis
David M. Cassel
SummarySummary
Discourse Analysis requires processing more text than POS tagging or finding entities.
Part of tracing the flow of discourse is resolving anaphora.
That resolution lets us capture more relationships and other information than we could otherwise.
Discourse Analysis requires processing more text than POS tagging or finding entities.
Part of tracing the flow of discourse is resolving anaphora.
That resolution lets us capture more relationships and other information than we could otherwise.