Generative Models of Discourse Eugene Charniak Brown Laboratory for Linguistic Information...

Generative Models of Discourse

Eugene Charniak

Brown Laboratory for Linguistic Information Processing

BL IPL

Joint Work With

• Micha Elsner

(PhD student, Brown)

• Joseph Osterwile

(Ex Undergraduate, Brown)

Abstract

Discourse, the study of how the meaning of a document is built out the meanings of its sentences, is the inter-sentential analogue of semantics. In this talk we consider the following abstract problem in discourse. Given a document, randomly permute the order of the sentences and then attempt to distinguish the original from the permuted version. We present a sequence of generative models that can handle the problem with increasing accuracy. Each model accounts for some aspect of the document, and assigns a probability to the document's contents. In the standard generative way the subsequent models simply multiply individual probabilities to get their results. We also discuss the linkage of this abstract task to more realistic ones such as essay grading, document summarization and document generation.

Revised Abstract We present a sequence of generative models that can handle

the problem with increasing accuracy. Each model accounts for some aspect of the document, and assigns a probability to the document's contents. Given a document, randomly permute the order of its sentences and then attempt to distinguish the original from the permuted version. In the standard generative way the subsequent models simply multiply individual probabilities to get their results. In this talk we consider the following abstract problem in discourse. We also discuss the linkage of this abstract task to more realistic ones such as essay grading, document summarization and document generation. Discourse, the study of how the meaning of a document is built out the meanings of its sentences, is the inter-sentential analogue of semantics.

NOTICE! This example is doctored to illustrate the program. You can ask me about the real randomized abstract if you like.

A Note on “Generative”

When we talk about a “generative” model

we do NOT mean a model that actually generates language. (If we do mean that we will say “literally generate”) Rather “generative” is used in machine learning to talk about a model that assigns probability to the input. So “generate” = “assign a probability to”.

Our Three Models

• So each of our three models assigns a probability to some aspect of the input (head-nouns, pronouns, and noun-phrase syntax, respectively).

• The idea is that the probability assigned to the original document should be higher than that assigned to the random one.

• One advantage of such generative models is that if done correctly, they can be combined by just multiplying their probabilities together. This is, in fact, exactly what we do.

More Formally

i

ii SSPDP )|()( 1,1

)()|()|()|( 1,11,11,11,1 isiiPiiNii SPSSPSSPSSP

We generate each sentence conditioned on the previous sentences

For each sentence we compute three probabilities, head-nouns, pronouns, and NP syntax.


I Introduction

II Model 1 – Head Nouns (Entity Grids)

III Model 2 - Pronominal Reference

IV Model 3 – Noun-Phrase Syntax

V Real Problems (Future Work)

Nouns Tend to Repeat

Discourse, the study of how the meaning of a document is built out the meaning’s of its sentences, is the inter-sentential analogue of semantics. In this talk we consider the following abstract problem in discourse. Given a document, randomly permute the order of the sentences and then attempt to distinguish the original from the permuted version. We present a sequence of generative models that can handle the problem with increasing accuracy. Each model accounts for some aspect of the document, and assigns a probability to the document's contents. In the standard generative way the subsequent models simply multiply individual probabilities to get their results. We also discuss the linkage of this abstract task to more realistic ones such as essay grading, document summarization and document generation.

Entity Grids

• Following Barzilay Lapata, and Lee, an entity grid is an array with the “entities” (really just the head nouns) of the document on one axis, the sentence ordering on the other, and at each point the role the entities plays in the sentence. As in previous work we limit the roles to subject (S), object (0), other (X) and not mentioned (-).

A (Partial) Entity Grid

Discourse S X - - - - -

Meaning X - - - - - -

Document X - X - X - -

Sentences X - X - - - -

Talk - X - - - - -

Problem - O - O - - -

Order - - O - - - -

Original - - X - - - - -

Version - - X - - - - -

Models - - - X - S -

The Grid for the Randomized Document

Discourse - - - - X - SMeaning - - - - - - XDocument - X X - - - XSentences - - X - - - XTalk - - - - X - -Problem O - - - O - -Order - - O - - - -Original - - X - - - -Version - - X - - - -Models X - - S - . -

The Basic E-grid Probability

For head-noun probabilities we look at each head nouns probability given its two sentence history (what roles, (S,O,X,-) it filled in the two previous sentences.

n

iiiiN nrnrnPSSP )(),(|()|( 211,1

Each noun in the sentence

The role n plays in the i-1th sentence

Model 1 Results

Baseline 50%

Model 1 82.2%

Trained on 10,000 automatically parsed documents form the NTC corpus, tested on 1323 other documents from same corpus.


I Introduction

II Model 1 - Entity Grids




Can Pronouns Help

• In our abstract the only important pronouns have intra-sentential antecedents.

• Furthermore, when the document is out of order, there will almost always be something for the pronoun to point back to.

• As we will see, pronouns are the weakest of our models, but they do help.

Adding Pronouns to the Mix

To handle pronouns we need to consider the various pronoun resolution possibilities

Unfortunately this sum is intractable, so we approximate it with

This is reasonable because most documents have only one set of reference assignments that make sense

a

pp DaAPDP ),()(

),(maxarg)( DaAPDP ap

The probability of an Antecedent and the Pronoun given the

Antecedent

)|)(()|)((

))(),(|()|,( 21

apronounnumberPapronoungenderP

amentionsadistaAPSSSaAP iiip

Probability that the antecedent is a given how far away a is, and how often it has been mentioned

Probability of the pronoun gender given the antecedent.

Probability of the pronoun number given the antecedent.

Example Pronoun Probabilities

P(ref=x|x is 1 back and appeared 1 time)= 0.25

If it is 1 back and appeared>4 times, 0.86

P(“asbestos” is neuter) = 0.998

P(“alice” is feminine)=0.84

P(“it” has a plural antecedent)=.04

Model 2 Results

Model 1 82.2%

Model 2 71.3%

Model 1+2 85.3%

Pronoun Reference vs. Discourse ModelingBest Gender Weak

Gender

Model Model

Model 2

Discourse71.3% 66.7%

Pronoun

Reference

Accuracy 79.1% 75.5%


I Introduction





Abstract

Discourse, the study of how the meaning of a document is built out the meanings of its sentences, is the inter-sentential analogue of semantics. In this talk we consider the following abstract problem in discourse. Given a document, randomly permute the order of the sentences and then attempt to distinguish the original from the permuted version. We present a sequence of generative models that can handle the problem with increasing accuracy. Each model accounts for some aspect of the document, and assigns a probability to the document's contents. In the standard generative way the subsequent models simply multiplies individual probabilities to get their results. We also discuss the linkage of this abstract task to more realistic ones such as essay grading, document summarization and document generation.

Distinctions Between First and Non-First Mentions

• The first mention of an entity tends to have more deeply embedded syntax,

• It is longer at every level of embedding,

• Uses the determiner “a” more often,

• Often uses certain key words more or less often. E.g., most newspapers seem to follow the convention that, e.g., “John Doe” will be followed by “Mr. Doe”.

Using This Information

• We assume that the first time a particular head noun occurs is the first mention, and all subsequent uses are non-first.

• We have a generative model of the noun-phrase syntax/key-words that should pick out the correct ordering.

Generative NP Syntax

),1|()|()|()|(1

1,1 lksyntaxPlhPlnpPSSPhk

knpnpiiS

l={first, notfirst}

),1,|(),1|( 1 lkssPlksyntaxPi

ii

h=height. Probability of larger h will be higher for l=first

s is either a non-terminal or key-word

A Simple Example

the document

NP

DET NOUN

P(h=1|l) is high for l=nonfirst

P(the|start,h=1,l) is high for l=nonfirst

Model 3 Results

Model 1 82.2%

Model 1+2 85.3%

Model 3 86.2%

Model 1+2+3 90.3%

1+3, 89.1%


I Introduction





Future Models

• Next week: Probabilistic choice of pronoun/full-NP.

• Next month: Insert quotations. (Almost) never in first sentence. Usually clustered together.

• Next year: Temporal relations between sentences, relations between verbs, different kinds of descriptions.

Real Problems

• Given an abstract representation of what we know about the entities in the document, (really) generate the words for those entities

• Given the sentences of two documents, and the first sentence of one of them, pick out the rest of the sentences of that document.

• The same, but with 10 documents on (roughly)

the same topic.

-er

Generative Models of Discourse Eugene Charniak Brown Laboratory for Linguistic Information...

Documents

Transcript of Generative Models of Discourse Eugene Charniak Brown Laboratory for Linguistic Information...