Preposition Phrase Attachment To what previous verb or noun phrase does a prepositional phrase (PP)...

Post on 18-Jan-2016

229 views 0 download

Transcript of Preposition Phrase Attachment To what previous verb or noun phrase does a prepositional phrase (PP)...

Preposition Phrase Attachment• To what previous verb or noun phrase does

a prepositional phrase (PP) attach?

The woman with a poodle

saw

in the park

with a telescope

on Tuesdayon his bicycle

a man with a poodle

A Simplified Version

• Assume ambiguity only between preceding base NP and preceding base VP:The woman had seen the man with the telescope.

Q: Does the PP attach to the NP or the VP?

• Assumption: Consider only NP/VP head and the preposition

Simple Formulation

• Determine attachment based on log-likelihood ratio:

LLR(v, n, p) = log P(p | v) - log P(p | n)

If LLR > 0 then attach to verb,

If LLR < 0 attach to noun

Issues

• Multiple attachment:– Attachment lines cannot cross

• Proximity:– Preference for attaching to closer structures, all

else being equal

Chrysler will end its troubled venture with Maserati.P(with | end) = 0.118

P(with | venture) = 0.107

!!!

Hindle & Rooth (1993)

• Consider just sentences with a transitive verb and PP, i.e., of the form:

... bVP bNP PP ...

Q: Where does the first PP attach (NP or VP)?

Indicator variables (0 or 1):VAp: Is there a PP headed by p after v attached to v?

NAp: Is there a PP headed by p after n attached to n?

NB: Both variables can be 1 in a sentence

Attachment Probabilities

• P(attach(p) = n | v, n) = P(NAp =1 | n)– Verb attachment is irrelevant; if it attaches to the

noun it cannot attach to the verb

• P(attach(p) = v | v, n)

= P(VAp =1, NAp =0 | v, n)

= P(VAp =1 | v) P(NAp =0 | n)– Noun attachment is relevant, since the noun

‘shadows’ the verb (by proximity principle)

Estimating Parameters

• MLE:P(VAp = 1 | v) = C(v,p) / C(v)

P(NAp = 1 | n) = C(n,p) / C(n)

• Using an unlabeled corpus:– Bootstrap from unambiguous cases:

The road from Chicago to New York is long.

She went from Albany towards Buffalo.

Unsupervised Training

1. Build initial model using only unambiguous attachments

2. Apply initial model and assign attachments if LLR above threshhold

3. Divide remaining ambiguous cases as 0.5 counts for each possibility

Use of EM as principled method?

Limitations• Semantic issues:

I examined the man with a stethoscope.

I examined the man with a broken leg.

• Other contextual features:Superlative adjectives (biggest) indicate NP

• More complex sentences:

The board approved its acquisition by BigCo of Milwaukee

for $32 a share at its meeting on Tuesday.

Memory-Based Formulation• Each example has four components:

V N1 P N2

examine man with stethoscope

Class = V

• Similarity based on information gain weighting for matching components

• Need ‘semantic’ similarity measure for words:stethoscope ~ thermometer kidney ~ leg

MVDM Word Similarity• Idea: Words are similar to the extent that

they predict similar class distributions

Cc

wcPwcPww )|()|(),( 2121

• Data sparseness is a serious problem, though!

• Extend idea to task independent similarity metric...

Lexical Space• Represent ‘semantics’ of a word by

frequencies of words which coöccur with it, instead of relative frequencies of classes

• Each word has 4 vectors of frequencies for words 2 before, 1 before, 1 after, and 2 after

IN for(0.05) since(0.10) at(0.11) after(0.11) under(0.11)

GROUP network(0.08) farm(0.11) measure(0.11) package(0.11) chain(0.11) club(0.11) bill(0.11)

JAPAN china(0.16) france(0.16) britain(0.19) canada(0.19)

mexico(0.19) india(0.19) australia(0.19) korea(0.22)

Results• Baseline comparisons:

– Humans (4-tuple): 88.2%– Humans (full sentence): 93.2%– Noun always: 59.0%– Most likely for prep: 72.2%

• Without Info Gain: 83.7%• With Info Gain: 84.1%

Using Many Features

• Use many features of an example together

• Consider interaction between features during learning

• Each example represented as a feature vector:

x = (f1,f2,...,fn)

Geometric Interpretation

kNN

Linear Separator Learning

Linear Separators

• Linear separator model is a vector of weights:w = (w1,w2,...,wn)

• Binary classification: Is wTx > 0 ?– ‘Positive’ and ‘Negative’ classes

A threshhold other than 0 is possible by adding dummyelement of “1” to all vectors – the threshhold is just theweight for that element

Error-Based Learning

1. Initialize w to be all 1’s

2. Cycle x through examples repeatedly (random order):

• If wTx > 0 but x is really negative, then decrease w’s elements

• If wTx < 0 but x is really positive, then decrease w’s elements

Winnow

1. Initialize w to be all 1’s

2. Cycle v through examples repeatedly (random order):

b) If wTx > 0 but x is really negative, then:

a) If wTx < 0 but x is really positive, then

ixii ww )1(

ixii ww )1(

Issues• No negative weights possible!

– Balanced Winnow:

Formulate weights as sum of 2 weight vectors:w = w+ - w-

Learn each vector separately, w+ regularly, and w- with polarity reversed

• Multiple classes:– Learn one weight vector for each class

(learning X vs. not-X) – Choose highest value result for example

PP Attachment Features• Words in each position

• Subsets of the above, e.g: <v=run,p=with>

• Word classes at various levels of generality:stethoscope medical instrument

instrument device instrumentation artifact object physical thing

– Derived from WordNet – handmade lexicon

• 15 basic features plus word-class features

Results

• Results without preposition of:

Base Word +1 +5 +10 +15

58.1 77.4 77.2 79.1 78.5 78.6

• Results including preposition of:

84.884.484.581.9

WinnowMBLBackoffTransform