Post on 18-Jan-2016
Preposition Phrase Attachment• To what previous verb or noun phrase does
a prepositional phrase (PP) attach?
The woman with a poodle
saw
in the park
with a telescope
on Tuesdayon his bicycle
a man with a poodle
A Simplified Version
• Assume ambiguity only between preceding base NP and preceding base VP:The woman had seen the man with the telescope.
Q: Does the PP attach to the NP or the VP?
• Assumption: Consider only NP/VP head and the preposition
Simple Formulation
• Determine attachment based on log-likelihood ratio:
LLR(v, n, p) = log P(p | v) - log P(p | n)
If LLR > 0 then attach to verb,
If LLR < 0 attach to noun
Issues
• Multiple attachment:– Attachment lines cannot cross
• Proximity:– Preference for attaching to closer structures, all
else being equal
Chrysler will end its troubled venture with Maserati.P(with | end) = 0.118
P(with | venture) = 0.107
!!!
Hindle & Rooth (1993)
• Consider just sentences with a transitive verb and PP, i.e., of the form:
... bVP bNP PP ...
Q: Where does the first PP attach (NP or VP)?
Indicator variables (0 or 1):VAp: Is there a PP headed by p after v attached to v?
NAp: Is there a PP headed by p after n attached to n?
NB: Both variables can be 1 in a sentence
Attachment Probabilities
• P(attach(p) = n | v, n) = P(NAp =1 | n)– Verb attachment is irrelevant; if it attaches to the
noun it cannot attach to the verb
• P(attach(p) = v | v, n)
= P(VAp =1, NAp =0 | v, n)
= P(VAp =1 | v) P(NAp =0 | n)– Noun attachment is relevant, since the noun
‘shadows’ the verb (by proximity principle)
Estimating Parameters
• MLE:P(VAp = 1 | v) = C(v,p) / C(v)
P(NAp = 1 | n) = C(n,p) / C(n)
• Using an unlabeled corpus:– Bootstrap from unambiguous cases:
The road from Chicago to New York is long.
She went from Albany towards Buffalo.
Unsupervised Training
1. Build initial model using only unambiguous attachments
2. Apply initial model and assign attachments if LLR above threshhold
3. Divide remaining ambiguous cases as 0.5 counts for each possibility
Use of EM as principled method?
Limitations• Semantic issues:
I examined the man with a stethoscope.
I examined the man with a broken leg.
• Other contextual features:Superlative adjectives (biggest) indicate NP
• More complex sentences:
The board approved its acquisition by BigCo of Milwaukee
for $32 a share at its meeting on Tuesday.
Memory-Based Formulation• Each example has four components:
V N1 P N2
examine man with stethoscope
Class = V
• Similarity based on information gain weighting for matching components
• Need ‘semantic’ similarity measure for words:stethoscope ~ thermometer kidney ~ leg
MVDM Word Similarity• Idea: Words are similar to the extent that
they predict similar class distributions
Cc
wcPwcPww )|()|(),( 2121
• Data sparseness is a serious problem, though!
• Extend idea to task independent similarity metric...
Lexical Space• Represent ‘semantics’ of a word by
frequencies of words which coöccur with it, instead of relative frequencies of classes
• Each word has 4 vectors of frequencies for words 2 before, 1 before, 1 after, and 2 after
IN for(0.05) since(0.10) at(0.11) after(0.11) under(0.11)
GROUP network(0.08) farm(0.11) measure(0.11) package(0.11) chain(0.11) club(0.11) bill(0.11)
JAPAN china(0.16) france(0.16) britain(0.19) canada(0.19)
mexico(0.19) india(0.19) australia(0.19) korea(0.22)
Results• Baseline comparisons:
– Humans (4-tuple): 88.2%– Humans (full sentence): 93.2%– Noun always: 59.0%– Most likely for prep: 72.2%
• Without Info Gain: 83.7%• With Info Gain: 84.1%
Using Many Features
• Use many features of an example together
• Consider interaction between features during learning
• Each example represented as a feature vector:
x = (f1,f2,...,fn)
Geometric Interpretation
kNN
Linear Separator Learning
Linear Separators
• Linear separator model is a vector of weights:w = (w1,w2,...,wn)
• Binary classification: Is wTx > 0 ?– ‘Positive’ and ‘Negative’ classes
A threshhold other than 0 is possible by adding dummyelement of “1” to all vectors – the threshhold is just theweight for that element
Error-Based Learning
1. Initialize w to be all 1’s
2. Cycle x through examples repeatedly (random order):
• If wTx > 0 but x is really negative, then decrease w’s elements
• If wTx < 0 but x is really positive, then decrease w’s elements
Winnow
1. Initialize w to be all 1’s
2. Cycle v through examples repeatedly (random order):
b) If wTx > 0 but x is really negative, then:
a) If wTx < 0 but x is really positive, then
ixii ww )1(
ixii ww )1(
Issues• No negative weights possible!
– Balanced Winnow:
Formulate weights as sum of 2 weight vectors:w = w+ - w-
Learn each vector separately, w+ regularly, and w- with polarity reversed
• Multiple classes:– Learn one weight vector for each class
(learning X vs. not-X) – Choose highest value result for example
PP Attachment Features• Words in each position
• Subsets of the above, e.g: <v=run,p=with>
• Word classes at various levels of generality:stethoscope medical instrument
instrument device instrumentation artifact object physical thing
– Derived from WordNet – handmade lexicon
• 15 basic features plus word-class features
Results
• Results without preposition of:
Base Word +1 +5 +10 +15
58.1 77.4 77.2 79.1 78.5 78.6
• Results including preposition of:
84.884.484.581.9
WinnowMBLBackoffTransform