Probabilistic and Lexicalized Parsing

30
Probabilistic and Lexicalized Parsing

description

Probabilistic and Lexicalized Parsing. Probabilistic CFGs. Weighted CFGs Attach weights to rules of CFG Compute weights of derivations Use weights to pick, preferred parses Utility: Pruning and ordering the search space, disambiguate, Language Model for ASR. - PowerPoint PPT Presentation

Transcript of Probabilistic and Lexicalized Parsing

Page 1: Probabilistic and Lexicalized Parsing

Probabilistic and Lexicalized Parsing

Page 2: Probabilistic and Lexicalized Parsing

Probabilistic CFGs• Weighted CFGs

– Attach weights to rules of CFG– Compute weights of derivations– Use weights to pick, preferred parses

• Utility: Pruning and ordering the search space, disambiguate, Language Model for ASR.

• Parsing with weighted grammars (like Weighted FA)– T* = arg maxT W(T,S)

• Probabilistic CFGs are one form of weighted CFGs.

Page 3: Probabilistic and Lexicalized Parsing

Probability Model• Rule Probability:

– Attach probabilities to grammar rules

– Expansions for a given non-terminal sum to 1

R1: VP V .55

R2: VP V NP .40

R3: VP V NP NP .05

– Estimate the probabilities from annotated corpora P(R1)=counts(R1)/counts(VP)

• Derivation Probability:– Derivation T= {R1…Rn}

– Probability of a derivation:

– Most likely probable parse: – Probability of a sentence:

• Sum over all possible derivations for the sentence

• Note the independence assumption: Parse probability does not change based on where the rule is expanded.

n

iiRPTP

1

)()(

)(maxarg* TPTT

T

STPSP )|()(

Page 4: Probabilistic and Lexicalized Parsing

Structural ambiguity • S NP VP• VP V NP• NP NP PP• VP VP PP• PP P NP

• NP John | Mary | Denver• V -> called• P -> from

John called Mary from Denver

S

VP PP

NP VP

V NP NPP

John called Mary from Denver

S

NP

NP VP

V NP PP

PJohn called Mary

from Denver

NP

Page 5: Probabilistic and Lexicalized Parsing

Cocke-Younger-Kasami Parser

• Bottom-up parser with top-down filtering

• Start State(s): (A, i, i+1) for each Awi+1

• End State: (S, 0,n) n is the input size• Next State Rules

– (Bi, k) (C, k, j) (A, i,j) if ABC

Page 6: Probabilistic and Lexicalized Parsing

Example

John called Mary from Denver

Page 7: Probabilistic and Lexicalized Parsing

Base Case: Aw

NP

P Denver

NP from

V Mary

NP called

John

Page 8: Probabilistic and Lexicalized Parsing

Recursive Cases: ABC

NP

P Denver

NP from

X V Mary

NP called

John

Page 9: Probabilistic and Lexicalized Parsing

NP

P Denver

VP NP from

X V Mary

NP called

John

Page 10: Probabilistic and Lexicalized Parsing

NP

X P Denver

VP NP from

X V Mary

NP called

John

Page 11: Probabilistic and Lexicalized Parsing

PP NP

X P Denver

VP NP from

X V Mary

NP called

John

Page 12: Probabilistic and Lexicalized Parsing

PP NP

X P Denver

S VP NP from

V Mary

NP called

John

Page 13: Probabilistic and Lexicalized Parsing

PP NP

X X P Denver

S VP NP from

X V Mary

NP called

John

Page 14: Probabilistic and Lexicalized Parsing

NP PP NP

X P Denver

S VP NP from

X V Mary

NP called

John

Page 15: Probabilistic and Lexicalized Parsing

NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

John

Page 16: Probabilistic and Lexicalized Parsing

VP NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

John

Page 17: Probabilistic and Lexicalized Parsing

VP NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

John

Page 18: Probabilistic and Lexicalized Parsing

VP1

VP2

NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

John

Page 19: Probabilistic and Lexicalized Parsing

S VP1

VP2

NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

John

Page 20: Probabilistic and Lexicalized Parsing

S VP NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

John

Page 21: Probabilistic and Lexicalized Parsing

Probabilistic CKY• Assign probabilities to constituents as they are

completed and placed in the table• Computing the probability

– Since we are interested in the max P(S,0,n)• Use the max probability for each constituent

• Maintain back-pointers to recover the parse.

)(*),,(*),,(),,(

),,(),,(

BCAPjkCPkiBPjiBCAP

jiBCAPjiAPBCA

Page 22: Probabilistic and Lexicalized Parsing

Problems with PCFGs• The probability model we’re using is just based on the rules in

the derivation.

• Lexical insensitivity:– Doesn’t use the words in any real way

– Structural disambiguation is lexically driven• PP attachment often depends on the verb, its object, and the preposition • I ate pickles with a fork. • I ate pickles with relish.

• Context insensitivity of the derivation– Doesn’t take into account where in the derivation a rule is used

• Pronouns more often subjects than objects • She hates Mary. • Mary hates her.

• Solution: Lexicalization– Add lexical information to each rule

Page 23: Probabilistic and Lexicalized Parsing

An example of lexical information: Heads

• Make use of notion of the head of a phrase– Head of an NP is a noun– Head of a VP is the main verb– Head of a PP is its preposition

• Each LHS of a rule in the PCFG has a lexical item

• Each RHS non-terminal has a lexical item.– One of the lexical items is shared with the LHS.

• If R is the number of binary branching rules in CFG, in lexicalized CFG: O(2*|∑|*|R|)

• Unary rules: O(|∑|*|R|)

Page 24: Probabilistic and Lexicalized Parsing

Example (correct parse)

Attribute grammar

Page 25: Probabilistic and Lexicalized Parsing

Example (less preferred)

Page 26: Probabilistic and Lexicalized Parsing

Computing Lexicalized Rule Probabilities

• We started with rule probabilities– VP V NP PP P(rule|VP)

• E.g., count of this rule divided by the number of VPs in a treebank

• Now we want lexicalized probabilities– VP(dumped) V(dumped) NP(sacks)PP(in)– P(rule|VP ^ dumped is the verb ^ sacks is the

head of the NP ^ in is the head of the PP)– Not likely to have significant counts in any

treebank

Page 27: Probabilistic and Lexicalized Parsing

Another Example• Consider the VPs

– Ate spaghetti with gusto– Ate spaghetti with marinara

• Dependency is not between mother-child.

Vp (ate)

Vp(ate) Pp(with)

vAte spaghetti with gusto

np

Vp(ate)

Pp(with)

Np(spag)

npvAte spaghetti with marinara

Page 28: Probabilistic and Lexicalized Parsing

Log-linear models for Parsing• Why restrict to the conditioning to the elements of a

rule?– Use even larger context– Word sequence, word types, sub-tree context etc.

• In general, compute P(y|x); where fi(x,y) test the properties of the context; i is the weight of that feature.

• Use these as scores in the CKY algorithm to find the best scoring parse.

Yy

yxf

yxf

ii

ii

e

exyP

),(*

),(*

)|(

Page 29: Probabilistic and Lexicalized Parsing

Supertagging: Almost parsing

Poachers now control the underground trade

NP

N

poachers

N

NN

tradeS

NP

VP

V

NP

N

poachers

::

S

SAdv

now

VP

VPAdv

now

VP

AdvVP

now

::

S

S

VP

V

NP

control

S

NP

VP

V

NP

control

S

NP

VP

V

NP

control

S

NP

NPDet

the

NP

NP

N

trade

N

NN

poachers

S

NP

VP

V

NP

N

trade

N

NAdj

underground

S

NP

VP

V

NP

Adj

underground

S

NP

VP

V

NP

Adj

underground

S

NP

:

Page 30: Probabilistic and Lexicalized Parsing

Summary• Parsing context-free grammars

– Top-down and Bottom-up parsers– Mixed approaches (CKY, Earley parsers)

• Preferences over parses using probabilities– Parsing with PCFG and PCKY algorithms

• Enriching the probability model– Lexicalization– Log-linear models for parsing