Statistical NLP Winter 2009

51
Statistical NLP Winter 2009 Lecture 10: Parsing I Roger Levy Thanks to Jason Eisner & Dan Klein for slides

description

Statistical NLP Winter 2009. Lecture 10: Parsing I. Roger Levy Thanks to Jason Eisner & Dan Klein for slides. Why is natural language parsing hard?. As language structure gets more abstract, computing it gets harder Document classification finite number of classes - PowerPoint PPT Presentation

Transcript of Statistical NLP Winter 2009

Page 1: Statistical NLP Winter 2009

Statistical NLPWinter 2009

Lecture 10: Parsing I

Roger Levy

Thanks to Jason Eisner & Dan Klein for slides

Page 2: Statistical NLP Winter 2009

Why is natural language parsing hard?

• As language structure gets more abstract, computing it gets harder

• Document classification• finite number of classes• fast computation at test time

• Part-of-speech tagging (recovering label sequences)• Exponentially many possible tag sequences• But exact computation possible in O(n)

• Parsing (recovering labeled trees)• Exponentially many, or even infinite, possible trees• Exact inference worse than tagging, but still within reach

Page 3: Statistical NLP Winter 2009

Why is parsing harder than tagging

• How many trees are there for a given string?• Imagine a rule VPVP

• …∞!

• This is not a problem for inferring availability of structures (why?)

• Nor is this a problem for inferring the most probable structure in a PCFG (why?)

Page 4: Statistical NLP Winter 2009

Why parsing is harder than tagging II

• Ingredient 1: syntactic category ambiguity• Exponentially many category sequences, like tagging

• Ingredient 2: attachment ambiguity• Classic case: prepositional-phrase (PP) attachment• 1 PP: no ambiguity

• 2 PPs: some ambiguity

Page 5: Statistical NLP Winter 2009

Why parsing is harder than tagging III

• 3 PPs: much more attachment ambiguity!

• 5 PPs: 14 trees, 6 PPs: 42 trees, 7 PPs: 132 trees…

Page 6: Statistical NLP Winter 2009

Why parsing is harder than tagging IV

• Tree-structure ambiguity grows like the Catalan numbers (Knuth, 1975; Church & Patil, 1982)

• This is factorial growth on top of the exponential growth associated with sequence label ambiguity

Page 7: Statistical NLP Winter 2009

Why parsing is still tractable

• This all makes parsing look really bad• But there’s still hope• Those factorially many parses are different

combinations of common subparts

Page 8: Statistical NLP Winter 2009

How to parse tractably

• Recall that we did HMM part-of-speech tagging by storing partial results in a trellis

• An HMM is a special type of grammar with essentially two types of rules:• “Category Y can follow category X (with cost π)”• “Category X can be realized as word w (with cost η)”

• The trellis is a graph whose structure reflects its rules• Edges between all sequentially adjacent category

pairs

Page 9: Statistical NLP Winter 2009

How to parse tractably II

• But a (weighted) CFG has more complicated rules:1. “Category X can rewrite as categories α (with cost π)”2. “Preterminal X can be realized as word w (with cost η)”

• (2 is really a special case of 1)• A graph is not rich enough to reflect CFG/tree structure

• Phrases need to be stored as partial results• We also need rule combination structure

• We’ll do this with hypergraphs

Page 10: Statistical NLP Winter 2009

How to parse tractably III

• Hypergraphs are like graphs, but have hyper-edges instead of edges

• “We observe a DT as word 1 and an NN as word 2.”• “Together, these let us infer an NP spanning words 1—2.”

start state allows us to infer each of these

both of these are needed to infer this

Page 11: Statistical NLP Winter 2009

How to parse tractably IV

• Hypergraph for Bird shot flies• (only partial)

Spanning words 1—2 Spanning words 2—3

Spanning words 1—3

Grammar:S NP VPVP V NP VP VNP N NP N N

Goal

Page 12: Statistical NLP Winter 2009

How to parse tractably V

• The nodes in the hypergraph can be thought of as being arranged in a triangle

• For a sentence of length N, this is the upper right triangle of an N×N matrix

• This matrix is called the parse chart

Page 13: Statistical NLP Winter 2009

How to parse tractably VI

• Before we study examples of parsing, let’s linger on the hypergraph for a moment

• The goal of parsing is to fully interconnect all the evidence (words) and the goal

• This could be done from the bottom up…

• …or from the top down & left to right• These correspond to different parse

strategies• Today: bottom-up (later: top-down)

Page 14: Statistical NLP Winter 2009

Bottom-up (CKY) parsing

• Bottom-up is the most straightforward efficient parsing algorithm to implement

• Known as Cocke-Kasami-Young (CKY) algorithm• We’ll illustrate it for the weighted CFG instance• Each rule has a weight (log-prob) associated with it• We’re looking for the “lightest” (lowest-weight or,

equivalently, highest-probability) tree T for sentence S• Implicitly this is Bayes’ rule!

Page 15: Statistical NLP Winter 2009

CKY parsing II

• Here’s the (partial) grammar we’ll use:

• The sentence we’ll parse (see the ambiguity?):

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Time flies like an arrow

Imperative verb:“Do the dishes!”

3 NP time4 NP flies4 VP flies3 Vst time2 P like5 V like1 Det an8 N arrow

Page 16: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 17: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 18: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 19: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 20: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 21: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 22: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 23: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 24: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

PP12

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 25: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 26: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 27: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

NP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 28: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

NP18S21

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 29: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 30: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 31: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 32: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 33: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 34: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 35: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 36: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 37: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 38: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Page 39: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

SFollow backpointers …

Page 40: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

S

NP VP

Page 41: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

S

NP VP

VP PP

Page 42: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

S

NP VP

VP PP

P NP

Page 43: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

S

NP VP

VP PP

P NP

Det N

Page 44: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Which entries do we need?

Page 45: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Which entries do we need?

Page 46: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Not worth keeping …

Page 47: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

… since it just breeds worse options

Page 48: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Keep only best-in-class!

“inferior stock”

Page 49: Statistical NLP Winter 2009

time 1 flies 2 like 3 an 4 arrow 5NP3Vst3

NP10S8

NP24S22

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Keep only best-in-class!(and backpointers so you can recover parse)

Page 50: Statistical NLP Winter 2009

Computational complexity of parsing

• This approach has good space complexity• O(GN2) where G is the # categories in the grammar

• What is the time complexity of the algorithm?• It’s cubic in N…why?

• What about time complexity in G?• First, a clarification is in order• CFG rules can have right-hand sides of arbitrary length

X α• But CKY works only w/ right-hand sides of max length 2

• So we need to convert the CFG for use with CKY

Page 51: Statistical NLP Winter 2009

Computational complexity II

• Any CFG can be transformed into a new CFG whose rules are at most binary-branching (α=2)• (Look up Chomsky normal form in the book for an example)

• This transformation is reversible with no loss of information• It’s also possible to similarly transform weighted CFGs• This makes CKY possible, and it is cubic in G