Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711...

114
Parsing III Maria Ryskina – CMU Slides adapted from: Dan Klein – UC Berkeley Taylor Berg-Kirkpatrick, Yulia Tsvetkov – CMU Algorithms for NLP

Transcript of Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711...

Page 1: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Parsing IIIMaria Ryskina – CMU

Slides adapted from: Dan Klein – UC Berkeley

Taylor Berg-Kirkpatrick, Yulia Tsvetkov – CMU

Algorithms for NLP

Page 2: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Learning PCFGs

Page 3: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Treebank PCFGs

▪ Use PCFGs for broad coverage parsing ▪ Can take a grammar right off the trees (doesn’t work well):

ROOT → S 1

S → NP VP . 1

NP → PRP 1

VP → VBD ADJP 1

…..

Model F1Baseline 72.0

[Charniak 96]

Page 4: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Conditional Independence?

▪ Not every NP expansion can fill every NP slot ▪ A grammar with symbols like “NP” won’t be context-free ▪ Statistically, conditional independence too strong

Page 5: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Non-Independence

▪ Independence assumptions are often too strong.

▪ Example: the expansion of an NP is highly dependent on the parent of the NP (i.e., subjects vs. objects).

▪ Also: the subject and object expansions are correlated!

NP PP DT NN PRP

6%9%11%

NP PP DT NN PRP

21%

9%9%

NP PP DT NN PRP

4%7%

23%All NPs NPs under S NPs under VP

Page 6: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Grammar Refinement

▪ Example: PP attachment

Page 7: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Grammar Refinement

▪ Example: PP attachment

Page 8: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Grammar Refinement

▪ Example: PP attachment

Page 9: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Grammar Refinement

▪ Example: PP attachment

Page 10: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Grammar Refinement

▪ Example: PP attachment

Page 11: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Grammar Refinement

▪ Example: PP attachment

Page 12: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Grammar Refinement

▪ Example: PP attachment

Page 13: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Grammar Refinement

Page 14: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Grammar Refinement

▪ Structural Annotation [Johnson ’98, Klein&Manning ’03]

Page 15: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Grammar Refinement

▪ Structural Annotation [Johnson ’98, Klein&Manning ’03]▪ Lexicalization [Collins ’99, Charniak ’00]

Page 16: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Grammar Refinement

▪ Structural Annotation [Johnson ’98, Klein&Manning ’03]▪ Lexicalization [Collins ’99, Charniak ’00]▪ Latent Variables [Matsuzaki et al. ’05, Petrov et al. ’06]

Page 17: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Structural Annotation

Page 18: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

▪ Annotation refines base treebank symbols to improve statistical fit of the grammar ▪ Structural annotation

The Game of Designing a Grammar

Page 19: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Typical Experimental Setup

▪ Corpus: Penn Treebank, WSJ

▪ Accuracy – F1: harmonic mean of per-node labeled precision and recall.

▪ Here: also size – number of symbols in grammar.

Training: sections 02-21Development: section 22 (here, first 20 files)Test: section 23

Page 20: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Vertical Markovization

▪ Vertical Markov order: rewrites depend on past k ancestor nodes.

(cf. parent annotation)

Order 1 Order 2

72%

74%

76%

78%

Vertical Markov Order

1 2 3

Sym

bols

0

6250

12500

18750

25000

Vertical Markov Order

1 2 3

Page 21: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Horizontal Markovization

71%

72%

72%

73%

73%

Horizontal Markov Order

0 1 2 inf

Sym

bols

0

3000

6000

9000

12000

Horizontal Markov Order

0 1 2 inf

Order 1 Order ∞

Page 22: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Binarization / MarkovizationNP

DT JJNN NN

Page 23: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Binarization / MarkovizationNP

DT JJNN NN

v=1,h=∞

Page 24: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Binarization / MarkovizationNP

DT JJNN NN

v=1,h=∞

NP

Page 25: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Binarization / MarkovizationNP

DT JJNN NN

v=1,h=∞

DT

NP

Page 26: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Binarization / MarkovizationNP

DT JJNN NN

v=1,h=∞

DT

NP

@NP[DT]

Page 27: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Binarization / MarkovizationNP

DT JJNN NN

v=1,h=∞

DT

NP

@NP[DT]

JJ @NP[DT,JJ]

Page 28: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Binarization / MarkovizationNP

DT JJNN NN

v=1,h=∞

DT

NP

@NP[DT]

@NP[DT,JJ,NN] NN

JJ @NP[DT,JJ]

Page 29: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Binarization / MarkovizationNP

DT JJNN NN

v=1,h=∞

DT

NP

@NP[DT]

@NP[DT,JJ,NN] NN

JJ @NP[DT,JJ]

NN

Page 30: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Binarization / MarkovizationNP

DT JJNN NN

v=1,h=∞

DT

NP

@NP[DT]

@NP[DT,JJ,NN] NN

JJ @NP[DT,JJ]

NN

v=1,h=1

DT

NP

JJ

@NP[DT]

@NP[…,NN] NN

@NP[…,JJ]

NN

Page 31: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Binarization / MarkovizationNP

DT JJNN NN

v=1,h=∞

DT

NP

@NP[DT]

@NP[DT,JJ,NN] NN

JJ @NP[DT,JJ]

NN

v=1,h=0

DT

NP

JJ

@NP

@NPNN

@NP

NN

v=1,h=1

DT

NP

JJ

@NP[DT]

@NP[…,NN] NN

@NP[…,JJ]

NN

Page 32: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Binarization / MarkovizationNP

DT JJNN NN

v=2,h=∞

DT^NP

NP^VP

JJ^NP

@NP^VP[DT]

@NP^VP[DT,JJ,NN] NN^NP

@NP^VP[DT,JJ]

NN^NP

v=2,h=0

DT^NP

NP^VP

JJ^NP

@NP

@NPNN^NP

@NP

NN^NP

v=2,h=1

DT^NP

NP^VP

JJ^NP

@NP^VP[DT]

@NP^VP[…,NN] NN^NP

@NP^VP[…,JJ]

NN^NP

Page 33: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Unary Splits

▪ Problem: unary rewrites used to transmute categories so a high-probability rule can be used.

Annotation F1 SizeBase 77.8 7.5KUNARY 78.3 8.0K

Page 34: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Unary Splits

▪ Problem: unary rewrites used to transmute categories so a high-probability rule can be used.

Annotation F1 SizeBase 77.8 7.5KUNARY 78.3 8.0K

■ Solution: Mark unary rewrite sites with -U

Page 35: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Unary Splits

▪ Problem: unary rewrites used to transmute categories so a high-probability rule can be used.

Annotation F1 SizeBase 77.8 7.5KUNARY 78.3 8.0K

■ Solution: Mark unary rewrite sites with -U

Page 36: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Tag Splits

▪ Problem: Treebank tags are too coarse.

▪ Example: Sentential, PP, and other prepositions are all marked IN.

▪ Partial Solution: ▪ Subdivide the IN tag.

Annotation F1 SizePrevious 78.3 8.0KSPLIT-IN 80.3 8.1K

Page 37: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Tag Splits

▪ Problem: Treebank tags are too coarse.

▪ Example: Sentential, PP, and other prepositions are all marked IN.

▪ Partial Solution: ▪ Subdivide the IN tag.

Annotation F1 SizePrevious 78.3 8.0KSPLIT-IN 80.3 8.1K

Page 38: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Tag Splits

▪ Problem: Treebank tags are too coarse.

▪ Example: Sentential, PP, and other prepositions are all marked IN.

▪ Partial Solution: ▪ Subdivide the IN tag.

Annotation F1 SizePrevious 78.3 8.0KSPLIT-IN 80.3 8.1K

Page 39: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

A Fully Annotated (Unlex) Tree

Page 40: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Some Test Set Results

▪ Beats “first generation” lexicalized parsers. ▪ Lots of room to improve – more complex models next.

Parser LP LR F1 CB 0 CB

Magerman 95 84.9 84.6 84.7 1.26 56.6

Collins 96 86.3 85.8 86.0 1.14 59.9

Unlexicalized 86.9 85.7 86.3 1.10 60.3

Charniak 97 87.4 87.5 87.4 1.00 62.1

Collins 99 88.7 88.6 88.6 0.90 67.1

Page 41: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Efficient Parsing forStructural Annotation

Page 42: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Grammar ProjectionsCoarse Grammar Fine Grammar

DT

NP

JJ

@NP

@NPNN

@NP

NN

DT^NP

NP^VP

JJ^NP

@NP^VP[DT]

@NP^VP[…,NN] NN^NP

@NP^VP[…,JJ]

NN^NP

Page 43: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Grammar Projections

NP → DT @NP

Coarse Grammar Fine Grammar

DT

NP

JJ

@NP

@NPNN

@NP

NN

DT^NP

NP^VP

JJ^NP

@NP^VP[DT]

@NP^VP[…,NN] NN^NP

@NP^VP[…,JJ]

NN^NP

Page 44: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Grammar Projections

NP → DT @NP

Coarse Grammar Fine Grammar

DT

NP

JJ

@NP

@NPNN

@NP

NN

DT^NP

NP^VP

JJ^NP

@NP^VP[DT]

@NP^VP[…,NN] NN^NP

@NP^VP[…,JJ]

NN^NP

NP^VP → DT^NP @NP^VP[DT]

Page 45: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Grammar Projections

NP → DT @NP

Coarse Grammar Fine Grammar

DT

NP

JJ

@NP

@NPNN

@NP

NN

DT^NP

NP^VP

JJ^NP

@NP^VP[DT]

@NP^VP[…,NN] NN^NP

@NP^VP[…,JJ]

NN^NP

NP^VP → DT^NP @NP^VP[DT]

Note: X-Bar Grammars are projections with rules like XP → Y @X or XP → @X Y or @X → X

Page 46: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Grammar Projections

NP

Coarse Symbols Fine Symbols

DT

@NP

NP^VPNP^S@NP^VP[DT]@NP^S[DT]@NP^VP[…,JJ]@NP^S[…,JJ]DT^NP

Page 47: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Coarse-to-Fine Pruning

For each coarse chart item X[i,j], compute posterior probability:

… QP NP VP …coarse:

E.g. consider the span 5 to 12:

Page 48: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Coarse-to-Fine Pruning

For each coarse chart item X[i,j], compute posterior probability:

… QP NP VP …coarse:

E.g. consider the span 5 to 12:

< threshold

Page 49: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Coarse-to-Fine Pruning

For each coarse chart item X[i,j], compute posterior probability:

… QP NP VP …

coarse:

fine:

E.g. consider the span 5 to 12:

< threshold

Page 50: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Coarse-to-Fine Pruning

For each coarse chart item X[i,j], compute posterior probability:

… QP NP VP …

coarse:

fine:

E.g. consider the span 5 to 12:

< threshold

Page 51: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Coarse-to-Fine Pruning

For each coarse chart item X[i,j], compute posterior probability:

… QP NP VP …

coarse:

fine:

E.g. consider the span 5 to 12:

< threshold

Page 52: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Inside probability: example

Page 53: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

▶The joint probability corresponding to the yellow, red and blue areas, assuming was the L child of some non-terminal:

Calculating outside probability

Page 54: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

▶The joint probability corresponding to the yellow, red and blue areas, assuming was the R child of some non-terminal:

Calculating outside probability

Page 55: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

▶The joint final joint probability (the sum over the L and R cases):

Calculating outside probability

Page 56: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Lexicalization

Page 57: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

▪ Annotation refines base treebank symbols to improve statistical fit of the grammar ▪ Structural annotation [Johnson ’98, Klein and Manning

03] ▪ Head lexicalization [Collins ’99, Charniak ’00]

The Game of Designing a Grammar

Page 58: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Problems with PCFGs

▪ If we do no annotation, these trees differ only in one rule: ▪ VP → VP PP ▪ NP → NP PP

▪ Parse will go one way or the other, regardless of words ▪ We addressed this in one way with unlexicalized grammars (how?) ▪ Lexicalization allows us to be sensitive to specific words

Page 59: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Problems with PCFGs

▪ What’s different between basic PCFG scores here? ▪ What (lexical) correlations need to be scored?

Page 60: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Lexicalized Trees

▪ Add “head words” to each phrasal node ▪ Syntactic vs. semantic

heads ▪ Headship not in (most)

treebanks ▪ Usually use head rules,

e.g.: ▪ NP:

▪ Take leftmost NP ▪ Take rightmost N* ▪ Take rightmost JJ ▪ Take right child

▪ VP: ▪ Take leftmost VB* ▪ Take leftmost VP ▪ Take left child

Page 61: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Lexicalized PCFGs?

▪ Problem: we now have to estimate probabilities like

▪ Never going to get these atomically off of a treebank

▪ Solution: break up derivation into smaller steps

Page 62: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Lexical Derivation Steps

▪ A derivation of a local tree [Collins 99]

Choose a head tag and word

Choose a complement bag

Generate children (incl. adjuncts)

Recursively derive children

Page 63: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Lexicalized CKY

bestScore(X,i,j,h) if (j = i+1) return tagScore(X,s[i]) else return max max score(X[h]->Y[h] Z[h’]) * bestScore(Y,i,k,h) * bestScore(Z,k,j,h’) max score(X[h]->Y[h’] Z[h]) * bestScore(Y,i,k,h’) * bestScore(Z,k,j,h)

Y[h] Z[h’]

X[h]

i h k h’ j

k,h’,X->YZ

(VP->VBD •)[saw] NP[her]

(VP->VBD...NP •)[saw]

k,h’,X->YZ

Page 64: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Efficient Parsing forLexical Grammars

Page 65: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Quartic Parsing▪ Turns out, you can do (a little) better [Eisner 99]

▪ Gives an O(n4) algorithm ▪ Still prohibitive in practice if not pruned

Y[h] Z[h’]

X[h]

i h k h’ j

Y[h] Z

X[h]

i h k j

Page 66: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Pruning with Beams

▪ The Collins parser prunes with per-cell beams [Collins 99] ▪ Essentially, run the O(n5) CKY ▪ Remember only a few hypotheses for

each span <i,j>. ▪ If we keep K hypotheses at each

span, then we do at most O(nK2) work per span (why?)

▪ Keeps things more or less cubic (and in practice is more like linear!)

▪ Also: certain spans are forbidden entirely on the basis of punctuation (crucial for speed)

Y[h] Z[h’]

X[h]

i h k h’ j

Page 67: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Pruning with a PCFG

▪ The Charniak parser prunes using a two-pass, coarse-to-fine approach [Charniak 97+] ▪ First, parse with the base grammar ▪ For each X:[i,j] calculate P(X|i,j,s)

▪ This isn’t trivial, and there are clever speed ups ▪ Second, do the full O(n5) CKY

▪ Skip any X :[i,j] which had low (say, < 0.0001) posterior ▪ Avoids almost all work in the second phase!

▪ Charniak et al 06: can use more passes ▪ Petrov et al 07: can use many more passes

Page 68: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Results

▪ Some results ▪ Collins 99 – 88.6 F1 (generative lexical) ▪ Charniak and Johnson 05 – 89.7 / 91.3 F1 (generative

lexical / reranked) ▪ Petrov et al 06 – 90.7 F1 (generative unlexical) ▪ McClosky et al 06 – 92.1 F1 (gen + rerank + self-train)

▪ However ▪ Bilexical counts rarely make a difference (why?) ▪ Gildea 01 – Removing bilexical counts costs < 0.5 F1

Page 69: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Latent Variable PCFGs

Page 70: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

▪ Annotation refines base treebank symbols to improve statistical fit of the grammar ▪ Parent annotation [Johnson ’98] ▪ Head lexicalization [Collins ’99, Charniak ’00] ▪ Automatic clustering?

The Game of Designing a Grammar

Page 71: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Latent Variable Grammars

Parse Tree Sentence

Page 72: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Latent Variable Grammars

Parse Tree Sentence

...

Derivations

Page 73: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Latent Variable Grammars

Parse Tree Sentence Parameters

...

Derivations

Page 74: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Learning Latent Annotations

EM algorithm:

Page 75: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Learning Latent Annotations

EM algorithm: ▪ Brackets are known ▪ Base categories are known ▪ Only induce subcategories

Page 76: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Learning Latent Annotations

EM algorithm:

X1

X2 X7X4

X5 X6X3

He was right

.

▪ Brackets are known ▪ Base categories are known ▪ Only induce subcategories

Page 77: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Forward

Learning Latent Annotations

EM algorithm:

X1

X2 X7X4

X5 X6X3

He was right

.

▪ Brackets are known ▪ Base categories are known ▪ Only induce subcategories

Just like Forward-Backward for HMMs. Backward

Page 78: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Refinement of the DT tag

DT

Page 79: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Refinement of the DT tag

DT

DT-1 DT-2 DT-3 DT-4

Page 80: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Hierarchical refinement

Page 81: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Hierarchical refinement

Page 82: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Hierarchical refinement

Page 83: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Hierarchical Estimation ResultsP

arsi

ng a

ccur

acy

(F1)

74

78.25

82.5

86.75

91

Total Number of grammar symbols

100 525 950 1375 1800Model F1Flat Training 87.3Hierarchical Training 88.4

Page 84: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Refinement of the , tag

▪ Splitting all categories equally is wasteful:

Page 85: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Refinement of the , tag

▪ Splitting all categories equally is wasteful:

Page 86: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Adaptive Splitting

▪ Want to split complex categories more ▪ Idea: split everything, roll back splits which

were least useful

Page 87: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Adaptive Splitting ResultsP

arsi

ng a

ccur

acy

(F1)

74

78.25

82.5

86.75

91

Total Number of grammar symbols100 500 900 1300 1700

50% Merging Hierarchical TrainingFlat Training

Page 88: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Adaptive Splitting ResultsP

arsi

ng a

ccur

acy

(F1)

74

78.25

82.5

86.75

91

Total Number of grammar symbols100 500 900 1300 1700

50% Merging Hierarchical TrainingFlat Training

Page 89: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Adaptive Splitting ResultsP

arsi

ng a

ccur

acy

(F1)

74

78.25

82.5

86.75

91

Total Number of grammar symbols100 500 900 1300 1700

50% Merging Hierarchical TrainingFlat Training

Model F1Previous 88.4With 50% Merging 89.5

Page 90: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

0

10

20

30

40

NP

VP

PP

AD

VP S

AD

JP

SB

AR

QP

WH

NP

PR

N

NX

SIN

V

PR

T

WH

PP

SQ

CO

NJP

FRA

G

NA

C

UC

P

WH

AD

VP

INTJ

SB

AR

Q

RR

C

WH

AD

JP X

RO

OT

LST

Number of Phrasal Subcategories

Page 91: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Number of Lexical Subcategories

0

18

35

53

70

NN

P JJ

NN

S

NN

VB

N

RB

VB

G VB

VB

D

CD IN

VB

Z

VB

P

DT

NN

PS

CC

JJR

JJS :

PR

P

PR

P$

MD

RB

R

WP

PO

S

PD

T

WR

B

-LR

B- .

EX

WP

$

WD

T

-RR

B- ''

FW RB

S

TO

$

UH , ``

SY

M RP LS

#

Page 92: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Learned Splits

▪ Proper Nouns (NNP):

▪ Personal pronouns (PRP):

NNP-14 Oct. Nov. Sept.NNP-12 John Robert JamesNNP-2 J. E. L.NNP-1 Bush Noriega PetersNNP-15 New San WallNNP-3 York Francisco Street

PRP-0 It He IPRP-1 it he theyPRP-2 it them him

Page 93: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

▪ Relative adverbs (RBR):

▪ Cardinal Numbers (CD):

RBR-0 further lower higherRBR-1 more less MoreRBR-2 earlier Earlier later

CD-7 one two ThreeCD-4 1989 1990 1988CD-11 million billion trillionCD-0 1 50 100CD-3 1 30 31CD-9 78 58 34

Learned Splits

Page 94: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Final Results (Accuracy)

≤ 40 words F1

all F1

ENG

Charniak&Johnson ‘05 (generative) 90.1 89.6

Split / Merge 90.6 90.1

GER

Dubey ‘05 76.3 -

Split / Merge 80.8 80.1

CHN

Chiang et al. ‘02 80.0 76.6

Split / Merge 86.3 83.4

Still higher numbers from reranking / self-training methods

Page 95: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Efficient Parsing forHierarchical Grammars

Page 96: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Coarse-to-Fine Inference

▪ Example: PP attachment

?????????

Page 97: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Hierarchical Pruning

Page 98: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Hierarchical Pruning

… QP NP VP …coarse:

Page 99: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Hierarchical Pruning

… QP NP VP …coarse:

Page 100: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Hierarchical Pruning

… QP NP VP …coarse:

split in two: … QP1

QP2

NP1

NP2 VP1 VP2 …

Page 101: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Hierarchical Pruning

… QP NP VP …coarse:

split in two: … QP1

QP2

NP1

NP2 VP1 VP2 …

Page 102: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Hierarchical Pruning

… QP NP VP …coarse:

split in two: … QP1

QP2

NP1

NP2 VP1 VP2 …

… QP1

QP1

QP3

QP4

NP1

NP2 NP3 NP4 VP1 VP2 VP3 VP4 …split in four:

Page 103: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Hierarchical Pruning

… QP NP VP …coarse:

split in two: … QP1

QP2

NP1

NP2 VP1 VP2 …

… QP1

QP1

QP3

QP4

NP1

NP2 NP3 NP4 VP1 VP2 VP3 VP4 …split in four:

Page 104: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Hierarchical Pruning

… QP NP VP …coarse:

split in two: … QP1

QP2

NP1

NP2 VP1 VP2 …

… QP1

QP1

QP3

QP4

NP1

NP2 NP3 NP4 VP1 VP2 VP3 VP4 …split in four:

split in eight: … … … … … … … … … … … … … … … … …

Page 105: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Bracket Posteriors

Page 106: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Bracket Posteriors

Page 107: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Bracket Posteriors

Page 108: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Bracket Posteriors

Page 109: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Bracket Posteriors

Page 110: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Bracket Posteriors

Page 111: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Bracket Posteriors

Page 112: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Bracket Posteriors

Page 113: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

Bracket Posteriors

Page 114: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 Lecture 15 - parsing III with...Algorithms for NLP. Learning PCFGs. Treebank PCFGs

1621 min 111 min 35 min

15 min (no search error)