Beam-Width Prediction for Efficient Context-Free Parsing

Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark

June 2011

OHSU Beam-Search Parser (BUBS)

• Standard bottom-up CYK• Beam-search per chart cell• Only “best” are retained

Ranking, Prioritization, and FOMs

• f() = g() + h()• Figure of Merit

– Caraballo and Charniak (1997)

• A* search– Klein and Manning (2003)– Pauls and Klein (2010)

• Other– Turrian (2007)– Huang (2008)

• Apply to beam-search

Beam-Width Prediction

• Traditional beam-search uses constant beam-width• Two definitions of beam-width:

– Number of local competitors to retain (n-best)– Score difference from best entry

• Advantages– Heavy pruning compared to CYK– Minimal sorting compared to global agenda

• Disadvantages– No global pruning – all chart cells treated equal– Conservative to keep outliers within beam

• How often is gold edge ranked in top N per chart cell– Exhaustively parse section 22 + Berkeley latent variable grammar

Gold rank <= N

• How often is gold edge ranked in top N per chart cell– Exhaustively parse section 22 + Berkeley latent variable grammar

Gold rank <= N

• Beam-search + C&C Boundary ranking: – How often is gold edge ranked in top N per chart cell:

Gold rank <= N

To maintain baseline accuracy, beam-width must be set to 15 with C&C Boundary ranking (and 50 using only inside score)

• Beam-search + C&C Boundary ranking: – How often is gold edge ranked in top N per chart cell:

Gold rank <= N

To maintain baseline accuracy, beam-width must be set to 15 with C&C Boundary ranking (and 50 using only inside score)

• Over 70% of gold edges are already ranked first in the local agenda

• 14 of 15 edges in these cells are unnecessary

• We can do much better than a constant beam-width

• Method: Train an averaged perceptron (Collins, 2002) to predict the optimal beam-width per chart cell• Map each chart cell in sentence S spanning words wi … wj to a feature

vector representation:

• x: Lexical and POS unigrams and bigrams, relative and absolute span• y:1 if gold rank > k, 0 otherwise (no gold edge has rank of -1)

• Minimize the loss:

• H is the unit step function

• Method: Use a discriminative classifier to predict the optimal beam-width per chart cell• Minimize the loss:

• L is the asymmetric loss function:

• If beam-width is too large, tolerable efficiency loss• If beam-width is too small, high risk to accuracy• Lambda set to 102 in all experiments

Special case: Predict if chart cell is open or closed to multi-word constituents

• A “closed” chart cell may need to be partially open• Binarized or dotted-rule parsing creates new “factored”

productions:

Method 1: Constituent Closure

• Constituent Closure is a per-cell generalization of Roark & Hollingshead (2008)– O(n2) classifications instead of O(n)

Method 2: Complete Closure

Method 3: Beam-Width Prediction

• Use multiple binary classifiers instead of regression (better performance)

• Local beam-width taken from classifier with smallest beam-width prediction

• Best performance with four binary classifiers: 0, 1, 2, 4– 97% of positive examples have beam-width <= 4– Don’t need a classifier for every possible beam-

width value between 0 and global maximum (15 in our case)

• Section 22 development set results

• Decoding time is seconds per sentence averaged over all sentences in Section 22

• Parsing with Berkeley latent variable grammar (4.3 million productions)

Parser Secs/Sent Speedup F1CYK 70.383 89.4

CYK + Constituent Closure 47.870 1.5x 89.3

CYK + Complete Closure 32.619 2.2x 89.3

Beam + Inside FOM (BI) 3.977 89.2

BI + Constituent Closer 2.033 2.0x 89.2

BI + Complete Closure 1.575 2.5x 89.3

BI + Beam-Predict 1.180 3.4x 89.3

Beam + Boundary FOM (BB) 0.326 89.2

BB + Constituent Closure 0.279 1.2x 89.2

BB + Complete Closure 0.199 1.6x 89.3

BB + Beam-Predict 0.143 2.3x 89.3

Beam + Boundary FOM (BB) 0.326 89.2

BB + Constituent Closure 0.279 1.2x 89.2

BB + Complete Closure 0.199 1.6x 89.3

BB + Beam-Predict 0.143 2.3x 89.3

Most recent numbers 0.053 6.2x 89.x

• Section 23 test results• Only MaxRule is marginalizing over latent variables and performing

non-Viterbi decoding

Parser Secs/Sent F1CYK 64.610 88.7

Berkeley CTF MaxRule Petrov and Klein (2007)

0.213 90.2

Berkeley CTF Viterbi 0.208 88.8

Beam + Boundary FOM (BB) Caraballo and Charniak (1998)

0.334 88.6

BB + Chart Constraints Roark and Hollingshead (2008; 2009)

0.244 88.7

BB + Beam-Prediction 0.125 88.7

Thanks.

FOM Details

• C&C FOM Details– FOM(NT) = Outsideleft * Inside * Outsideright

– Inside = Accumulated grammar score– Outsideleft = MaxPOS [ POS forward prob * POS-to-NT transition prob ]

– Outsideright = MaxPOS [ NT-to-POS transition prob * POS bkwd prob ]

FOM Details

• C&C FOM Details

Beam-Width Prediction for Efficient Context-Free Parsing

Documents

Transcript of Beam-Width Prediction for Efficient Context-Free Parsing

Parsing Video Events with Goal inference and Intent Prediction

Eukaryotic Gene Predictionrice.plantbiology.msu.edu/training/Zhu_gene_finders.pdf · What is Gene Prediction? Gene prediction is the problem of parsing a sequence into nonoverlapping

Dependency Parsing - Oregon State Universityclasses.engr.oregonstate.edu/.../notes/dependency-parsing-lecture.pdf · Outline Introduction Dependency Parsing Formal definition Parsing

PAD-Net: Multi-Tasks Guided Prediction-and-Distillation ...€¦ · PAD-Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing

Learning for semantic parsing using statistical syntactic parsing techniques

Incremental, Predictive Parsing with …homepages.inf.ed.ac.uk/keller/publications/cl13.pdfDemberg et al. Parsing with Psycholinguistically Motivated Tree-Adjoining Grammar 2.1 Prediction,

Chart Parsing and Probabilistic Parsing - SourceForgenltk.sourceforge.net/doc/en/advanced-parsing.pdfChart Parsing and Probabilistic Parsing 9.1 Introduction ... Furthermore, it is

Weighted Parsing, Probabilistic Parsing

Pika parsing: reformulating packrat parsing as a dynamic ... · Pika parsing maintains the linear-time performance characteristics of packrat parsing as a function of input length.

Parsing Video Events with Goal inference and Intent Predictionsczhu/papers/Conf_2011/Pei_Event... · Parsing Video Events with Goal inference and Intent Prediction Mingtao Peia;b,

Analytical prediction of springback based on residual ...ered/ME482/Paper_Topics/... · PREDICTION Consider the bending process as shown in Fig. 1, where a unit width of a continuous

A Generalized Earley Parser for Human Activity Parsing and ... › Temp › PAMI2019_Prediction.pdf · 1 A Generalized Earley Parser for Human Activity Parsing and Prediction Siyuan

Dependency Parsing (3) - University Of Maryland · Dependency Parsing: what you should know •Transition-based dependency parsing •Shift-reduce parsing •Transition systems: arc

Bare-Bones Dependency Parsing - Uppsala Universitystp.lingfil.uu.se/~nivre/docs/BareBones.pdf · I Parsing methods for bare-bones dependency parsing I Chart parsing ... Eisner 2000]:

The prediction formula of mesiodistal width of unerupted ......The prediction formula of mesiodistal width of unerupted permanent canine and premolars from a group of Vietnamese, a

Syntax and Parsing of Semitic Languages - Tsarfatytsarfaty.com/pdfs/semitic.pdfSyntax and Parsing of Semitic Languages 3 1.1 Parsing Systems Syntactic Analysis A parsing system is

EYE-HEIGHT/WIDTH PREDICTION USING ARTIFICIAL NEURAL ...

Parsing III (Top-down parsing: recursive descent & LL(1) )

An Arabic Semantic Parser and Meaning AnalyzerBottom-up chart parsing, Top-down chart parsing, Top-Down Parsing with Recursive Transition Networks and Recursive Descent Parsing [1].

Parsing and Semantics/ Jan. 2006/ page 1 / for Xerox internal use only Xerox Incremental Parsing Parsing And Semantics.