Improved Inference for Unlexicalized Parsing
description
Transcript of Improved Inference for Unlexicalized Parsing
Improved Inference for Unlexicalized Parsing
Slav Petrov and Dan Klein
Unlexicalized Parsing
Hierarchical, adaptive refinement:
1,140 Nonterminal symbols 1621min Parsing time
531,200 Rewrites
[Petrov et al. ‘06]
91.2 F1 score on Dev Set (1600 sentences)
DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8
DT1 DT2 DT3 DT4
DT1
DT
DT2
1621 min
Coarse-to-Fine Parsing[Goodman ‘97, Charniak&Johnson ‘05]
Coarse grammarNP … VP
NP-dog NP-catNP-apple VP-run NP-eat…
Refined grammar
…
TreebankParse
Pru
ne
NP-17 NP-12NP-1 VP-6VP-31…
Refined grammar
…
Parse
Prune?
For each chart item X[i,j], compute posterior probability:
… QP NP VP …
coarse:
refined:
E.g. consider the span 5 to 12:
< threshold
1621 min
111 min(no search error)
[Charniak et al. ‘06]
NP … VP
NP-dog NP-catNP-apple VP-run NP-eat…
Refined grammar
…
X
A,B,..
Multilevel Coarse-to-Fine Parsing
Add more rounds of
pre-parsing
Grammars coarser
than X-bar ???
???
?
Hierarchical Pruning
Consider again the span 5 to 12:
… QP NP VP …coarse:
split in two: … QP1
QP2
NP1 NP2 VP1 VP2 …
… QP1
QP1
QP3
QP4
NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 …split in four:
split in eight: … … … … … … … … … … … … … … … … …
Intermediate Grammars
X-Bar=G0
G=
G1
G2
G3
G4
G5
G6
Lea
rning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8
DT1 DT2 DT3 DT4
DT1
DT
DT2
1621 min111 min
35 min(no search error)
State Drift (DT tag)
somesomethisthisThatThat thesethese
That this some
the
these
this some
that
That this some
the
these
this some
that
……………… …… ……………… …… somesomethesethisThatThis thatthat EM
G1
G2
G3
G4
G5
G6
Lea
rning
G1
G2
G3
G4
G5
G6
Lea
rning
Projected Grammars
X-Bar=G0
G=
Pro
jectio
n i
0(G)
1(G)
2(G)
3(G)
4(G)
5(G)G
Estimating Projected Grammars
Nonterminals?
Nonterminals in G
NP1VP1VP0 S0S1
NP0
Nonterminals in (G)
VP
S
NP
Projection
Easy:
Rules in G Rules in (G)
Estimating Projected Grammars
Rules?
S1 NP1 VP1 0.20S1 NP1 VP2 0.12S1 NP2 VP1 0.02S1 NP2 VP2 0.03S2 NP1 VP1 0.11S2 NP1 VP2 0.05S2 NP2 VP1 0.08S2 NP2 VP2 0.12
S NP VP
????
Treebank
Estimating Projected Grammars[Corazza & Satta ‘06]
Rules in (G)
S NP VP
Rules in G
S1 NP1 VP1 0.20S1 NP1 VP2 0.12S1 NP2 VP1 0.02S1 NP2 VP2 0.03S2 NP1 VP1 0.11S2 NP1 VP2 0.05S2 NP2 VP1 0.08S2 NP2 VP2 0.12
Infinite tree distribution
…
…
0.56
Estimating Grammars
Calculating Expectations
Nonterminals:
ck(X): expected counts up to depth k Converges within 25 iterations (few seconds)
Rules:
1621 min111 min35 min
15 min(no search error)
G1
G2
G3
G4
G5
G6
Lea
rning
Parsing times
X-Bar=G0
G=
60 %
12 %
7 %
6 %
6 %
5 %
4 %
Bracket Posteriors (after G0)
Bracket Posteriors (after G1)
Bracket Posteriors (Movie)(Final Chart)
Bracket Posteriors (Best Tree)
Parse Selection
Computing most likely unsplit tree is NP-hard: Settle for best derivation. Rerank n-best list. Use alternative objective function.
Parses:
-1
-1
-2
-2
-1
-1
-1Derivations:
-1
-2
-1
-1
-2
-1
-2
Parse Risk Minimization
Expected loss according to our beliefs:
TT : true tree TP : predicted tree L : loss function (0/1, precision, recall, F1)
[Titov & Henderson ‘06]
Use n-best candidate list and approximate
expectation with samples.
Reranking Results
Objective Precision Recall F1 Exact
BEST DERIVATION
Viterbi Derivation 89.6 89.4 89.5 37.4
Exact (non-sampled) 90.8 90.8 90.8 41.7
Exact/F1 (oracle) 95.3 94.4 95.0 63.9
RERANKING
Precision (sampled) 91.1 88.1 89.6 21.4
Recall (sampled) 88.2 91.3 89.7 21.5
F1 (sampled) 90.2 89.3 89.8 27.2
Exact (sampled) 89.5 89.5 89.5 25.8
Dynamic Programming
[Matsuzaki et al. ‘05]Approximate posterior parse distribution
à la [Goodman ‘98]Maximize number of expected correct rules
Objective Precision Recall F1 Exact
BEST DERIVATION
Viterbi Derivation 89.6 89.4 89.5 37.4
DYNAMIC PROGRAMMING
Variational 90.7 90.9 90.8 41.4
Max-Rule-Sum 90.5 91.3 90.9 40.4
Max-Rule-Product 91.2 91.1 91.2 41.4
Dynamic Programming Results
Final Results (Efficiency)
Berkeley Parser: 15 min 91.2 F-score Implemented in Java
Charniak & Johnson ‘05 Parser 19 min 90.7 F-score Implemented in C
Final Results (Accuracy)
≤ 40 words
F1
all
F1
EN
G
Charniak&Johnson ‘05 (generative) 90.1 89.6
This Work 90.6 90.1
Charniak&Johnson ‘05 (reranked) 92.0 91.4
GE
R
Dubey ‘05 76.3 -
This Work 80.8 80.1
CH
N
Chiang et al. ‘02 80.0 76.6
This Work 86.3 83.4
Conclusions
Hierarchical coarse-to-fine inference Projections Marginalization
Multi-lingual unlexicalized parsing
Thank You!
Parser available at
http://nlp.cs.berkeley.edu