PCFG Probabilistic Context - Max Planck...
Transcript of PCFG Probabilistic Context - Max Planck...
Presenter: Ba Dat Nguyen
Advisor: Dr. Martin Theobald
Max-Planck-Institut für Informatik
Saarbrücken, Germany
PCFG: Probabilistic Context
Free Grammars
• Introduction
• Probabilistic Context Free Grammars Parsing
Context Free Grammars
Probabilistic Context Free Grammars
Inside-Outside Algorithm
• Extension Distance
Complement/ adjunct distinction
Traces and Wh-movement
Outline
2 / 25 Probabilistic Context Free Grammars
The World is a big ambiguity
3 / 25 Probabilistic Context Free Grammars
PCFG is a good way to solve ambiguity
problems in syntactic structure field.
Probabilistic Context Free Grammars
Solution
• Introduction
• Probabilistic Context Free Grammars Parsing
Context Free Grammars
Probabilistic Context Free Grammars
Inside-Outside Algorithm
• Extension Distance
Complement/ adjunct distinction
Traces and Wh-movement
Outline
5 / 25 Probabilistic Context Free Grammars
• Language
Structural
Ambiguous
• Grammar
Generalization of regularities in language structures
Morphology and syntax
Language and Grammar
6 / 25 Probabilistic Context Free Grammars
• Process working out the grammatical structure of
sentences.
• Basic Parsing Algorithms
Parsing Strategies
CYK Algorithm
Earley Algorithm
Parsing
7 / 25 Probabilistic Context Free Grammars
• “She is a nice girl”
S
NP VP
PRP VBZ NP
She is DT JJ NN
a nice girl
Example of parsing
8 / 25 Probabilistic Context Free Grammars
• Introduction
• Probabilistic Context Free Grammars Parsing
Context Free Grammars
Probabilistic Context Free Grammars
Inside-Outside Algorithm
• Extension Distance
Complement/ adjunct distinction
Traces and Wh-movement
Outline
9 / 25 Probabilistic Context Free Grammars
Probabilistic Context Free Grammars
Chomsky hierarchy
A
A
aBAaA or
lsnontermina and terminalsof strings are γβ, α,
terminala is a
lsnontermina are B A,
:Where
• A Context Free Grammars consists of
A set of terminals { }, k = 1,... V
A set of nonterminals { }, i = 1,... n
A designated start symbol
A set of rules { }
where is a sequence of terminals and nonterminals
Probabilistic Context Free Grammars
Context Free Grammars (CFG)
kw
iN
1N
jiN j
S -> NP VP NP -> NP PP
PP -> P NP NP -> astronomers
VP -> V NP NP -> ears
VP -> VP PP NP -> saw
P -> with NP -> stars
V -> saw NP -> telescopes
Probabilistic Context Free Grammars
Example of CFG
S
NP VP
astronomers V NP
saw NP PP
Which one stars P NP
is better? with ears
Probabilistic Context Free Grammars
S
NP VP
astronomers VP PP
V NP P NP
saw stars with ears
Ambiguous sentences
• Introduction
• Probabilistic Context Free Grammars Parsing
Context Free Grammars
Probabilistic Context Free Grammars
Inside-Outside Algorithm
• Extension Distance
Complement/ adjunct distinction
Traces and Wh-movement
Outline
14 / 25 Probabilistic Context Free Grammars
• A Probabilistic Context Free Grammars (PCFG)
consists of
A CFG
A corresponding set of probabilities on rules such that:
Probabilistic Context Free Grammars
Probabilistic CFG
j
jiNP 1)( i
S -> NP VP 1.0 NP -> NP PP 0.4
PP -> P NP 1.0 NP -> astronomers 0.1
VP -> V NP 0.7 NP -> ears 0.18
VP -> VP PP 0.3 NP -> saw 0.04
P -> with 1.0 NP -> stars 0.18
V -> saw 1.0 NP -> telescopes 0.1
Probabilistic Context Free Grammars
Example of PCFG
NP
NP PP
stars P NP
with ears
Probabilistic Context Free Grammars
Probability of a tree
stars)NP PP, NP NPPPP withP earsP(NP
stars)NP PP, NP NPPPP withP(P
stars)NP PP, NP NPPP(PP
NP PP) stars| NPP(NP NP PP) P(S
ears) with, NPP P stars, PPNP PP, NP P(NP
NP, ,|
NP, |
|NP
NP,
• Place invariance
• Context-free
• Ancestor-free
Probabilistic Context Free Grammars
Assumptions
sameξ) is thek P(N j
c)k(k
)( |( j
kl
j
kl N Phrough l) utside k tanything oNP
)( |( j
kl
j
kl
j
kl N P) ide Nnodes outsancestoranyNP
ckk
j
ww
N
...
NP
NP PP
stars P NP
with ears
Probabilistic Context Free Grammars
Probability of a tree
ears)P(NP with)P(P
)PP(PP stars)P(NP NP PP) P(S
ears) with, NPP P stars, PPNP PP, NP P(NP
NP
NP,
S1.0
NP0.1 VP0.7
astronomers V1.0 NP0.4
saw NP0.18 PP1.0
stars P1.0 NP0.18
with ears
Probabilistic Context Free Grammars
S1.0
NP0.1 VP0.3
astronomers VP0.7 PP1.0
V1.0 NP0.18 P1.0 NP0.18
saw stars with ears
1.0x0.1x0.7x1.0x0.4x0.18x1.0x1.0x0.18 =
0.0009072
1.0x0.1x0.3x0.7x1.0x0.18x1.0x1.0x0.18 =
0.0006804
Ambiguity
• Introduction
• Probabilistic Context Free Grammars Parsing
Context Free Grammars
Probabilistic Context Free Grammars
Inside-Outside Algorithm
• Extension Distance
Complement/ adjunct distinction
Traces and Wh-movement
Outline
21 / 25 Probabilistic Context Free Grammars
• Given a training set of annotated sentences
C(.) - number of times that a particular rule is used.
Probabilistic Context Free Grammars
Probability of a rule
γ
j
jj
γ)C(N
ξ)C(Nξ)P(N
Probabilistic Context Free Grammars
Probability of a rule
How to calculate if there is no
annotated data!
• Maximum Likelihood Estimation
• No known analytic method to choose µ to maximize
P(O | µ)
• Locally maximize P(O | µ) by an iterative hill-climbing –
special case of Expectation Maximization method.
• Inside-Outside algorithm is a form of EM using the
inside-outside probabilities estimated from training set.
Probabilistic Context Free Grammars
Maximum Likelihood Estimation
setgrammar current of parameters
)|(maxarg
trainingOP
• We are given
A set of training sentences
A set of terminals
A set of nonterminals
• Initial probabilities are estimated by rules (perhaps
by randonly chosen)
• Using inside-outside algorithm to train
Probabilistic Context Free Grammars
Training a PCFG
• Outside probability Inside probability
Probabilistic Context Free Grammars
Inside-Outside probabilities
1N
jN
m1qqp1p1 w... w w... w w... w
),( qpj),( qpj
• Inside probability is the probability of sequence
being generated with a tree rooted by node
• Calculation can be carried out bottom-up
Probabilistic Context Free Grammars
Inside probabilities
),( qpj
qp ww ... jN
)(),( k
j
j wNPkk
),1(),()(),(,
1
qddpNNNPqp sr
sr
sr
q
qd
j
j
)|(),( j
pqpqj NwPqp
• Outside probability is the total probability of
beginning with the start symbol and generating
all the words outside
Probabilistic Context Free Grammars
Outside probabilities
),( qpj
j
pqN
)1,()(),(
),1()(),(),(
10111
...
),,(),(
,
1
1
, 1
1
*
,11,1
peNNNPqe
eqNNNPepqp
for j,m)( and α,m)(α
wwNN
wNwPqp
g
jgf
gf
p
e
f
g
gjf
gf
m
qe
fj
j
qp
jj
pq
nq
j
pqpj
We have:
Probabilistic Context Free Grammars
Inside-Outside Algorithm
),(),()|(
)(
),(),(),(
,1
*1
,
*
,1
*1
,
*
,1
*1
qpqpwNwNP
wNPCall
wNwNPqpqp
jj
mqp
j
m
qp
j
mjj
Probabilistic Context Free Grammars
Inside-Outside Algorithm
1
1 1
1
1
),1(),()(),(
) ,(
),(),(
) (
m
p
m
pq
q
pd
sr
srj
j
jsrj
m
p
m
pq
jj
j
qddpNNNPqp
usedNNNNE
qpqp
usedisNE
Probabilistic Context Free Grammars
Inside-Outside Algorithm
m
p
m
q
jj
m
h
j
k
hjkj
m
p
m
q
jj
m
p
m
pq
q
pd
sr
srj
j
srj
qpqp
hhwwPhh
wNP
qpqp
qddpNNNPqp
NNNP
Therefore
1 1
1
1 1
1
1 1
1
),(),(
),()(),(
)(
),(),(
),1(),()(),(
)(
:
Probabilistic Context Free Grammars
Inside-Outside Algorithm
)(
),(),(),,(
)(
),()(),(),,(
)(
),1(),()(),(
),,,,(
corpus in the Wsentenceeach For
*1
*1
*1
1
i
i
jj
i
i
j
k
hj
i
i
q
pd
sr
srj
j
i
WNP
qpqpjqph
WNP
hhwwPhhkjhg
WNP
qddpNNNPqp
srjqpf
Probabilistic Context Free Grammars
Inside-Outside Algorithm
l
i
m
p
m
pq
i
l
i
m
h
ikj
l
i
m
p
m
pq
i
l
i
m
p
m
pq
i
srj
i i
i
i i
i i
jqph
kjhg
wNP
jqph
srjqpf
NNNP
1 1
1 1
1 1
1
1
1 1
),,(
),,(
)(
),,(
),,,,(
)(
have We
• Inside-Outside algorithm is quite slow for
each sentence
m is the length of the sentence
n is the number of nonterminals
• The algorithm is very sensitive to the initialization of
the parameters.
• In practice, a PCFG is a worse language model for
English than an n-gram model (n>1).
Probabilistic Context Free Grammars
Discussion
)( 33nmO
• Introduction
• Probabilistic Context Free Grammars Parsing
Context Free Grammars
Probabilistic Context Free Grammars
Inside-Outside Algorithm
• Extension Distance
Complement/ adjunct distinction
Traces and Wh-movement
Outline
35 / 25 Probabilistic Context Free Grammars
TOP -> S(bought)
S(bought) -> NP(week) NP(Marks) VP(bought)
NP(week) -> JJ(Last) NN(week)
NP(Marks) -> NNP(Marks)
VP(bought) -> VB(bought) NP(Brooks)
NP(Brooks) -> NNP(Brooks)
• Adding some lexical features: words and POS inside non-terminals.
• Using the head-child of the phrase, which inherits the head-word from its parent.
Probabilistic Context Free Grammars
More features
TOP
S(bought)
NP(week) NP(Marks) VP(bought)
JJ NN NNP VB NP(Brooks)
Last week Marks bought NNP
Brooks
Probabilistic Context Free Grammars
Example
• Using some probabilities
– Head constituent label of the phrase
– Modifiers to the right of the head
– Modifiers to the left of the head
Probabilistic Context Free Grammars
Using Distance
),|( hPHPH
1..1
1111 ))()...(,,,|)((mi
iiiiR rRrRHhPrRP
1..1
1111 ))()...(,,,|)((ni
iiiiL lLlLHhPlLP
Probabilistic Context Free Grammars
Using Distance
),,|)((
),,|)((),|(
))()()()((
0 distanceWith
),,,,|)((
),,|)((),|(
))()()()((
:
boughtVPSweekNPP
boughtVPSMarksNPPboughtSVPP
boughtVPMarksNPweekNPboughtSP
MarksNPboughtVPSweekNPP
boughtVPSMarksNPPboughtSVPP
boughtVPMarksNPweekNPboughtSP
exampleFor
l
lh
l
lh
• Introduction
• Probabilistic Context Free Grammars Parsing
Context Free Grammars
Probabilistic Context Free Grammars
Inside-Outside Algorithm
• Extension Distance
Complement/ adjunct distinction
Traces and Wh-movement
Outline
40 / 25 Probabilistic Context Free Grammars
TOP
S(bought)
NP(week) NP-C(Marks) VP(bought)
JJ NN NNP VB NP-C(Brooks)
Last week Marks bought NNP
Brooks
It would be useful to identify “Marks” as a subject and “Last
week” as an adjunct!
Probabilistic Context Free Grammars
Adding the complement / adjunct distinction
• Adding “-C“ suffix to all non-terminals in training data which
satisfy:
The nonterminal must be: an NP, SBAR or S whose parent is an S;
an NP, SBAR, S, or VP whose parent is a VP; or S whose parent is
an SBAR
The non-terminal must not have one of the following semantic tags:
ADV, VOC, BNF, DIR, EXT, LOC, MNR, TMP, CLR or PRP.
Probabilistic Context Free Grammars
Adding the complement / adjunct distinction
• Using some probabilities
Head constituent label of the phrase
Left and right subcat frames
Modifiers to the right of the head
Modifiers to the left of the head
Probabilistic Context Free Grammars
Adding the complement / adjunct distinction
),|( hPHPH
)),1(distance,,,|,( RCihPHrRP iiR
),,|( ),,|( hHPRCPandhHPLCP rclc
)),1(distance,,,|,( LCihPHlLP iiL
• For example
Probabilistic Context Free Grammars
Adding the complement / adjunct distinction
),,|)((
}){,,,|)}(({
),,|}({
),|(
))( )( )()((
boughtVPSweekNPP
CNPboughtVPSMarksCNPP
boughtVPSCNPP
boughtSVPP
boughtVPMarksCNPweekNPboughtSP
l
l
lc
h
• Introduction
• Probabilistic Context Free Grammars Parsing
Context Free Grammars
Probabilistic Context Free Grammars
Inside-Outside Algorithm
• Extension Distance
Complement/ adjunct distinction
Traces and Wh-movement
Outline
45 / 25 Probabilistic Context Free Grammars
• Adding a “gap“ feature to each non-terminal in the tree and
propagating gaps through the tree until the are finally
discharged as a trace complement.
• For example
(1) NP -> NP SBAR(+gap)
(2) SBAR(+gap) -> WHNP S-C(+gap)
(3) S(+gap) -> NP-C VP(+gap)
(4) VP(+gap) -> VB TRACE NP
Probabilistic Context Free Grammars
Traces and Wh-movement
NP(store)
NP(store) SBAR(that)(+gap)
The store WHNP(that) S(bought)(+gap)
WDT NP-C(Marks) VP(bought)(+gap)
that Marks VBD TRACE NP(week)
bought last week
Probabilistic Context Free Grammars
Traces and Wh-movement
),,|)((
}),{,,,|(
),,|}({
),,|(),|(
))( )())(((
VBboughtVPweekNPP
gapCNPVBboughtVPTRACEP
VBboughtVPCNPP
VBboughtVPRightPboughtVPVBP
weekNPTRACEboughtVBgapboughtVPP
R
R
RC
Gh
l
Probabilistic Context Free Grammars
Experiment
parse treebank in the
t constituen a with boundariest constituen ate viol
whichtsconstituen ofnumber Brackets Crossing
parsek in treeban tsconstituen ofnumber
parse proposedin tsconstituencorrect ofnumber
recall Labelled
parse proposedin tsconstituen ofnumber
parse proposedin tsconstituencorrect ofnumber
Precision Labelled
23.section on testedandTreebank Penn
theofportion JournalStreet Wallthe
of 21-02 sectionson trainedare Models
1. Foundations of Statistical Natural Language
Processing
2. Standford parse http://nlp.stanford.edu:8080/parser/
3. Three Generative, Lexicalised Models for
Statistical Parsing, ACL 97
Probabilistic Context Free Grammars
References
Thanks!
Probabilistic Context Free Grammars