CL1, 4 Dec 2014 Slide credits: Chris Callison-Burch (UPenn...
Transcript of CL1, 4 Dec 2014 Slide credits: Chris Callison-Burch (UPenn...
Machine Translation
CL1, 4 Dec 2014
Slide credits:Chris Callison-Burch (UPenn)
Philipp Koehn (JHU)
Translational Equivalence
He insisted on the test, but just barely.
He passed the test, but just barely.
Er hat die Prüfung bestanden, jedoch nur knapp
How do lexical translation models deal with contextual information?
Translational Equivalence
Ma insisted on the test, but just barely.
Ma passed the test, but just barely.
Ma hat die Prüfung bestanden, jedoch nur knapp
F E log prob
bestanden insisted -1.18were -1.18
existed -1.36was -1.39been -1.43
passed -1.52consist -1.87
Translational Equivalence
He insisted on the test, but just barely.
He passed the test, but just barely.
Er hat die Prüfung bestanden, jedoch nur knapp
What is wrong with this?
How can we improve this?
Lexical Translation
Translation model• What are the atomic units
• Lexical translation: words
• Phrase-based translation: phrases
• Benefits
• many-to-many translation
• use of local context in translation
• Downsides
• Where do phrases comes from?
• Standard model used by Google, Microsoft ...
Translation model
• With a latent variable, we introduce a decomposition into phrases which translate independently:
p(f,a | e) = p(a)Y
⇥e,f⇤�a
p(f | e)
p(f | e) =X
a�Ap(a)
Y
⇥e,f⇤�a
p(f | e)We can then marginalize to get p(f|e):
e =
f = Morgen fliege ich nach Baltimore zur Konferenz
Tomorrow will flyI in Baltimoreto the Konferenz
Translation model
• With a latent variable, we introduce a decomposition into phrases which translate independently:
p(f,a | e) = p(a)Y
⇥e,f⇤�a
p(f | e)
p(f | e) =X
a�Ap(a)
Y
⇥e,f⇤�a
p(f | e)We can then marginalize to get p(f|e):a =
e =
f = fliege ich nach Baltimore zur Konferenz
will flyI in Baltimoreto the Konferenz
Morgen
Tomorrow
Translation model
• With a latent variable, we introduce a decomposition into phrases which translate independently:
p(f,a | e) = p(a)Y
⇥e,f⇤�a
p(f | e)
p(f | e) =X
a�Ap(a)
Y
⇥e,f⇤�a
p(f | e)We can then marginalize to get p(f|e):a =
e =
f = fliege ich nach Baltimore zur Konferenz
will flyI in Baltimoreto the Konferenz
Morgen
Tomorrow
p(Morgen|Tomorrow)
Translation model
• With a latent variable, we introduce a decomposition into phrases which translate independently:
p(f,a | e) = p(a)Y
⇥e,f⇤�a
p(f | e)
p(f | e) =X
a�Ap(a)
Y
⇥e,f⇤�a
p(f | e)We can then marginalize to get p(f|e):a =
e =
f = ich nach Baltimore zur Konferenz
I in Baltimoreto the Konferenz
Morgen
Tomorrow
fliege
will fly
p(Morgen|Tomorrow) x p(fliege|will fly)
Translation model
• With a latent variable, we introduce a decomposition into phrases which translate independently:
p(f,a | e) = p(a)Y
⇥e,f⇤�a
p(f | e)
p(f | e) =X
a�Ap(a)
Y
⇥e,f⇤�a
p(f | e)We can then marginalize to get p(f|e):a =
e =
f = nach Baltimore zur Konferenz
in Baltimoreto the Konferenz
Morgen
Tomorrow
fliege
will fly
p(Morgen|Tomorrow) x p(fliege|will fly)
ich
I
x p(ich|I)
Translation model
• With a latent variable, we introduce a decomposition into phrases which translate independently:
p(f,a | e) = p(a)Y
⇥e,f⇤�a
p(f | e)
p(f | e) =X
a�Ap(a)
Y
⇥e,f⇤�a
p(f | e)We can then marginalize to get p(f|e):a =
e =
f = zur Konferenz
to the Konferenz
Morgen
Tomorrow
fliege
will fly
p(Morgen|Tomorrow) x p(fliege|will fly)
ich
I
x p(ich|I)
nach Baltimore
in Baltimore
x ...
Translation model
• With a latent variable, we introduce a decomposition into phrases which translate independently:
p(f,a | e) = p(a)Y
⇥e,f⇤�a
p(f | e)
p(f | e) =X
a�Ap(a)
Y
⇥e,f⇤�a
p(f | e)Marginalize to get p(f|e):
Phrases
• Contiguous strings of words
• Phrases are not necessarily syntactic constituents
• Usually have maximum limits
• Phrases subsume words (individual words are phrases of length 1)
Linguistic Phrases
• Model is not limited to linguistic phrases(NPs, VPs, PPs, CPs...)
• Non-constituent phrases are useful
!
!
• Is a “good” phrase more likely to be [P NP] or [governor P] Why? How would you figure this out?
es gibt there is | there are
Phrase Tables
das Thema
the issue 0.41
the point 0.72
the subject 0.47
the thema 0.99
es gibtthere is 0.96
there are 0.72
morgen tomorrow 0.9
fliege ich
will I fly 0.63
will fly 0.17
I will fly 0.13
f e p(f | e)
p(a)• Two responsibilities
• Divide the source sentence into phrases
• Standard approach: uniform distribution over all possible segmentations
• How many segmentations are there?
• Reorder the phrases
• Standard approach: Markov model on phrases (parameterized with log-linear model)
Reordering Model
Learning Phrases
• Latent segmentation variable
• Latent phrasal inventory
• Parallel data
• EM?
Computational problem: summing over all segmentationsand alignments is #P-complete
Modeling problem: MLE has a degenerate solution.
Learning Phrases
• Three stages
• word alignment
• extraction of phrases
• estimation of phrase probabilities
Consistent Phrases
Phrase Extraction
I open the box
watashi
wa
hako
wo
akemasu
Phrase Extraction
I open the box
watashi
wa
hako
wo
akemasu
akemasu / open
Phrase Extraction
I open the box
watashi
wa
hako
wo
akemasu
watashi wa / I
Phrase Extraction
I open the box
watashi
wa
hako
wo
akemasu
watashi / I
Phrase Extraction
I open the box
watashi
wa
hako
wo
akemasu
watashi / I ✘
Phrase Extraction
I open the box
watashi
wa
hako
wo
akemasu
hako wo / box
Phrase Extraction
I open the box
watashi
wa
hako
wo
akemasu
hako wo / the box
Phrase Extraction
I open the box
watashi
wa
hako
wo
akemasu
hako wo / open the box
Phrase Extraction
I open the box
watashi
wa
hako
wo
akemasu
hako wo / open the box✘
Phrase Extraction
I open the box
watashi
wa
hako
wo
akemasu
hako wo akemasu / open the box
Maria no dio una bofetada a la bruja verde
Mary not give
did not
no
a slap to
by
the witch green
did not give
slap
a slap
to the
the
green witch
the witch
hag bawdy
Maria no dio una bofetada a la bruja verde
Mary not give
did not
no
a slap to
by
the witch green
did not give
slap
a slap
to the
the
the witch
green witch
hag bawdy
Maria no dio una bofetada a la bruja verde
Mary not give
did not
no
a slap to
by
the witch green
did not give
slap
a slap
to the
the
the witch
green witch
hag bawdy
Decoding algorithm• Translation as a search problem
• Partial hypothesis keeps track of
• which source words have been translated (coverage vector)
• n-1 most recent words of English (for LM!)
• a back pointer list to the previous hypothesis + (e,f) phrase pair used
• the (partial) translation probability
• the estimated probability of translating the remaining words (precomputed, a function of the coverage vector)
• Start state: no translated words, E=<s>, bp=nil
• Goal state: all translated words
Decoding algorithm• Q[0] ← Start state
• for i = 0 to |f|-1
• Keep b best hypotheses at Q[i]
• for each hypothesis h in Q[i]
• for each untranslated span in h.c for which there is a translation <e,f> in the phrase table
• h’ = h extend by <e,f>
• Is there an item in Q[|h’.c|] with = LM state?
• yes: update the item bp list and probability
• no: Q[|h’.c|] ← h’
• Find the best hypothesis in Q[|f|], reconstruction translation by following back pointers
: <s>! : ---------! : 1.0
ecp
Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:
: <s>! : ---------! : 1.0
ecp
: <s> Mary! : *--------! : 0.9
ecp
Mary
Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:
: <s>! : ---------! : 1.0
ecp
: <s> Mary! : *--------! : 0.9
ecp
: <s> Maria! : *--------! : 0.3
ecp
Mary
Maria
Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:
: <s>! : ---------! : 1.0
ecp
: <s> Mary! : *--------! : 0.9
ecp
: <s> Maria! : *--------! : 0.3
ecp
Mary
Maria
: did not! : **-------! : 0.3
ecp
Mary did not
Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:
: <s>! : ---------! : 1.0
ecp
: <s> Mary! : *--------! : 0.9
ecp
: <s> Maria! : *--------! : 0.3
ecp
Mary
Maria
: did not! : **-------! : 0.3
ecp
Mary did not
did not
Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:
: <s>! : ---------! : 1.0
ecp
: <s> Mary! : *--------! : 0.9
ecp
: <s> Maria! : *--------! : 0.3
ecp
Mary
Maria
: did not! : **-------! : 0.45
ecp
Mary did not
did not
Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:
did not
: <s>! : ---------! : 1.0
ecp
: <s> Mary! : *--------! : 0.9
ecp
: <s> Maria! : *--------! : 0.3
ecp
Mary
Maria
: did not! : **-------! : 0.45
ecp
Mary did not
did not
did not
: Mary not! : **-------! : 0.1
ecp
not
: not slap! : *****----! : 0.316
ecpslap
Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:
Reordering• Language express words in different orders
• bruja verde vs. green witch
• Phrase pairs can “memorize” some of these
• More general: in decoding, “skip ahead”
• Problem:
• Won’t “easy parts” of the sentence be translated first?
• Solution:
• Future cost estimate
• For every coverage vector, estimate what it will cost to translate the remaining untranslated words
• When pruning, use p * future cost!
: <s>! : ---------! : 1.0 1.5e-9
: <s> Mary! : *--------! : 0.9 8.6e-9
: <s> Maria! : *--------! : 0.3 8.6e-9
Mary
Maria
Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:
ecp fc:
ecp fc:
ecp fc:
: <s>! : ---------! : 1.0 1.5e-9
: <s> Mary! : *--------! : 0.9 8.6e-9
: <s> Maria! : *--------! : 0.3 8.6e-9
Mary
Maria
Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:
ecp fc:
ecp fc:
ecp fc:
: <s> Not! : -*-------! : 0.4 1.0e-9
ecp fc:
Not
: <s>! : ---------! : 1.0 1.5e-9
: <s> Mary! : *--------! : 0.9 8.6e-9
: <s> Maria! : *--------! : 0.3 8.6e-9
Mary
Maria
Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:
ecp fc:
ecp fc:
ecp fc:
: <s> Not! : -*-------! : 0.4 1.0e-9
ecp fc:
Not
Future costs make thesehypotheses comparable.}
Decoding summary
• Finding the best hypothesis is NP-hard
• Even with no language model, there are an exponential number of states!
• Solution 1: limit reordering
• Solution 2: (lossy) pruning
Decoding summary
• Finding the best hypothesis is NP-hard
• Even with no language model, there are an exponential number of states!
• Solution 1: limit reordering
• Solution 2: (lossy) pruning
Goals
• Revisit why people thought syntax would not help machine transla0on
• Learn about Synchronous Context Free Grammars • Introduce nota0on, and basic algorithm • Understand how we learn SCFGs from bitexts • Get a sense of the different flavors of SCFGs
– Hiero – SAMT
�2
The Syntax Bet
�3
• Longstanding debate about whether linguis0c informa0on can help sta0s0cal transla0on
• Two camps
The Syntax Bet
�3
• Longstanding debate about whether linguis0c informa0on can help sta0s0cal transla0on
• Two camps Syntax will improve translation
The Syntax Bet
�3
• Longstanding debate about whether linguis0c informa0on can help sta0s0cal transla0on
• Two camps
Simpler data-driven models will always win
Syntax will improve translation
The Syntax Bet
�3
• Longstanding debate about whether linguis0c informa0on can help sta0s0cal transla0on
• Two camps
Every time I fire a linguist my performance goes up
Simpler data-driven models will always win
Syntax will improve translation
Syntax is bad for transla0on
• The IBM Models were the dominant approach to SMT from the `90s un0l mid 2000s – Eschewed linguis0c informa0on
• A number of studies cast doubt on whether linguis0c info could help SMT –Fox (2002) showed that “phrasal cohesion” was less common than assumed across even related languages
–Koehn et al (2003) empirically demonstrated that syntac0cally mo0vated phrases made PBMT worse
�4
Phrases aren’t coherent in bitexts
�5
Elle aura de les effets plus destructifs que positifs
EX There
AUX be
JJR more
NN divisiveness
IN than
JJ positive
NNS effects
MD will
NPPP
ADJP
ADJPVP
VPS
NP
Gloss: It will have effects more destructive than positive
Fox (2002)
Ouch! Syntax hurts!
�6
18.0
20.5
23.0
25.5
28.0
10k 20k 40k 80k 160k 320k
IBM Model 4PBMTPBMT w/syntactic phrases
Training corpus size
Koehn et al (2003)
BLE
U s
core
Ouch! Syntax hurts!
�6
18.0
20.5
23.0
25.5
28.0
10k 20k 40k 80k 160k 320k
IBM Model 4PBMTPBMT w/syntactic phrases
18
20
23
24
2526
Training corpus size
Koehn et al (2003)
BLE
U s
core
Ouch! Syntax hurts!
�6
18.0
20.5
23.0
25.5
28.0
10k 20k 40k 80k 160k 320k
IBM Model 4PBMTPBMT w/syntactic phrases
21
24
25
26
2728
18
20
23
24
2526
Training corpus size
Koehn et al (2003)
BLE
U s
core
Ouch! Syntax hurts!
�6
18.0
20.5
23.0
25.5
28.0
10k 20k 40k 80k 160k 320k
IBM Model 4PBMTPBMT w/syntactic phrases
18
20
2222
23
25
21
24
25
26
2728
18
20
23
24
2526
Training corpus size
Koehn et al (2003)
BLE
U s
core
Extrac0ng phrase pairs
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
Extrac0ng phrase pairs
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
澳洲, Australia
Extrac0ng phrase pairs
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
澳洲, Australia是, is
Extrac0ng phrase pairs
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
澳洲, Australia是, is之一, one of
Extrac0ng phrase pairs
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
澳洲, Australia是, is之一, one of少数, few
Extrac0ng phrase pairs
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
澳洲, Australia是, is之一, one of少数, few国家, countries有, have邦交, diplomatic relations与, with北, North韩, Korea
Extrac0ng phrase pairs
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
澳洲, Australia是, is之一, one of少数, few国家, countries有, have邦交, diplomatic relations与, with北, North韩, Korea澳洲是, Australia is少数 国家, few countries有邦交, have diplomatic relations与北, with North北韩, North Korea
Extrac0ng phrase pairs
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
澳洲, Australia是, is之一, one of少数, few国家, countries有, have邦交, diplomatic relations与, with北, North韩, Korea澳洲是, Australia is少数 国家, few countries有邦交, have diplomatic relations与北, with North北韩, North Korea
的少数 国家, the few countries that与北韩, with North Korea
Extrac0ng phrase pairs
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
澳洲, Australia是, is之一, one of少数, few国家, countries有, have邦交, diplomatic relations与, with北, North韩, Korea澳洲是, Australia is少数 国家, few countries有邦交, have diplomatic relations与北, with North北韩, North Korea
的少数 国家, the few countries that与北韩, with North Korea
之一的少数 国家, one of the the few countries that与北韩 有邦交, have diplomatic relations with North Korea有邦交 的少数 国家, the few countries that have diplomatic relations
Extrac0ng phrase pairs
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
澳洲, Australia是, is
少数, few国家, countries有, have邦交, diplomatic relations与, with北, North韩, Korea
少数 国家, few countries
北韩, North Korea
与北韩, with North Korea
与北韩 有邦交, have diplomatic relations with North Korea
Why does it hurt to limit to cons0tuents?
• Massively reduces the inventory of phrases that can be used as transla0on units
• Eliminates non-‐cons0tuent phrases, many of which are quite useful – there are – note that – according to
�8
So, what should we do?
• Drop syntax from sta0s0cal machine transla0on, since syntax is a bad fit for the data
• Abandon conven0onal English syntax and move towards more robust grammars that adapt to the parallel training corpus
• Maintain English syntax but design different syntac0c models
�9
Synchronous Context Free Grammars
• A common way of represen0ng syntax in NLP is through context free grammars
• Synchronous context free grammars generate pairs of corresponding strings
• Can be used to describe transla0on and re-‐ordering between languages
• SCFGs translate sentences by parsing them
�10
Example SCFG for Urdu
�11
Urdu English
S → NP① VP② NP① VP②VP→ PP① VP② VP② PP①
VP→ V① AUX② AUX② V①
PP → NP① P② P② NP①NP → hamd ansary Hamid Ansari
NP → na}b sdr Vice President
V → namzd nominated
P → kylye for
AUX → taa was
hamd ansary na}b sdr kylye namzd taa
NP❶
Hamid Ansari
NP❶
hamd ansary na}b sdr kylye namzd taa
NP❶
Hamid Ansari
NP❶
NP❷
Vice President
NP❷
hamd ansary na}b sdr kylye namzd taa
NP❶
Hamid Ansari
NP❶
NP❷
Vice President
NP❷
for
P❸
P❸
hamd ansary na}b sdr kylye namzd taa
NP❶
Hamid Ansari
NP❶
NP❷
Vice President
NP❷
for
P❸
P❸
nominated
V❹
V❹
hamd ansary na}b sdr kylye namzd taa
NP❶
Hamid Ansari
NP❶
NP❷
Vice President
NP❷
for
P❸
P❸
nominated
V❹
V❹
hamd ansary na}b sdr kylye namzd taa
was
AUX❺
AUX❺
NP❶
Hamid Ansari
NP❶
NP❷
Vice President
NP❷
for
P❸
P❸
nominated
V❹
V❹
hamd ansary na}b sdr kylye namzd taa
was
AUX❺
AUX❺
NP❶
Hamid Ansari
NP❶
NP❷
Vice President
NP❷
for
P❸
P❸
nominated
V❹
V❹
hamd ansary na}b sdr kylye namzd taa
was
AUX❺
AUX❺
PP❻
PP❻
NP❶
Hamid Ansari
NP❶
NP❷
Vice President
NP❷
for
P❸
P❸
nominated
V❹
V❹
hamd ansary na}b sdr kylye namzd taa
was
AUX❺
AUX❺
PP❻
PP❻
NP❶
Hamid Ansari
NP❶
NP❷
Vice President
NP❷
for
P❸
P❸
nominated
V❹
V❹
hamd ansary na}b sdr kylye namzd taa
was
AUX❺
AUX❺
PP❻
PP❻
VP❼
VP❼
NP❶
Hamid Ansari
NP❶
NP❷ P❸ V❹
hamd ansary na}b sdr kylye namzd taa
AUX❺
PP❻
Vice President
NP❷
for
P❸
PP❻
VP❼
nominated
V❹
was
AUX❺
VP❼
NP❶
Hamid Ansari
NP❶
NP❷ P❸ V❹
hamd ansary na}b sdr kylye namzd taa
AUX❺
PP❻
Vice President
NP❷
for
P❸
PP❻
VP❼
nominated
V❹
was
AUX❺
VP❼
VP❽
VP❽
NP❶
Hamid Ansari
NP❶
NP❷ P❸ V❹
hamd ansary na}b sdr kylye namzd taa
AUX❺
PP❻
Vice President
NP❷
for
P❸
PP❻
VP❼
nominated
V❹
was
AUX❺
VP❼
VP❽
VP❽
NP❶
Hamid Ansari
NP❶
NP❷ P❸ V❹
hamd ansary na}b sdr kylye namzd taa
AUX❺
PP❻
Vice President
NP❷
for
P❸
PP❻
VP❼
nominated
V❹
was
AUX❺
VP❼
VP❽
VP❽S❾
S❾
Discussion: Do you like SCFG?
• In what ways are SCFGs beger for describing reordering than what we saw before?
• Is this a good model of how languages relate? • What do you think of the synchronous requirement? !
(Discuss with your neighbor)
�17
NP❶ Leila
Some0mes languages are mismatched
S
NP❶ Leila
VP
V❷ misses
NP❸ Fry
S
VP
V❷ manque
PP
à
NP❸ Fry
Spanish mo0on verb
�19
S❶
NP❷ We
VP❸
V Drove
P away
S❶
NP❷ Nos we
VP❸
V fuimos went
PP
P en by
NP coche car
Spanish mo0on verb, pro-‐drop
�20
S
NP He
VP
V swam
S
VP
V Fue
He+went
PP
P a to
NP Ibiza Ibiza
PP
P to
NP Ibiza
VBG nadando swimming
We are going to use them anyway
• SCFGs are mismatched with some linguis0c phenomena
• But they have nice formal proper0es and well-‐defined algorithms
�21
Formal defini0on of SCFGs
• Aho and Ullman worked all of this out in the `60s and `70s
• Compiler theory
�22
Formal defini0on of SCFGs
• A synchronous context free grammar is formally defined by a tuple
G = <N, TS, TT, R, S>•Where
�23
Formal defini0on of SCFGs
• A synchronous context free grammar is formally defined by a tuple
G = <N, TS, TT, R, S>•Where
–N is a shared set of non-‐terminal symbols
�23
S, NP, VP, PP, P, V, AUX
Formal defini0on of SCFGs
• A synchronous context free grammar is formally defined by a tuple
G = <N, TS, TT, R, S>•Where
–N is a shared set of non-‐terminal symbols–TS is the set of source language terminals
�23
S, NP, VP, PP, P, V, AUX
hamd ansary, na}b sdr, namzd, kylye, taa
Formal defini0on of SCFGs
• A synchronous context free grammar is formally defined by a tuple
G = <N, TS, TT, R, S>•Where
–N is a shared set of non-‐terminal symbols–TS is the set of source language terminals–TT is the set of target language terminals
�23
S, NP, VP, PP, P, V, AUX
hamd ansary, na}b sdr, namzd, kylye, taa
for, Hamid Ansari, nominated, Vice President, was
Formal defini0on of SCFGs
• A synchronous context free grammar is formally defined by a tuple
G = <N, TS, TT, R, S>•Where
–N is a shared set of non-‐terminal symbols–TS is the set of source language terminals–TT is the set of target language terminals–R is a set of produc0on rules
�23
S, NP, VP, PP, P, V, AUX
hamd ansary, na}b sdr, namzd, kylye, taa
for, Hamid Ansari, nominated, Vice President, was
Formal defini0on of SCFGs
• A synchronous context free grammar is formally defined by a tuple
G = <N, TS, TT, R, S>•Where
–N is a shared set of non-‐terminal symbols–TS is the set of source language terminals–TT is the set of target language terminals–R is a set of produc0on rules–S ∈ N, designated as the goal state �23
S, NP, VP, PP, P, V, AUX
hamd ansary, na}b sdr, namzd, kylye, taa
for, Hamid Ansari, nominated, Vice President, was
S
Formal defini0on of SCFGs
• Each produc0on rule has the form X → ⟨α, β, ∼,w⟩
•Where – X ∈ N – α ∈ (N ∪ TS)* – β ∈ (N ∪ TT)* – ~ is a one-‐to-‐one correspondence between the non terminals in α and β
– w is a weight assigned to the rule �24
Algorithms for SCFGs
• Transla0on with SCFGs is done via parsing • How do we write an algorithm for parsing? • One way to do it is as a deduc0ve proof system
�25
The CKY Parsing Algorithm
�26
Axioms _______
A → α
for all (A → α) ∈ R
Inference rules A [A, i, i+1]
!
[B, i, j] [C, j, k] A → BC [A, i, k]
Goal[S, 0, n]
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]NP → hamd ansary1
[NP, 0, 1]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1]
NP → hamd ansary1 [NP, 0, 1]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1]
NP → na}b sdr2 [NP, 1, 1]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1]
NP → na}b sdr2 [NP, 1, 1]
[NP, 1, 2]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2]
P → kylye3 [P, 2, 3]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2]
P → kylye3 [P, 2, 3]
[P, 2, 3]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3]
V → namzd4 [V, 3, 4]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4]
V → namzd4 [V, 3, 4]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4]
AUX → taa5 [AUX, 4, 5]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4]
AUX → taa5 [AUX, 4, 5]
[AUX,4,5]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4] [AUX,4,5]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4] [AUX,4,5]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4] [AUX,4,5]
[NP, 1, 2] [P, 2, 3] PP → NP P [PP, 1, 3]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4] [AUX,4,5]
[PP, 1, 3]
[NP, 1, 2] [P, 2, 3] PP → NP P [PP, 1, 3]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4] [AUX,4,5]
[PP, 1, 3]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4] [AUX,4,5]
[PP, 1, 3]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4] [AUX,4,5]
[PP, 1, 3]
[V, 3, 4] [AUX, 4, 5] VP → V AUX [VP, 3, 5]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4] [AUX,4,5]
[PP, 1, 3] [VP, 3,5]
[V, 3, 4] [AUX, 4, 5] VP → V AUX [VP, 3, 5]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4] [AUX,4,5]
[PP, 1, 3] [VP, 3,5]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4] [AUX,4,5]
[PP, 1, 3] [VP, 3,5]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4] [AUX,4,5]
[PP, 1, 3] [VP, 3,5]
[PP, 1, 3] [VP, 3, 5] VP → PP CP [VP, 1, 5]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4] [AUX,4,5]
[PP, 1, 3] [VP, 3,5]
[VP, 1,5]
[PP, 1, 3] [VP, 3, 5] VP → PP CP [VP, 1, 5]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4] [AUX,4,5]
[PP, 1, 3] [VP, 3,5]
[VP, 1,5]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4] [AUX,4,5]
[PP, 1, 3] [VP, 3,5]
[VP, 1,5]
[NP, 0, 1] [VP, 1, 5] S → NP VP [S, 0, 5]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4] [AUX,4,5]
[PP, 1, 3] [VP, 3,5]
[VP, 1,5]
[NP, 0, 1] [VP, 1, 5] S → NP VP [S, 0, 5]
[S, 0,5]
hamd ansary na}b sdr kylye namzd taa0 1 2 3 4 5
S → NP VPVP→ PP VPVP→ V AUXPP → NP PNP → hamd ansaryNP → na}b sdr
V → namzdP → kylye
AUX → taa
Axioms Inference rule used Goal[S, 0, 5]
[NP, 0, 1] [NP, 1, 2] [P, 2, 3] [V, 3, 4] [AUX,4,5]
[PP, 1, 3] [VP, 3,5]
[VP, 1,5]
[S, 0,5]
The CKY Parsing Algorithm
�28
Axioms _______
A → α
for all (A → α) ∈ R
Inference rules A [A, i, i+1]
!
[B, i, j] [C, j, k] A → BC [A, i, k]
Goal[S, 0, n]
The CKY Transla0on Algorithm
�29
Axioms _______
A → α, β for all (A→α, β)∈R
Inference rules A [A, i, i+1]
!
[B, i, j] [C, j, k] A → BC [A, i, k]
Goal[S, 0, n]
Where do grammars come from?
• Great! We now have –a formalism for describing the rela0onship between two languages,
–an algorithm for producing transla0ons
• All we need now is a synchronous grammar
�30
Where do grammars come from?
• Great! We now have –a formalism for describing the rela0onship between two languages,
–an algorithm for producing transla0ons
• All we need now is a synchronous grammar• Where do grammars come from?• Well, when two languages love each other very much...
�30
Data-‐driven grammar extrac0on
• Grammar rules are not wrigen by hand, they are extracted from bilingual parallel corpora
• =
�31
EnglishFrench
L' Espagne a refusé de confirmer que l' Espagne avait refusé d' aider le Maroc.
Nous voyons que le gouvernement français a envoyé un médiateur.
Force est de constater que la situation évolue chaque jour.
!"#$%&'( ) '*+, -"./ 012 34"5 6+78
69$: 0;1<"= +)0$>", 6+)?$@"A B6C 7D( EFG.H .69<I<J
6?"C 6KL +#$M12 E"#DF<NH 6+#O@"0J +#P<"7<J .
Spain declined to confirm that Spain declined to aid Morocco.
We see that the French government has sent a mediator.
We note that the situation is changing every day.
Torture is still being practised on a wide scale.
Arrest and detention without cause take place routinely.
This is a time for vision and political courage
. . . . . .
EnglishArabic
. . . . . .
我国 能源 原材料 工� 生� 大幅度 增� .
非国大 要求 阻止 更 多 被 拘留 人� 死亡 .
China's energy and raw materials production up.
ANC calls for steps to prevent deaths in police custody .
EnglishChinese
. . . . . .
EnglishFrench
L' Espagne a refusé de confirmer que l' Espagne avait refusé d' aider le Maroc.
Nous voyons que le gouvernement français a envoyé un médiateur.
Force est de constater que la situation évolue chaque jour.
!"#$%&'( ) '*+, -"./ 012 34"5 6+78
69$: 0;1<"= +)0$>", 6+)?$@"A B6C 7D( EFG.H .69<I<J
6?"C 6KL +#$M12 E"#DF<NH 6+#O@"0J +#P<"7<J .
Spain declined to confirm that Spain declined to aid Morocco.
We see that the French government has sent a mediator.
We note that the situation is changing every day.
Torture is still being practised on a wide scale.
Arrest and detention without cause take place routinely.
This is a time for vision and political courage
. . . . . .
EnglishArabic
. . . . . .
我国 能源 原材料 工� 生� 大幅度 增� .
非国大 要求 阻止 更 多 被 拘留 人� 死亡 .
China's energy and raw materials production up.
ANC calls for steps to prevent deaths in police custody .
EnglishChinese
. . . . . .
Hiero-‐style SCFG rules
• Most common type of SCFG in SMT is Hiero which has rules w/one non-‐terminal symbol
• Not as nice as linguis0cally mo0vated rules, does not capture the reordering in Urdu
X1 X1
与 X2 有 X3 have X2withX3
diplomatic relations
邦交North Korea
北韩
Extrac0ng Hiero rules
�33
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
Extrac0ng Hiero rules
�33
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
X → 与 北 韩 有 邦交, have diplomatic relations with North Korea
Extrac0ng Hiero rules
�33
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
X → 与 北 韩 有 邦交, have diplomatic relations with North Korea
X → 邦交, diplomatic relations
Extrac0ng Hiero rules
�33
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
X → 与 北 韩 有 邦交, have diplomatic relations with North Korea
X → 邦交, diplomatic relationsX → 北 韩, North Korea
Extrac0ng Hiero rules
�33
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
X → 与 北 韩 有 邦交, have diplomatic relations with North Korea
X → 邦交, diplomatic relationsX → 北 韩, North KoreaX2
X1
Extrac0ng Hiero rules
�33
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
X → 与 北 韩 有 邦交, have diplomatic relations with North Korea
X → 邦交, diplomatic relationsX → 北 韩, North Korea
X → 与 X1 有 X2, have X2 with X1
X2
X1
Extrac0ng Hiero rules
�33
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
X → 与 北 韩 有 邦交, have diplomatic relations with North Korea
X → 邦交, diplomatic relationsX → 北 韩, North Korea
X → 与 X1 有 X2, have X2 with X1
X2
X1
Discussion: what do you think of Hiero?
• So, we now have a way of extrac0ng SCFGs from bitexts. Great! So what?
• Is this any beger than the phrase based model? • How? • Do you feel that it is lacking anything? !
(Discuss with your neighbor)
�34
Extrac0ng Syntac0c Rules
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
NN
PP
PN
P
NP
NP
NP
PPN
PCO
MP
VP
Extrac0ng Syntac0c Rules
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
VP
SN
PN
P
NN
PP
PN
P
NP
NP
NP
PPN
PCO
MP
VP
Extrac0ng Syntac0c Rules
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
VP → 与 北 韩 有 邦交, have diplomatic relations with North Korea
VP
SN
PN
P
NN
PP
PN
P
NP
NP
NP
PPN
PCO
MP
VP
Extrac0ng Syntac0c Rules
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
VP → 与 北 韩 有 邦交, have diplomatic relations with North Korea
NP → 与 北 韩 有 邦交 的 少数 国家, the few countries that have diplomatic relations with North Korea
VP
SN
PN
P
NP → VP 的 少数 国家, the few countries that VP
NN
PP
PN
P
NP
NP
NP
PPN
PCO
MP
VP
Extrac0ng Syntac0c Rules
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
VP → 与 北 韩 有 邦交, have diplomatic relations with North Korea
NP → 与 北 韩 有 邦交 的 少数 国家, the few countries that have diplomatic relations with North Korea
VP
SN
PN
P
NP → VP 的 少数 国家, the few countries that VP
NP → VP 的 NP, the NP that VP
Wait a minute...
• Didn’t we see this earlier in Koehn’s paper? • Aren’t we giving up a ton of rules that you said were valuable?
• Something about a reduced inventory because we got rid of non-‐cons0tuent phrases?
�36
NN
PP
PN
P
NP
NP
NP
PPN
PCO
MP
VP
Extrac0ng Syntac0c Rules
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
VP → 与 北 韩 有 邦交, have diplomatic relations with North Korea
NP → 与 北 韩 有 邦交 的 少数 国家, the few countries that have diplomatic relations with North Korea
VP
SN
PN
P
NN
PP
PN
P
NP
NP
NP
PPN
PCO
MP
VP
Extrac0ng Syntac0c Rules
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
VP → 与 北 韩 有 邦交, have diplomatic relations with North Korea
??? → 的 少数 国家, the few countries that
NP → 与 北 韩 有 邦交 的 少数 国家, the few countries that have diplomatic relations with North Korea
VP
SN
PN
P
NN
PP
PN
P
NP
NP
NP
PPN
PCO
MP
VP
Extrac0ng Syntac0c Rules
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
VP → 与 北 韩 有 邦交, have diplomatic relations with North Korea
??? → 的 少数 国家, the few countries that
NP → 与 北 韩 有 邦交 的 少数 国家, the few countries that have diplomatic relations with North Korea
??? → 澳洲 是, Australia is
VP
SN
PN
P
NN
PP
PN
P
NP
NP
NP
PPN
PCO
MP
VP
Extrac0ng Syntac0c Rules
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
VP → 与 北 韩 有 邦交, have diplomatic relations with North Korea
??? → 的 少数 国家, the few countries that
NP → 与 北 韩 有 邦交 的 少数 国家, the few countries that have diplomatic relations with North Korea
??? → 澳洲 是, Australia is
VP
SN
PN
P
NN
PP
PN
P
NP
NP
NP
PPN
PCO
MP
VP
Extrac0ng Syntac0c Rules
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
VP → 与 北 韩 有 邦交, have diplomatic relations with North Korea
??? → 的 少数 国家, the few countries that
NP → 与 北 韩 有 邦交 的 少数 国家, the few countries that have diplomatic relations with North Korea
??? → 澳洲 是, Australia is
VP
SN
PN
P
VP
NP
NP/ →VP
NN
PP
PN
P
NP
NP
NP
PPN
PCO
MP
VP
Extrac0ng Syntac0c Rules
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
VP → 与 北 韩 有 邦交, have diplomatic relations with North Korea
??? → 的 少数 国家, the few countries that
NP → 与 北 韩 有 邦交 的 少数 国家, the few countries that have diplomatic relations with North Korea
??? → 澳洲 是, Australia is
VP
SN
PN
P
VP
NP
S SN
P
NP/ →
S/ →
VP
NP
NN
PP
PN
P
NP
NP
NP
PPN
PCO
MP
VP
Extrac0ng Syntac0c Rules
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
VP → 与 北 韩 有 邦交, have diplomatic relations with North Korea
??? → 的 少数 国家, the few countries that
NP → 与 北 韩 有 邦交 的 少数 国家, the few countries that have diplomatic relations with North Korea
??? → 澳洲 是, Australia is
VP
SN
PN
P
VP
NP
S SN
P
NP/ →
S/ →
VP
NP
NN
PP
PN
P
NP
NP
NP
PPN
PCO
MP
VP
Extrac0ng Syntac0c Rules
澳洲
是 与 北 韩 有 邦交
的 少数
国家
之一
Australia
is
one
of
the
few
countries
that
have
diplomatic
relations
with
North
Korea
VP → 与 北 韩 有 邦交, have diplomatic relations with North Korea
??? → 的 少数 国家, the few countries that
NP → 与 北 韩 有 邦交 的 少数 国家, the few countries that have diplomatic relations with North Korea
??? → 澳洲 是, Australia is
VP
SN
PN
P
VP
NP
S SN
P
NP/ →
S/ →
VP
NP
Discussion: Is this beger?
• What do you think of this flavor of SCFGs? • What are its limita0ons? • Do you think that it is beger or worse than Hiero?
• How would you prove it? !
(Discuss with your neighbors)
�40
New training paradigm
• Training data: word-‐aligned bilingual parallel corpus, with parse trees –No need to parse the Urdu, just parse the English –Method is therefore transferable to other resource poor languages
• Extract SCFG rules with syntac0c nonterminals • For non-‐cons0tuent phrases use CCG-‐style nonterminals
• Same coverage as Hiero model�41
Does it work?
• Tested for Urdu-‐English MT • 1.5 Million word parallel corpus • Two contras0ve systems, with different grammar extrac0on mechanism – Hiero – Syntax-‐augmented grammars
• Used same decoder in both cases • Tested results in a blind test set administered by the Na0onal Ins0tute for Standards in Technology
�42
Syntax v. no Syntax
�43
Syntax v. no Syntax
�43
Bleu score on blind NIST Urdu-English test set
20.0
23.0
26.0
29.0
32.0
No Syntax (Hiero) Syntax (SAMT) Best system
25.0
Syntax v. no Syntax
�43
Bleu score on blind NIST Urdu-English test set
20.0
23.0
26.0
29.0
32.0
No Syntax (Hiero) Syntax (SAMT) Best system
31.0
25.0
Syntax v. no Syntax
�43
Bleu score on blind NIST Urdu-English test set
20.0
23.0
26.0
29.0
32.0
No Syntax (Hiero) Syntax (SAMT) Best system
31.231.0
25.0
State of the Art Urdu ResultsAll system scores on NIST09 Urdu-English constrained
task
19.0
22.3
25.5
28.8
32.03131
25242423232322
20
25
Hiero baseline Syntax
31
PBMT(Moses)
Transla0on improvements
�45
'first nuclear experiment in 1990 was' Thomas red Unilever National Laboratory of the United States in وویيپنن designer, are already working on the book of Los اایيلمووسس National Laboratory ڈڈیينی, former director of the technical اانٹڻیيلجنسس written with the cooperation of سٹڻلمیينن. !This book 'nuclear express: political history and the expansion of bomb' has been written, and the two writers have also claimed that the country has made nuclear bomb is he or any other country's nuclear secrets to or that of any other nuclear چرراایيٴے power cooperation is achieved.
The First Nuclear Test Was in 1990.
!Thomas red of the United States, the National Laboratory in designer are already working on the book of Los Alamos National Laboratory, former director of the technical intelligence, with the cooperation of Diana steelman wrote. !This book under the title of the spread of nuclear expressway: the political history of the bomb and this has been written and the two writers have claimed that the country also has made nuclear bomb or any other country, Korea nuclear secrets, or any of the other nuclear power cooperation.
First nuclear test conducted in 1990 Thomas Reed, who has worked as a weapons designer at Livermore National Laboratory in the United States, has written a book in collaboration with Danny Stillman, former director of the technical intelligence division at Los Alamos National Laboratory. !In their book, 'The Nuclear Express: A Political History of the Bomb and its Proliferation,' Reed and Stillman claim that every country that has ever produced a nuclear bomb has been able to do so because it stole the nuclear secrets of another country or enjoyed the cooperation of another nuclear power.
Who did what to whom?
�46
Thomas was red when this question why China has provided the nuclear technology to Pakistan, In response, He said as China and India was joint enemy of Pakistan.
Baseline
SCALE final system
He said that China, North Korea, Iran, Syria, Pakistan, through Egypt, Libya and Yemen is to provide nuclear technology.
Thomas red when was this question why China has provided to Pakistan nuclear technology, he said in response to China, Pakistan and India as a common enemy.
He said that China would provide nuclear technology to North Korea, Iran, Syria, Pakistan, Egypt, Libya and Yemen.
Syntax captures Urdu reordering
�47
Why did this work?
• Using syntax-‐based transla0on models resulted in huge improvements in quality
• Previous work on syntax did not shown significant gains, so why did it work here?
• Urdu is an ideal language to show off the advantages of syntax –Very small amount of training data –Very different word order than English
• Can’t simply memorize transla0ons of phrases • Must generalize �48
Training data for MT Research
�49
Urdu
1.5M
Arabic and Chinese DARPA GALE
200M
European Parliament
50M
French-English 10^9 word webcrawl
1000M
Distribu0on of Word Orders
All Languages
14%
7%
36%
40%
SOV SVO VSO VOS No dominant orderOVS OSV
SMT Languages
Distribu0on of Word Orders
All Languages
14%
7%
36%
40%
SOV SVO VSO VOS No dominant orderOVS OSV
SMT Languages
13%
Distribu0on of Word Orders
All Languages
14%
7%
36%
40%
SOV SVO VSO VOS No dominant orderOVS OSV
SMT Languages
61%
13%
Distribu0on of Word Orders
All Languages
14%
7%
36%
40%
SOV SVO VSO VOS No dominant orderOVS OSV
SMT Languages
4%
61%
13%
Distribu0on of Word Orders
All Languages
14%
7%
36%
40%
SOV SVO VSO VOS No dominant orderOVS OSV
SMT Languages
22%
4%
61%
13%
Joshua Decoder • An open source decoder • Uses synchronous context free grammars to translate
• Implements all algorithms needed for transla0ng with SCFGs –grammar extrac0on (Thrax!) –chart-‐parsing –n-‐gram language model integra0on
–pruning, and k-‐best extrac0on�51