Natural Language Models and Interfaces���lecture 11
Ivan Titov
Institute for Logic, Language and Computation
Today
2
} Machine translation:
} IBM 1 and HMM models
} Unsupervised Induction of Semantic Representations
} Topics for the exam
IBM Candide Project [Brown et al 93]
French Broken English
English
French/English Bilingual Text
English Text
Statistical Analysis Statistical Analysis
J’ ai si faim
What hunger have I, Hungry I am so, I am so hungry, Have me that hunger …
I am so hungry
Mathematical Formulation
J’ ai si faim I am so hungry
Translation Model P(f | e)
Language Model P(e)
Decoding algorithm argmaxe P(e) · P(f | e)
Given source sentence f:
argmaxe P(e | f) =
argmaxe P(f | e) · P(e) / P(f) = by Bayes Rule
argmaxe P(f | e) · P(e) P(f) same for all e
French Broken English
English
The Classic Translation Model Word Substitution/Permutation [Brown et al., 1993]
Mary did not slap the green witch
Mary not slap slap slap the green witch
Maria no dió una bofetada a la bruja verde
Mary not slap slap slap NULL the green witch
Maria no dió una bofetada a la verde bruja
Trainable
Process model of translation:
n(3|slap) 50k entries
d(j|i) 2500 entries
P-Null 1 entry
t(la|the) 25m entries
Let's do it more formally
Unsupervised EM Training
… la maison … la maison bleue … la fleur … … the house … the blue house … the flower …
All P(french-word | english-word) equally likely
Unsupervised EM Training
… la maison … la maison bleue … la fleur … … the house … the blue house … the flower …
“la” and “the” observed to co-occur frequently, so P(la | the) is increased.
Unsupervised EM Training
… la maison … la maison bleue … la fleur … … the house … the blue house … the flower …
“maison” co-occurs with both “the” and “house”, but P(maison | house) can be raised without limit, to 1.0,
while P(maison | the) is limited because of “la”
(pigeonhole principle)
e f P(f | e) national
nationale 0.47 national 0.42 nationaux 0.05 nationales 0.03
the
le 0.50 la 0.21 les 0.16 l’ 0.09 ce 0.02 cette 0.01
farmers
agriculteurs 0.44 les 0.42 cultivateurs 0.05 producteurs 0.02
new French sentence f
P(f | e) · P(e) à score for e
Translation Model
Language Model
w1 w2 P(w2 | w1) of
the 0.13 a 0.09 another 0.01 some 0.01
hong
kong 0.98 said 0.01 stated 0.01
potential translation e
P(f | e) P(e)
Beam Search Decoding [Brown et al US Patent #5,477,451]
1st English word
2nd English word
3rd English word
4th English word
start end
Each partial translation hypothesis contains: - Last English word chosen + source words covered by it - Next-to-last English word chosen - Entire coverage vector (so far) of source sentence - Language model and translation model scores (so far)
all source words
covered
[Jelinek 69; Och, Ueffing, and Ney, 01]
Phrases
SOURCE TARGET
phrases phrases
How do you translate “real estate” into French? real estate real number dance number dance card memory card memory stick …
words words
syntax syntax
logical form
interlingua
logical form
Phrase-Based Statistical MT
• Foreign input segmented into phrases – “phrase” just means “word sequence”
• Each phrase is probabilistically translated into English – P(to the conference | zur Konferenz) – P(into the meeting | zur Konferenz)
• Phrases are probabilistically re-ordered See [Koehn et al, 2003] for an overview. – More details now
Morgen fliege ich nach Kanada zur Konferenz
Tomorrow I will fly to the conference In Canada
How to Learn the Phrase Translation Table?
• One method: “alignment templates” [Och et al 99]
• Start with word alignment
• Collect all phrase pairs that are consistent with the word alignment
How do we compute word alignments?
Mary did not slap the green witch
Maria no dió una bofetada a la bruja verde
Word Alignment Induced Phrases
Mary did not slap the green witch
Maria no dió una bofetada a la bruja verde
Word Alignment Induced Phrases
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)
Mary did not slap the green witch
Maria no dió una bofetada a la bruja verde
Word Alignment Induced Phrases
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap) (bruja verde, green witch)
Mary did not slap the green witch
Maria no dió una bofetada a la bruja verde
Word Alignment Induced Phrases
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) … (Maria no dió una bofetada a la bruja verde, Mary did not slap the green witch)
Phrase Pair Probabilities
• A certain phrase pair (f-f-f, e-e-e) may appear many times across the bilingual corpus.
• No EM training
• Just relative frequency:
count(f-f-f, e-e-e) P(f-f-f | e-e-e) = ----------------------- count(e-e-e)
Phrase-Based MT • It was the best way to do Statistical MT until very recently • Now syntax starts play the role
• Like CFGs, but production have two right hand sides – Source side – Target side – Related through linked non-terminal symbols
• E.g. VP → <V[1] NP[2],NP[2] V[1]> • One-to-one correspondence • Productions applied in parallel to both sides to
linked non-terminals
Synchronous CFGs [Chiang, 2005]
Today
21
} Machine translation:
} IBM 1 and HMM models
} Unsupervised Induction of Semantic Representations
} Topics for the exam
Representing events and their participants
22
} A semantic frame [Fillmore 1968] is a conceptual structure describing a situation, object, or event along with associated properties and participants
} Example: OPENING frame
Jack opened the lock with a paper clip
Semantic Roles (aka Frame Elements):
OPENER – an initiator/doer in the event [Who?]
OPENED - an affected entity [to Whom / to What?]
INSTRUMENT – the entity manipulated to accomplish the goal
Other roles for CLOSURE/OPENING frame: BENEFICIARY, FASTENER, DEGREE,
CIRCUMSTANCES, MANIPULATOR, PORTAL, …
Can we induce this without labeled data?
Lansky left Australia to study the piano at the Royal College of Music
Semantic frames and roles } A semantic frame [Fillmore 1968] is a conceptual structure describing a
situation, object, or event along with associated properties and participants
} Example: OPENING frame
Jack opened the lock with a paper clip
Semantic Roles (aka Frame Elements):
AGENT – an initiator/doer in the event [Who?]
PATIENT - an affected entity [to Whom / to What?]
INSTRUMENT – the entity manipulated to accomplish the goal
Other roles for CLOSURE/OPENING frame: BENEFICIARY, FASTENER, DEGREE, CIRCUMSTANCES,
MANIPULATOR, PORTAL, …
1
Lansky left Australia to study the piano at the Royal College of Music
Student
SubjectInstituition
EducationDeparting
Object
Source
Purpose
Semantic frames and roles } A semantic frame [Fillmore 1968] is a conceptual structure describing a
situation, object, or event along with associated properties and participants
} Example: OPENING frame
Jack opened the lock with a paper clip
Semantic Roles (aka Frame Elements):
AGENT – an initiator/doer in the event [Who?]
PATIENT - an affected entity [to Whom / to What?]
INSTRUMENT – the entity manipulated to accomplish the goal
Other roles for CLOSURE/OPENING frame: BENEFICIARY, FASTENER, DEGREE, CIRCUMSTANCES,
MANIPULATOR, PORTAL, …
2
Lansky left Australia to study the piano at the Royal College of Music
Student
SubjectInstituition
EducationDeparting
Object
Source
Purpose
Semantic frames and roles } A semantic frame [Fillmore 1968] is a conceptual structure describing a
situation, object, or event along with associated properties and participants
} Example: OPENING frame
Jack opened the lock with a paper clip
Semantic Roles (aka Frame Elements):
AGENT – an initiator/doer in the event [Who?]
PATIENT - an affected entity [to Whom / to What?]
INSTRUMENT – the entity manipulated to accomplish the goal
Other roles for CLOSURE/OPENING frame: BENEFICIARY, FASTENER, DEGREE, CIRCUMSTANCES,
MANIPULATOR, PORTAL, …
2
Can we induce this without labeled data?
effect
affected
Frame: likelihood
frame
predicate (i.e. its lemma)
role 1
(non-terminal, head word) for role 1
tend
likelihood
affected
(NP, investments)
[Thompson et al, 2003]
Recap: last time
Induction of Frame-Semantic Information
27
} The semantic induction task involves 3 sub-tasks
} Construction of a transformed syntactic dependency graph (~ argument identification)
gave Peter the Great build wooden fortified castlean order to a
gave Peter the Great build wooden fortified castlean order
Person Request BuildingsBeing_ProtectedConstruction to
Material a
Induction of Frame-Semantic Information
28
} The semantic induction task involves 3 sub-tasks
} Construction of a transformed syntactic dependency graph (~ argument identification)
} Induction of frames (and clusters of arguments)
gave Peter the Great build wooden fortified castlean order
Person Request
Speaker Message
Created Entity
BuildingsBeing_ProtectedConstruction to
Material
Material
a
Type
Induction of Frame-Semantic Information
29
} The semantic induction task involves 3 sub-tasks
} Construction of a transformed syntactic dependency graph (~ argument identification)
} Induction of frames (and clusters of arguments)
} Role Induction
We model these sub-tasks jointly within a probabilistic model
We will refer to labels as semantic classes (both frames and just clusters of arguments)
The (Simplified) Model
30
GenSemClass(croot
)
croot
⇠ �root
for each sentence :Draw semantic class for root
The (Simplified) Model
31
GenSemClass(croot
)
croot
⇠ �root
for each sentence :Draw semantic class for root
while [n ⇠ �+c,t] = 1 :
for each role t = 1, . . . , T :
s ⇠ �c
GenArgument(c, t)
GenArgument(c, t)
GenSemClass(c)
if [n ⇠ �c,t] = 1 :
Request
The (Simplified) Model
32
GenSemClass(croot
)
croot
⇠ �root
for each sentence :Draw semantic class for root
while [n ⇠ �+c,t] = 1 :
for each role t = 1, . . . , T :
s ⇠ �c
GenArgument(c, t)
GenArgument(c, t)
GenSemClass(c)
if [n ⇠ �c,t] = 1 :
Draw synt/lex realization
gave an order
Request
Like in the syntactic DOP model
The (Simplified) Model
33
GenSemClass(croot
)
croot
⇠ �root
for each sentence :Draw semantic class for root
while [n ⇠ �+c,t] = 1 :
for each role t = 1, . . . , T :
s ⇠ �c
GenArgument(c, t)
GenArgument(c, t)
GenSemClass(c)
if [n ⇠ �c,t] = 1 :
Draw synt/lex realization
At least one argument
gave an order
Request
Speaker
The (Simplified) Model
34
GenSemClass(croot
)
croot
⇠ �root
for each sentence :Draw semantic class for root
while [n ⇠ �+c,t] = 1 :
for each role t = 1, . . . , T :
s ⇠ �c
GenArgument(c, t)
GenArgument(c, t)
GenSemClass(c)
if [n ⇠ �c,t] = 1 :
Draw synt/lex realization
At least one argument
Draw first argument
GenArgument(c, t)
ac,t ⇠ �c,t
c0c,t ⇠ �c,t
GenSemClass(c0c,t)
gave an order
Request
Speaker
The (Simplified) Model
35
GenSemClass(croot
)
croot
⇠ �root
for each sentence :Draw semantic class for root
while [n ⇠ �+c,t] = 1 :
for each role t = 1, . . . , T :
s ⇠ �c
GenArgument(c, t)
GenArgument(c, t)
GenSemClass(c)
if [n ⇠ �c,t] = 1 :
Draw synt/lex realization
At least one argument
Draw first argument
GenArgument(c, t)
ac,t ⇠ �c,t
c0c,t ⇠ �c,t
GenSemClass(c0c,t)
Draw argument key
gave an order
Request
SpeakerACTIVE:LEFT:SBJ
Syntactic property of the argument
The (Simplified) Model
36
GenSemClass(croot
)
croot
⇠ �root
for each sentence :Draw semantic class for root
while [n ⇠ �+c,t] = 1 :
for each role t = 1, . . . , T :
s ⇠ �c
GenArgument(c, t)
GenArgument(c, t)
GenSemClass(c)
if [n ⇠ �c,t] = 1 :
Draw synt/lex realization
At least one argument
Draw first argument
GenArgument(c, t)
ac,t ⇠ �c,t
c0c,t ⇠ �c,t
GenSemClass(c0c,t)
Draw argument key
Draw semantic class for arg
gave an order
Person Request
SpeakerACTIVE:LEFT:SBJ
The (Simplified) Model
37
GenSemClass(croot
)
croot
⇠ �root
for each sentence :Draw semantic class for root
while [n ⇠ �+c,t] = 1 :
for each role t = 1, . . . , T :
s ⇠ �c
GenArgument(c, t)
GenArgument(c, t)
GenSemClass(c)
if [n ⇠ �c,t] = 1 :
Draw synt/lex realization
At least one argument
Draw first argument
GenArgument(c, t)
ac,t ⇠ �c,t
c0c,t ⇠ �c,t
GenSemClass(c0c,t)
Draw argument key
Draw semantic class for arg
Recurse
gave Peter the Great an order
Person Request
SpeakerACTIVE:LEFT:SBJ
The (Simplified) Model
38
GenSemClass(croot
)
croot
⇠ �root
for each sentence :Draw semantic class for root
while [n ⇠ �+c,t] = 1 :
for each role t = 1, . . . , T :
s ⇠ �c
GenArgument(c, t)
GenArgument(c, t)
GenSemClass(c)
if [n ⇠ �c,t] = 1 :
Draw synt/lex realization
At least one argument
Draw first argument
GenArgument(c, t)
ac,t ⇠ �c,t
c0c,t ⇠ �c,t
GenSemClass(c0c,t)
Draw argument key
Draw semantic class for arg
Recurse Continue generation
gave Peter the Great an order
Person Request
SpeakerACTIVE:LEFT:SBJ
The (Simplified) Model
39
GenSemClass(croot
)
croot
⇠ �root
for each sentence :Draw semantic class for root
while [n ⇠ �+c,t] = 1 :
for each role t = 1, . . . , T :
s ⇠ �c
GenArgument(c, t)
GenArgument(c, t)
GenSemClass(c)
if [n ⇠ �c,t] = 1 :
Draw synt/lex realization
At least one argument
Draw first argument
GenArgument(c, t)
ac,t ⇠ �c,t
c0c,t ⇠ �c,t
GenSemClass(c0c,t)
Draw argument key
Draw semantic class for arg
Recurse Continue generation
Draw more arguments
gave Peter the Great buildan order
Person Request
Speaker Message
Constr. to
ACTIVE:LEFT:SBJ ACTIVE:RIGHT:OBJ
The (Simplified) Model
40
GenSemClass(croot
)
croot
⇠ �root
for each sentence :Draw semantic class for root
while [n ⇠ �+c,t] = 1 :
for each role t = 1, . . . , T :
s ⇠ �c
GenArgument(c, t)
GenArgument(c, t)
GenSemClass(c)
if [n ⇠ �c,t] = 1 :
Draw synt/lex realization
At least one argument
Draw first argument
GenArgument(c, t)
ac,t ⇠ �c,t
c0c,t ⇠ �c,t
GenSemClass(c0c,t)
Draw argument key
Draw semantic class for arg
Recurse Continue generation
Draw more arguments
gave Peter the Great build fortified castlean order
Person Request
Speaker Message
Created Entity
BuildingsProtectedConstr. to a
Type
ACTIVE:LEFT:SBJ ACTIVE:RIGHT:OBJ
ACTIVE:RIGHT:OBJ
-:LEFT:NMOD
Inducing Semantics
41
} Given a (large) collection of sentences annotated with (transformed) syntactic dependencies
gave Peter the Great build wooden fortified castlean order to a
woreMary an evening dress from Cardin
…
{xi}ni=1
} Given a (large) collection of sentences annotated with (transformed) syntactic dependencies
} We want to induce its semantics , i.e. its segmentation and clustering
gave Peter the Great build wooden fortified castlean order
Person Request
Speaker Message
Created Entity
BuildingsBeing_ProtectedConstruction to
Material
Material
a
Type
woreMary an evening dress from Cardin
Wearer
Wearing
Clothing Creator
Style
GarmentPerson Occasion Brand
Inducing Semantics
42
…
{xi}ni=1
{mi}ni=1
gave Peter the Great build wooden fortified castlean order
1121 621
9 3
11
33332395 to
78
24
a
4
woreMary an evening dress from Cardin
12
573
5 21
7
8971121 445 1621
} Given a (large) collection of sentences annotated with (transformed) syntactic dependencies
} We want to induce its semantics , i.e. its segmentation and clustering
Inducing Semantics
43
…
{xi}ni=1
{mi}ni=1
Can be induced with a form of EM algorithm (though this is not quite what we have done)
Semantic classes (frames)
44
From a collection of biomedical texts:
Class Variations
1 motif, sequence, regulatory element, response element, element, dna sequence
2 donor, individual, subject
3 important, essential, critical
4 dose, concentration
5 activation, transcriptional activation, transactivation
6 b cell, t lymphocyte, thymocyte, b lymphocyte, t cell, t-cell line, human lymphocyte, t-lymphocyte
7 indicate, reveal, document, suggest, demonstrate
8 augment, abolish, inhibit, convert, cause, abrogate, modulate, block, decrease, reduce, diminish, suppress, up-regulate, impair, reverse, enhance
9 confirm, assess, examine, study, evaluate, test, resolve, determine, investigate
10 nf-kappab, nf-kappa b, nfkappab, nf-kb
“cause change position on a scale” frame
Blood cells
Today
} Machine translation:
} IBM 1 and HMM models
} Unsupervised Induction of Semantic Representations
} Topics for the exam
Overview
46
} Noisy channel paradigm (and applications)
} Language models:
} handling rare words and noise
} Sequence labeling problems:
} POS-tagging, spell-checking applications and semantic role labeling (SRL)
} Estimation algorithms
} Decoding (labeling) methods: Viterbi (dynamic programming)
} Parsing problems (trees)
} Syntactic parsing and touched on semantics (probabilistic CCGs)
} Modeling: PCFGs and grammar transforms (including DOP)
} Decoding (parsing) algorithms: CKY (dynamic programming)
} Sequence-to-sequence problems
} Machine translation: word, phrase and syntactic models, …
Material for the Exam (March 24th)
47
} CFG and PCFGs } incl. tree probability; estimation from a treebank
} recap smoothing
} CNF form, binarization (CNF transform)
} CKY algorithm for CFG and PCFG
} Incl. understanding the data structure (the chart), understanding the unary closures both for the probabilistic case and not, time complexity of the algorithm
} Make sure that given a grammar and a sentence, you can fill a chart
} Ideas about weaknesses of PCFGs
} Remember the PP-attachment and coordination examples
Material for the Exam (March 24th)
48
} Structural annotation: vertical and horizontal Markovization } Effects on grammar size, on parsing speed, accuracy, …
} Lexicalization: how to get rules, basic understanding of estimation, time complexity
} Computing the probability of a word sequence with PCFGs
} "Sum product" – inside probabilities
Material for the Exam (March 24th)
49
} DOP estimation (recall the basic corpus example)
} HMMs and PCFGs as general tool:
} Semantic role labeling problem: the basic sequence model
} CCGs (only the application combinator, no composition and no type raising): an idea how CKY can be applied to the semantic parsing problem
Material for the Exam (March 24th)
50
} Machine translation
} Word models: IBM model1, HMM model (from today)
} An idea about higher order IBM model (see slides from lecture 9 – modeling valency)
} The basic phrase-based model: how do we obtain phrases? How do we define probabilistic model? (including from today)
} Intuition about estimation (an idea of the EM algorithm but not formal derivation)
} Decoding: what represents the state in the lattice?
} An idea about synchronous CFGs
Material for the Exam (March 24th)
51
} Not at the exam:
} Semantics
} Type-raising and composition for CCGs
} unsupervised induction of SRL (i.e. today)
} Cognitive aspects
} History
Top Related