Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-11.pdf(bruja verde,...

Post on 12-Mar-2021

3 views 0 download

Transcript of Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-11.pdf(bruja verde,...

Natural Language Models and Interfaces���lecture 11

Ivan Titov

Institute for Logic, Language and Computation

Today

2

}  Machine translation:

}  IBM 1 and HMM models

}  Unsupervised Induction of Semantic Representations

}  Topics for the exam

IBM Candide Project [Brown et al 93]

French Broken English

English

French/English Bilingual Text

English Text

Statistical Analysis Statistical Analysis

J’ ai si faim

What hunger have I, Hungry I am so, I am so hungry, Have me that hunger …

I am so hungry

Mathematical Formulation

J’ ai si faim I am so hungry

Translation Model P(f | e)

Language Model P(e)

Decoding algorithm argmaxe P(e) · P(f | e)

Given source sentence f:

argmaxe P(e | f) =

argmaxe P(f | e) · P(e) / P(f) = by Bayes Rule

argmaxe P(f | e) · P(e) P(f) same for all e

French Broken English

English

The Classic Translation Model Word Substitution/Permutation [Brown et al., 1993]

Mary did not slap the green witch

Mary not slap slap slap the green witch

Maria no dió una bofetada a la bruja verde

Mary not slap slap slap NULL the green witch

Maria no dió una bofetada a la verde bruja

Trainable

Process model of translation:

n(3|slap) 50k entries

d(j|i) 2500 entries

P-Null 1 entry

t(la|the) 25m entries

Let's do it more formally

Unsupervised EM Training

… la maison … la maison bleue … la fleur … … the house … the blue house … the flower …

All P(french-word | english-word) equally likely

Unsupervised EM Training

… la maison … la maison bleue … la fleur … … the house … the blue house … the flower …

“la” and “the” observed to co-occur frequently, so P(la | the) is increased.

Unsupervised EM Training

… la maison … la maison bleue … la fleur … … the house … the blue house … the flower …

“maison” co-occurs with both “the” and “house”, but P(maison | house) can be raised without limit, to 1.0,

while P(maison | the) is limited because of “la”

(pigeonhole principle)

e f P(f | e) national

nationale 0.47 national 0.42 nationaux 0.05 nationales 0.03

the

le 0.50 la 0.21 les 0.16 l’ 0.09 ce 0.02 cette 0.01

farmers

agriculteurs 0.44 les 0.42 cultivateurs 0.05 producteurs 0.02

new French sentence f

P(f | e) · P(e) à score for e

Translation Model

Language Model

w1 w2 P(w2 | w1) of

the 0.13 a 0.09 another 0.01 some 0.01

hong

kong 0.98 said 0.01 stated 0.01

potential translation e

P(f | e) P(e)

Beam Search Decoding [Brown et al US Patent #5,477,451]

1st English word

2nd English word

3rd English word

4th English word

start end

Each partial translation hypothesis contains: - Last English word chosen + source words covered by it - Next-to-last English word chosen - Entire coverage vector (so far) of source sentence - Language model and translation model scores (so far)

all source words

covered

[Jelinek 69; Och, Ueffing, and Ney, 01]

Phrases

SOURCE TARGET

phrases phrases

How do you translate “real estate” into French? real estate real number dance number dance card memory card memory stick …

words words

syntax syntax

logical form

interlingua

logical form

Phrase-Based Statistical MT

•  Foreign input segmented into phrases –  “phrase” just means “word sequence”

•  Each phrase is probabilistically translated into English –  P(to the conference | zur Konferenz) –  P(into the meeting | zur Konferenz)

•  Phrases are probabilistically re-ordered See [Koehn et al, 2003] for an overview. – More details now

Morgen fliege ich nach Kanada zur Konferenz

Tomorrow I will fly to the conference In Canada

How to Learn the Phrase Translation Table?

•  One method: “alignment templates” [Och et al 99]

•  Start with word alignment

•  Collect all phrase pairs that are consistent with the word alignment

How do we compute word alignments?

Mary did not slap the green witch

Maria no dió una bofetada a la bruja verde

Word Alignment Induced Phrases

Mary did not slap the green witch

Maria no dió una bofetada a la bruja verde

Word Alignment Induced Phrases

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)

Mary did not slap the green witch

Maria no dió una bofetada a la bruja verde

Word Alignment Induced Phrases

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap) (bruja verde, green witch)

Mary did not slap the green witch

Maria no dió una bofetada a la bruja verde

Word Alignment Induced Phrases

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) … (Maria no dió una bofetada a la bruja verde, Mary did not slap the green witch)

Phrase Pair Probabilities

•  A certain phrase pair (f-f-f, e-e-e) may appear many times across the bilingual corpus.

•  No EM training

•  Just relative frequency:

count(f-f-f, e-e-e) P(f-f-f | e-e-e) = ----------------------- count(e-e-e)

Phrase-Based MT •  It was the best way to do Statistical MT until very recently •  Now syntax starts play the role

•  Like CFGs, but production have two right hand sides – Source side – Target side – Related through linked non-terminal symbols

•  E.g. VP → <V[1] NP[2],NP[2] V[1]> •  One-to-one correspondence •  Productions applied in parallel to both sides to

linked non-terminals

Synchronous CFGs [Chiang, 2005]

Today

21

}  Machine translation:

}  IBM 1 and HMM models

}  Unsupervised Induction of Semantic Representations

}  Topics for the exam

Representing events and their participants

22

}  A semantic frame [Fillmore 1968] is a conceptual structure describing a situation, object, or event along with associated properties and participants

}  Example: OPENING frame

Jack opened the lock with a paper clip

Semantic Roles (aka Frame Elements):

OPENER – an initiator/doer in the event [Who?]

OPENED - an affected entity [to Whom / to What?]

INSTRUMENT – the entity manipulated to accomplish the goal

Other roles for CLOSURE/OPENING frame: BENEFICIARY, FASTENER, DEGREE,

CIRCUMSTANCES, MANIPULATOR, PORTAL, …

Can we induce this without labeled data?

Lansky left Australia to study the piano at the Royal College of Music

Semantic frames and roles }  A semantic frame [Fillmore 1968] is a conceptual structure describing a

situation, object, or event along with associated properties and participants

}  Example: OPENING frame

Jack opened the lock with a paper clip

Semantic Roles (aka Frame Elements):

AGENT – an initiator/doer in the event [Who?]

PATIENT - an affected entity [to Whom / to What?]

INSTRUMENT – the entity manipulated to accomplish the goal

Other roles for CLOSURE/OPENING frame: BENEFICIARY, FASTENER, DEGREE, CIRCUMSTANCES,

MANIPULATOR, PORTAL, …

1

Lansky left Australia to study the piano at the Royal College of Music

Student

SubjectInstituition

EducationDeparting

Object

Source

Purpose

Semantic frames and roles }  A semantic frame [Fillmore 1968] is a conceptual structure describing a

situation, object, or event along with associated properties and participants

}  Example: OPENING frame

Jack opened the lock with a paper clip

Semantic Roles (aka Frame Elements):

AGENT – an initiator/doer in the event [Who?]

PATIENT - an affected entity [to Whom / to What?]

INSTRUMENT – the entity manipulated to accomplish the goal

Other roles for CLOSURE/OPENING frame: BENEFICIARY, FASTENER, DEGREE, CIRCUMSTANCES,

MANIPULATOR, PORTAL, …

2

Lansky left Australia to study the piano at the Royal College of Music

Student

SubjectInstituition

EducationDeparting

Object

Source

Purpose

Semantic frames and roles }  A semantic frame [Fillmore 1968] is a conceptual structure describing a

situation, object, or event along with associated properties and participants

}  Example: OPENING frame

Jack opened the lock with a paper clip

Semantic Roles (aka Frame Elements):

AGENT – an initiator/doer in the event [Who?]

PATIENT - an affected entity [to Whom / to What?]

INSTRUMENT – the entity manipulated to accomplish the goal

Other roles for CLOSURE/OPENING frame: BENEFICIARY, FASTENER, DEGREE, CIRCUMSTANCES,

MANIPULATOR, PORTAL, …

2

Can we induce this without labeled data?

effect  

affected  

Frame: likelihood

frame

predicate (i.e. its lemma)

role 1

(non-terminal, head word) for role 1

tend

likelihood

affected

(NP, investments)

[Thompson et al, 2003]

Recap: last time

Induction of Frame-Semantic Information

27

}  The semantic induction task involves 3 sub-tasks

}  Construction of a transformed syntactic dependency graph (~ argument identification)

gave Peter the Great build wooden fortified castlean order to a

gave Peter the Great build wooden fortified castlean order

Person Request BuildingsBeing_ProtectedConstruction to

Material a

Induction of Frame-Semantic Information

28

}  The semantic induction task involves 3 sub-tasks

}  Construction of a transformed syntactic dependency graph (~ argument identification)

}  Induction of frames (and clusters of arguments)

gave Peter the Great build wooden fortified castlean order

Person Request

Speaker Message

Created Entity

BuildingsBeing_ProtectedConstruction to

Material

Material

a

Type

Induction of Frame-Semantic Information

29

}  The semantic induction task involves 3 sub-tasks

}  Construction of a transformed syntactic dependency graph (~ argument identification)

}  Induction of frames (and clusters of arguments)

}  Role Induction

We model these sub-tasks jointly within a probabilistic model

We will refer to labels as semantic classes (both frames and just clusters of arguments)

The (Simplified) Model

30

GenSemClass(croot

)

croot

⇠ �root

for each sentence :Draw semantic class for root

The (Simplified) Model

31

GenSemClass(croot

)

croot

⇠ �root

for each sentence :Draw semantic class for root

while [n ⇠ �+c,t] = 1 :

for each role t = 1, . . . , T :

s ⇠ �c

GenArgument(c, t)

GenArgument(c, t)

GenSemClass(c)

if [n ⇠ �c,t] = 1 :

Request

The (Simplified) Model

32

GenSemClass(croot

)

croot

⇠ �root

for each sentence :Draw semantic class for root

while [n ⇠ �+c,t] = 1 :

for each role t = 1, . . . , T :

s ⇠ �c

GenArgument(c, t)

GenArgument(c, t)

GenSemClass(c)

if [n ⇠ �c,t] = 1 :

Draw synt/lex realization

gave an order

Request

Like in the syntactic DOP model

The (Simplified) Model

33

GenSemClass(croot

)

croot

⇠ �root

for each sentence :Draw semantic class for root

while [n ⇠ �+c,t] = 1 :

for each role t = 1, . . . , T :

s ⇠ �c

GenArgument(c, t)

GenArgument(c, t)

GenSemClass(c)

if [n ⇠ �c,t] = 1 :

Draw synt/lex realization

At least one argument

gave an order

Request

Speaker

The (Simplified) Model

34

GenSemClass(croot

)

croot

⇠ �root

for each sentence :Draw semantic class for root

while [n ⇠ �+c,t] = 1 :

for each role t = 1, . . . , T :

s ⇠ �c

GenArgument(c, t)

GenArgument(c, t)

GenSemClass(c)

if [n ⇠ �c,t] = 1 :

Draw synt/lex realization

At least one argument

Draw first argument

GenArgument(c, t)

ac,t ⇠ �c,t

c0c,t ⇠ �c,t

GenSemClass(c0c,t)

gave an order

Request

Speaker

The (Simplified) Model

35

GenSemClass(croot

)

croot

⇠ �root

for each sentence :Draw semantic class for root

while [n ⇠ �+c,t] = 1 :

for each role t = 1, . . . , T :

s ⇠ �c

GenArgument(c, t)

GenArgument(c, t)

GenSemClass(c)

if [n ⇠ �c,t] = 1 :

Draw synt/lex realization

At least one argument

Draw first argument

GenArgument(c, t)

ac,t ⇠ �c,t

c0c,t ⇠ �c,t

GenSemClass(c0c,t)

Draw argument key

gave an order

Request

SpeakerACTIVE:LEFT:SBJ

Syntactic property of the argument

The (Simplified) Model

36

GenSemClass(croot

)

croot

⇠ �root

for each sentence :Draw semantic class for root

while [n ⇠ �+c,t] = 1 :

for each role t = 1, . . . , T :

s ⇠ �c

GenArgument(c, t)

GenArgument(c, t)

GenSemClass(c)

if [n ⇠ �c,t] = 1 :

Draw synt/lex realization

At least one argument

Draw first argument

GenArgument(c, t)

ac,t ⇠ �c,t

c0c,t ⇠ �c,t

GenSemClass(c0c,t)

Draw argument key

Draw semantic class for arg

gave an order

Person Request

SpeakerACTIVE:LEFT:SBJ

The (Simplified) Model

37

GenSemClass(croot

)

croot

⇠ �root

for each sentence :Draw semantic class for root

while [n ⇠ �+c,t] = 1 :

for each role t = 1, . . . , T :

s ⇠ �c

GenArgument(c, t)

GenArgument(c, t)

GenSemClass(c)

if [n ⇠ �c,t] = 1 :

Draw synt/lex realization

At least one argument

Draw first argument

GenArgument(c, t)

ac,t ⇠ �c,t

c0c,t ⇠ �c,t

GenSemClass(c0c,t)

Draw argument key

Draw semantic class for arg

Recurse

gave Peter the Great an order

Person Request

SpeakerACTIVE:LEFT:SBJ

The (Simplified) Model

38

GenSemClass(croot

)

croot

⇠ �root

for each sentence :Draw semantic class for root

while [n ⇠ �+c,t] = 1 :

for each role t = 1, . . . , T :

s ⇠ �c

GenArgument(c, t)

GenArgument(c, t)

GenSemClass(c)

if [n ⇠ �c,t] = 1 :

Draw synt/lex realization

At least one argument

Draw first argument

GenArgument(c, t)

ac,t ⇠ �c,t

c0c,t ⇠ �c,t

GenSemClass(c0c,t)

Draw argument key

Draw semantic class for arg

Recurse Continue generation

gave Peter the Great an order

Person Request

SpeakerACTIVE:LEFT:SBJ

The (Simplified) Model

39

GenSemClass(croot

)

croot

⇠ �root

for each sentence :Draw semantic class for root

while [n ⇠ �+c,t] = 1 :

for each role t = 1, . . . , T :

s ⇠ �c

GenArgument(c, t)

GenArgument(c, t)

GenSemClass(c)

if [n ⇠ �c,t] = 1 :

Draw synt/lex realization

At least one argument

Draw first argument

GenArgument(c, t)

ac,t ⇠ �c,t

c0c,t ⇠ �c,t

GenSemClass(c0c,t)

Draw argument key

Draw semantic class for arg

Recurse Continue generation

Draw more arguments

gave Peter the Great buildan order

Person Request

Speaker Message

Constr. to

ACTIVE:LEFT:SBJ ACTIVE:RIGHT:OBJ

The (Simplified) Model

40

GenSemClass(croot

)

croot

⇠ �root

for each sentence :Draw semantic class for root

while [n ⇠ �+c,t] = 1 :

for each role t = 1, . . . , T :

s ⇠ �c

GenArgument(c, t)

GenArgument(c, t)

GenSemClass(c)

if [n ⇠ �c,t] = 1 :

Draw synt/lex realization

At least one argument

Draw first argument

GenArgument(c, t)

ac,t ⇠ �c,t

c0c,t ⇠ �c,t

GenSemClass(c0c,t)

Draw argument key

Draw semantic class for arg

Recurse Continue generation

Draw more arguments

gave Peter the Great build fortified castlean order

Person Request

Speaker Message

Created Entity

BuildingsProtectedConstr. to a

Type

ACTIVE:LEFT:SBJ ACTIVE:RIGHT:OBJ

ACTIVE:RIGHT:OBJ

-:LEFT:NMOD

Inducing Semantics

41

}  Given a (large) collection of sentences annotated with (transformed) syntactic dependencies

gave Peter the Great build wooden fortified castlean order to a

woreMary an evening dress from Cardin

{xi}ni=1

}  Given a (large) collection of sentences annotated with (transformed) syntactic dependencies

}  We want to induce its semantics , i.e. its segmentation and clustering

gave Peter the Great build wooden fortified castlean order

Person Request

Speaker Message

Created Entity

BuildingsBeing_ProtectedConstruction to

Material

Material

a

Type

woreMary an evening dress from Cardin

Wearer

Wearing

Clothing Creator

Style

GarmentPerson Occasion Brand

Inducing Semantics

42

{xi}ni=1

{mi}ni=1

gave Peter the Great build wooden fortified castlean order

1121 621

9 3

11

33332395 to

78

24

a

4

woreMary an evening dress from Cardin

12

573

5 21

7

8971121 445 1621

}  Given a (large) collection of sentences annotated with (transformed) syntactic dependencies

}  We want to induce its semantics , i.e. its segmentation and clustering

Inducing Semantics

43

{xi}ni=1

{mi}ni=1

Can be induced with a form of EM algorithm (though this is not quite what we have done)

Semantic classes (frames)

44

From a collection of biomedical texts:

Class Variations

1 motif, sequence, regulatory element, response element, element, dna sequence

2 donor, individual, subject

3 important, essential, critical

4 dose, concentration

5 activation, transcriptional activation, transactivation

6 b cell, t lymphocyte, thymocyte, b lymphocyte, t cell, t-cell line, human lymphocyte, t-lymphocyte

7 indicate, reveal, document, suggest, demonstrate

8 augment, abolish, inhibit, convert, cause, abrogate, modulate, block, decrease, reduce, diminish, suppress, up-regulate, impair, reverse, enhance

9 confirm, assess, examine, study, evaluate, test, resolve, determine, investigate

10 nf-kappab, nf-kappa b, nfkappab, nf-kb

“cause change position on a scale” frame

Blood cells

Today

}  Machine translation:

}  IBM 1 and HMM models

}  Unsupervised Induction of Semantic Representations

}  Topics for the exam

Overview

46

}  Noisy channel paradigm (and applications)

}  Language models:

}  handling rare words and noise

}  Sequence labeling problems:

}  POS-tagging, spell-checking applications and semantic role labeling (SRL)

}  Estimation algorithms

}  Decoding (labeling) methods: Viterbi (dynamic programming)

}  Parsing problems (trees)

}  Syntactic parsing and touched on semantics (probabilistic CCGs)

}  Modeling: PCFGs and grammar transforms (including DOP)

}  Decoding (parsing) algorithms: CKY (dynamic programming)

}  Sequence-to-sequence problems

}  Machine translation: word, phrase and syntactic models, …

Material for the Exam (March 24th)

47

}  CFG and PCFGs }  incl. tree probability; estimation from a treebank

}  recap smoothing

}  CNF form, binarization (CNF transform)

}  CKY algorithm for CFG and PCFG

}  Incl. understanding the data structure (the chart), understanding the unary closures both for the probabilistic case and not, time complexity of the algorithm

}  Make sure that given a grammar and a sentence, you can fill a chart

}  Ideas about weaknesses of PCFGs

}  Remember the PP-attachment and coordination examples

Material for the Exam (March 24th)

48

}  Structural annotation: vertical and horizontal Markovization }  Effects on grammar size, on parsing speed, accuracy, …

}  Lexicalization: how to get rules, basic understanding of estimation, time complexity

}  Computing the probability of a word sequence with PCFGs

}  "Sum product" – inside probabilities

Material for the Exam (March 24th)

49

}  DOP estimation (recall the basic corpus example)

}  HMMs and PCFGs as general tool:

}  Semantic role labeling problem: the basic sequence model

}  CCGs (only the application combinator, no composition and no type raising): an idea how CKY can be applied to the semantic parsing problem

Material for the Exam (March 24th)

50

}  Machine translation

}  Word models: IBM model1, HMM model (from today)

}  An idea about higher order IBM model (see slides from lecture 9 – modeling valency)

}  The basic phrase-based model: how do we obtain phrases? How do we define probabilistic model? (including from today)

}  Intuition about estimation (an idea of the EM algorithm but not formal derivation)

}  Decoding: what represents the state in the lattice?

}  An idea about synchronous CFGs

Material for the Exam (March 24th)

51

}  Not at the exam:

}  Semantics

}  Type-raising and composition for CCGs

}  unsupervised induction of SRL (i.e. today)

}  Cognitive aspects

}  History