Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get...

160
Context-Free Translation Models Adam Lopez University of Edinburgh Johns Hopkins University

Transcript of Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get...

Page 1: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Context-Free Translation Models

Adam LopezUniversity of Edinburgh → Johns Hopkins University

Page 2: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Finite-State Models

Page 3: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Finite-State Models

•Very simple models get us pretty far!

Page 4: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Finite-State Models

•Very simple models get us pretty far!

•There’s no data like more data.

Page 5: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Finite-State Models

•Very simple models get us pretty far!

•There’s no data like more data.

•Word-based models follow intuitions, but not all.

Page 6: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Finite-State Models

•Very simple models get us pretty far!

•There’s no data like more data.

•Word-based models follow intuitions, but not all.

•Phrase-based models are similar, but more effective.

Page 7: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Finite-State Models

•Very simple models get us pretty far!

•There’s no data like more data.

•Word-based models follow intuitions, but not all.

•Phrase-based models are similar, but more effective.

•All of these models are weighted regular languages.

Page 8: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Finite-State Models

•Very simple models get us pretty far!

•There’s no data like more data.

•Word-based models follow intuitions, but not all.

•Phrase-based models are similar, but more effective.

•All of these models are weighted regular languages.

•Need dynamic programming with approximations.

Page 9: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Finite-State Models

•Very simple models get us pretty far!

•There’s no data like more data.

•Word-based models follow intuitions, but not all.

•Phrase-based models are similar, but more effective.

•All of these models are weighted regular languages.

•Need dynamic programming with approximations.

•Is this the best we can do?

Page 10: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Overview

training data(parallel text) learner model

联合国 安全 理事会 的

五个 常任 理事 国都decoder

However , the sky remained clear under the strong north wind .

Page 11: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Two Problems

•Exact decoding requires exponential time.

•This is a consequence of arbitrary permutation.

•But in translation reordering is not arbitrary!

•Parameterization of reordering is weak.

•No generalization!

Page 12: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Garcia and associates .

Garcia y asociados .Carlos Garcia has three associates .

Carlos Garcia tiene tres asociados .his associates are not strong .

sus asociados no son fuertes .Garcia has a company also .

Garcia tambien tiene una empresa .its clients are angry .

sus clientes estan enfadados .the associates are also angry .

los asociados tambien estan enfadados .

la empresa tiene enemigos fuertes en Europa .

the company has strong enemies in Europe .the clients and the associates are enemies .

los clientes y los asociados son enemigos .the company has three groups .

la empresa tiene tres grupos .its groups are in Europe .

sus grupos estan en Europa .the modern groups sell strong pharmaceuticals .

los grupos modernos venden medicinas fuertes .the groups do not sell zanzanine .

los grupos no venden zanzanina .the small groups are not modern .

los grupos pequenos no son modernos .

Page 13: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Garcia and associates .

Garcia y asociados .Carlos Garcia has three associates .

Carlos Garcia tiene tres asociados .his associates are not strong .

sus asociados no son fuertes .Garcia has a company also .

Garcia tambien tiene una empresa .its clients are angry .

sus clientes estan enfadados .the associates are also angry .

los asociados tambien estan enfadados .

la empresa tiene enemigos fuertes en Europa .

the company has strong enemies in Europe .the clients and the associates are enemies .

los clientes y los asociados son enemigos .the company has three groups .

la empresa tiene tres grupos .its groups are in Europe .

sus grupos estan en Europa .the modern groups sell strong pharmaceuticals .

los grupos modernos venden medicinas fuertes .the groups do not sell zanzanine .

los grupos no venden zanzanina .the small groups are not modern .

los grupos pequenos no son modernos .

Page 14: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Garcia and associates .

Garcia y asociados .Carlos Garcia has three associates .

Carlos Garcia tiene tres asociados .his associates are not strong .

sus asociados no son fuertes .Garcia has a company also .

Garcia tambien tiene una empresa .its clients are angry .

sus clientes estan enfadados .the associates are also angry .

los asociados tambien estan enfadados .

la empresa tiene enemigos fuertes en Europa .

the company has strong enemies in Europe .the clients and the associates are enemies .

los clientes y los asociados son enemigos .the company has three groups .

la empresa tiene tres grupos .its groups are in Europe .

sus grupos estan en Europa .the modern groups sell strong pharmaceuticals .

los grupos modernos venden medicinas fuertes .the groups do not sell zanzanine .

los grupos no venden zanzanina .the small groups are not modern .

los grupos pequenos no son modernos .

Same pattern:NN JJ → JJ NN

Page 15: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Garcia and associates .

Garcia y asociados .Carlos Garcia has three associates .

Carlos Garcia tiene tres asociados .his associates are not strong .

sus asociados no son fuertes .Garcia has a company also .

Garcia tambien tiene una empresa .its clients are angry .

sus clientes estan enfadados .the associates are also angry .

los asociados tambien estan enfadados .

la empresa tiene enemigos fuertes en Europa .

the company has strong enemies in Europe .the clients and the associates are enemies .

los clientes y los asociados son enemigos .the company has three groups .

la empresa tiene tres grupos .its groups are in Europe .

sus grupos estan en Europa .the modern groups sell strong pharmaceuticals .

los grupos modernos venden medicinas fuertes .the groups do not sell zanzanine .

los grupos no venden zanzanina .the small groups are not modern .

los grupos pequenos no son modernos .

Same pattern:NN JJ → JJ NN

Finite-state models do not capture this generalization.

Page 16: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Context-Free Grammar

Page 17: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Context-Free Grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

Page 18: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Context-Free Grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

Page 19: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Context-Free Grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

Page 20: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Context-Free Grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

Page 21: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Context-Free Grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

Page 22: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Context-Free Grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

watashi wa

Page 23: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Context-Free Grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

watashi wa

Page 24: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Context-Free Grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

NP Vwatashi wa

Page 25: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Context-Free Grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

NP Vwatashi wa

Page 26: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Context-Free Grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

NP Vwatashi wa

hako wo

Page 27: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Context-Free Grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

NP Vwatashi wa

hako wo

Page 28: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Context-Free Grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

NP Vwatashi wa

hako wo akemasu

Page 29: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Context-Free Grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

NP Vwatashi wa

hako wo akemasu

Page 30: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Context-Free Grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

NP Vwatashi wa

hako wo akemasu

watashi wa hako wo akemasu

Page 31: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

Page 32: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S → NP VPNP → INP → the boxVP → V NP V → open

Page 33: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

Page 34: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

Page 35: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

S S

Page 36: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

S S

Page 37: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

S S

Page 38: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

S S

NP VP NP VP

Page 39: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

S S

NP VP NP VP

Page 40: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

S S

NP VP NP VP

Page 41: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

S S

NP VP NP VP

watashi wa I

Page 42: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

S S

NP VP NP VP

watashi wa I

Page 43: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

S S

NP VP NP VP

NP V V NPwatashi wa I

Page 44: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

S S

NP VP NP VP

NP V V NPwatashi wa I

Page 45: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

S S

NP VP NP VP

NP V V NPwatashi wa I

Page 46: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

S S

NP VP NP VP

NP V V NPwatashi wa I

hako wo the box

Page 47: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

S S

NP VP NP VP

NP V V NPwatashi wa I

hako wo the box

Page 48: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free Grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

S S

NP VP NP VP

NP V V NPwatashi wa I

akemasu openhako wo the box

Page 49: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free GrammarS S

NP VP NP VP

NP V V NPwatashi wa I

akemasu openhako wo the box

Page 50: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free GrammarS S

NP VP NP VP

NP V V NPwatashi wa I

akemasu openhako wo the box

watashi wa hako wo akemasu

Page 51: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Synchronous Context-Free GrammarS S

NP VP NP VP

NP V V NPwatashi wa I

akemasu openhako wo the box

watashi wa hako wo akemasu I open the box

Page 52: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Translation as Parsing

watashi wa hako wo akemasu

Page 53: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Translation as ParsingS

NP VP

NP Vwatashi wa

akemasuhako wo

watashi wa hako wo akemasu

Page 54: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Translation as ParsingS

NP VP

NP Vwatashi wa

akemasuhako wo

S

NP VP

V NPI

open the box

watashi wa hako wo akemasu

Page 55: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Translation as ParsingS

NP VP

NP Vwatashi wa

akemasuhako wo

S

NP VP

V NPI

open the box

watashi wa hako wo akemasu I open the box

Page 56: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Decoding

Page 57: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Decoding

•In general, there are an exponential number of possible parse trees for a sentence.

Page 58: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Decoding

•In general, there are an exponential number of possible parse trees for a sentence.

•Dynamic programming to the rescue!

Page 59: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

Page 60: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

ParsingNN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Page 61: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Page 62: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Page 63: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Page 64: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Page 65: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

PRP0,1 ! (w1 = I) " (PRP # I)Xi,i+1 ! (wi+1 = w) " (X # w)NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Page 66: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

PRP0,1 ! (w1 = I) " (PRP # I)Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Page 67: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Page 68: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

Page 69: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

Page 70: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

Page 71: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

Page 72: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

Xi,j ! Yi,k " Zk,j " (X # Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

Page 73: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

Xi,j ! Yi,k " Zk,j " (X # Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

Page 74: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

Xi,j ! Yi,k " Zk,j " (X # Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

Page 75: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

Xi,j ! Yi,k " Zk,j " (X # Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

Page 76: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

Xi,j ! Yi,k " Zk,j " (X # Y Z)NP2,4 ! PRP$2,3 "NN3,4 " (NP# PRP$ NN)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

Page 77: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

Xi,j ! Yi,k " Zk,j " (X # Y Z)NP2,4 ! PRP$2,3 "NN3,4 " (NP# PRP$ NN)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

NP2,4

Page 78: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

Xi,j ! Yi,k " Zk,j " (X # Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

NP2,4

Page 79: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

Xi,j ! Yi,k " Zk,j " (X # Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

NP2,4 SBAR2,4

Page 80: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

Xi,j ! Yi,k " Zk,j " (X # Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

NP2,4 SBAR2,4

VP1,4

Page 81: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

Xi,j ! Yi,k " Zk,j " (X # Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

NP2,4 SBAR2,4

VP1,4

Page 82: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

Xi,i+1 ! (wi+1 = w) " (X # w)

PRP0,1

Xi,j ! Yi,k " Zk,j " (X # Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

NP2,4 SBAR2,4

VP1,4

S0,4

Page 83: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

Page 84: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

Page 85: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

I saw her duck

NP

VP

PRP VBD PRP$ NN

S

Page 86: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

I saw her duck

SBAR

VP

PRP VBD PRP VB

S

Page 87: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

Analysis

Page 88: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Parsing

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

Analysis

nodesO(Nn2)

O(Gn3) edges

Page 89: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Language Models Again•Language models are finite-state (i.e. regular).

•Our translation model is context-free.

•We can again compute full model via intersection.

•Result is also context-free.

•Bad news for context-free language models and context-free translation models...

•Context-free languages not closed under intersection.

•Computation is in PSPACE!

Page 90: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Language Models Again

•Basic DP strategy: nodes include category, span, and left and right language model context.

•While polynomial, this still tends to be too slow to do exactly.

•Various forms of pruning are generally used.

•Finding efficient algorithms is currently an area of very active research.

Page 91: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Big Question

Page 92: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Big Question

Where do the categories come from?

Page 93: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Big Question

Where do the categories come from?Answer #1: there are no categories!

Page 94: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Big Question

Where do the categories come from?Answer #1: there are no categories!

X → X1 X2 / X1 X2

X → X1 X2 / X2 X1

X → watashi wa / I X → hako wo / the boxX → akemasu / open

Page 95: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Big Question

Where do the categories come from?Answer #1: there are no categories!

X → X1 X2 / X1 X2

X → X1 X2 / X2 X1

X → watashi wa / I X → hako wo / the boxX → akemasu / open

Keep order

Page 96: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Big Question

Where do the categories come from?Answer #1: there are no categories!

X → X1 X2 / X1 X2

X → X1 X2 / X2 X1

X → watashi wa / I X → hako wo / the boxX → akemasu / open

Swap orderKeep order

Page 97: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Big Question

Where do the categories come from?Answer #1: there are no categories!

X → X1 X2 / X1 X2

X → X1 X2 / X2 X1

X → watashi wa / I X → hako wo / the boxX → akemasu / open

Swap orderKeep order

Page 98: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Big Question

Where do the categories come from?Answer #1: there are no categories!

X → X1 X2 / X1 X2

X → X1 X2 / X2 X1

X → watashi wa / I X → hako wo / the boxX → akemasu / open

Swap orderKeep order

Translate wordsor phrases

Page 99: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Big Question

Where do the categories come from?Answer #1: there are no categories!

X

X X

X Xwatashi wa

akemasuhako wo

X

X X

X XI

open the box

Page 100: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Inversion Transduction Grammar

Page 101: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Inversion Transduction GrammarParsing is polynomial. We must be giving up something

in order to acheive polynomial complexity.

Page 102: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Inversion Transduction GrammarParsing is polynomial. We must be giving up something

in order to acheive polynomial complexity.

A B C D

B D A C

Page 103: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Inversion Transduction GrammarParsing is polynomial. We must be giving up something

in order to acheive polynomial complexity.

A B C D

B D A C

Page 104: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Inversion Transduction GrammarParsing is polynomial. We must be giving up something

in order to acheive polynomial complexity.

A B C D

B D A C

ITG cannot produce this kind of reordering.

Page 105: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Inversion Transduction GrammarParsing is polynomial. We must be giving up something

in order to acheive polynomial complexity.

A B C D

B D A C

ITG cannot produce this kind of reordering.Does this matter? Do such reorderings occur in real data?

Page 106: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Inversion Transduction Grammarje

pense que

, independamment

de notre parti

, nous

trouvonstous cela

inacceptable

that,Ibelieve,weall findunacceptable,regardlessofpoliticalparty

ITG cannot produce this kind of reordering.Does this matter? Do such reorderings occur in real data?

YES!

Page 107: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Inversion Transduction Grammarje

pense que

, independamment

de notre parti

, nous

trouvonstous cela

inacceptable

that,Ibelieve,weall findunacceptable,regardlessofpoliticalparty

ITG cannot produce this kind of reordering.Does this matter? Do such reorderings occur in real data?

YES! (but they’re very rare)

Page 108: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Hierarchical Phrase-Based Translation

X

X X

I

open

the box

X

X X

watashi wa akemasu

hako wo

X → X1 hako wo X2 / X1 open X2

X → hako wo / the boxX → akemasu / open

Page 109: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Big Question

Where do the categories come from?

Page 110: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Big Question

Where do the categories come from?Answer #2: from a parser.

Page 111: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Big Question

Where do the categories come from?Answer #2: from a parser.

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V1 NP2

V → akemasu / open

Page 112: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Syntax-based TranslationS S

NP VP NP VP

NP V V NPwatashi wa I

akemasu openhako wo the box

Are reorderings in real data consistent with isomorphisms on linguistic parse trees?

Page 113: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Syntax-based Translation

I saw her duck

SBAR

VP

PRP VBD PRP VB

S

Are reorderings in real data consistent with isomorphisms on linguistic parse trees?

yo la vi agacharse

Of course not.

Page 114: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Syntax-based Translation

I saw her duck

SBAR

VP

PRP VBD PRP VB

S

yo la vi agacharse

Page 115: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Syntax-based Translation

I saw her duck

SBAR

VP

PRP VBD PRP VB

S

yo la vi agacharse

Tree substitution grammar

Page 116: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Syntax-based Translation

I saw her duck

VP

PRP VBD VB

S

yo la vi agacharse

Tree substitution grammar

weakly equivalent SCFG

Page 117: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Syntax-based Translation

I saw her duck

VP

PRP VBD VB

S

yo la vi agacharse

Tree substitution grammar

S → PRP1 VP2 / PRP1 VP2

PRP → I / yo VP → VBD1 her VB2 / la VBD1 VB2

VBD → saw /vi VB → duck /agacharse

weakly equivalent SCFG

Page 118: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Syntax-based Translation

I saw her duck

VP

PRP VBD VB

S

yo la vi agacharse

Tree substitution grammar

S → PRP1 VP2 / PRP1 VP2

PRP → I / yo VP → VBD1 her VB2 / la VBD1 VB2

VBD → saw /vi VB → duck /agacharse

weakly equivalent SCFG

Problem: we need a parser!

Page 119: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Big Question

Where do the categories come from?

Page 120: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Big Question

Where do the categories come from?Answer #3: they are automatically induced!

Page 121: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Big Question

Where do the categories come from?Answer #3: they are automatically induced!

This is an area of active research.www.clsp.jhu.edu/workshops/ws10/groups/msgismt/

Page 122: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Another Big Question...

Where do the grammars come from?

Page 123: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Recap: Expectation Maximization

•Arbitrarily select a set of parameters (say, uniform).

•Calculate expected counts of the unseen events.

•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.

•Iterate.

•Guaranteed that likelihood is monotonically nondecreasing.

Page 124: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Can we apply it to other models?

•Sure, why not?

•The derivation structure of each model is simply a latent variable.

•We simply apply EM to each model structure.

Page 125: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Recap: Expectation Maximization

•Arbitrarily select a set of parameters (say, uniform).

•Calculate expected counts of the unseen events.

•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.

•Iterate.

•Guaranteed that likelihood is monotonically nondecreasing.

Page 126: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Recap: Expectation Maximization

•Arbitrarily select a set of parameters (say, uniform).

•Calculate expected counts of the unseen events.

•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.

•Iterate.

•Guaranteed that likelihood is monotonically nondecreasing.

Page 127: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Recap: Expectation Maximization

•Arbitrarily select a set of parameters (say, uniform).

•Calculate expected counts of the unseen events.

•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.

•Iterate.

•Guaranteed that likelihood is monotonically nondecreasing.

BAD:Objective function is highly non-convex

Page 128: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Recap: Expectation Maximization

•Arbitrarily select a set of parameters (say, uniform).

•Calculate expected counts of the unseen events.

•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.

•Iterate.

•Guaranteed that likelihood is monotonically nondecreasing.

Page 129: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Recap: Expectation Maximization

•Arbitrarily select a set of parameters (say, uniform).

•Calculate expected counts of the unseen events.

•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.

•Iterate.

•Guaranteed that likelihood is monotonically nondecreasing.

WORSE:Computing expectations from a phrase-based model, given a sentence pair, is NP-Complete(by reduction to SAT; DeNero & Klein, 2008)

Page 130: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Recap: Expectation Maximization

•Arbitrarily select a set of parameters (say, uniform).

•Calculate expected counts of the unseen events.

•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.

•Iterate.

•Guaranteed that likelihood is monotonically nondecreasing.

Computing expectations from an SCFGmodel, given a sentence pair, is at least O(n6)

Page 131: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Now What?

•Option #1: approximate expectations

•Restrict computation to some tractable subset of the alignment space (arbitrarily biased).

•Markov chain Monte Carlo (very slow).

Page 132: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Now What?•Option #2: change the problem definition

•We already know how to learn word-to-word translation models efficiently.

•Idea: learn word-to-word alignments, extract most probable alignment, then treat it as observed.

•Learn phrase translations consistent with word alignments.

•Decouples alignment from model learning -- is this a good thing?

Page 133: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

Page 134: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

watashi wa / I

Page 135: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

hako wo / the box

Page 136: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

hako wo akemasu / open the box

Page 137: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

Page 138: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Hierarchical Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

Page 139: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Hierarchical Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

Page 140: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Hierarchical Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

X1 akemasu / open X1

Page 141: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Syntactic Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

SNP VP

V NP

Page 142: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Syntactic Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

SNP VP

V NP

VP → hako wo akemasu/ open the box

Page 143: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Syntactic Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

SNP VP

V NP

VP → NP1 akemasu/open NP1

Page 144: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Summary

•Unsupervised learning over intractable models turns out to be a hard problem.

•Heuristic methods are widely used, but they offer no useful guarantees and are highly biased.

•Finding more elegant approximations is a topic of ongoing research.

Page 145: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Implementations

•Synchronous context-free translation models

•Moses -- www.statmt.org/moses

•cdec -- www.cdec-decoder.org

•Joshua -- www.cs.jhu.edu/~ccb/joshua

Page 146: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Datasets

•Proceedings of the European Parliament

•www.statmt.org/europarl

•Linguistic Data Consortium

•www.ldc.upenn.edu

Page 147: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Summary

•Many probabilistic translation models can be thought in terms of weighted (formal) languages.

•Dynamic programming is a common (though not universal!) decoding strategy.

•With these concepts in mind, you might be able to define models that capture other translation phenomena (e.g. morphosyntactics, semantics).

Page 148: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Recap

Page 149: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Tower of Babel

Pieter Brueghel the Elder (1563)

Page 150: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow
Page 151: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow
Page 152: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

2756 language pairs!

Page 153: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow
Page 154: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Statistical Machine Translation

Develop a statistical model of translation that can be learned from data and used to predict the correct

English translation of new Chinese sentences.

Page 155: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

linguistics

algorithms

machine learning

formal language

theory info

rmat

ion

theo

ry

Statistical Machine Translation

Page 156: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

linguistics

algorithms

machine learning

formal language

theory info

rmat

ion

theo

ry

Statistical Machine Translationregular &

context-free languages

Bayes’ rule,maximum likelihood,expectation

maximization

noisy channel model

syntax-basedmodels

dynamic programming,graphs & hypergraphs

Page 157: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

The Data Deluge

•We are overwhelmed with data, but we can harness it to solve real problems.

•Formal tools help us model the data.

•Probabilistic tools help us learn models and make predictions.

•Algorithmic optimization methods make it all run.

•Tradeoffs: model expressivity vs. tractability.

Page 158: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

We aren’t there yet!

Page 159: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

We aren’t there yet!

•We still need:

•Better models of translation

•Based on linguistic insights

•Better approximations

•Better algorithms

Page 160: Context-free translation models - mt-archive.info · Finite-State Models •Very simple models get us pretty far! •There’s no data like more data. •Word-based models follow

Fred Jelinek18 November 1932 14 September 2010

Research in both ASR and MT continues. The statistical

approach is clearly dominant. The knowledge of linguists is added wherever it fits. And

although we have made significant progress, we are

very far from solving the problems.