Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell...

73
Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Transcript of Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell...

Page 1: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Dual Decomposition Inference for Graphical Models over Strings

Nanyun (Violet) PengRyan Cotterell Jason Eisner

Johns Hopkins University

1

Page 2: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Attention!

• Don’t care about phonology?

• Listen anyway. This is a general method for

inferring strings from other strings (if you have a probability model).

• So if you haven’t yet observed all the words of your noisy or complex language, try it!

2

Page 3: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

A Phonological ExerciseTenses

Verb

s

3

[tɔk] [tɔks] [tɔkt]TALKTHANKHACK

1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.

[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]

CRACKSLAP

[kɹæks] [kɹækt][slæp] [slæpt]

Page 4: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Matrix Completion: Collaborative Filtering

Movies

Use

rs

-37 29 19 29-36 67 77 22-24 61 74 12

-79 -41-52 -39

Page 5: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Matrix Completion: Collaborative Filtering

29 19

Movies

Use

rs

2967 77 2261 74 12

-79 -41-39

-6 -3 2

[ 4 1 -5][ 7 -2 0][ 6 -2 3][-9 1 4][ 3 8 -5]

5

[

[

9 -2 1

[

[

9 -7 2

[

[

4 3 -2

[

[

-37-36-24

-52

Page 6: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Matrix Completion: Collaborative Filtering

6

Prediction!

59 -806 46

-37 29 19 29-36 67 77 22-24 61 74 12

-79 -41-52 -39

-6 -3 2

[

[

9 -2 1

[

[

9 -7 2

[

[

[

[

[ 4 1 -5][ 7 -2 0][ 6 -2 3][-9 1 4][ 3 8 -5]

Movies

Use

rs

4 3 -2[

Page 7: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Matrix Completion: Collaborative Filtering

[1,-4,3] [-5,2,1]

-10

-11

Dot Product

Gaussian Noise

7

Page 8: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

A Phonological Exercise

[tɔk] [tɔks] [tɔkt]TALKTHANKHACK

1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.

Tenses

Verb

s

[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]

CRACKSLAP

[kɹæks] [kɹækt][slæp] [slæpt]

8

Page 9: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

A Phonological Exercise

[tɔk] [tɔks] [tɔkt]TALKTHANKHACK

1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.

Suffixes

Stem

s

[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]

CRACKSLAP

[kɹæks] [kɹækt][slæp] [slæpt]

/Ø/ /s/ /t/ /t/

/tɔk//θeɪŋk//hæk/

/slæp//kɹæk/

9

Page 10: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

A Phonological Exercise

[tɔk] [tɔks] [tɔkt]TALKTHANKHACK

1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.

[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]

CRACKSLAP

[kɹæks] [kɹækt][slæp] [slæpt]

/Ø/ /s/ /t/ /t/

/tɔk//θeɪŋk//hæk/

/slæp//kɹæk/

10

Suffixes

Stem

s

Page 11: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

A Phonological Exercise

[tɔk] [tɔks] [tɔkt]TALK

HACK

1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.

[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]

CRACKSLAP

[kɹæk] [kɹæks] [kɹækt] [kɹækt][slæp] [slæps] [slæpt] [slæpt]

/Ø/ /s/ /t/ /t/

/tɔk//θeɪŋk//hæk/

/slæp//kɹæk/

Prediction!11

THANK

Suffixes

Stem

s

Page 12: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

A Model of Phonology

tɔk s

tɔks

Concatenate

“talks”12

Page 13: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

A Phonological Exercise

[tɔk] [tɔks] [tɔkt]TALKTHANKHACK

1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.

[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]

CRACKSLAPCODEBAT

[kɹæks] [kɹækt][slæp] [slæpt]

[koʊdz] [koʊdɪt][bæt] [bætɪt]

/Ø/ /s/ /t/ /t/

/tɔk//θeɪŋk//hæk/

/bæt//koʊd//slæp//kɹæk/

13

Suffixes

Stem

s

Page 14: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

A Phonological Exercise

[tɔk] [tɔks] [tɔkt]TALK

HACK

1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.

[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]

CRACKSLAPCODEBAT

[kɹæks] [kɹækt][slæp] [slæpt]

[koʊdz] [koʊdɪt][bæt] [bætɪt]

/Ø/ /s/ /t/ /t/

/tɔk//θeɪŋk//hæk/

/bæt//koʊd//slæp//kɹæk/

z instead of s ɪt instead of t14

THANK

Suffixes

Stem

s

Page 15: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

A Phonological Exercise

[tɔk] [tɔks] [tɔkt]TALK

HACK

1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.

[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]

CRACKSLAPCODEBATEAT

[kɹæks] [kɹækt][slæp] [slæpt]

[koʊdz] [koʊdɪt][bæt] [bætɪt][it] [eɪt] [itən]

/Ø/ /s/ /t/ /t/

/tɔk//θeɪŋk//hæk/

/it//bæt//koʊd//slæp//kɹæk/

eɪt instead of itɪt 15

THANK

Suffixes

Stem

s

Page 16: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

A Model of Phonology

koʊd s

koʊd#s

koʊdz

Concatenate

Phonology (stochastic)

“codes”

16

Modeling word forms using latent underlying morphs and phonology.Cotterell et. al. TACL 2015

Page 17: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

A Model of Phonology

rizaign ation

rizaign#ation

rεzɪgneɪʃn

“resignation”

Concatenate

17

Phonology (stochastic)

Page 18: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

dæmneɪʃənzrizaign

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

rεzɪgn#eɪʃən rizajn#z dæmn#zdæmn#eɪʃən

Fragment of Our Graph for English

18

1) Morphemes

2) Underlying words

3) Surface words

Concatenation

Phonology

“resignation” “resigns”

“damnation” “damns”

3rd-personsingular suffix:very common!

Page 19: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Limited to concatenation? No, could extend to templatic morphology …

19

Page 20: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Outline

20

● A motivating example: phonology● General framework:

o graphical models over stringso Inference on graphical models over strings

● Dual decomposition inferenceo The general ideao Substring features and active set

● Experiments and results

Page 21: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Graphical Models over Strings?

● Joint distribution over many strings

● Variables● Range over * Σ infinite set of all strings

● Relations among variables● Usually specified by (multi-tape) FSTs

21

A probabilistic approach to language change (Bouchard-Côté et. al. NIPS 2008)

Graphical models over multiple strings. (Dreyer and Eisner. EMNLP 2009)

Large-scale cognate recovery (Hall and Klein. EMNLP 2011)

Page 22: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Graphical Models over Strings?

● Strings are the basic units in natural languages.● Use

o Orthographic (spelling)o Phonological (pronunciation)o Latent (intermediate steps not observed directly)

● Sizeo Morphemes (meaningful subword units)o Wordso Multi-word phrases, including “named entities”o URLs

22

Page 23: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

What relationships could you model?

● spelling pronunciation

● word noisy word (e.g., with a typo)

● word related word in another language

(loanwords, language evolution, cognates)

● singular plural (for example)

● root word

● underlying form surface form

23

Page 24: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Factor Graph for phonology

25

zrizajgn eɪʃən dæmn

rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

1) Morpheme URs

2) Word URs

3) Word SRs

Concatenation (e.g.)

Phonology (PFST)

log-probabilityLet’s maximize it!

zrizajgn eɪʃən dæmn

rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

1) Morpheme URs

2) Word URs

3) Word SRs

Concatenation (e.g.)

Phonology (PFST)

Page 25: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Contextual Stochastic Edit Process

26

Stochastic contextual edit distance and probabilistic FSTs. (Cotterell et. al. ACL 2014)

Page 26: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

?

?

riz’ajnz

?

r,εzɪgn’eɪʃn

?

?

riz’ajnd

??

Inference on a Factor Graph

28

1) Morpheme URs

2) Word URs

3) Word SRs

Page 27: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

foo

?

riz’ajnz

?

r,εzɪgn’eɪʃn

s

?

riz’ajnd

dabar

Inference on a Factor Graph

29

1) Morpheme URs

2) Word URs

3) Word SRs

Page 28: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Inference on a Factor Graph

30

foo

bar#s

riz’ajnz

bar#foo

r,εzɪgn’eɪʃn

s

bar#da

riz’ajnd

dabar1) Morpheme URs

2) Word URs

3) Word SRs

Page 29: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Inference on a Factor Graph

8e-3 0.01 0.05 0.02

31

foo

bar#s

riz’ajnz

bar#foo

r,εzɪgn’eɪʃn

s

bar#da

riz’ajnd

dabar1) Morpheme URs

2) Word URs

3) Word SRs

Page 30: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Inference on a Factor Graph

8e-3 0.01 0.05 0.02

32

foo

bar#s

riz’ajnz

bar#foo

r,εzɪgn’eɪʃn

s

bar#da

riz’ajnd

dabar1) Morpheme URs

2) Word URs

3) Word SRs

6e-12002e-1300 7e-1100

Page 31: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Inference on a Factor Graph

8e-3 0.01 0.05 0.02

33

foo

bar#s

riz’ajnz

bar#foo

r,εzɪgn’eɪʃn

s

bar#da

riz’ajnd

dabar1) Morpheme URs

2) Word URs

3) Word SRs

6e-12002e-1300 7e-1100

Page 32: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Inference on a Factor Graph

34

foo

far#s

riz’ajnz

far#foo

r,εzɪgn’eɪʃn

s

far#da

riz’ajnd

dafar1) Morpheme URs

2) Word URs

3) Word SRs

?

Page 33: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Inference on a Factor Graph

35

foo

size#s

riz’ajnz

size#foo

r,εzɪgn’eɪʃn

s

size#da

riz’ajnd

dasize1) Morpheme URs

2) Word URs

3) Word SRs

?

Page 34: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Inference on a Factor Graph

36

foo

…#s

riz’ajnz

…#foo

r,εzɪgn’eɪʃn

s

…#da

riz’ajnd

da…1) Morpheme URs

2) Word URs

3) Word SRs

?

Page 35: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Inference on a Factor Graph

37

foo

rizajn#s

riz’ajnz

rizajn#foo

r,εzɪgn’eɪʃn

s

rizajn#da

riz’ajnd

darizajn1) Morpheme URs

2) Word URs

3) Word SRs

Page 36: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Inference on a Factor Graph

38

foo

rizajn#s

riz’ajnz

rizajn#foo

r,εzɪgn’eɪʃn

s

rizajn#da

riz’ajnd

darizajn1) Morpheme URs

2) Word URs

3) Word SRs

0.012e-5 0.008

Page 37: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Inference on a Factor Graph

39

eɪʃn

rizajn#s

riz’ajnz

rizajn#eɪʃn

r,εzɪgn’eɪʃn

s

rizajn#d

riz’ajnd

drizajn1) Morpheme URs

2) Word URs

3) Word SRs

0.010.001 0.015

Page 38: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Inference on a Factor Graph

40

eɪʃn

rizajgn#s

riz’ajnz

rizajgn#eɪʃn

r,εzɪgn’eɪʃn

s

rizajgn#d

riz’ajnd

drizajgn1) Morpheme URs

2) Word URs

3) Word SRs

0.0080.008 0.013

Page 39: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

eɪʃn

rizajgn#s

riz’ajnz

rizajgn#eɪʃn

r,εzɪgn’eɪʃn

s

rizajgn#d

riz’ajnd

drizajgn

0.0080.008 0.013

Inference on a Factor Graph

41

Page 40: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Challenges in Inference

42

• Global discrete optimization problem.

• Variables range over a infinite set … cannot be solved by ILP or even brute force. Undecidable!

• Our previous papers used approximate algorithms: Loopy Belief Propagation, or Expectation Propagation.

Q: Can we do exact inference? A: If we can live with 1-best and not marginal inference, then we can use Dual Decomposition … which is exact.

(if it terminates! the problem is undecidable in general …)

Page 41: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Outline

43

● A motivating example: phonology● General framework:

o graphical models over stringso Inference on graphical models over strings

● Dual decomposition inferenceo The general ideao Substring features and active set

● Experiments and results

Page 42: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Graphical Model for Phonology

44

Jointly decide the values of the inter-dependent latent variables, which range over a infinite set.

1) Morpheme URs

2) Word URs

3) Word SRs

Concatenation (e.g.)

Phonology (PFST)

zrizajgn eɪʃən dæmn

rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

rεzign eɪʃən

Page 43: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

General Idea of Dual Decomp

45

zrizajgn eɪʃən dæmn

rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

rεzign eɪʃən

Page 44: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

General Idea of Dual Decomp

zrizajneɪʃən dæmn eɪʃən zdæmnrεzɪgn

rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

Subproblem 1 Subproblem 2 Subproblem 3 Subproblem 4

46

Page 45: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

I preferrεzɪgn

I preferrizajn

General Idea of Dual Decomp

zrizajneɪʃən dæmn eɪʃən zdæmnrεzɪgn

rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

Subproblem 1 Subproblem 2 Subproblem 3 Subproblem 4

47

Page 46: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Outline

48

● A motivating example: phonology● General framework:

o graphical models over stringso Inference on graphical models over strings

● Dual decomposition inferenceo The general ideao Substring features and active set

● Experiments and results

Page 47: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

zrizajneɪʃən dæmn eɪʃən zdæmnrεzɪgn

rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

Subproblem 1 Subproblem 2 Subproblem 3 Subproblem 4

49

Page 48: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Substring Features and Active Set

zrizajneɪʃən dæmn eɪʃən zdæmnrεzɪgn

rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

Subproblem 1 Subproblem 1 Subproblem 1 Subproblem 1

50

I preferrεzɪgn

Less ε, ɪ, g; more i, a, j(to match others)

I preferrizajn

Less i, a, j;more ε, ɪ, g(to match others)

Page 49: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Features: “Active set” method

• How many features?

• Infinitely many possible n-grams!

• Trick: Gradually increase feature set as needed.– Like Paul & Eisner (2012), Cotterell & Eisner (2015)

1. Only add features on which strings disagree.2. Only add abcd once abc and bcd already agree.

– Exception: Add unigrams and bigrams for free.

51

Page 50: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Fragment of Our Graph for Catalan

52

?

?

grizos

?

gris

?

?

grize

??

grizes

?

?

Stem of “grey”

Separate these 4 words into 4 subproblems as before …

Page 51: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

53

? ?

grizos

?

gris

?

?

grize

??

??

grizes

Redraw the graph to focus on the stem …

Page 52: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

54

? ?

grizos

?

gris

?

grize

??

grizes

??

???

Separate into 4 subproblems – each gets its own copy of the stem

Page 53: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

55

? ?

grizos

?

gris

ε

?

grize

??

grizes

??

εε

ε

nonzero features:{ }

Iteration: 1

Page 54: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

56

? ?

grizos

?

gris

g

?

grize

??

grizes

??

gg

g

nonzero features: { }

Iteration: 3

Page 55: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

57

? ?

grizos

?

gris

gris

?

grize

??

grizes

??

grizgriz

griz

nonzero features: {s, z, is, iz, s$, z$ }

Iteration: 4

Feature weights (dual variable)

Page 56: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

58

? ?

grizos

?

gris

gris

?

grize

??

grizes

??

grizgrizo

griz

nonzero features: {s, z, is, iz, s$, z$,o, zo, o$ }

Iteration: 5

Feature weights (dual variable)

Page 57: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

59

? ?

grizos

?

gris

gris

?

grize

??

grizes

??

grizgrizo

griz

nonzero features: {s, z, is, iz, s$, z$,o, zo, o$ }

Iteration: 6

Iteration: 13

Feature weights (dual variable)

Page 58: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

60

? ?

grizos

?

gris

griz

?

grize

??

grizes

??

grizgrizo

griz

nonzero features: {s, z, is, iz, s$, z$,o, zo, o$ }

Iteration: 14

Feature weights (dual variable)

Page 59: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

61

? ?

grizos

?

gris

griz

?

grize

??

grizes

??

grizgriz

griz

nonzero features: {s, z, is, iz, s$, z$,o, zo, o$ }

Iteration: 17

Feature weights (dual variable)

Page 60: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

62

? ?

grizos

?

gris

griz

?

grize

??

grizes

??

grizegriz

griz

nonzero features: {s, z, is, iz, s$, z$, o, zo, o$, e, ze, e$}

Iteration: 18

Feature weights (dual variable)

Page 61: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

63

? ?

grizos

?

gris

griz

?

grize

??

grizes

??

grizegriz

griz

nonzero features: {s, z, is, iz, s$, z$, o, zo, o$, e, ze, e$}

Iteration: 19

Iteration: 29

Feature weights (dual variable)

Page 62: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

64

? ?

grizos

?

gris

griz

?

grize

??

grizes

??

grizgriz

griz

nonzero features: {s, z, is, iz, s$, z$, o, zo, o$, e, ze, e$}

Iteration: 30

Feature weights (dual variable)

Page 63: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

65

? ?

grizos

?

gris

griz

?

grize

??

grizes

??

grizgriz

griz

nonzero features: {s, z, is, iz, s$, z$, o, zo, o$, e, ze, e$}

Iteration: 30

Converged!

Page 64: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

I’ll try to arrange forr not i at position 2, i not z at position 3,z not at position 4.

Why n-gram features?

66

• Positional features don’t understand insertion:

• In contrast, our “z” feature counts the number of “z” phonemes, without regard to position.

These solutions already agree on “g”, “i”, “z” counts … they’re only negotiating over the “r” count.

gizgriz

gizgriz

I need more r’s.

Page 65: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Why n-gram features?

67

• Adjust weights λ until the “r” counts match:

• Next iteration agrees on all our unigram features:

– Oops! Features matched only counts, not positions – But bigram counts are still wrong …

so bigram features get activated to save the day

– If that’s not enough, add even longer substrings …

gizgriz I need more r’s … somewhere.

girzgriz I need more gr, ri, iz,less gi, ir, rz.

Page 66: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Outline

68

● A motivating example: phonology● General framework:

o graphical models over stringso Inference on graphical models over strings

● Dual decomposition inferenceo The general ideao Substring features and active set

● Experiments and results

Page 67: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

7 Inference Problems (graphs)

EXERCISE (small)

o 4 languages: Catalan, English, Maori, Tangale

o 16 to 55 underlying morphemes.

o 55 to 106 surface words.

CELEX (large)

o 3 languages: English, German, Dutch

o 341 to 381 underlying morphemes.

o 1000 surface words for each language.

69

# vars (unknown strings)

# subproblems

Page 68: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Experimental Questions

o Is exact inference by DD practical?o Does it converge? o Does it get better results than approximate

inference methods?

o Does exact inference help EM?

71

Page 69: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

● DD seeks best λ via subgradient algorithm reduce dual objective tighten upper bound on primal objective

● If λ gets all sub-problems to agree (x1 = … = xK) constraints satisfied dual value is also value of a primal solution which must be max primal! (and min dual)

72

primal (function of strings x)

dual(function of weights λ)

Page 70: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Convergence behavior (full graph)

Catalan Maori

English Tangale73

Dual (tighten upper bound)

primal(improve strings)

optimal!

Page 71: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Comparisons

● Compare DD with two types of Belief Propagation (BP) inference.

Approximate MAP inference(max-product BP)

(baseline)

Approximate marginal inference(sum-product BP)

(TACL 2015)

Exact MAP inference(dual decomposition)

(this paper)

74

Exact marginal inference(we don’t know how!)

variationalapproximation

Viterbiapproximation

Page 72: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Inference accuracy

75

Approximate MAP inference(max-product BP)

(baseline)

Approximate marginal inference(sum-product BP)

(TACL 2015)

Exact MAP inference(dual decomposition)

(this paper)

Model 1, EXERCISE: 90% Model 1, CELEX: 84% Model 2S, CELEX: 99%Model 2E, EXERCISE: 91%

Model 1, EXERCISE: 95% Model 1, CELEX: 86% Model 2S, CELEX: 96%Model 2E, EXERCISE: 95%

Model 1, EXERCISE: 97% Model 1, CELEX: 90% Model 2S, CELEX: 99%Model 2E, EXERCISE: 98%

Model 1 – trivial phonologyModel 2S – oracle phonologyModel 2E – learned phonology (inference used within EM)

impro

ves improvesmore!

worse

Page 73: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.

Conclusion

•A general DD algorithm for MAP inference on graphical models over strings.

•On the phonology problem, terminates in practice, guaranteeing the exact MAP solution.

•Improved inference for supervised model; improved EM training for unsupervised model.

•Try it for your own problems generalizing to new strings!

76