Extended Baum-Welch algorithm

Extended Baum-Welch algorithm

Present by shih-hung Liu 20060121

NTNU Speech Lab. 2

References

• A generalization of the Baum algorithm to rational objective function － [Gopalakrishnan et al.] IEEE ICASP 1989

• An inequality for rational function with applications to some statistical estimation problems [Gopalakrishnan et al.]

－ IEEE Transactions on Information Theory 1991

• HMMs, MMIE, and the Speech Recognition problem－ [Normandin 1991] PhD dissertation

• Function maximization － [Povey 2004] PhD thesis chapter 4.5

NTNU Speech Lab. 3

Outline

• Introduction• Extended Baum-Welch algorithm [Gopalakrishnan et al.]• EBW from discrete to continuous [Normandin]• EBW for discrete [Povey]• Example of function optimization [Gopalakrishnan et al.]• Conclusion

NTNU Speech Lab. 4

Introduction

• The well-known Baum-Eagon inequality provides an effective iterative scheme for finding a local maximum for homogeneous polynomials with positive coefficients over a domain of probability values

• However, we are interesting in maximizing a general rational function. We extend the Baum-Eagon inequality to rational function

NTNU Speech Lab. 5

Extended Baum-Welch algorithm (1/6)

• an arbitrary homogeneous polynomial with nonnegative coefficient of degree d in variables

Assuming that this polynomial is defined over a domain of probability values, they show how to construct a transformation for some such that following the property:

property A : for any and , unless

[Gopalakrishnan 1989]

})({)( ijXPXP

iij qjpiX ,...,1 ,,...,1 ,

iq

j ijij xxD1

1 ,0 :DUT : DU

Ux )(xTy )()( xPyP xy

NTNU Speech Lab. 6


• is a ratio of two polynomials in variables defined over a domain

we are looking for a growth transformation such that for any and , unless

• A reduction of the case of rational function to polynomial we reduce the problem of finding a growth transformation for a

rational function to of finding that for a specially formed polynomial• reduce to Non-homogeneous polynomial with nonnegative • Extend Baum-Eagon inequality to Non-homogeneous polynomial

with nonnegative


iq

j ijij xxD1

1 ,0 :

)(/)()( 21 XSXSXR 0)( ),( 21 XSXSiij qjpiXX ,...,1 ,,...,1 },{

DDT :Dx )(xTy )()( xRyR xy

NTNU Speech Lab. 7


• Step1:


)()( then ),()( ifsuch that )( polynomial a exists thereany for

xRyRDyxPyPXPDx xxx

)()( then 0)()( if thereforeand, 0)( that see easy to isit Indeed,

)()()()(set enough to isit for this 21

xRyRxPyPxP

XSXRXSXP

xx

x

x

follows as of n nsformatiogrowth tra a define could then we

)( unless any for unless)())((such that , of n nsformatiogrowth tra aconstruct

could we,),( polynomialeach for that suppose now

DT

yTyDyyPyTPDT

DxXP

x

xxxx

x

)()( yTyT y

NTNU Speech Lab. 8


• Step2:


1,0 be domain Let

...1,...1, ein variabl tscoefficien real withpolynomial a be })({)(Let :

1

iq

jijij

iij

ij

xxD

qjpiXXPXPLemma

constant a is any at valuethesuch that and tscoefficien enonnegativonly has )()()( polynomail thesuch that polynomial aexist there)(

DxC(x)XCXPXP

C(X)a

)(for nsnsformatiogrowth tra ofset the with coincide )(for of nsnsformatiogrowth tra ofset the)(

XPXPDb

NTNU Speech Lab. 9


• Step3: finding a growth transformation for a polynomial with nonnegative coefficients can be reduce to the same problem for a homogeneous polynomial with nonnegative coefficients


1 where...1,1...1, esin variabl})/({})({)( polynomial shomogeneou heconsider t

1

1,11,1

pilm

pijdplm

qqmplYYYPYYPYP

iij

q

jij qjpiyyD

i

...1 ,1...1 ,0 ,1:1

))(()( , any for such that and)1,1(),(for such that

}{ into }{ mapping , :bijection )),(()),((

ln

xfPxPDxpjiyx

yxDxxDDfDYPDYP

ijij

ij

1

NTNU Speech Lab. 10


• Baum-Eagon inequality:


i allfor 0)(

1

iq

j ij

ijij x

xPx

iq

j ij

ijij

ij

ijij

ij

xxP

x

xxP

xy

1

)(

)(

iq

j ij

ijij

ij

ijij

ijC

CxxP

x

CxxP

x

xT

1

)(

)(

))((

NTNU Speech Lab. 11

EBW for CDHMM – from discrete to continuous (1/3)

• Discrete case for emission probability update

codebook in the symbols ofnumber theis:

)( : ),(

)(),(

)(),()(for

such that 1

1

K

jkj

Ckbkj

CkbkjkbEBW

t

v

T

t

K

kjt

jtj

k

o

[ Normandin 1991 ]

NTNU Speech Lab. 12

kx

),|( jkxN

j jj

EBW for CDHMM – from discrete to continuous (2/3)[ Normandin 1991 ]

M subintervals Ik of width Mj /2

K

kjjk

jjkj

xN

xNkb

),|(

),|()(

1I 2I3I

NTNU Speech Lab. 13

EBW for CDHMM – from discrete to continuous (3/3)[ Normandin 1991 ]

2

1

222

1

01

2

1

01

2

0

1

1

1

1

01

1

01

0

),(

)(),(lim)(

),()(

)(),(lim))((lim

),(

),(

)(),(

)(),(lim

),()(

)(),(lim)(lim

jK

k

jjk

K

kK

kjkK

kj

jK

kjkjj

K

k

jk

K

kK

kK

kj

kjkK

kkK

kj

jK

kkjj

Ckj

Cxkjx

Ckjkb

Ckbkjxkb

Ckj

Cxkj

Ckbkj

xCkbxkjx

Ckjkb

Ckbkjxkb

K

kjt

jtj

Ckbkj

Ckbkjkb

1

)(),(

)(),()(

EBW

j

K

kkj

v

xkb

1

0)(lim

NTNU Speech Lab. 14

EBW for discrete HMMs (1/6)

• The Baum-Eagon inequality is formulated for the case where there are variables in a matrix containing rows with a sum-to-one constraint , and we are maximizing a sum of polynomial terms in with nonnegative coefficient

• For ML training, we can find an auxiliary function and optimize it

• Finding the maximum of the auxiliary function (e.g. using lagrangian multiplier) leads to the following update, which is a growth transformation for the polynomial:

[Povey 2004]

ijx X1 j ijx

ijx

NTNU Speech Lab. 15


• The Baum-Welch update is an update procedure for HMMs which uses this growth transformation together with an algorithm known as the forward-backward algorithm for finding the relevant differentials efficiently

[Povey 2004]

kXXik

ik

XXijij

ij

xFx

xFx

x

NTNU Speech Lab. 16


• An update rule as convenient and provable correct as the Baum-Welch update is not available for discriminative training of HMMs, which is a harder optimization problem

• The Extended Baum-Welch update equation as originally derived is applicable to rational function of parameters which are subject to sum-to-one constraints

• The MMI objective function for discrete-probability HMMs is an example of such a function

[Povey 2004]

)()|(log

OpwOpFMMI

NTNU Speech Lab. 17


Instead of maximizing for positive and ,we can instead maximize where and are the value of previous iteration ; increasing will cause to increase

this is because is a strong sense auxiliary function for around

2. If some terms in the resulting polynomial are negative, we can add to the expression a constant C times a further polynomial which is constrained to be a constant (e.g. ), so as to ensure that no product of terms in the final expression has a negative coefficient

[Povey 2004]

)()()(

xbxaxf )(xa )(xb

)()()( xkbxaxg )(/)( xbxak x)(xg )(xf

x)(xg )(xf

1.

j iji xC

two essential points used to derive the EBW update for MMI

NTNU Speech Lab. 18

EBW for discrete HMMs (5/6)[Povey 2004]

kXXik

XXijij

xF

xF

x

)log(

)log( ijijij

ij

ij xxF

xx

xF

1)log(

)log()log(

By applying these two ideas :

k ijXXik

ij

XXijij

xCx

F

xCx

F

x

)log(

)log(

NTNU Speech Lab. 19

EBW equivalent smooth function (6/6)

0

2)(2221),(

022

21),(

check can We

function objective into

2)()2log(

21),(

function smootha adding as regarded be can

4

222

2

2

2222

jjjjsm

jjsm

jjjj

sm

DDDDg

DDg

DDDDg

EBW

[Povey 2004]

NTNU Speech Lab. 20

Example

• consider 1 0,, ),,( 222

2

zyxzyxzyx

xzyxR

2. togo and 1by iindex iteration increment .5

),,(

),,(

),,(

4442 /),,(/),,(/),,(let

formula update using.4)()(),,(

tcoefficien nonegative with polynomial aobtain .3),,(2.

0iindex iteration 1 ,0,0,0such that ,, some fromstart .1

,

1

,

1

,

1

2

22222

000000000

iiiiiiiii zyx

i

zyx

i

zyx

i

iii

Dz

zyxPzz

Dy

zyxPyy

Dx

zyxPxx

kyzkxzkxyxzzyxPzyzyxPyxzyxPxD

zyxkzyxkxzyxP

zyxRk

zyxzyxzyx

C

NTNU Speech Lab. 21

Example

NTNU Speech Lab. 22

Conclusion

• Presented an algorithm for maximization of certain rational function define over domain of probability values

• This algorithm is very useful in practical situation for training HMMs parameters

NTNU Speech Lab. 23

MPE: Final Auxiliary Function

)|(log)|(log

)(),( qOpqOp

FH rr

MPE

qrMPE

rlat

W

),),((log)(),( mmrrqm

MPErq

m

et

stqrMPE toNtg

q

qrlat

W

weak-sense auxiliary function

strong-sense auxiliary function

smoothing function involved

)()()(|)log(|2

),),((log)(),(

11

mmmmmT

mmmm

m

mmrrqm

MPErq

m

et

stqrMPE

trD

toNtgq

qrlat

W

weak-sense auxiliary function

NTNU Speech Lab. 24

EBW derived from auxiliary function

m m

mmm

msm Dg 1)()2log(2

),( 2

22

)()()(|)log(|2

),( 11 mmmmmT

mmmm

m

sm trD

g

2

)(22

),(m

mmmsm

m

Dg

m m

mmm

m

mmrrqm

MPErq

m

et

stqrMPE

D

toNtgq

qrlat

1)()2log(2

),),((log)(),(

2

22

W

2

2))((

21),),(( m

mr to

mmmr etoN

NTNU Speech Lab. 25

EBW derived from auxiliary function

1)(

)2log(2

),),((log)(),( 2

22

m m

mmm

mmmr

rqm

MPErq

m

et

stqrMPE

DtoNtg

q

qrlat

W

2

2

22

))((221)(

))(()2log(21)(),(

m

mrrqm

MPErq

et

stqr

m

mrm

rqm

MPErq

m

et

stqr

mMPE

m

tot

totg

q

qrlat

q

qrlat

W

W

mrqm

MPErq

et

stqr

mmrrqm

MPErq

et

stqr

m

m

mmm

m

mrrqm

MPErq

et

stqr

m

mmm

m

mrrqm

MPErq

et

stqr

MPEm

Dt

Dtot

Dto

t

Dtot

g

q

qrlat

q

qrlat

q

qrlat

q

qrlat

)(

)()(

0)())((

)(

0)(22

))((221)(

0),(

22

22

W

W

W

W

m m

mmmsm Dg 2

)(22

),(

Extended Baum-Welch algorithm

Documents

Transcript of Extended Baum-Welch algorithm