1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University...

39
1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Departm ent

Transcript of 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University...

Page 1: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

1

The Hidden Vector State Language Model

Vidura Senevitratne, Steve Young

Cambridge University Engineering Department

Page 2: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

2

Reference

• Young, S. J., “The Hidden Vector State language model”, Tech. Report CUED/F-INFENG/TR.467, Cambridge University Engineering Department, 2003.

• He, Y. and Young S.J., “Hidden Vector State Model for hierarchical semantic parsing”, In Proc. of the ICASSP, Hong Kong, 2003.

• Fine, S., Singer Y., and Tishby N., “The Hierarchical Hidden Markov Model: Analysis and applications”, Machine Learning 32(1): 41-62, 1998.

Page 3: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

3

Outline

• Introduction

• HVS Model

• Experiments

• Conclusion

Page 4: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

4

Introduction

• Language model:

• Issue of data sparseness, inability to capture long distance dependencies and model the nested structural information

• Class-based language model– POS tag information

• Structured language model– Syntactic information

11 1 2 1

1 11 1 1 1

( ) ( ) ( | )

( | ( )) ( | )

M M kk k

m i m ii i i i i n

P W P w P w W

P w W P w W

Page 5: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

5

Hierarchical Hidden Markov Model

• HHMM is structured multi-level stochastic process.– Each state is an HHMM– Internal state: hidden state that do not emit

observable symbols directly– Production state: leaf state

• States of HMM are production states of HHMM.

Page 6: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

6

HHMM (cont.)

• Parameters of HHMM:

1,..., 1,..., 1 1,..., 1, ,

d d d Dq q q q

d D d D d DA B

*

1 2

: finite alphabet

: the set of all possible string over

: observation sequence ...

: state of an HHMM,

is the state index and is the hierarchy index

T

di

O o o o

q

i d

Page 7: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

7

HHMM (cont.)

• Transition probability: horizontal

• Initial probability: vertical

• Observation probability:

1 1, ( | )

d dq q d di j j iA a P q q

1 1{ ( )} { ( | )}d dq q d d d

i iq P q q

{ ( )} ( | )D Dq q D

kB b k P q

Page 8: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

8

HHMM (cont.)

• Current node is root:– Choose child according to initial probability

• Child is production state:– Produce an observation– Transit within the same level– When it reaches end-state, back to parent of end-state

• Child is internal state:– Choose child– Wait until control is back from children– Transit within the same level– When it reaches end-state, back to parent of end-state

Page 9: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

9

HHMM (cont.)

Page 10: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

10

HHMM (cont.)

• Other application: trend of stocks (IDEAL 2004)

Page 11: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

11

Hidden Vector State Model

Page 12: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

12

Hidden Vector State Model (cont.)

The semantic information relating to any single word can be stored as a vector of semantic tag names

Page 13: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

13

Hidden Vector State Model (cont.)

• If state transitions were unconstrained– Fully HHMM

• Transitions between states can be factored into a stack shift: two stage, pop, push

• Stack size is limited, # of new concept to be pushed is limited to one– More efficient

Page 14: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

14

Hidden Vector State Model (cont.)

• The joint probability is defined:( , , | )

: sequence of stack pop operations N

: word sequence

: concept vector sequence

P W C N

N

W

C

1 11

1

1 11 1 1

( , , ) ( | , )

( [1] | , ) ( | , )

Tt t

t it

t t tt t t

P W C N P n W C

P c W n P w W C

Page 15: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

15

Hidden Vector State Model (cont.)

• Approximation (assumption):

• So,

11

( , , ) ( | )

( [1] | [2... ]) ( | )

T

t tt

t t t t t

P W C N P n c

P c c D P w c

1 11 1

11

11 1

( | , ) ( | )

( [1] | , ) ( [1] | [2... ])

( | , ) ( | )

t tt i t t

tt t t t t

t tt t t

P n W C P n c

P c W n P c c D

P w W C P w c

Page 16: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

16

Hidden Vector State Model (cont.)

• Generative process associated with this constrained version of HVS models consists of three step for each position t:

1. choose a value for nt

2. Select preterminal concept tag ct[1]

3. Select a word wt

Page 17: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

17

Hidden Vector State Model (cont.)

• It is reasonable to ask an application designer to provide examples of utterances which would yield each type of semantic schema.

• It is not reasonable to require utterances with manually transcribed parse trees.

• Assume abstract semantic annotations and availability of a set of domain specific lexical classes.

Page 18: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

18

Hidden Vector State Model (cont.)

Abstract semantic annotations: • show me flights arriving in X at T.• List flights arriving around T in X.• Which flight reaches X before T.

= FLIGHT(TOLOC(CITY(X),TIME_RELATIVE(TIME(T))))

Class set:

CITY: Boston, New York, Denver…

Page 19: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

19

Experiments

Experimental Setup

Training set: ATIS-2, ATIS-3

Test set: ATIS-3 NOV93, DEC94

Baseline: FST (Finite Semantic Tagger)

GT for FST, Witten-Bell for HVS

Show me flights from Boston to New York

Goal: FLIGHT

Slots: FROMLOC.CITY = Boston

TOLOC.CITY = New York

Page 20: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

20

Experiments

F-measure: 2 /( )F PR P R

Page 21: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

21

Experiments

Dash line: goal detection accuracy, Solid line: F-measure

Page 22: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

22

Conclusion

• The key features of HVS model – Its ability for representing hierarchical information in a

constrained way – Its capability for training directly from target semantics

without explicit word-level annotation.

Page 23: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

23

HVS Language Model

• The basic HVS model is a regular HMM in which each state encodes history in a fixed dimension stack-like structure.

• Each state consists of a stack where each element of the stack is a label chosen from a finite set of cardinality M+1: C={c1,…,cM,c#}

• A D depth HVS model state can be characterized by a vector of dimension D with most recently pushed element at index 1 and the oldest at index D

Page 24: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

24

HVS Language Model (cont.)

Page 25: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

25

HVS Language Model (cont.)

• Each HVS model state transition is restricted:

(i) exactly nt class label are popped off the stack

(ii) exactly one new class label ct is pushed into the stack

• The number of elements to pop nt and the choice of new class label to push ct are determined:

Page 26: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

26

HVS Language Model (cont.)

1 1 1

11 1

ji

( | ) ( | ) ( | )

: intermediate state, s with n clas labels popped off

[ , ( )]

: stack access function such that ( )

denotes the sequence of class label from i to j

of

nt t t t t t

t

D nt t t

P s s P n s P c s

s

s c s

s

D1a stack of depth D and ( )s s

Page 27: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

27

HVS Language Model (cont.)

• nt is conditioned on all the class labels that are in the stack at t-1 but ct is conditioned only on the class labels that remain on the stack after the pop operation

• Former distribution can encode embedding, whereas the latter focuses on modeling long-range dependencies.

1 1 1( | ) ( | ) ( | )nt t t t t tP s s P n s P c s

Page 28: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

28

HVS Language Model (cont.)

• Joint probability:

• Assumption:

1 11 1

1

1 1 11 1 1 1

( , ) ( | , )

( | , , ) ( | , )

Tt t

tt

t t t tt t t

P W S P n W S

P c W S n P w W S

11

11 1

( , ) ( | )

( | ( )) ( | )

T

t tt

D nt t t t

P W S P n s

P c s P w s

Page 29: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

29

HVS Language Model (cont.)

• Training: EM algorithm– C,N: latent data, W: observed data

• E-step:

C,N|W,

,

( ) log ( , , | ) log[ ( | ) ( , | , )]

log ( | ) log ( , | , )

log ( | ) log ( , , | ) log ( , | , )

take expection over C,N (E[ ] )

log ( | ) [ ( , | , ) log( , , | )]

[ ( , | , ) log( , | ,

C N

L P W C N P W P C N W

P W P C N W

P W P W C N P C N W

P W P C N W W C N

P C N W C N W

,

)]C N

Page 30: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

30

HVS Language Model (cont.)

• M-Step:– Q function (auxiliary):

– Substituting P(W,C,N|λ)

,

,

ˆ[ ( , | , ) log( , , | )]

( , , | ) ˆ[ log( , , | )]( | )

C N

C N

P C N W W C N

P W C NW C N

P W

11

11 1

( , ) ( , , ) ( | )

( | ( )) ( | )

T

t tt

D nt t t t

P W S P C N W P n s

P c s P w s

Page 31: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

31

HVS Language Model (cont.)

• Calculate probability distributions separately.

1

1

11

1

( , | , )ˆ ( | )( | , )

( [ , ( )] | , )ˆ ( | )( | , )

( , | , )ˆ ( | )( | , )

t tt

tt

D ntn t

n ntt

t tt

tt

P n n s s WP n s

P s s W

P s c s WP c s

P s s W

P s s w w WP w s

P s s W

Page 32: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

32

HVS Language Model (cont.)

• State space S, if fully populated:– |S|=MD states, for M=100+, D=3 to 4

• Due to data sparseness, backoff is needed.

1 11 1 1( | ( )) ( ( )) ( | ( ))D d d

BO BOP w s B s P w s

Page 33: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

33

HVS Language Model (cont.)

• Backoff weight:

• Modified version of absolute discounting

11 2

1 2

1 2

1

( ) , 0, 0( 2 )0( ( 2 ))

a knd r n n an n

aa n n a

1( )1 1

1( )

1 1 ( ) ( | ( ))( ( ))

1 1 ( | ( ))

d

w ddd

BOw d

d r P w sB s

P w s

Page 34: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

34

Experiments

• Training set: – ATIS-3,276K words, 23K sentences.

• Development set: – ATIS -3 Nov93

• Test set : – ATIS-3 Dec94, 10K words, 1K sentences.

• OOV were removed

• k=850

Page 35: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

35

Experiments (cont.)

Page 36: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

36

Experiments (cont.)

Page 37: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

37

Conclusion

• The HVS language model is able to make better use of context than standard class n-gram models.

• HVS model is trainable using EM.

Page 38: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

38

Class tree for implementation

Page 39: 1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

39

Iteration number vs. perplexity