“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March...

32
“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department www.cs.cmu.edu/~dbohus Carnegie Mellon University [email protected] Pittsburgh, PA 15213
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March...

Page 1: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

“k hypotheses + other” belief updating in spoken dialog systems

Dialogs on Dialogs Talk, March 2006

Dan Bohus Computer Science Departmentwww.cs.cmu.edu/~dbohus Carnegie Mellon [email protected] Pittsburgh, PA 15213

Page 2: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

2

problem

spoken language interfaces lack robustness when faced with understanding errors

errors stem mostly from speech recognition typical word error rates: 20-30% significant negative impact on interactions

Page 3: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

3

guarding against understanding errors

use confidence scores machine learning approaches for detecting

misunderstadings [Walker, Litman, San-Segundo, Wright, and others]

engage in confirmation actions explicit confirmation

did you say you wanted to fly to Seoul? yes → trust hypothesis no → delete hypothesis “other” → non-understanding

implicit confirmationtraveling to Seoul … what day did you need to travel? rely on new values overwriting old values

related work : data : user response analysis : proposed approach: experiments and results : conclusion

Page 4: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

4

construct accurate beliefs by integrating information over multiple turns in a conversation

today’s talk …

S: Where would you like to go?U: Huntsville

[SEOUL / 0.65]

S: traveling to Seoul. What day did you need to travel?

destination = {seoul/0.65}

destination = {?}

U: no no I’m traveling to Birmingham[THE TRAVELING TO BERLIN P_M / 0.60]

Page 5: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

5

belief updating: problem statement

S: traveling to Seoul. What day did you need to travel?

destination = {seoul/0.65}

destination = {?}

[THE TRAVELING TO BERLIN P_M / 0.60]

given an initial belief Binitial(C) over

concept C a system action SA a user response R

construct an updated belief Bupdated(C) ← f (Binitial(C), SA, R)

Page 6: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

6

outline

proposed approach

data

experiments and results

effect on dialog performance

conclusion

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 7: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

7

belief updating: problem statement

S: traveling to Seoul. What day did you need to travel?

destination = {seoul/0.65}

destination = {?}

[THE TRAVELING TO BERLIN P_M / 0.60]

given an initial belief Binitial(C) over

concept C a system action SA(C) a user response R

construct an updated belief Bupdated(C) ← f(Binitial(C),SA(C),R)

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 8: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

8

belief representationBupdated(C) ← f(Binitial(C), SA(C), R)

most accurate representation probability distribution over the set of possible

values

however system will “hear” only a small number of

conflicting values for a concept within a dialog session

in our data max = 3 (conflicting values heard) only in 6.9% of cases, more than 1 value heard

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 9: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

9

compressed belief representation

k hypotheses + other at each turn, the system

retains the top m initial hypotheses and adds n new hypotheses from the input (m+n=k)

belief representationBupdated(C) ← f(Binitial(C), SA(C), R)

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 10: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

10

B(C) modeled as a multinomial variable {h1, h2, … hk, other}

B(C) = <ch1, ch2, …, chk, cother> where ch1 + ch2 + … + chk + cother = 1

belief updating can be cast as multinomial regression problem:

Bupdated(C) ← Binitial(C) + SA(C) + R

belief representationBupdated(C) ← f(Binitial(C), SA(C), R)

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 11: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

11

request S: For when do you want the room?U:Friday

[FRIDAY / 0.65]

explicit confirmation

S: Did you say you wanted a room for Friday?U:Yes

[GUEST / 0.30]

implicit confirmation

S: a room for Friday … starting at what time?U:starting at ten a.m.

[STARTING AT TEN A_M / 0.86]

unplanned implicit confirmation

S: I found 5 rooms available Friday from 10 until noon. Would you like a small or a large room?U:not Friday, Thursday

[FRIDAY THURSDAY / 0.25]

no action /unexpected update

S: okay. I will complete the reservation. Please tell

me your name or say ‘guest user’ if you are not

a registered user.U:guest user

[THIS TUESDAY / 0.55]

system actionBupdated(C) ← f(Binitial(C), SA(C), R)

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 12: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

12

acoustic / prosodic

acoustic and language scores, duration, pitch (min, max, mean, range, std.dev, min and max slope, plus normalized versions), voiced-to-unvoiced ratio, speech rate, initial pause, etc;

lexical number of words, lexical terms highly correlated with corrections or acknowledgements (selected via mutual information computation).

grammatical number of slots (new and repeated), parse fragmentation, parse gaps, etc;

dialog dialog state, turn number, expectation match, new value for concept, timeout, barge-in, concept identity

priors priors for concept values (manually constructed by a domain expert for 3 of 29 concepts: date, start_time, end_time; uniform assumed o/w)

confusability empirically derived confusability scores

Bupdated(C) ← f(Binitial(C), SA(C), R)user response

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 13: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

13

approach

problem <uch1, … uchk, ucoth> ← f(<ich1, … ichk, icoth>, SA(C),

R)

approach: multinomial generalized linear model regression model, multinomial independent variable sample efficient stepwise approach

feature selection BIC to control over-fitting

one model for each system action <uch1, … uchk, ucoth> ← fSA(C)(<ich1, … ichk, icoth>, R)

Bupdated(C) ← f(Binitial(C), SA(C), R)

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 14: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

14

outline

proposed approach

data

experiments and results

effect on dialog performance

conclusion

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 15: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

15

data

collected with RoomLine a phone-based mixed-initiative spoken dialog

system

conference room reservation

explicit and implicit confirmations

simple heuristic rules for belief updating explicit confirm: yes / no

implicit confirm: new values overwrite old ones

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 16: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

16

corpus

user study 46 participants (naïve users) 10 scenario-based interactions each compensated per task success

corpus 449 sessions, 8848 user turns orthographically transcribed manually annotated

misunderstandings corrections correct concept values

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 17: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

17

outline

proposed approach

data

experiments and results

effect on dialog performance

conclusion

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 18: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

18

baselines

initial baseline accuracy of system beliefs before the update

heuristic baseline accuracy of heuristic update rule used by the

system

oracle baseline accuracy if we knew exactly when the user

corrects

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 19: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

19

k=2 hypotheses + other

priors and confusability

initial confidence score

concept identity

barge-in

expectation match

repeated grammar slots

Informative features

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 20: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

20

outline

proposed approach

data

experiments and results

effect on dialog performance

conclusion

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 21: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

21

a question remains …

… does this really matter?

what is the effect on global dialog performance?

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 22: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

22

let’s run an experiment

guinea pigs from Speech Lab for exp: $0

getting change from guys in the lab: $2/$3/$5

real subjects for the experiment: $25

picture with advisor of the VERY last exp at CMU: priceless!!!!

[courtesy of Mohit Kumar]

Page 23: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

23

a new user study …

implemented models in RavenClaw, performed a new user study 40 participants, first-time users 10 scenario-driven interactions each

non-native speakers of North-American English improvements more likely at higher WER

supported by empirical evidence

between-subjects; 2 gender-balanced groups control: RoomLine using heuristic update rules treatment: RoomLine using runtime models

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 24: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

24

effect on task success

proposed approach: data: experiments and results : effect on dialog performance : conclusion

73.6%

81.3%

control

treatment

tasksuccess

control

treatment

even though

averageuser WER

21.9%

24.2%

Page 25: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

25

0 20% 40% 60% 80% 100%0

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%treatment

control

effect on task success … a closer look

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Task Success ← 2.09 - 0.05∙WER + 0.69∙Condition

probability of task success

word error rate

16% WER30% WER

64%

78%

p=0.001

78%

Page 26: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

26

0 10 20 30 40 50 60 70 80 90 1000.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

improvements at different WER

proposed approach: data: experiments and results : effect on dialog performance : conclusion

word-error-rate

abso

lute

Im

pro

vem

ent

in t

ask

succ

ess

Page 27: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

27

effect on task duration (for successful tasks)

ANOVA on task duration for successful tasksDuration ← -0.21 + 0.013∙WER - 0.106∙Condition

significant improvement, equivalent to 7.9% absolute reduction in WER

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 28: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

28

outline

proposed approach

data

experiments and results

effect on dialog performance

conclusion

proposed approach: data: experiments and results : effect on dialog performance : conclusion

Page 29: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

29

summary

data-driven approach for constructing accurate system beliefs integrate information across multiple turns

bridge together detection of misunderstandings and corrections

significantly outperforms current heuristics

significantly improves effectiveness and efficiency

Page 30: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

30

other advantages sample efficient

performs a local one-turn optimization good local performance leads to good global

performance

scalable works independently on concepts 29 concepts, varying cardinalities

portable decoupled from dialog task specification doesn’t make strong assumptions about

dialog management technology

Page 31: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

31

thank you! questions …

Page 32: “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department dbohus.

32

user study

10 scenarios, fixed order presented graphically (explained during briefing)

participants compensated per task success