Spoken Dialogue Systems

CS 4705

Talking to a Machine….and (often) Getting an Answer

• Today’s spoken dialogue systems make it possible to accomplish real tasks without talking to a person– Could Eliza do this?

– What do today’s systems do better?

– Do they actually embody human intelligence?

• Key advances– Stick to goal-directed interactions in a limited domain

– Prime users to adopt the vocabulary you can recognize

– Partition the interaction into manageable stages

– Judicious use of system vs. mixed initiative

Dialogue vs. Monologue

• Monologue and dialogue both involve interpreting– Information status

– Coherence issues

– Reference resolution

– Speech acts, implicature, intentionality

• Dialogue involves managing– Turn-taking

– Grounding and repairing misunderstandings

– Initiative and confirmation strategies

Segmenting Speech into Utterances

• What is an `utterance’?– Why is EOU detection harder than EOS?

– How does speech differ from text?

– Single syntactic sentence may span several turnsA: We've got you on USAir flight 99

B: Yep

A: leaving on December 1.

– Multiple syntactic sentences may occur in single turnA: We've got you on USAir flight 99 leaving on December. Do

you need a rental car?

– Intonational definitions: intonational phrase, breath group, intonation unit

Turns and Utterances

• Dialogue is characterized by turn-taking: who should talk next, and when they should talk

• How do we identify turns in recorded speech?– Little speaker overlap (around 5% in English --although

depends on domain)

– But little silence between turns either

• How do we know when a speaker is giving up or taking a turn? Holding the floor? How do we know when a speaker is interruptable?

Simplified Turn-Taking Rule (Sacks et al)

• At each transition-relevance place (TRP) of each turn:– If current speaker has selected A as next speaker, then

A must speak next

– If current speaker does not select next speaker, any other speaker may take next turn

– If no one else takes next turn, the current speaker may take next turn

• TRPs are where the structure of the language allows speaker shifts to occur

• Adjacency pairs set up next speaker expectations– GREETING/GREETING

– QUESTION/ANSWER

– COMPLIMENT/DOWNPLAYER

– REQUEST/GRANT

• ‘Significant silence’ is dispreferredA: Is there something bothering you or not? (1.0s)

A: Yes or no? (1.5s)

A: Eh?

B: No.

Intonational Cues to Turntaking

• Continuation rise (L-H%) holds the floor• H-H% requests a response

– L*H-H% (ynq contour)

– H* H-H% (highrise question contour)

• Intonational contours signal dialogue acts in adjacency pairs

Timing and Turntaking

• How should we time responses in a SDS?– Japanese studies of aizuchi (backchannels) (Koiso et al

‘98, Takeuchi et al ‘02) in natural speech

– Lexical information: particles ne and ka ending preceding turn or (in telephone shopping) product names

– Length of preceding utterance, f0, loudness, and pause after even more important in predicting turntaking

Turntaking and Initiative Strategies• System Initiative

S: Please give me your arrival city name.

U: Baltimore.

S: Please give me your departure city name….

• User InitiativeS: How may I help you?

U: I want to go from Boston to Baltimore on November 8.

• `Mixed’ initiativeS: How may I help you?

U: I want to go to Boston.

S: What day do you want to go to Boston?

Grounding (Clark & Shaefer ‘89)

• Conversational participants don’t just take turns speaking….they try to establish common ground (or mutual belief)

• Hmust ground a S's utterances by making it clear whether or not understanding has occurred

• How do hearers do this?S: I can upgrade you to an SUV at that rate.– Continued attention

(U gazes appreciatively at S)– Relevant next contribution

U: Do you have a RAV4 available?

– Acknowledgement/backchannelU: Ok/Mhmmm/Great!

– Demonstration/paraphraseU: An SUV.

– Display/repetition

U: You can upgrade me to an SUV at the same rate?

– Request for repair

U: I beg your pardon?

Detecting Grounding Behavior• Evidence of system misconceptions reflected in user

responses (Krahmer et al ‘99, ‘00)– Responses to incorrect verifications

• contain more words (or are empty)

• show marked word order (especially after implicit verifications)

• contain more disconfirmations, more repeated/corrected info

– ‘No’ after incorrect verifications vs. other ynq’s

• has higher boundary tone

• wider pitch range

• longer duration

• longer pauses before and after

• more additional words after it

• User information state reflected in response (Shimojima et al ’99, ‘01)– Echoic responses repeat prior information – as

acknowledgment or request for confirmationS1: Then go to Keage station.

S2: Keage.

– Experiment:

• Identify ‘degree of integration’ and prosodic features (boundary tone, pitch range, tempo, initial pause)

• Perception studies to elicit ‘integration’ effect

– Results: fast tempo, little pause and low pitch signal high integration

Grounding and Confirmation Strategies

U: I want to go to Baltimore.• Explicit

S: Did you say you want to go to Baltimore?

• ImplicitS: Baltimore. (H* L- L%)

S: Baltimore? (L* H- H%)

S: What time do you want to leave Baltimore?

• No confirmation

How do we evaluate Dialogue Systems?

• PARADISE framework (Walker et al ’00)

• “Performance” of a dialogue system is affected both by what gets accomplished by the user and the dialogue agent and how it gets accomplished

MaximizeMaximizeTask SuccessTask Success

Minimize Minimize CostsCosts

EfficiencyEfficiencyMeasuresMeasures

QualitativeQualitativeMeasuresMeasures

04/19/23 17

What metrics should we use?

• Efficiency of the Interaction:User Turns, System Turns, Elapsed Time

• Quality of the Interaction: ASR rejections, Time Out Prompts, Help Requests, Barge-Ins, Mean Recognition Score (concept accuracy), Cancellation Requests

• User Satisfaction• Task Success: perceived completion,

information extracted

04/19/23 18

User Satisfaction:Sum of Many Measures

• Was Annie easy to understand in this conversation? (TTS Performance)

• In this conversation, did Annie understand what you said? (ASR Performance)

• In this conversation, was it easy to find the message you wanted? (Task Ease)

• Was the pace of interaction with Annie appropriate in this conversation? (Interaction Pace)

• In this conversation, did you know what you could say at each point of the dialog?

(User Expertise)• How often was Annie sluggish

and slow to reply to you in this conversation? (System Response)

• Did Annie work the way you expected her to in this conversation? (Expected Behavior)

• From your current experience with using Annie to get your email, do you think you'd use Annie regularly to access your mail when you are away from your desk? (Future Use)

Performance Model

• Weights trained for each independent factor via multiple regression modeling: how much does each contribute to User Satisfaction?

• Result useful for system development– Making predictions about system modifications

– Distinguishing ‘good’ dialogues from ‘bad’ dialogues

• But … can we also tell on-line when a dialogue is ‘going wrong’

Identifying Misrecognitions, Awares and User Corrections Automatically (Hirschberg,

Litman & Swerts)

• Collect corpus from interactive voice response system

• Identify speaker ‘turns’• incorrectly recognized • where speakers first aware of error • that correct misrecognitions

• Identify prosodic features of turns in each category and compare to other turns

• Use Machine Learning techniques to train a classifier to make these distinctions automatically

Turn Types

TOOT: Hi. This is AT&T Amtrak Schedule System. This is TOOT. How may I help you?

User: Hello. I would like trains from Philadelphia to New York leaving on Sunday at ten thirty in the evening.

TOOT: Which city do you want to go to?

User: New York.

misrecognition

correction

aware site

Results

• Reduced error in predicting misrecognized turns to 8.64%

• Error in predicting ‘awares’ (12%)• Error in predicting corrections (18-21%)

Conclusions

• Spoken dialogue systems presents new problems -- but also new possibilities– Recognizing speech introduces a new source of errors

– Additional information provided in the speech stream offers new information about users’ intended meanings, emotional state (grounding of information, speech acts, reaction to system errors)

• Why spoken dialogue systems rather than web-based interfaces?

Spoken Dialogue Systems

Documents

Transcript of Spoken Dialogue Systems

Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

Building Spoken Dialogue Systems for Embodied Agents

Slides for On the Usability of Spoken Dialogue …kom.aau.dk/~lbl/phd/Slides_for_On_the_Usability_of Spoken...Page 2 of 33 On the Usability of Spoken Dialogue Systems Overview •

Miscommunication Detection and Recovery for Spoken … · using spoken language. Similarly, advances in speech recognition and synthesis have enabled spoken dialogue systems to run

Intonational Variation in Spoken Dialogue Systems

6/28/20151 Spoken Dialogue Systems: Human and Machine Julia Hirschberg CS 4706.

Spoken dialogue BIM systems an application of Big Data in ...osp.mans.edu.eg/elbeltagi/Fac 6-3 Facilities_manuscript.pdf · 1 Spoken dialogue BIM systems – an application of Big

Bootstrapping spoken dialogue systems by exploiting ...Bootstrapping spoken dialogue systems 317 In our previous work, we have presented active and unsupervised (or semisuper-vised)

Spoken Language Understanding in Dialogue Systems Svetlana Stoyanchev 02/02/2015.

Spoken dialogue systems: challenges, and opportunities … · Spoken dialogue systems: challenges, and opportunities for research ... New York last-call: ... Spoken dialogue systems:

Lecture # 18 ASR for Spoken-Dialogue Systems · 2020-01-03 · 6.345 Automatic Speech Recognition Acoustic Modelling 1 ASR for Spoken-Dialogue Systems • Introduction • Speech

Natural Language Generation for Spoken Dialogue Systems · (the slippery slope) Dialogue Manager Content Planner Surface Realizer Speech Synthesizer But, in reactive systems, generation

An Evaluation Framework for Natural Language Understanding in Spoken Dialogue Systems

Towards human-like spoken dialogue systems Introduction · example a web form. The interface metaphor lies close at hand because many of the spoken dialogue systems that users are

Spoken Dialogue Systems for Ambient Environments: Second International Workshop on Spoken Dialogue Systems Technology, IWSDS 2010, Gotemba, Shizuoka, Japan, October 1-2, 2010. Proceedings

Spoken Dialogue Systems and Social Talk - Emer Gilmartin

Learning Optimal Strategies for Spoken Dialogue Systems

Unsupervised Learning and Modeling of Knowledge and Intent for … · 2015-09-09 · SPOKEN DIALOGUE SYSTEM (SDS) Spoken dialogue systems are the intelligent agents that are able

5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.

Developing Multimodal Spoken Dialogue Systemsjocke/final_thesis.pdfdeveloping multimodal spoken dialogue systems where users can express themselves freely without having to learn a