Spoken Dialog Systems

Spoken Dialog Systems

Lecture 3Spoken Language Processing

Prof. Andrew Rosenberg

2

Overview• Conversational Agents

– Automatic Speech Recognition (ASR)– Natural Language Understanding (NLU)– Generation

• Cooperative Question-Answering• Dialog Representations

3

Spoken Dialog Systems• Other terms:

– Interactive Voice Response (IVR) Systems– Conversational Agents– Dialog Systems

• Applications– Travel Arrangements (Amtrak, United Airlines)– Telephone call routing– Tutoring– Communicating with Robots– Anything with limited screen or keyboard

4

Travel Assistant: Communicator

5

Call Routing: ATT HMIHY

6

Tutorial: ITSPOKE

7

Conversational Structure• Telephone Conversation

– Stage 1: Enter a conversation– Stage 2: Identification– Stage 3: Establish a joint willingness to converse– Stage 4: First topic is raised. Often by the caller.

8

Conversational confusion• Customer: (rings)• Operator: Directory Enquiries, for which

town please?• Customer: Could you give me the phone

number of um: Mrs. um: Smithson?• Operator: Yes, which town is this at please?• Customer: Huddleston.• Operator: Yes. And the name again?• Customer: Mrs. Smithson

9

Customer confusion• Agent: And, what day in May did you want

to travel?• Client: OK, uh, I need to be there for a

meeting that’s from the 12th to the 15th.• The client didn’t answer the question.• Meaning of the sentence

– Meeting• start-of-meeting: 12th

• end-of-meeting: 15th

– Doesn’t say anything about flying• How does the agent infer that the client is

informing him/her of travel dates, or requirements

10

Customer Confusion• Agent: ….there’s 3 non-stops today

• True if, in fact there are more than 3 non-stops today.

• Intention is that there are exactly 3.• How can the client infer that the

agent means– “only 3”

11

Grice: Conversational Implicature

• Implicature: a particular class of licensed inferences.

• Grice (1975): how do conversational participants draw inferences beyond what is actually asserted?

• Cooperative Principle– “Make your contribution such as it is required, at

the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged.”

– A tacit agreement by speakers and listeners to cooperate in communication

12

Four Gricean Maxims• Relevance: Be relevant• Quantity :Do not make your contribution

more or less informative than required• Quality: Try to make your contribution

one that is true – don’t say things that are false, or for which you do not have evidence

• Manner: Avoid ambiguity and obscurity. Be brief and orderly

13

Maxim of Relevance• A: Is Regina here?• B: Her car is outside.• Implicature: yes

– Hearer thinks: Why would B mention the car? It must be relevant. How is it relevant? If her car is here, then she probably is too.

14

Maxim of Relevance• Client: I need to be there for a

meeting that’s from the 12th to the 15th.– Hearer thinks: Speaker is following

conversational maxims. This must be relevant? How would a meeting be relevant? If the client meant that he or she must arrive in the destination in time for the meeting, and will not leave until after the meeting is over.

15

Maxim of Quantity• A: How much money do you have on you?• B: I have 5 dollars.

– Implicature: 5 and only 5 dollars. not 6 dollars.• Similarly, 3 non-stops implies “only 3”

– Hearer thinks: If the speaker means 7 non-stops, he or she would have said 7 non-stops

• A: Did you do the reading for today’s class.• B: I’m going to at lunch

– Implicature: No.– B’s answer would be true if B planned on doing the

reading at lunch, AND already did the reading, but this would violate the Maxim of Quantity

16

Dialog System Architecture

17

Speech Recognition• Input: acoustic waveform• Output: string of words• Basic Components:

– A phone recognizer• acoustic model• small sound units [k][ae], etc.

– A pronunciation dictionary • pronunciation model• cat = [k ae t]

– A grammar telling us what words are likely to follow what words

• language model– A search algorithm to find the best string of words

• decoder

18

Natural Language Understanding

• “NLU”• “Computational Semantics”• There are many ways to represent

the meaning of a sentence, or group of sentences

• For speech spoken dialog systems, the most common is Frame and Slot Semantics

19

A Sample Frame• “Show me morning flights from Boston to

SF on Tuesday”

SHOW:FLIGHTS:ORIGIN:CITY: BostonDATE: TuesdayTIME: morningDEST:CITY: San Francisco

21

Semantics for a sentence

Show me flights from Boston

to San Francisco on Tuesday morning

LIST FLIGHTS ORIGIN

DESTINATION DEPARTDATE DEPARTTIME

22

Generation and TTS• Generation component

– Chooses concepts to express to the user– Plans how to express these concepts in

words– Assigns appropriate prosody to the

words• Text-to-Speech (TTS) component

– Input: words and prosodic annotations– Output: Synthesized audio waveform

23

Generation Component• Content Planner

– Decides what information to express to user• ask a question, present an answer, etc.

– Frequently merged with a dialog manager• Language Generation

– Chooses syntactic structures and words to express meaning.

– Simplest method: Template Based Generation• Pre-specify basic sentences and fill in the blanks as

appropriate• Can have variables:

– What time do you want to leave CITY-ORIG?– Will you return to CITY-ORIG from CITY-DEST?

24

More sophisticate language generation

• Natural Language Generation• Approach

– Dialog manager builds representation of the meaning of the utterance to be expressed

• Possibly Slot-Frame representation– Passes this to a “generator”– Generators have three components

• Sentence planner• Surface realizer• Prosody assigner

25

Architecture of a generator• Based on Walker & Rambow 2002

26

HCI Constraints on Generation• Human Computer Interaction (HCI)• Discourse markers and pronouns • “Coherence”Please say the date…Please say the start time…Please say the duration…Please say the subject…First, tell me the date…

Next, I’ll need the time it starts…Thanks. <pause> Now, how long is it supposed to last.Just one more thing. I need a brief description

27

HCI Constraints• “Tapered Prompts”• Prompts get incrementally smaller• System: Now, what’s the first company you’d like to add to

your watch list?• Caller: Cisco• System: What’s the next company name? Or you can say

“Finished”• Caller: IBM• System: Tell me the next company name or say “Finished”• Caller: Intel• System: Next One?• Caller: General Motors• System: Next?• Caller: …

28

Issues in Building SDS• What architecture should you use?• How much discourse history should

you collect and maintain?• What type of dialog strategy will you

use?• What type of confirmation strategy to

use?• What help will be provided?• How can you recover from errors?

29

Next Class• Acoustics of Speech• Reading: J&M 7.4

Spoken Dialog Systems

Documents

Transcript of Spoken Dialog Systems