Spoken Dialog Systems
description
Transcript of Spoken Dialog Systems
Spoken Dialog Systems
Lecture 3Spoken Language Processing
Prof. Andrew Rosenberg
2
Overview• Conversational Agents
– Automatic Speech Recognition (ASR)– Natural Language Understanding (NLU)– Generation
• Cooperative Question-Answering• Dialog Representations
3
Spoken Dialog Systems• Other terms:
– Interactive Voice Response (IVR) Systems– Conversational Agents– Dialog Systems
• Applications– Travel Arrangements (Amtrak, United Airlines)– Telephone call routing– Tutoring– Communicating with Robots– Anything with limited screen or keyboard
4
Travel Assistant: Communicator
5
Call Routing: ATT HMIHY
6
Tutorial: ITSPOKE
7
Conversational Structure• Telephone Conversation
– Stage 1: Enter a conversation– Stage 2: Identification– Stage 3: Establish a joint willingness to converse– Stage 4: First topic is raised. Often by the caller.
8
Conversational confusion• Customer: (rings)• Operator: Directory Enquiries, for which
town please?• Customer: Could you give me the phone
number of um: Mrs. um: Smithson?• Operator: Yes, which town is this at please?• Customer: Huddleston.• Operator: Yes. And the name again?• Customer: Mrs. Smithson
9
Customer confusion• Agent: And, what day in May did you want
to travel?• Client: OK, uh, I need to be there for a
meeting that’s from the 12th to the 15th.• The client didn’t answer the question.• Meaning of the sentence
– Meeting• start-of-meeting: 12th
• end-of-meeting: 15th
– Doesn’t say anything about flying• How does the agent infer that the client is
informing him/her of travel dates, or requirements
10
Customer Confusion• Agent: ….there’s 3 non-stops today
• True if, in fact there are more than 3 non-stops today.
• Intention is that there are exactly 3.• How can the client infer that the
agent means– “only 3”
11
Grice: Conversational Implicature
• Implicature: a particular class of licensed inferences.
• Grice (1975): how do conversational participants draw inferences beyond what is actually asserted?
• Cooperative Principle– “Make your contribution such as it is required, at
the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged.”
– A tacit agreement by speakers and listeners to cooperate in communication
12
Four Gricean Maxims• Relevance: Be relevant• Quantity :Do not make your contribution
more or less informative than required• Quality: Try to make your contribution
one that is true – don’t say things that are false, or for which you do not have evidence
• Manner: Avoid ambiguity and obscurity. Be brief and orderly
13
Maxim of Relevance• A: Is Regina here?• B: Her car is outside.• Implicature: yes
– Hearer thinks: Why would B mention the car? It must be relevant. How is it relevant? If her car is here, then she probably is too.
14
Maxim of Relevance• Client: I need to be there for a
meeting that’s from the 12th to the 15th.– Hearer thinks: Speaker is following
conversational maxims. This must be relevant? How would a meeting be relevant? If the client meant that he or she must arrive in the destination in time for the meeting, and will not leave until after the meeting is over.
15
Maxim of Quantity• A: How much money do you have on you?• B: I have 5 dollars.
– Implicature: 5 and only 5 dollars. not 6 dollars.• Similarly, 3 non-stops implies “only 3”
– Hearer thinks: If the speaker means 7 non-stops, he or she would have said 7 non-stops
• A: Did you do the reading for today’s class.• B: I’m going to at lunch
– Implicature: No.– B’s answer would be true if B planned on doing the
reading at lunch, AND already did the reading, but this would violate the Maxim of Quantity
16
Dialog System Architecture
17
Speech Recognition• Input: acoustic waveform• Output: string of words• Basic Components:
– A phone recognizer• acoustic model• small sound units [k][ae], etc.
– A pronunciation dictionary • pronunciation model• cat = [k ae t]
– A grammar telling us what words are likely to follow what words
• language model– A search algorithm to find the best string of words
• decoder
18
Natural Language Understanding
• “NLU”• “Computational Semantics”• There are many ways to represent
the meaning of a sentence, or group of sentences
• For speech spoken dialog systems, the most common is Frame and Slot Semantics
19
A Sample Frame• “Show me morning flights from Boston to
SF on Tuesday”
SHOW:FLIGHTS:ORIGIN:CITY: BostonDATE: TuesdayTIME: morningDEST:CITY: San Francisco
20
How do we Generate this semantics
• Many Methods:– Simplest: “semantic grammars”– E.g. a simple Context Free Grammars
• Left Hand Side rules are semantic categories– LIST -> show me | I want | can I see | …– DEPARTURETIME -> (after|around|before) HOUR |
morning | afternoon | evening– HOUR -> one | two | three | … | twelve (am|pm)– FLIGHTS -> (a) flight | flights– ORIGIN -> from CITY– DESTINATION -> to CITY– CITY -> Boston | San Francisco | Denver | Washington
21
Semantics for a sentence
Show me flights from Boston
to San Francisco on Tuesday morning
LIST FLIGHTS ORIGIN
DESTINATION DEPARTDATE DEPARTTIME
22
Generation and TTS• Generation component
– Chooses concepts to express to the user– Plans how to express these concepts in
words– Assigns appropriate prosody to the
words• Text-to-Speech (TTS) component
– Input: words and prosodic annotations– Output: Synthesized audio waveform
23
Generation Component• Content Planner
– Decides what information to express to user• ask a question, present an answer, etc.
– Frequently merged with a dialog manager• Language Generation
– Chooses syntactic structures and words to express meaning.
– Simplest method: Template Based Generation• Pre-specify basic sentences and fill in the blanks as
appropriate• Can have variables:
– What time do you want to leave CITY-ORIG?– Will you return to CITY-ORIG from CITY-DEST?
24
More sophisticate language generation
• Natural Language Generation• Approach
– Dialog manager builds representation of the meaning of the utterance to be expressed
• Possibly Slot-Frame representation– Passes this to a “generator”– Generators have three components
• Sentence planner• Surface realizer• Prosody assigner
25
Architecture of a generator• Based on Walker & Rambow 2002
26
HCI Constraints on Generation• Human Computer Interaction (HCI)• Discourse markers and pronouns • “Coherence”Please say the date…Please say the start time…Please say the duration…Please say the subject…First, tell me the date…
Next, I’ll need the time it starts…Thanks. <pause> Now, how long is it supposed to last.Just one more thing. I need a brief description
27
HCI Constraints• “Tapered Prompts”• Prompts get incrementally smaller• System: Now, what’s the first company you’d like to add to
your watch list?• Caller: Cisco• System: What’s the next company name? Or you can say
“Finished”• Caller: IBM• System: Tell me the next company name or say “Finished”• Caller: Intel• System: Next One?• Caller: General Motors• System: Next?• Caller: …
28
Issues in Building SDS• What architecture should you use?• How much discourse history should
you collect and maintain?• What type of dialog strategy will you
use?• What type of confirmation strategy to
use?• What help will be provided?• How can you recover from errors?
29
Next Class• Acoustics of Speech• Reading: J&M 7.4