HIGGINS A spoken dialogue system for investigating error handling techniques Jens Edlund, Gabriel...

Post on 19-Dec-2015

215 views 0 download

Tags:

Transcript of HIGGINS A spoken dialogue system for investigating error handling techniques Jens Edlund, Gabriel...

HIGGINSA spoken dialogue system for investigating error handling techniques

Jens Edlund, Gabriel Skantze and Rolf Carlson

ScenarioUser: I want to go to the closest subway station.

System: Ok, to the closest subway station. Can you describe where you are now?

User: I have an ATM to my left and a pedestrian crossing in front of me.

System: Can you see some trees to your right and a white building in front of you?

User: Yes

System: Ok, take left after the large building which you have on your left and follow the street until you reach a crossing.

User: Ok, there is a bus station here.

System: That’s right. Take left again after the bus station.

Centre forSpeech Technology

TT

User utterance

User reaction/repair

Assume understanding

No recovery

Non-understanding

Assume understanding

Architecture

The Higgins Project• Instigated in 2003• Theoretical goal: Investigate error handling techniques for

collaborative dialogue systems• Practical goal: Build a system in which these can be tested

empirically• This poster presents the current stage of the project.

Error recovery(Non-understanding)

Error recovery•Map-task-like studies on human-human conversation using ASR in one direction:

•Results show that humans tend not to signal non-understanding:

•This leads to•Increased experience of task success•Faster recovery from non-understanding

•Skantze, G. (2003). Exploring human error handling strategies: implications for spoken dialogue systems.

Early error detection

Grounding

Late error detection

Error recovery(Misunderstanding)

Late error detectionThe need for late error detection is task dependent:• Sometimes not necessary:

• Sometimes reference handing is sufficient:

• For slots with multiple possible values, late error detection is necessary:

(These also exemplify misunderstanding error recovery.)

Grounding• The amount of feedback from the system should at

least depend on• Confidence of understanding• Consequence of misunderstanding

• The discourse modeller• Unifies assertions and tracks referents• Solves ellipses • Solves anaphora• Keeps track of who contributed which

information:

Early error detection

KTH LVSCR

Large-Vocabulary Probabilistic ASR

Machine-learned error detection

Rule-driven semantic/syntactic error detection

Rule-driven discourse error detection

• Which features could be used for detecting word level errors• How are they operationalised?• Initial tests with Memory-based and Transformation-based

learning suggest:• Utterance context• Lexical information• Word confidences• Discourse history

• Skantze, G. & Edlund, J. (2004). Early error detection on word level.

ASR post-processing

PICKERING:Robust

interpretation

• Rule-based semantic parsing• Finds partial results with largest coverage• Allows insertions inside phrases• Allows non-agreement if necessary

• Evaluation results show robustness against inserted content words

• Skantze, G. & Edlund, J. (2004). Robust interpretation in the Higgins spoken dialogue system.

ASR

Utterance interpretation

Discourse modelling

Generation Decision making

TTS

• Distributed modular system• Goals:

• A module for every task that is reasonably well-defined

• Separation of the domain specific (XML) and the domain independent (module code)

• Incremental processing allows for:• Rapid feedback• Flexible turn-taking• Faster processing

U1: I want to go to BostonS1: To London...U2: No, to Boston!

U1: How much is the big apartment?S1: The small apartment is […]U2a: No, the big apartment!U2b: And the big apartment?

U1: I have a large building on my leftS1: A large building on your rightU2a: No, on my left!U2b: And on my left

Misunderstanding

U1: There is a large red buildingS2: What material is the large building made

of?

O1: Do you see a wooden house in front of you?U1: YES CROSSING ADDRESS NOW

(I pass the wooden house now)O2: Can you see a restaurant sign?

Vocoder

User Operator

Listens Speaks

ReadsSpeaks ASR

GALATEA:Discourse modelling