Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch
description
Transcript of Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch
![Page 1: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/1.jpg)
Non-Native Users in the Let’s Go!! Spoken Dialogue System:
Dealing with Linguistic Mismatch
Antoine Raux & Maxine EskenaziLanguage Technologies Institute
Carnegie Mellon University
![Page 2: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/2.jpg)
Background Speech-enabled systems use models of
the user’s language Such models are tailored for native
speech Great loss of performance for non-native
users who don’t follow typical native patterns
![Page 3: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/3.jpg)
Previous Work on Non-Native Speech Recognition Assumes knowledge about/data from a
specific non-native population Often based on read speech Focuses on acoustic mismatch:
• Acoustic adaptation• Multilingual acoustic models
![Page 4: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/4.jpg)
Linguistic Particularities of Non-Native Speakers Non-native speakers might use different
lexical and syntactic constructs
Non-native speakers are in a dynamic process of L2 acquisition
![Page 5: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/5.jpg)
Outline of the Talk
Baseline system and data collection Study of non-native/native mismatch and
effect of additional non-native data Adaptive lexical entrainment
![Page 6: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/6.jpg)
The CMU Let’s Go!! System:Bus Schedule Information for the Pittsburgh Area
ASRSphinx II
ParsingPhoenix
Dialogue ManagementRavenClaw
Speech SynthesisFestival
HUBGalaxy
NLGRosetta
![Page 7: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/7.jpg)
Data Collection Baseline system accessible since
February 2003 Experiments with scenarios Publicized the phone number inside
CMU in Fall 2003
![Page 8: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/8.jpg)
Data Collection Web Page
![Page 9: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/9.jpg)
Data Directed experiments: 134 calls
• 17 non-native speakers (5 from India, 7 from Japan, 5 others)
Spontaneous: 30 calls Total: 1768 utterances Evaluation Data:
• Non-Native: 449 utterances• Native: 452 utterances
![Page 10: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/10.jpg)
Speech Recognition Baseline Acoustic Models:
• semi-continuous HMMs (codebook size: 256)• 4000 tied states• trained on CMU Communicator data
Language Model: • class-based backoff 3-gram• trained on 3074 utterances from native calls
![Page 11: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/11.jpg)
Speech Recognition Results
Native Non-Native
20.4% 52.0%
Causes of discrepancy:• Acoustic mismatch (accent)• Linguistic mismatch (word choice, syntax)
Word Error Rate:
![Page 12: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/12.jpg)
Language Model Performance
05
10152025303540
Perp
lexity
Native Non-Native
Perplexity0
0.51
1.52
2.53
3.5
% to
kens
Native Non-Native
OOV Rate
02468
101214
% ut
tera
nces
Native Non-Native
Rate of utterances with OOV
Evaluation on transcripts. Initial model: 3074 native utterances
![Page 13: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/13.jpg)
Adding non-native data:3074 native+1308 non-native utterances
Initial (native) modelMixed model
Language Model Performance
00.5
11.5
22.5
33.5
% to
kens
Native Non-Native
OOV Rate
02468
101214
% ut
tera
nces
Native Non-Native
Rate of utterances with OOV
05
10152025303540
Perp
lexity
Native Non-Native
Perplexity
![Page 14: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/14.jpg)
Natural Language Understanding Grammar manually written incrementally,
as the system was being developed Initially built with native speakers in mind Phoenix: robust parser (less sensitive to
non-standard expressions)
![Page 15: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/15.jpg)
Grammar Coverage
05
1015202530354045
% wo
rds
not
cove
red
by p
arse
Native Non-Native
Parse Word Coverage
0102030405060
% ut
tera
nces
not
fully
par
sed
Native Non-Native
Parse Utterance Coverage
Initial grammar:• Manually written for
native utterances
![Page 16: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/16.jpg)
Grammar Coverage
05
1015202530354045
% wo
rds
not
cove
red
by p
arse
Native Non-Native
Parse Word Coverage
0102030405060
% ut
tera
nces
not
fully
par
sed
Native Non-Native
Parse Utterance Coverage
Grammar designed to accept some non-native patterns: • “reach” = “arrive”• “What is the next bus?” =
“When is the next bus?”
![Page 17: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/17.jpg)
Relative Improvement due to Additional Data
0102030405060
% Im
prov
emen
t
% OOV % utt w/OOV
Perplexity WordCoverage
Utt.Coverage
Native Set Non-Native Set
![Page 18: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/18.jpg)
Effect of Additional Data on Speech Recognition
0
10
20
30
40
50
60
Word
Erro
r Rat
e (%
)
Native Set Non-Native Set
Native ModelMixed Model
![Page 19: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/19.jpg)
Adaptive Lexical Entrainment “If you can’t adapt the system, adapt the user” System should use the same expressions it
expects from the user But non-native speakers might not master all
target expressions
Use expressions that are close to the non-native speaker’s language
Use prosody to stress incorrect words
![Page 20: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/20.jpg)
Adaptive Lexical Entrainment:Example
I want to go the airport
I want to go the airport?TODid you mean:
![Page 21: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/21.jpg)
Adaptive Lexical Entrainment:Algorithm
TargetPrompts
ASR Hypothesis
DP-basedAlignment
PromptSelection Emphasis Confirmation
Prompt
I want to go the airport
![Page 22: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/22.jpg)
Adaptive Lexical Entrainment:Algorithm
TargetPrompts
ASR Hypothesis
DP-basedAlignment
PromptSelection Emphasis Confirmation
Prompt
I want to go the airport
I’d like to go to the airport
![Page 23: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/23.jpg)
Adaptive Lexical Entrainment:Algorithm
TargetPrompts
ASR Hypothesis
DP-basedAlignment
PromptSelection Emphasis Confirmation
Prompt
I want to go the airport
I’d like to go to the airport
I want to go to the airport
![Page 24: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/24.jpg)
Adaptive Lexical Entrainment:Algorithm
TargetPrompts
ASR Hypothesis
DP-basedAlignment
PromptSelection Emphasis Confirmation
Prompt
I want to go the airport
I’d like to go to the airport
I want to go to the airport
![Page 25: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/25.jpg)
Adaptive Lexical Entrainment:Algorithm
TargetPrompts
ASR Hypothesis
DP-basedAlignment
PromptSelection Emphasis Confirmation
Prompt
I want to go the airport
I’d like to go to the airport
I want to go to the airport
![Page 26: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/26.jpg)
Adaptive Lexical Entrainment:Algorithm
TargetPrompts
ASR Hypothesis
DP-basedAlignment
PromptSelection Emphasis Confirmation
Prompt
I want to go the airport
I’d like to go to the airport
I want to go to the airportDid you mean: ?
![Page 27: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/27.jpg)
Adaptive Lexical Entrainment:Study 24 dialogues, 12.4 turns/dialogue, 2.5
entrainment prompts/dialogue Trigger entrainment whenever best
target does not perfectly match ASR• Too simple? 40.7% false positive
64.4% prompts are triggered by misrecognitions
![Page 28: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/28.jpg)
Conclusion LM and Grammar perform significantly worse
on non-native speech Including non-native data reduces the gap in
linguistic metrics but not WER Future work:
• Improve Speech Recognition• More data for LM• Acoustic/Lexical adaptation
• Integrate adaptive lexical entrainment in the dialog management framework
![Page 29: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/29.jpg)
Example DialogueS: Welcome to the Let’s Go!! bus information system.
What can I do for you?
U: Excuse me. I’m in the place which uh where the Forbes Avenue and Murray Avenue are crossing and I want to go to the place mm Forbeswhere the Forbes Avenue and Craig Street are crossing.
S: I heard “SIX”. Is that correct?
U: Which bus uh should I take? Uh sorry uh maybe 61C bus I have… What time the next 61C bus will come?
S: The time is currently 5:37 PM. Where are you leaving from?
…
![Page 30: Non-Native Users in the Let ’ s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch](https://reader035.fdocuments.us/reader035/viewer/2022081520/568167f1550346895ddd63bb/html5/thumbnails/30.jpg)
Example DialogueS: Welcome to the Let’s Go!! bus information system.
What can I do for you?
U: I want to go to downtown now I’m at Fifth and Bigelow.
S: Sorry, did you say “I want to go to downtown”?
U: Which number bus can I take and when will the next buscome here?
S: ... What can I do for you?
…