Speech recognition challenges
-
Upload
alexandru-chica -
Category
Documents
-
view
2.150 -
download
0
description
Transcript of Speech recognition challenges
Speech Recognition Challenges
Presenter: Alexandru Chica
Contents
Speech User Interface basic concepts
• Speech recognition
• Speech synthesis
Speech Recognition Challenges
• Accuracy
• User responsiveness
• Performance
• Reliability
• Fault tolerance
Speech User Interface basic concepts
Speech Recognition
• The translation of spoken text into written text
"#'spit&S#" "speech"algorithm
• Statistical Processing• Hidden Marcov
Models• Dynamic Time
Warping
Phonetic representation of speech
Types of speech recognition:• Command and control• Dictation
Speech User Interface basic concepts
Speech Recognition Components
• Audio input (front-end)• Grammars – contain commands that can be spoken by the user• Acoustic models – language dependant, used to “define” the language features• Recognition algorithms (back-end)
Audio input / Front end
Acoustic models Grammars Recognition
algorithms
feature extraction
Back end
result
Speech User Interface basic concepts
Speech Recognition APIs
Microsoft SAPI IBM: Embedded ViaVoice
Nuance: VoCon VoiceBox Speech Recognition
Speech User Interface basic concepts
Speech Synthesis
• The translation of written text into spoken text
"#'spit&S#""speech"g2p
Speech User Interface basic concepts
Speech Synthesis APIs
Microsoft SAPI
Nuance: Vocalizer
SoftVoice TTS
SVOX TTS
Apple PlainTalk
eSpeak
Speech User Interface basic concepts - Usage
In car:
• Control media player / radio stations
• Control navigation
• Control phone book and phone activities
• Find POI locations (POI : point of interests)
• E-mail/SMS reading
On the web:
• HTML 5 speech input
• Google Search with voice input
• Reading of web page content
Speech Recognition Challenges – Accuracy
Audio Input
Problem: Audio signal qualityImpact: loss of recognition accuracy
Solution 1: Echo cancellation
Solution 2: Beamforming
Speech Recognition Challenges – Accuracy
Audio Input
Problem: Talk-over problemImpact: loss of recognition accuracy
Solution: Barge-In
TTS
User
Speech Recognition Challenges – User responsiveness
Speech Recognition
Problem: resources are not ready and user starts to speak the commandSolution: Delayed speech recognition
Delayed Speech Recognition
Resource loading /Front-end processing
Back-end processing
Speech Recognition Challenges – User responsiveness
Speech Recognition
Problem: synchronization with multiple applications (media, phone, navigation)
Solution: apply concurrent design patterns
• Active Object
• Monitor
• Double-checked locking
Speech Recognition Challenges – Performance
Grammars
Use cases:
• Command & Control grammars• 200 – 500 commands
• Navigation grammars• 100k+ static data
• Music grammars• 10k+ dynamic data
Speech Recognition Challenges – Performance
Grammars (1)
Problem: Grammar size too bigImpact:
• increased loading times of files from disk to memory
Solution: Grammar optimization • merging of similar command tokens
Speech Recognition Challenges – Performance
Grammars (2)
•removal / replacement of recursion rules
Speech Recognition Challenges – Performance
Grammars (3)
Problem: Grammar token collisionsImpact:
• loss of recognition accuracySolution:
• replacement of collision prone tokens with synonyms• adding special pronunciation tokens to collision words
Examples:
sum – sun – sung
bet – bed
Speech Recognition Challenges – Performance
Dynamic Grammars
Problem: synchronization with USB devices, phones, navigation databases takes too much time
Solution 1: implementation of a caching mechanism
Speech Recognition Challenges – Performance
add to slot:title <DYN_TITLE>
artist <DYN_ARTIST>
Title: OneArtist: U2, Album: Achtung Baby,Genre: rock
...
transcriptions
Use id3 parser to read from mp3 files titles, artists, composers, genre, album. etc.
dynamic
grammar
Phoneme cache
Speech Recognition Challenges – Performance
Dynamic Grammars
Solution 2: split the processing in two, and dispatch part of the work to a different processor
add to slot:title <DYN_TITLE>
artist <DYN_ARTIST>
Title: OneArtist: U2, Album: Achtung Baby,Genre: rock
...
Use id3 parser to read from mp3 files titles, artists, composers, genre, album. etc.
dynamic
grammar
CPU2
CPU2
CPU1
Preprocessing step
CPU1
Speech Recognition Challenges – Reliability
Reliability - the ability of the system to keep operating over time
Problem: system has to operate correctly over large periods of time
Solution 1: automated tests
Solution 2: drive tests
Speech Recognition Challenges – Fault tolerance
Problem: Recovery from system failures must be possible
Solution:
• system is modeled in a modular manner, with components that communicate via internal car area network.
• individual components can be restarted without affecting other system components
Speech Recognition Challenges
TTS & ASR Demo
Speech Recognition Challenges
Questions ?
Speech Recognition Challenges
Thank You