Speech recognition challenges

24
Speech Recognition Challenges Presenter: Alexandru Chica

description

 

Transcript of Speech recognition challenges

Page 1: Speech recognition challenges

Speech Recognition Challenges

Presenter: Alexandru Chica

Page 2: Speech recognition challenges

Contents

Speech User Interface basic concepts

• Speech recognition

• Speech synthesis

Speech Recognition Challenges

• Accuracy

• User responsiveness

• Performance

• Reliability

• Fault tolerance

Page 3: Speech recognition challenges

Speech User Interface basic concepts

Speech Recognition

• The translation of spoken text into written text

"#'spit&S#" "speech"algorithm

• Statistical Processing• Hidden Marcov

Models• Dynamic Time

Warping

Phonetic representation of speech

Types of speech recognition:• Command and control• Dictation

Page 4: Speech recognition challenges

Speech User Interface basic concepts

Speech Recognition Components

• Audio input (front-end)• Grammars – contain commands that can be spoken by the user• Acoustic models – language dependant, used to “define” the language features• Recognition algorithms (back-end)

Audio input / Front end

Acoustic models Grammars Recognition

algorithms

feature extraction

Back end

result

Page 5: Speech recognition challenges

Speech User Interface basic concepts

Speech Recognition APIs

Microsoft SAPI IBM: Embedded ViaVoice

Nuance: VoCon VoiceBox Speech Recognition

Page 6: Speech recognition challenges

Speech User Interface basic concepts

Speech Synthesis

• The translation of written text into spoken text

"#'spit&S#""speech"g2p

Page 7: Speech recognition challenges

Speech User Interface basic concepts

Speech Synthesis APIs

Microsoft SAPI

Nuance: Vocalizer

SoftVoice TTS

SVOX TTS

Apple PlainTalk

eSpeak

Page 8: Speech recognition challenges

Speech User Interface basic concepts - Usage

In car:

• Control media player / radio stations

• Control navigation

• Control phone book and phone activities

• Find POI locations (POI : point of interests)

• E-mail/SMS reading

On the web:

• HTML 5 speech input

• Google Search with voice input

• Reading of web page content

Page 9: Speech recognition challenges

Speech Recognition Challenges – Accuracy

Audio Input

Problem: Audio signal qualityImpact: loss of recognition accuracy

Solution 1: Echo cancellation

Solution 2: Beamforming

Page 10: Speech recognition challenges

Speech Recognition Challenges – Accuracy

Audio Input

Problem: Talk-over problemImpact: loss of recognition accuracy

Solution: Barge-In

TTS

User

Page 11: Speech recognition challenges

Speech Recognition Challenges – User responsiveness

Speech Recognition

Problem: resources are not ready and user starts to speak the commandSolution: Delayed speech recognition

Delayed Speech Recognition

Resource loading /Front-end processing

Back-end processing

Page 12: Speech recognition challenges

Speech Recognition Challenges – User responsiveness

Speech Recognition

Problem: synchronization with multiple applications (media, phone, navigation)

Solution: apply concurrent design patterns

• Active Object

• Monitor

• Double-checked locking

Page 13: Speech recognition challenges

Speech Recognition Challenges – Performance

Grammars

Use cases:

• Command & Control grammars• 200 – 500 commands

• Navigation grammars• 100k+ static data

• Music grammars• 10k+ dynamic data

Page 14: Speech recognition challenges

Speech Recognition Challenges – Performance

Grammars (1)

Problem: Grammar size too bigImpact:

• increased loading times of files from disk to memory

Solution: Grammar optimization • merging of similar command tokens

Page 15: Speech recognition challenges

Speech Recognition Challenges – Performance

Grammars (2)

•removal / replacement of recursion rules

Page 16: Speech recognition challenges

Speech Recognition Challenges – Performance

Grammars (3)

Problem: Grammar token collisionsImpact:

• loss of recognition accuracySolution:

• replacement of collision prone tokens with synonyms• adding special pronunciation tokens to collision words

Examples:

sum – sun – sung

bet – bed

Page 17: Speech recognition challenges

Speech Recognition Challenges – Performance

Dynamic Grammars

Problem: synchronization with USB devices, phones, navigation databases takes too much time

Solution 1: implementation of a caching mechanism

Page 18: Speech recognition challenges

Speech Recognition Challenges – Performance

add to slot:title <DYN_TITLE>

artist <DYN_ARTIST>

Title: OneArtist: U2, Album: Achtung Baby,Genre: rock

...

transcriptions

Use id3 parser to read from mp3 files titles, artists, composers, genre, album. etc.

dynamic

grammar

Phoneme cache

Page 19: Speech recognition challenges

Speech Recognition Challenges – Performance

Dynamic Grammars

Solution 2: split the processing in two, and dispatch part of the work to a different processor

add to slot:title <DYN_TITLE>

artist <DYN_ARTIST>

Title: OneArtist: U2, Album: Achtung Baby,Genre: rock

...

Use id3 parser to read from mp3 files titles, artists, composers, genre, album. etc.

dynamic

grammar

CPU2

CPU2

CPU1

Preprocessing step

CPU1

Page 20: Speech recognition challenges

Speech Recognition Challenges – Reliability

Reliability - the ability of the system to keep operating over time

Problem: system has to operate correctly over large periods of time

Solution 1: automated tests

Solution 2: drive tests

Page 21: Speech recognition challenges

Speech Recognition Challenges – Fault tolerance

Problem: Recovery from system failures must be possible

Solution:

• system is modeled in a modular manner, with components that communicate via internal car area network.

• individual components can be restarted without affecting other system components

Page 22: Speech recognition challenges

Speech Recognition Challenges

TTS & ASR Demo

Page 23: Speech recognition challenges

Speech Recognition Challenges

Questions ?

Page 24: Speech recognition challenges

Speech Recognition Challenges

Thank You