Speech recognition challenges

Speech Recognition Challenges

Presenter: Alexandru Chica

Contents

Speech User Interface basic concepts

• Speech recognition

• Speech synthesis


• Accuracy

• User responsiveness

• Performance

• Reliability

• Fault tolerance


Speech Recognition

• The translation of spoken text into written text

"#'spit&S#" "speech"algorithm

• Statistical Processing• Hidden Marcov

Models• Dynamic Time

Warping

Phonetic representation of speech

Types of speech recognition:• Command and control• Dictation


Speech Recognition Components

• Audio input (front-end)• Grammars – contain commands that can be spoken by the user• Acoustic models – language dependant, used to “define” the language features• Recognition algorithms (back-end)

Audio input / Front end

Acoustic models Grammars Recognition

algorithms

feature extraction

Back end

result


Speech Recognition APIs

Microsoft SAPI IBM: Embedded ViaVoice

Nuance: VoCon VoiceBox Speech Recognition


Speech Synthesis

• The translation of written text into spoken text

"#'spit&S#""speech"g2p


Speech Synthesis APIs

Microsoft SAPI

Nuance: Vocalizer

SoftVoice TTS

SVOX TTS

Apple PlainTalk

eSpeak

Speech User Interface basic concepts - Usage

In car:

• Control media player / radio stations

• Control navigation

• Control phone book and phone activities

• Find POI locations (POI : point of interests)

• E-mail/SMS reading

On the web:

• HTML 5 speech input

• Google Search with voice input

• Reading of web page content

Speech Recognition Challenges – Accuracy

Audio Input

Problem: Audio signal qualityImpact: loss of recognition accuracy

Solution 1: Echo cancellation

Solution 2: Beamforming

Speech Recognition Challenges – Accuracy

Audio Input

Problem: Talk-over problemImpact: loss of recognition accuracy

Solution: Barge-In

TTS

User

Speech Recognition Challenges – User responsiveness

Speech Recognition

Problem: resources are not ready and user starts to speak the commandSolution: Delayed speech recognition

Delayed Speech Recognition

Resource loading /Front-end processing

Back-end processing

Speech Recognition Challenges – User responsiveness

Speech Recognition

Problem: synchronization with multiple applications (media, phone, navigation)

Solution: apply concurrent design patterns

• Active Object

• Monitor

• Double-checked locking

Speech Recognition Challenges – Performance

Grammars

Use cases:

• Command & Control grammars• 200 – 500 commands

• Navigation grammars• 100k+ static data

• Music grammars• 10k+ dynamic data


Grammars (1)

Problem: Grammar size too bigImpact:

• increased loading times of files from disk to memory

Solution: Grammar optimization • merging of similar command tokens


Grammars (2)

•removal / replacement of recursion rules


Grammars (3)

Problem: Grammar token collisionsImpact:

• loss of recognition accuracySolution:

• replacement of collision prone tokens with synonyms• adding special pronunciation tokens to collision words

Examples:

sum – sun – sung

bet – bed


Dynamic Grammars

Problem: synchronization with USB devices, phones, navigation databases takes too much time

Solution 1: implementation of a caching mechanism


add to slot:title <DYN_TITLE>

artist <DYN_ARTIST>

Title: OneArtist: U2, Album: Achtung Baby,Genre: rock

...

transcriptions

Use id3 parser to read from mp3 files titles, artists, composers, genre, album. etc.

dynamic

grammar

Phoneme cache


Dynamic Grammars

Solution 2: split the processing in two, and dispatch part of the work to a different processor

add to slot:title <DYN_TITLE>

artist <DYN_ARTIST>

Title: OneArtist: U2, Album: Achtung Baby,Genre: rock

...

Use id3 parser to read from mp3 files titles, artists, composers, genre, album. etc.

dynamic

grammar

CPU2

CPU2

CPU1

Preprocessing step

CPU1

Speech Recognition Challenges – Reliability

Reliability - the ability of the system to keep operating over time

Problem: system has to operate correctly over large periods of time

Solution 1: automated tests

Solution 2: drive tests

Speech Recognition Challenges – Fault tolerance

Problem: Recovery from system failures must be possible

Solution:

• system is modeled in a modular manner, with components that communicate via internal car area network.

• individual components can be restarted without affecting other system components


TTS & ASR Demo


Questions ?


Thank You

Speech recognition challenges

Documents

Transcript of Speech recognition challenges