Real-Time Speech Recognition

14
Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter

description

Real-Time Speech Recognition. Thang Pham Advisor: Shane Cotter. Background. Types of speech recognition systems: Word recognition, Connected speech recognition, Speech understanding systems Simplest: user-dependent limited vocabulary Hard to design any system - PowerPoint PPT Presentation

Transcript of Real-Time Speech Recognition

Page 1: Real-Time Speech Recognition

Real-Time Speech Recognition

Thang PhamAdvisor: Shane Cotter

Page 2: Real-Time Speech Recognition

Background

Types of speech recognition systems: Word recognition, Connected speech recognition, Speech understanding systems

Simplest: user-dependent limited vocabulary

Hard to design any system Variations of speech, i.e.

amplitude, duration, and signal to noise

Background noise Reverberation noise.

Implemented in banking, telephone, etc. IBM ViaVoice

Page 3: Real-Time Speech Recognition

Project Outline

Design a user-dependent speech recognition system to control the movement of a small remote control car

Limited in vocabulary: Backward, Forward, Left, and Right Trained to my voice

Different speech recognition algorithms were examined to understand the advantages and disadvantages of each system

Linear Predictive Coding Cepstrum Coefficients Mel-frequency Cepstrum Coefficients

Page 4: Real-Time Speech Recognition

System Design

Microphone

TI 6713 DSP Board

Sample word at 8 kHz

Segment word into time frames

Find Mel-Cepstrum coefficients for each frame

Compare input word to a codebook of defined words using

dynamic time warping

Recognized word

Page 5: Real-Time Speech Recognition

Components List

Texas Instruments TMS320C6713 DSP Board

Audio Technica Omnidirectional Microphone ATR35S

Two step motors

Page 6: Real-Time Speech Recognition

Linear Predictive Coding

Provides a good model of the speech signal. Can approximate a speech sample at time n from past

samples.

where a1,a2,…,ap are coefficients that weight each sample.

)(...)2()1()( 21 pnsansansans p

Page 7: Real-Time Speech Recognition

Mel-frequency Cepstrum Coefficients

Research has shown mel-frequency cepstrum coefficients to be better than cepstrum coefficients and LPC

Modeled around human auditory system (ear)

where cn is the nth order mel-frequency cepstrum, and Sk is the power of the kth mel filter.

12 mel-frequency cepstrum coefficients characterize each time frame

M

k MknkSLognC

1

]*)5.0(*cos[*])[(][

Page 8: Real-Time Speech Recognition

Dynamic Time Warping

Arranged mel-frequency coefficients into vectors

Use dynamic time warping to find best match

Compare words that are uttered in a different time frame.

You have a referenced word that you are listening for

You have a sampled word

Want to compared both words, sampled and referenced, and see if they match

Compare mel-frequency cepstrum coefficients for each frame of speech

Page 9: Real-Time Speech Recognition

Dynamic Time Warping

Example of DTW:

Page 10: Real-Time Speech Recognition

Dynamic Time Warping

Solution:

Page 11: Real-Time Speech Recognition

Results

Word Recognition Rate

Backward 50 %

Forward 70 %

Left 90 %

Right 40 %

Sources of error: 1. Noise, i.e. computer fan, fluorescent light.2. Voice changes, i.e. a word spoken on a day might not sound the same on thenext day3. Trained to one word template

Page 12: Real-Time Speech Recognition

Problems Encountered

Warping frequency domain into mel-frequency, i.e. Log10.

Translation of MATLAB code into C, i.e. dynamic arrays, debugging process

Dynamic time warping, i.e. theory, algorithm

7001*2595 10

Hzmel

FLogF

Page 13: Real-Time Speech Recognition

Future Work

• The C implementation of this system is being developed. The implementation will be uploaded onto the TI 6713 DSP Board once it is completed.

• The code will be modified to allow the recognition system to operate in real-time.

• A more comprehensive testing of the system will be performed under a variety of noise conditions.

Page 14: Real-Time Speech Recognition

That is all.