Speech Recognition

19
Speech Recognition Created By : Kanjariya Hardik G. Roll No : 17

Transcript of Speech Recognition

Speech Recognition

Created By : Kanjariya Hardik G. Roll No : 17

Introduction

Speech recognition technology has recently reached a higher level of performance and robustness, allowing it to communicate to another user by talking .

Speech Recognization is process of decoding acoustic speech signal captured by microphone or telephone ,to a set of words.

And with the help of these it will recognize whole speech is recognized word by word .

Types of SR There are two main types of speaker models: speaker independent

and speaker dependent.

Speaker independent models recognize the speech patterns of a large group of people.

Speaker dependent models recognize speech patterns from only one person. Both models use mathematical and statistical formulas to yield the best work match for speech. A third variation of speaker models is now emerging, called speaker adaptive.

Speaker adaptive systems usually begin with a speaker independent model and adjust these models more closely to each individual during a brief training period.

Speech produces a sound pressure wave which forms an acoustic signal.

The microphone – receives the acoustic signal and converts it to an

analogue signal.

To store the analogue signal, it must be converted to a digital signal.

A speech recognizer tries to transform a digitally encoded acoustic signal in a natural language

into text in that language.

How does it works?..

Speech Waveform/Spectrogram

The spectrogram is an alternative way to characterize speech.

The louder the sound the greater the amplitude on the y-axis.

s p ee ch l a bHz

s

Speech Recognition Process Flow

Audio input

Grammar

Acoustic Model

Recognized text

The major components

It is important to understand that this audio stream is rarely pristine

It contains not only the speech data (what was said) but also background noise.

This noise can interfere with the recognition process, and the speech engine must handle (and possibly even adapt to) the environment within which the audio is spoken.

Audio I/O

Once the speech data is in the proper format, the engine searches for the best match.

It does this by taking into consideration the words and phrases it knows about (the active grammars), along with its knowledge of the environment in which it is operating.

The knowledge of the environment is provided in the form of an acoustic model.

Once it identifies the most likely match for what was said, it returns what it recognized as a text string.

Acoustic+Grammer

About SR Engine

SR requires a software application "engine" with logic built in to decipher and act on the spoken word.

Sound Card – Converts acoustic signal to digital signal.

Function of SR Engine-– SR Engine converts these digital signal to

phonemes to word.

Different SR engine

CMU Sphinx

Microsoft SAPI

IBM ViaVoice

Decoding process.

Recognition Process Flow Summary

Step 1:User Input The system catches user’s voice in the form of analog

acoustic signal.

Step 2:Digitization Digitize the analog acoustic signal.

Step 3:Phonetic Breakdown Breaking signals into phonemes.

Recognition Process Flow Summary

Step 4:Statistical Modeling Mapping phonemes to their phonetic representation

using statistics model.

Step 5:Matching According to grammar , phonetic representation and

Dictionary , the system returns an n-best list (I.e.:a word plus a confidence score)

Grammar-the union words or phrases to constraint the range of input or output in the voice application.

Dictionary-the mapping table of phonetic representation and word(EX:thu,theethe)

REPRESENTATION OF SOFTWARE

15

Challenges and Difficultiesof SR

Speech Recognition is still a very cumbersome problem. Following are the problem….

Speaker Variability Two speakers or even the same speaker will pronounce the

same word differently

Channel Variability The quality and position of microphone and background

environment will affect the output

Current Software Options for PC

Dragon Systems – Naturally Speaking

Philips – FreeSpeech

IBM – ViaVoice

Lernout & Hauspie – Voice Xpress