Automatic Speech Recognition
-
Upload
yogesh-vijay -
Category
Technology
-
view
385 -
download
0
description
Transcript of Automatic Speech Recognition
![Page 1: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/1.jpg)
PROBLEM STATEMENT
TO IMPLEMENT AND CONSTRUCT A SPOKEN DIALOGUE SYSTEM (SDS) WHICH TAKES SPEECHES AS INPUT AND STORED ACCORDING TO PERSON’S IDENTITY AND AT THE END
PERFORMS SPOKEN LANGUAGE GENERATION.
![Page 2: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/2.jpg)
YoGiV 2
Assumptions
We are considering that while giving speech to our system. It is quite exhaustive that it has no noise other than coming from user.
At certain places we use stored database in that
generates after training sets had done.
11/14/2012
![Page 3: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/3.jpg)
YoGiV 3
SPOKEN DIALOGUE SYSTEM
To implement the above system we have 3 subsystems.
1. ASR (Automatic Speech Recognition)
2. DIALOGUE MANAGEMENT
3. SPOKEN LANGUAGE GENERATION
11/14/2012
![Page 4: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/4.jpg)
YoGiV 4
AUTOMATIC SPEECH RECOGNITION
This is the 1st subsystem used in SDS which takes
voice as input and converts it into grammatically correct speech and stores in the system. This system moreover focuses on making the voice (including noise) into certain speech which further can be used in our next subsystem. This is our main area to focus.
11/14/2012
![Page 5: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/5.jpg)
YoGiV 5
DIALOGUE MANAGEMENT
This system mainly focus in the management of the
output taken by ASR according to the individual identity and Stores in the system for using in next subsystem
11/14/2012
![Page 6: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/6.jpg)
YoGiV 6
SPOKEN LANGUAGE GENERATION
This subsystem uses stored speeches and generates
spoken language (say English in our case).
11/14/2012
![Page 7: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/7.jpg)
YoGiV 7
Main Flow Diagram:
11/14/2012
![Page 8: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/8.jpg)
YoGiV 8
Now in our case we are dealing with ASR (Automatic Speech
Recognition)
11/14/2012
![Page 9: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/9.jpg)
YoGiV 9
AUTOMATIC SPEECH RECOGNITION
ASR will take voice as input and accordingly convert to understandable speeches.
Question Arise How can system distinguish between different
speakers? How can system distinguish between ambient noise
and someone speaking? How can system derive meaning from what was said? For the above questions we start to describe our
important part “Speech”
11/14/2012
![Page 10: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/10.jpg)
YoGiV 10
SPEECH
Some of the factors which are to be taken in mind while taking speech as input.
a) Biological Factorsb) Phonologyc) Frequency of Soundsd) Timing
11/14/2012
![Page 11: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/11.jpg)
YoGiV 11
Biological Factors
1. The way our mouth move to produce certain sounds affect the features of the sound itself.
2. The structure of the mouth produces multiple waves in certain patterns.
3. When we manipulate our mouths in the way to make certain letters say‘t’ we push out more air at once, making a higher frequency sound. So from this we have one thing to take care is frequency of speech and with frequency we take Amplitude and Pitch into consideration.
11/14/2012
![Page 12: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/12.jpg)
YoGiV 12
Phonology
It shows that how we use sound to convey meaning in a language
In English it states characteristics of sounds like vowels and consonants.
Phoneme is the smallest segmental unit of sound in a language. Each Phoneme has features in the sound that differs it from another Phoneme Combine to represent words and sentences. Regarding English we have about 40-50 phonemes. So we use phoneme to remove any noise from the sound
11/14/2012
![Page 13: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/13.jpg)
YoGiV 13
Frequency of Sounds
Different vowels have different pitches; they are similar to musical notes
for ex. 'i' being the highest 'u' being the lowest Consonant phonemes have more waves oscillating
of different parts of the mouth. So according to different frequency system we can
store words with different phoneme.
11/14/2012
![Page 14: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/14.jpg)
YoGiV 14
Timing
There is a lot of information in timing. Breaks between words, breaks between one sentence and another, so this all to be considered in the speech to distinguish between different words. According to Research Vowels last longer than consonants.
Now by looking above factors we have to: Translate from frequencies to a representation of a
phoneme. Discarding the useless information like noise, etc. The sentence created must make some sense.
11/14/2012
![Page 15: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/15.jpg)
YoGiV 15
For the above problems we use two models and one database:
Acoustic Model Dictionary Language model
11/14/2012
![Page 16: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/16.jpg)
YoGiV 16
Speech to Features
Based on all the features of a sound wave
Frequency Pitch Amplitude Time information
11/14/2012
![Page 17: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/17.jpg)
YoGiV 17
Acoustic Model
● The Acoustic Model is the statistical mapping from the units of speech to all the features of speech.
● Convert Speech Sound to Phoneme then to Word
Statistical● Tells information about the language Phonology. It can learn from a training set.
11/14/2012
![Page 18: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/18.jpg)
YoGiV 18
Dictionary
It checks the Word broken into the phoneme sounds as what they are typically made of.
11/14/2012
![Page 19: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/19.jpg)
YoGiV 19
Language Model
● Provides word-level structure for a language. ● Use formal grammar rules to make sentence. As we use context to place
particular word at particular place. To implement the above context matching in systems we use technique of
Probability. For this we calculate probability of next coming word by using previous probability
Probability of word is based on the last N-1 terms P(Y) =∑ P (Y|X) P(X)(Sum over x)X= Probability of all the existing word in sentence.Y= Probability of observing a sequence.
11/14/2012
![Page 20: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/20.jpg)
YoGiV 20
FLOW DIAGRAM OF ASR
11/14/2012
![Page 21: Automatic Speech Recognition](https://reader034.fdocuments.us/reader034/viewer/2022051411/5454b739af79590c338b49bd/html5/thumbnails/21.jpg)
YoGiV 21
Yogesh Vijay
B. Tech , Computer Science
JIIT , Noida
11/14/2012