Speech Recognition

Speech Recognition in Konkani

Nilkanth Shet Shirodkar

What is Speech Recognition

Also known as automatic speech recognition or computer speech recognition which means understanding voice by the computer and performing any required task

Where can it be used

- System controlControlling devices

- CommercialIndustrial applications

- Voice dialing

Recognition

Voice Input Analog to Digital Acoustic Model

Language Model

Display Speech Engine

Speech Recognition

bull 1 Voice recording2 Word boundary detection3 Feature extraction 4 Recognition with the help of language models

Components of the recognition system

①Sound recording and Word detection Component Takes the input from the audio recorder preferably

microphone and identifies the word in the input signal Word detection is usually done by using the energy and the zero crossing rate of the signal The output of this component is then sent to the feature extractor module

②Feature Extractor This is responsible for generating the feature

vectors for the audio signals input to it from the word detection component It generates the MFCC (Mel Frequency Cepstrum Coe1113094fficients) which is used later to identify the audio signal

bull 3 Recognition System ndash HMM (Hidden Markov Model-based) component

which takes as input the feature vectors generated from the feature extractor component and then finds the best or most suitable match from the knowledge model

bull 4 Knowledge Model ndash language dictionary which is used to identify the

sound signal

Speech Recognition system

Acoustic Model

bull Features that were extracted from the input sound by the extraction module have to be compared with some predefined model to identify the spoken word

bull Word Modelbull Phone Model

bull Phone Model - Only parts of words called phones are modelled instead of modelling the word as a whole Instaed of matching the sound with each word we match the sound with the words and recognise the parts

bull Word Model - The words are modelled as a whole During recognition the input sound matched against each word present in the wodel and the best possible match is then considered to be spoken word

o Phone Set - Phoneme is the basic or the smallest unit of sound

o aa a iy o Dictionary bull A dictionary is also known as the pronunciation

lexicon specifies the pronunciations of the words as linear sequence of phonemes

bull the dh axbull on aa n

Language Model

bull Providing a fair idea about the context and the words that can occur in the context to the speech recognition system It also provides an idea about the different words that are possible in the language and the sequence in which these words may occur

HMM for ASR

bull Building an HMM for each phonebull Combine the phone models based on the

pronunciation model to create word level models

bull Word level models are combined based on the language model

How Language Models work

bull Hard to compute ndash P(ldquoAnd nothing but the truthrdquo)

bull Decompose probabilityndash P(ldquoAnd nothing but the truth) = P(ldquoAndrdquo)

P(ldquonothing|andrdquo) P(ldquobut|and nothingrdquo) P(ldquothe|and nothing butrdquo) P(ldquotruth|and nothing but therdquo)

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet




Slide 4

Speech Recognition


Slide 7

Slide 8


Acoustic Model

Slide 11

Slide 12

Language Model

HMM for ASR

Slide 15


CMUSphinx

Jasper

Resources

References


Also known as automatic speech recognition or computer speech recognition which means understanding voice by the computer and performing any required task




- Voice dialing

Recognition


Language Model


Speech Recognition










sound signal


Acoustic Model









Language Model


HMM for ASR








CMUSphinx



CMU Sphinx

Jasper



Assistant

Resources





References







Slide 4

Speech Recognition


Slide 7

Slide 8


Acoustic Model

Slide 11

Slide 12

Language Model

HMM for ASR

Slide 15


CMUSphinx

Jasper

Resources

References




- Voice dialing

Recognition


Language Model


Speech Recognition










sound signal


Acoustic Model









Language Model


HMM for ASR








CMUSphinx



CMU Sphinx

Jasper



Assistant

Resources





References







Slide 4

Speech Recognition


Slide 7

Slide 8


Acoustic Model

Slide 11

Slide 12

Language Model

HMM for ASR

Slide 15


CMUSphinx

Jasper

Resources

References

Recognition


Language Model


Speech Recognition










sound signal


Acoustic Model









Language Model


HMM for ASR








CMUSphinx



CMU Sphinx

Jasper



Assistant

Resources





References







Slide 4

Speech Recognition


Slide 7

Slide 8


Acoustic Model

Slide 11

Slide 12

Language Model

HMM for ASR

Slide 15


CMUSphinx

Jasper

Resources

References

Speech Recognition










sound signal


Acoustic Model









Language Model


HMM for ASR








CMUSphinx



CMU Sphinx

Jasper



Assistant

Resources





References







Slide 4

Speech Recognition


Slide 7

Slide 8


Acoustic Model

Slide 11

Slide 12

Language Model

HMM for ASR

Slide 15


CMUSphinx

Jasper

Resources

References









sound signal


Acoustic Model









Language Model


HMM for ASR








CMUSphinx



CMU Sphinx

Jasper



Assistant

Resources





References







Slide 4

Speech Recognition


Slide 7

Slide 8


Acoustic Model

Slide 11

Slide 12

Language Model

HMM for ASR

Slide 15


CMUSphinx

Jasper

Resources

References


Acoustic Model









Language Model


HMM for ASR








CMUSphinx



CMU Sphinx

Jasper



Assistant

Resources





References







Slide 4

Speech Recognition


Slide 7

Slide 8


Acoustic Model

Slide 11

Slide 12

Language Model

HMM for ASR

Slide 15


CMUSphinx

Jasper

Resources

References







Language Model


HMM for ASR








CMUSphinx



CMU Sphinx

Jasper



Assistant

Resources





References







Slide 4

Speech Recognition


Slide 7

Slide 8


Acoustic Model

Slide 11

Slide 12

Language Model

HMM for ASR

Slide 15


CMUSphinx

Jasper

Resources

References

HMM for ASR








CMUSphinx



CMU Sphinx

Jasper



Assistant

Resources





References







Slide 4

Speech Recognition


Slide 7

Slide 8


Acoustic Model

Slide 11

Slide 12

Language Model

HMM for ASR

Slide 15


CMUSphinx

Jasper

Resources

References





CMUSphinx



CMU Sphinx

Jasper



Assistant

Resources





References







Slide 4

Speech Recognition


Slide 7

Slide 8


Acoustic Model

Slide 11

Slide 12

Language Model

HMM for ASR

Slide 15


CMUSphinx

Jasper

Resources

References

Jasper



Assistant

Resources





References







Slide 4

Speech Recognition


Slide 7

Slide 8


Acoustic Model

Slide 11

Slide 12

Language Model

HMM for ASR

Slide 15


CMUSphinx

Jasper

Resources

References

Resources





References







Slide 4

Speech Recognition


Slide 7

Slide 8


Acoustic Model

Slide 11

Slide 12

Language Model

HMM for ASR

Slide 15


CMUSphinx

Jasper

Resources

References

References







Slide 4

Speech Recognition


Slide 7

Slide 8


Acoustic Model

Slide 11

Slide 12

Language Model

HMM for ASR

Slide 15


CMUSphinx

Jasper

Resources

References

Speech Recognition

Education

Transcript of Speech Recognition