Speech Recognition

20
Speech Recognition in Konkani Nilkanth Shet Shirodkar

Transcript of Speech Recognition

Page 1: Speech Recognition

Speech Recognition in Konkani

Nilkanth Shet Shirodkar

What is Speech Recognition

Also known as automatic speech recognition or computer speech recognition which means understanding voice by the computer and performing any required task

Where can it be used

- System controlControlling devices

- CommercialIndustrial applications

- Voice dialing

Recognition

Voice Input Analog to Digital Acoustic Model

Language Model

Display Speech Engine

Speech Recognition

bull 1 Voice recording2 Word boundary detection3 Feature extraction 4 Recognition with the help of language models

Components of the recognition system

①Sound recording and Word detection Component Takes the input from the audio recorder preferably

microphone and identifies the word in the input signal Word detection is usually done by using the energy and the zero crossing rate of the signal The output of this component is then sent to the feature extractor module

②Feature Extractor This is responsible for generating the feature

vectors for the audio signals input to it from the word detection component It generates the MFCC (Mel Frequency Cepstrum Coe1113094fficients) which is used later to identify the audio signal

bull 3 Recognition System ndash HMM (Hidden Markov Model-based) component

which takes as input the feature vectors generated from the feature extractor component and then finds the best or most suitable match from the knowledge model

bull 4 Knowledge Model ndash language dictionary which is used to identify the

sound signal

Speech Recognition system

Acoustic Model

bull Features that were extracted from the input sound by the extraction module have to be compared with some predefined model to identify the spoken word

bull Word Modelbull Phone Model

bull Phone Model - Only parts of words called phones are modelled instead of modelling the word as a whole Instaed of matching the sound with each word we match the sound with the words and recognise the parts

bull Word Model - The words are modelled as a whole During recognition the input sound matched against each word present in the wodel and the best possible match is then considered to be spoken word

o Phone Set - Phoneme is the basic or the smallest unit of sound

o aa a iy o Dictionary bull A dictionary is also known as the pronunciation

lexicon specifies the pronunciations of the words as linear sequence of phonemes

bull the dh axbull on aa n

Language Model

bull Providing a fair idea about the context and the words that can occur in the context to the speech recognition system It also provides an idea about the different words that are possible in the language and the sequence in which these words may occur

HMM for ASR

bull Building an HMM for each phonebull Combine the phone models based on the

pronunciation model to create word level models

bull Word level models are combined based on the language model

How Language Models work

bull Hard to compute ndash P(ldquoAnd nothing but the truthrdquo)

bull Decompose probabilityndash P(ldquoAnd nothing but the truth) = P(ldquoAndrdquo)

P(ldquonothing|andrdquo) P(ldquobut|and nothingrdquo) P(ldquothe|and nothing butrdquo) P(ldquotruth|and nothing but therdquo)

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 2: Speech Recognition

What is Speech Recognition

Also known as automatic speech recognition or computer speech recognition which means understanding voice by the computer and performing any required task

Where can it be used

- System controlControlling devices

- CommercialIndustrial applications

- Voice dialing

Recognition

Voice Input Analog to Digital Acoustic Model

Language Model

Display Speech Engine

Speech Recognition

bull 1 Voice recording2 Word boundary detection3 Feature extraction 4 Recognition with the help of language models

Components of the recognition system

①Sound recording and Word detection Component Takes the input from the audio recorder preferably

microphone and identifies the word in the input signal Word detection is usually done by using the energy and the zero crossing rate of the signal The output of this component is then sent to the feature extractor module

②Feature Extractor This is responsible for generating the feature

vectors for the audio signals input to it from the word detection component It generates the MFCC (Mel Frequency Cepstrum Coe1113094fficients) which is used later to identify the audio signal

bull 3 Recognition System ndash HMM (Hidden Markov Model-based) component

which takes as input the feature vectors generated from the feature extractor component and then finds the best or most suitable match from the knowledge model

bull 4 Knowledge Model ndash language dictionary which is used to identify the

sound signal

Speech Recognition system

Acoustic Model

bull Features that were extracted from the input sound by the extraction module have to be compared with some predefined model to identify the spoken word

bull Word Modelbull Phone Model

bull Phone Model - Only parts of words called phones are modelled instead of modelling the word as a whole Instaed of matching the sound with each word we match the sound with the words and recognise the parts

bull Word Model - The words are modelled as a whole During recognition the input sound matched against each word present in the wodel and the best possible match is then considered to be spoken word

o Phone Set - Phoneme is the basic or the smallest unit of sound

o aa a iy o Dictionary bull A dictionary is also known as the pronunciation

lexicon specifies the pronunciations of the words as linear sequence of phonemes

bull the dh axbull on aa n

Language Model

bull Providing a fair idea about the context and the words that can occur in the context to the speech recognition system It also provides an idea about the different words that are possible in the language and the sequence in which these words may occur

HMM for ASR

bull Building an HMM for each phonebull Combine the phone models based on the

pronunciation model to create word level models

bull Word level models are combined based on the language model

How Language Models work

bull Hard to compute ndash P(ldquoAnd nothing but the truthrdquo)

bull Decompose probabilityndash P(ldquoAnd nothing but the truth) = P(ldquoAndrdquo)

P(ldquonothing|andrdquo) P(ldquobut|and nothingrdquo) P(ldquothe|and nothing butrdquo) P(ldquotruth|and nothing but therdquo)

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 3: Speech Recognition

Where can it be used

- System controlControlling devices

- CommercialIndustrial applications

- Voice dialing

Recognition

Voice Input Analog to Digital Acoustic Model

Language Model

Display Speech Engine

Speech Recognition

bull 1 Voice recording2 Word boundary detection3 Feature extraction 4 Recognition with the help of language models

Components of the recognition system

①Sound recording and Word detection Component Takes the input from the audio recorder preferably

microphone and identifies the word in the input signal Word detection is usually done by using the energy and the zero crossing rate of the signal The output of this component is then sent to the feature extractor module

②Feature Extractor This is responsible for generating the feature

vectors for the audio signals input to it from the word detection component It generates the MFCC (Mel Frequency Cepstrum Coe1113094fficients) which is used later to identify the audio signal

bull 3 Recognition System ndash HMM (Hidden Markov Model-based) component

which takes as input the feature vectors generated from the feature extractor component and then finds the best or most suitable match from the knowledge model

bull 4 Knowledge Model ndash language dictionary which is used to identify the

sound signal

Speech Recognition system

Acoustic Model

bull Features that were extracted from the input sound by the extraction module have to be compared with some predefined model to identify the spoken word

bull Word Modelbull Phone Model

bull Phone Model - Only parts of words called phones are modelled instead of modelling the word as a whole Instaed of matching the sound with each word we match the sound with the words and recognise the parts

bull Word Model - The words are modelled as a whole During recognition the input sound matched against each word present in the wodel and the best possible match is then considered to be spoken word

o Phone Set - Phoneme is the basic or the smallest unit of sound

o aa a iy o Dictionary bull A dictionary is also known as the pronunciation

lexicon specifies the pronunciations of the words as linear sequence of phonemes

bull the dh axbull on aa n

Language Model

bull Providing a fair idea about the context and the words that can occur in the context to the speech recognition system It also provides an idea about the different words that are possible in the language and the sequence in which these words may occur

HMM for ASR

bull Building an HMM for each phonebull Combine the phone models based on the

pronunciation model to create word level models

bull Word level models are combined based on the language model

How Language Models work

bull Hard to compute ndash P(ldquoAnd nothing but the truthrdquo)

bull Decompose probabilityndash P(ldquoAnd nothing but the truth) = P(ldquoAndrdquo)

P(ldquonothing|andrdquo) P(ldquobut|and nothingrdquo) P(ldquothe|and nothing butrdquo) P(ldquotruth|and nothing but therdquo)

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 4: Speech Recognition

Recognition

Voice Input Analog to Digital Acoustic Model

Language Model

Display Speech Engine

Speech Recognition

bull 1 Voice recording2 Word boundary detection3 Feature extraction 4 Recognition with the help of language models

Components of the recognition system

①Sound recording and Word detection Component Takes the input from the audio recorder preferably

microphone and identifies the word in the input signal Word detection is usually done by using the energy and the zero crossing rate of the signal The output of this component is then sent to the feature extractor module

②Feature Extractor This is responsible for generating the feature

vectors for the audio signals input to it from the word detection component It generates the MFCC (Mel Frequency Cepstrum Coe1113094fficients) which is used later to identify the audio signal

bull 3 Recognition System ndash HMM (Hidden Markov Model-based) component

which takes as input the feature vectors generated from the feature extractor component and then finds the best or most suitable match from the knowledge model

bull 4 Knowledge Model ndash language dictionary which is used to identify the

sound signal

Speech Recognition system

Acoustic Model

bull Features that were extracted from the input sound by the extraction module have to be compared with some predefined model to identify the spoken word

bull Word Modelbull Phone Model

bull Phone Model - Only parts of words called phones are modelled instead of modelling the word as a whole Instaed of matching the sound with each word we match the sound with the words and recognise the parts

bull Word Model - The words are modelled as a whole During recognition the input sound matched against each word present in the wodel and the best possible match is then considered to be spoken word

o Phone Set - Phoneme is the basic or the smallest unit of sound

o aa a iy o Dictionary bull A dictionary is also known as the pronunciation

lexicon specifies the pronunciations of the words as linear sequence of phonemes

bull the dh axbull on aa n

Language Model

bull Providing a fair idea about the context and the words that can occur in the context to the speech recognition system It also provides an idea about the different words that are possible in the language and the sequence in which these words may occur

HMM for ASR

bull Building an HMM for each phonebull Combine the phone models based on the

pronunciation model to create word level models

bull Word level models are combined based on the language model

How Language Models work

bull Hard to compute ndash P(ldquoAnd nothing but the truthrdquo)

bull Decompose probabilityndash P(ldquoAnd nothing but the truth) = P(ldquoAndrdquo)

P(ldquonothing|andrdquo) P(ldquobut|and nothingrdquo) P(ldquothe|and nothing butrdquo) P(ldquotruth|and nothing but therdquo)

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 5: Speech Recognition

Speech Recognition

bull 1 Voice recording2 Word boundary detection3 Feature extraction 4 Recognition with the help of language models

Components of the recognition system

①Sound recording and Word detection Component Takes the input from the audio recorder preferably

microphone and identifies the word in the input signal Word detection is usually done by using the energy and the zero crossing rate of the signal The output of this component is then sent to the feature extractor module

②Feature Extractor This is responsible for generating the feature

vectors for the audio signals input to it from the word detection component It generates the MFCC (Mel Frequency Cepstrum Coe1113094fficients) which is used later to identify the audio signal

bull 3 Recognition System ndash HMM (Hidden Markov Model-based) component

which takes as input the feature vectors generated from the feature extractor component and then finds the best or most suitable match from the knowledge model

bull 4 Knowledge Model ndash language dictionary which is used to identify the

sound signal

Speech Recognition system

Acoustic Model

bull Features that were extracted from the input sound by the extraction module have to be compared with some predefined model to identify the spoken word

bull Word Modelbull Phone Model

bull Phone Model - Only parts of words called phones are modelled instead of modelling the word as a whole Instaed of matching the sound with each word we match the sound with the words and recognise the parts

bull Word Model - The words are modelled as a whole During recognition the input sound matched against each word present in the wodel and the best possible match is then considered to be spoken word

o Phone Set - Phoneme is the basic or the smallest unit of sound

o aa a iy o Dictionary bull A dictionary is also known as the pronunciation

lexicon specifies the pronunciations of the words as linear sequence of phonemes

bull the dh axbull on aa n

Language Model

bull Providing a fair idea about the context and the words that can occur in the context to the speech recognition system It also provides an idea about the different words that are possible in the language and the sequence in which these words may occur

HMM for ASR

bull Building an HMM for each phonebull Combine the phone models based on the

pronunciation model to create word level models

bull Word level models are combined based on the language model

How Language Models work

bull Hard to compute ndash P(ldquoAnd nothing but the truthrdquo)

bull Decompose probabilityndash P(ldquoAnd nothing but the truth) = P(ldquoAndrdquo)

P(ldquonothing|andrdquo) P(ldquobut|and nothingrdquo) P(ldquothe|and nothing butrdquo) P(ldquotruth|and nothing but therdquo)

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 6: Speech Recognition

Components of the recognition system

①Sound recording and Word detection Component Takes the input from the audio recorder preferably

microphone and identifies the word in the input signal Word detection is usually done by using the energy and the zero crossing rate of the signal The output of this component is then sent to the feature extractor module

②Feature Extractor This is responsible for generating the feature

vectors for the audio signals input to it from the word detection component It generates the MFCC (Mel Frequency Cepstrum Coe1113094fficients) which is used later to identify the audio signal

bull 3 Recognition System ndash HMM (Hidden Markov Model-based) component

which takes as input the feature vectors generated from the feature extractor component and then finds the best or most suitable match from the knowledge model

bull 4 Knowledge Model ndash language dictionary which is used to identify the

sound signal

Speech Recognition system

Acoustic Model

bull Features that were extracted from the input sound by the extraction module have to be compared with some predefined model to identify the spoken word

bull Word Modelbull Phone Model

bull Phone Model - Only parts of words called phones are modelled instead of modelling the word as a whole Instaed of matching the sound with each word we match the sound with the words and recognise the parts

bull Word Model - The words are modelled as a whole During recognition the input sound matched against each word present in the wodel and the best possible match is then considered to be spoken word

o Phone Set - Phoneme is the basic or the smallest unit of sound

o aa a iy o Dictionary bull A dictionary is also known as the pronunciation

lexicon specifies the pronunciations of the words as linear sequence of phonemes

bull the dh axbull on aa n

Language Model

bull Providing a fair idea about the context and the words that can occur in the context to the speech recognition system It also provides an idea about the different words that are possible in the language and the sequence in which these words may occur

HMM for ASR

bull Building an HMM for each phonebull Combine the phone models based on the

pronunciation model to create word level models

bull Word level models are combined based on the language model

How Language Models work

bull Hard to compute ndash P(ldquoAnd nothing but the truthrdquo)

bull Decompose probabilityndash P(ldquoAnd nothing but the truth) = P(ldquoAndrdquo)

P(ldquonothing|andrdquo) P(ldquobut|and nothingrdquo) P(ldquothe|and nothing butrdquo) P(ldquotruth|and nothing but therdquo)

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 7: Speech Recognition

②Feature Extractor This is responsible for generating the feature

vectors for the audio signals input to it from the word detection component It generates the MFCC (Mel Frequency Cepstrum Coe1113094fficients) which is used later to identify the audio signal

bull 3 Recognition System ndash HMM (Hidden Markov Model-based) component

which takes as input the feature vectors generated from the feature extractor component and then finds the best or most suitable match from the knowledge model

bull 4 Knowledge Model ndash language dictionary which is used to identify the

sound signal

Speech Recognition system

Acoustic Model

bull Features that were extracted from the input sound by the extraction module have to be compared with some predefined model to identify the spoken word

bull Word Modelbull Phone Model

bull Phone Model - Only parts of words called phones are modelled instead of modelling the word as a whole Instaed of matching the sound with each word we match the sound with the words and recognise the parts

bull Word Model - The words are modelled as a whole During recognition the input sound matched against each word present in the wodel and the best possible match is then considered to be spoken word

o Phone Set - Phoneme is the basic or the smallest unit of sound

o aa a iy o Dictionary bull A dictionary is also known as the pronunciation

lexicon specifies the pronunciations of the words as linear sequence of phonemes

bull the dh axbull on aa n

Language Model

bull Providing a fair idea about the context and the words that can occur in the context to the speech recognition system It also provides an idea about the different words that are possible in the language and the sequence in which these words may occur

HMM for ASR

bull Building an HMM for each phonebull Combine the phone models based on the

pronunciation model to create word level models

bull Word level models are combined based on the language model

How Language Models work

bull Hard to compute ndash P(ldquoAnd nothing but the truthrdquo)

bull Decompose probabilityndash P(ldquoAnd nothing but the truth) = P(ldquoAndrdquo)

P(ldquonothing|andrdquo) P(ldquobut|and nothingrdquo) P(ldquothe|and nothing butrdquo) P(ldquotruth|and nothing but therdquo)

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 8: Speech Recognition

bull 3 Recognition System ndash HMM (Hidden Markov Model-based) component

which takes as input the feature vectors generated from the feature extractor component and then finds the best or most suitable match from the knowledge model

bull 4 Knowledge Model ndash language dictionary which is used to identify the

sound signal

Speech Recognition system

Acoustic Model

bull Features that were extracted from the input sound by the extraction module have to be compared with some predefined model to identify the spoken word

bull Word Modelbull Phone Model

bull Phone Model - Only parts of words called phones are modelled instead of modelling the word as a whole Instaed of matching the sound with each word we match the sound with the words and recognise the parts

bull Word Model - The words are modelled as a whole During recognition the input sound matched against each word present in the wodel and the best possible match is then considered to be spoken word

o Phone Set - Phoneme is the basic or the smallest unit of sound

o aa a iy o Dictionary bull A dictionary is also known as the pronunciation

lexicon specifies the pronunciations of the words as linear sequence of phonemes

bull the dh axbull on aa n

Language Model

bull Providing a fair idea about the context and the words that can occur in the context to the speech recognition system It also provides an idea about the different words that are possible in the language and the sequence in which these words may occur

HMM for ASR

bull Building an HMM for each phonebull Combine the phone models based on the

pronunciation model to create word level models

bull Word level models are combined based on the language model

How Language Models work

bull Hard to compute ndash P(ldquoAnd nothing but the truthrdquo)

bull Decompose probabilityndash P(ldquoAnd nothing but the truth) = P(ldquoAndrdquo)

P(ldquonothing|andrdquo) P(ldquobut|and nothingrdquo) P(ldquothe|and nothing butrdquo) P(ldquotruth|and nothing but therdquo)

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 9: Speech Recognition

Speech Recognition system

Acoustic Model

bull Features that were extracted from the input sound by the extraction module have to be compared with some predefined model to identify the spoken word

bull Word Modelbull Phone Model

bull Phone Model - Only parts of words called phones are modelled instead of modelling the word as a whole Instaed of matching the sound with each word we match the sound with the words and recognise the parts

bull Word Model - The words are modelled as a whole During recognition the input sound matched against each word present in the wodel and the best possible match is then considered to be spoken word

o Phone Set - Phoneme is the basic or the smallest unit of sound

o aa a iy o Dictionary bull A dictionary is also known as the pronunciation

lexicon specifies the pronunciations of the words as linear sequence of phonemes

bull the dh axbull on aa n

Language Model

bull Providing a fair idea about the context and the words that can occur in the context to the speech recognition system It also provides an idea about the different words that are possible in the language and the sequence in which these words may occur

HMM for ASR

bull Building an HMM for each phonebull Combine the phone models based on the

pronunciation model to create word level models

bull Word level models are combined based on the language model

How Language Models work

bull Hard to compute ndash P(ldquoAnd nothing but the truthrdquo)

bull Decompose probabilityndash P(ldquoAnd nothing but the truth) = P(ldquoAndrdquo)

P(ldquonothing|andrdquo) P(ldquobut|and nothingrdquo) P(ldquothe|and nothing butrdquo) P(ldquotruth|and nothing but therdquo)

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 10: Speech Recognition

Acoustic Model

bull Features that were extracted from the input sound by the extraction module have to be compared with some predefined model to identify the spoken word

bull Word Modelbull Phone Model

bull Phone Model - Only parts of words called phones are modelled instead of modelling the word as a whole Instaed of matching the sound with each word we match the sound with the words and recognise the parts

bull Word Model - The words are modelled as a whole During recognition the input sound matched against each word present in the wodel and the best possible match is then considered to be spoken word

o Phone Set - Phoneme is the basic or the smallest unit of sound

o aa a iy o Dictionary bull A dictionary is also known as the pronunciation

lexicon specifies the pronunciations of the words as linear sequence of phonemes

bull the dh axbull on aa n

Language Model

bull Providing a fair idea about the context and the words that can occur in the context to the speech recognition system It also provides an idea about the different words that are possible in the language and the sequence in which these words may occur

HMM for ASR

bull Building an HMM for each phonebull Combine the phone models based on the

pronunciation model to create word level models

bull Word level models are combined based on the language model

How Language Models work

bull Hard to compute ndash P(ldquoAnd nothing but the truthrdquo)

bull Decompose probabilityndash P(ldquoAnd nothing but the truth) = P(ldquoAndrdquo)

P(ldquonothing|andrdquo) P(ldquobut|and nothingrdquo) P(ldquothe|and nothing butrdquo) P(ldquotruth|and nothing but therdquo)

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 11: Speech Recognition

bull Phone Model - Only parts of words called phones are modelled instead of modelling the word as a whole Instaed of matching the sound with each word we match the sound with the words and recognise the parts

bull Word Model - The words are modelled as a whole During recognition the input sound matched against each word present in the wodel and the best possible match is then considered to be spoken word

o Phone Set - Phoneme is the basic or the smallest unit of sound

o aa a iy o Dictionary bull A dictionary is also known as the pronunciation

lexicon specifies the pronunciations of the words as linear sequence of phonemes

bull the dh axbull on aa n

Language Model

bull Providing a fair idea about the context and the words that can occur in the context to the speech recognition system It also provides an idea about the different words that are possible in the language and the sequence in which these words may occur

HMM for ASR

bull Building an HMM for each phonebull Combine the phone models based on the

pronunciation model to create word level models

bull Word level models are combined based on the language model

How Language Models work

bull Hard to compute ndash P(ldquoAnd nothing but the truthrdquo)

bull Decompose probabilityndash P(ldquoAnd nothing but the truth) = P(ldquoAndrdquo)

P(ldquonothing|andrdquo) P(ldquobut|and nothingrdquo) P(ldquothe|and nothing butrdquo) P(ldquotruth|and nothing but therdquo)

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 12: Speech Recognition

o Phone Set - Phoneme is the basic or the smallest unit of sound

o aa a iy o Dictionary bull A dictionary is also known as the pronunciation

lexicon specifies the pronunciations of the words as linear sequence of phonemes

bull the dh axbull on aa n

Language Model

bull Providing a fair idea about the context and the words that can occur in the context to the speech recognition system It also provides an idea about the different words that are possible in the language and the sequence in which these words may occur

HMM for ASR

bull Building an HMM for each phonebull Combine the phone models based on the

pronunciation model to create word level models

bull Word level models are combined based on the language model

How Language Models work

bull Hard to compute ndash P(ldquoAnd nothing but the truthrdquo)

bull Decompose probabilityndash P(ldquoAnd nothing but the truth) = P(ldquoAndrdquo)

P(ldquonothing|andrdquo) P(ldquobut|and nothingrdquo) P(ldquothe|and nothing butrdquo) P(ldquotruth|and nothing but therdquo)

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 13: Speech Recognition

Language Model

bull Providing a fair idea about the context and the words that can occur in the context to the speech recognition system It also provides an idea about the different words that are possible in the language and the sequence in which these words may occur

HMM for ASR

bull Building an HMM for each phonebull Combine the phone models based on the

pronunciation model to create word level models

bull Word level models are combined based on the language model

How Language Models work

bull Hard to compute ndash P(ldquoAnd nothing but the truthrdquo)

bull Decompose probabilityndash P(ldquoAnd nothing but the truth) = P(ldquoAndrdquo)

P(ldquonothing|andrdquo) P(ldquobut|and nothingrdquo) P(ldquothe|and nothing butrdquo) P(ldquotruth|and nothing but therdquo)

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 14: Speech Recognition

HMM for ASR

bull Building an HMM for each phonebull Combine the phone models based on the

pronunciation model to create word level models

bull Word level models are combined based on the language model

How Language Models work

bull Hard to compute ndash P(ldquoAnd nothing but the truthrdquo)

bull Decompose probabilityndash P(ldquoAnd nothing but the truth) = P(ldquoAndrdquo)

P(ldquonothing|andrdquo) P(ldquobut|and nothingrdquo) P(ldquothe|and nothing butrdquo) P(ldquotruth|and nothing but therdquo)

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 15: Speech Recognition

How Language Models work

bull Hard to compute ndash P(ldquoAnd nothing but the truthrdquo)

bull Decompose probabilityndash P(ldquoAnd nothing but the truth) = P(ldquoAndrdquo)

P(ldquonothing|andrdquo) P(ldquobut|and nothingrdquo) P(ldquothe|and nothing butrdquo) P(ldquotruth|and nothing but therdquo)

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 16: Speech Recognition

CMUSphinx

Sphinx3 is the speech recognizer (decoder) SphinxTrain is a set of tools for acoustic

modeling SphinxBase is a common set of library used in

CMU Sphinx

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 17: Speech Recognition

Jasper

bull Jasper is an open source platform for developing voice-controlled applications

bull Uses voice to ask for informationbull Jasper runs on Raspberry Pibull Configure Jasper to make own personal

Assistant

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 18: Speech Recognition

Resources

bull List of publications httpcmusphinxsourceforgenetwikiresearch Speech Recognition With CMU Sphinx [Blog by N Shmyrev

Sphinx developer] bull Speech recognition seminars at Leiden Institute for

Advanced Computer Science Netherlands bull httpwwwliacsnl~erwinspeechrecognitionhtml

httpwwwliacsnl~erwinSR2003 httpwwwliacsnl~erwinSR2005httpwwwliacsnl~erwinSR2006httpwwwliacsnl~erwinSR2009

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References
Page 19: Speech Recognition

References

bull [1] Anushree Srivastava Nivedita Singh and Shivangi Vaish Speech Recognition For Hindi Language International Journal of Engineering Research amp Technology (IJERT) April ndash 2013

bull Wiqas Ghai and Navdeep Singh ldquoAnalysis of Automatic Speech Recognition Systems for Indo-Aryan Languages Punjabi A Case Studyrdquo Vol-2 Issue-1 March 2012

bull Website httpcmusphinxsourceforgenet

  • Speech Recognition in Konkani
  • What is Speech Recognition
  • Where can it be used
  • Slide 4
  • Speech Recognition
  • Components of the recognition system
  • Slide 7
  • Slide 8
  • Speech Recognition system
  • Acoustic Model
  • Slide 11
  • Slide 12
  • Language Model
  • HMM for ASR
  • Slide 15
  • How Language Models work
  • CMUSphinx
  • Jasper
  • Resources
  • References