Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology...

57
Dr. G. Bharadwaja Kumar Indian Language Speech Processing

Transcript of Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology...

Page 1: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Dr. G. Bharadwaja Kumar

Indian Language

Speech

Processing

Page 2: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Speech recognition

Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, into the corresponding

orthographic representation.

Page 3: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Parameters Range

Speaking Mode Isolated words to Continuous Speech

Speaking style Read Speech to Spontaneous Speech

Enrollment Speaker dependent to Speaker independent

Vocabulary Small (<20 words) to large (> 20000 words)

Language model Finite State to Context Sensitive

Perplexity Small (<10) to large (>100)

Signal to Noise Ratio (SNR)

High (>30dB) to low (<10dB)

Transducer noice cancelling microphone to Telephone

Speech recognition systems can be characterized by many parameters

Page 4: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Speaking Style

Read speech

– Planned or read speech may not contain disfluencies

– News

Spontaneous speech

– extemporaneously generated speech

– Disfluencies (hesitations and fillers)

– ‘less-well-formed’ (or un-grammatical).

Page 5: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Vocabulary size

As the vocabulary increases, the number

of input-template comparisons which

must be made before a best match

can be determined also increases.

Page 6: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Enrolment

Some systems require speaker enrollment -- a user must provide samples of his or her speech before using them -- whereas other systems are said to be speaker-independent, in that no enrollment is necessary.

Page 7: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

External parameters

In addition, there are some external parameters that can affect speech recognition system performance, including the characteristics of the environmental noise and the type and the placement of the microphone.

Page 8: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of
Page 9: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of
Page 10: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

The modifications to pronunciation once isolated words are embedded in continuous speech include

Assimilation

Elision

Vowel reduction

Strong and weak forms

Liaison

Contractions

Juncture

Ref: http://cristiancuesta512.blogspot.in/

Page 11: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Source Channel Model

If A represents the acoustic feature sequence extracted from a speech sample, the speech recognition system should yield the optimal word sequence which matches Abest .

W= argmax P(W|A)

W

Page 12: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Using Baye’s rule, we can rewrite as

Here, P(A|W) is the likelihood of feature sequence A given the acoustic model of word sequence W.

P(A|W)P(W)P(W|A)=

P(A)

Page 13: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of
Page 14: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Pronunciation Lexicon

provides pronunciations of words, so decoder knows which HMMs to use for a certain word.

also provides a list of words to limit the language model complexity and the decoder’s search space.

As a result, an ASR system can only recognize a limited number of words presented in the dictionary, which is normally known as the closed-vocabulary speech recognition.

Page 15: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Out-of-vocabulary (OOV)

words that are unknown and appear in test data for which the phonetic sequence is unknown.

They cannot be recognized and also effect the recognition accuracy of their surrounding in-vocabulary (IV) words.

Four challenges with OOV:

Detecting the presence of the word

Determining its location within the utterance

Recognizing the underlying phonetic sequence

Identifying the spelling of the word

Page 16: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Acoustic Modeling

Page 17: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Sampling: measuring amplitude of signal at time t

16,000 Hz (samples/sec) Microphone (“Wideband”):

8,000 Hz (samples/sec) Telephone

Why?– Need at least 2 samples per cycle

– max measurable frequency is half sampling rate

– Human speech < 10,000 Hz, so need max 20K

– Telephone filtered at 4K, so 8K is enough

Page 18: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Why Frequency Domain

the frequency of a sound is one of its most important physical properties.

Which can be easily observed by converting time to frequency domain using FFT

Cepstral coefficients are typically used in speech recognition to characterize spectral envelopes, capturing primarily the formants of speech.

Page 19: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of
Page 20: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of
Page 21: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of
Page 22: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Mobile Recorded Speech

Page 23: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Mel-Frequency Cepstral Coefficient (MFCC)

Most widely used spectral representation in ASR

Page 24: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Why is MFCC so popular?

Efficient to compute

Incorporates a perceptual Mel frequency scale

Separates the source and filter

IDFT(DCT) decorrelates the features– Improves diagonal assumption in HMM

modeling

Page 25: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Acoustic Modeling

- Mporas, Iosif, et al. "Comparison of speech features on the speech recognition task." Journal of Computer Science 3.8 (2007): 608-616.

Page 26: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

HMM/GMM Models

Page 27: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of
Page 28: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of
Page 29: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Approaches to Speaker Adaptation

Page 30: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of
Page 31: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Language models

help any speech recognizer to figure out how likely a word sequence is independent of the acoustics.

play a paramount role in guiding and constraining among large number of alternative word hypotheses in continuous speech recognition.

Page 32: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

continuous speech recognition suffers from difficulties such as variation due to sentence structure (prosodies), interaction between adjacent words (crossword co-articulation), and no clear acoustic markers to delineate word boundaries.

play a vital role in resolving acoustic confusions that arise due to co-articulation, assimilation and homophones while decoding.

Page 33: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

The perplexity can be roughly interpreted as the average branching factor of the testing data to the language model.

lower perplexity correlates to better recognition performance due to less branches the speech recognizer needs to consider during decoding

Page 34: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

N-Gram Language Models

The intuition of the N-gram model is that instead of computing the probability of a word given its entire history, we can approximate the history by just the last few words

Page 35: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of
Page 36: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of
Page 37: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Smoothing Techniques

Add-1 smoothing or Good-Turing:

OK for text categorization,

for language modeling the most commonly used method:

Extended Interpolated Kneser-Ney

For very large N-grams like the Web:

backoff

Page 38: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Domain Sensitivity

Language models are extremely sensitive to changes in the style, topic or genre of the text on which they are trained.

A language model trained on Dow-Jones newswire text will see its perplexity doubled when applied to the very similar Associated Press newswire text from the same time period

Rosenfeld, Ronald. "Two decades of statistical language modeling: Where do we go from here?." Proceedings of the IEEE 88.8 (2000): 1270-1278.

Page 39: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Given a background model PB (w|h)

and a topic-based model PT (w|h) it

is possible to obtain a final model PI

(w|h) , to be used in the second

decoding pass, as follows:

Page 40: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Complexity

ASR systems often have complexity that is linear in the number of tokens and polynomial in the number of types (e.g., decoding using a trigram language model with size-Nvocabulary has, in the worst case, a complexity of at least O(N3)).

-- Lin, Hui, and Jeff Bilmes. "Optimal selection of limited vocabulary speech corpora." Twelfth Annual Conference of the International Speech Communication Association. 2011.

Page 41: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Notable speech recognition software engines

Ref- https://en.wikipedia.org/wiki/List_of_speech_recognition_software

System Name Open Source Acoustic Modeling

CMU Yes GMM/HMM

HTK No GMM/HMM

RWTH Yes LSTM

Kaldi Yes Deep Neural Network

Julius Yes GMM/HMM

Page 42: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Challenges in

Indian Language Speech Processing

Page 43: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Dravidian Languages

Major: Telugu, Tamil, Kannada, Malayalam

Very rich in morphology and complex Sandhi rules

Relatively free word order languages

Page 44: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Pronunciation Lexicon

Most of the Indian languages are phonetic in nature i.e. there exists a one-to-one correspondence between the orthography and pronunciation in these languages.

For Telugu, simple rule based G2P is enough.

Tamil does not distinguish between voiced and voiceless stops and lacks symbols for voiced and aspirated stops.

Page 45: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Morph Based Language Models

In two Finnish recognition tasks, relative error rate reductions between 12% and 31% are obtained.

word fragments obtained using grammatical rules do not outperform the fragments discovered from text.

Hirsimäki, Teemu, et al. "Unlimited vocabulary speech recognition with morph language models applied to Finnish." Computer Speech & Language 20.4 (2006): 515-541.

Page 46: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Phoneme list

Page 47: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Tamil Phonetic Mapping

Page 48: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Grapheme to Phoneme Mapping Softwares

Sequitur G2P

https://www-i6.informatik.rwth-aachen.de/web/Software/g2p.html

Sequence-to-Sequence G2P toolkit

https://github.com/cmusphinx/g2p-seq2seq

Phonetisaurus G2P

https://github.com/AdolfVonKleist/Phonetisaurus

Page 49: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Morphology

Application of extensive Sandhi changes sometimes results in telescoping of several words into long strings.

English Sentence: Do you say that there is no hot water?’

Telugu Sentence: vEdinILLu levu aNtavu A?

After Sandhi: vENNILLEvaNtAvA (one word)

– Reference: P. Bhaskara Rao, “Telugu” , Concise Encyclopedia of Languages of the world, Elsevier, pp 1055-1060.

Page 50: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Type-Token Analysis

G. Bharadwaja Kumar et. Al. “Statistical Analyses of Telugu Text Corpora”, IJDL, Vol. 36, No. 2 (2007)

Page 51: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

BNC Corpus for English (100 Million Word Corpus)

Page 52: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

UoH Corpus for Telugu ( 40 Million Word Corpus)

Page 53: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of
Page 54: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

One of the main problems with LVCSR systems is that the words spoken may not always exist within the system’s vocabulary.

These are called out of vocabulary words (OOV’s).

Predominant problem for very rich & complex Morphological languages such as Dravidian Languages

Page 55: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of
Page 56: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of
Page 57: Indian Language Speech Processing · types (e.g., decoding ... Malayalam Very rich in morphology and complex Sandhi rules Relatively free word order languages. ... Application of

Thank You

Presentation is available at

http://bharadwajakumar.wordpress.com