Kishore Prahallad ([email protected]), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian...

17
Kishore Prahallad ([email protected]), IIIT-Hyder abad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009) Kishore Prahallad Email: [email protected] International Institute of Information Technology (IIIT) Hyderabad, India & Language Technologies Institute, Carnegie Mellon University

Transcript of Kishore Prahallad ([email protected]), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian...

Page 1: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad1

Unit Selection Synthesis in Indian Languages

(Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore PrahalladEmail: [email protected]

International Institute of Information Technology (IIIT) Hyderabad, India&

Language Technologies Institute, Carnegie Mellon University

Page 2: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad2

Building an Unrestricted Voice

• Build Language Specific Knowledge– Define phone set – Define stress and syllabification rules – Define letter to sound rules

• Optimal text collection • Recording of speech • Speech Labeling • Unit clustering• This session will be a live demo of running

Festvox scripts to build Hindi voice

Page 3: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad3

Creation of Unit Speech Database

• Text selection: – Large corpus might be costly to record and

hand label

• Optimal Text selection approaches – Use large text corpus– Extract a set of sentences which has best unit

(phone/diphone/triphone/syllable) coverage

Page 4: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad4

Recording of speech data

• Ideal conditions– Anechoic chamber– Studio recording– Professional speaker

• Practical conditions– Lab environments– Good voices– Need repetition of steps to create a good unit

selection voice

Page 5: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad5

Labeling of Speech Data

• Automatic Labeling– Use Dynamic Wraping techniques, if duration models

are available – Use HMMs / Neural Nets for automatic segmentation

of the data

• Semi-Automatic Labeling– Machine Labeling + Hand Correction – Tools such as Emulabel (www.festvox.org/emu) are

useful – Wavesurfer

Page 6: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad6

Building Databases (Training Phase)

• Get the phonemic features for each unit along with previous & next unit information– Previous, Next Unit – C/Vowel– Vowel Length– Vowel Height– Vowel Frontness– Vowel Height– Consonant voicing– Consonant POA– MOA– Position in the syllable & Word

Page 7: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad7

Clustering the Units (Training Phase)

• For each unit create a decision tree• Select a feature as a root of the tree, such that it

minimizes the acoustic distances among its child nodes– Acoustic distance between two sound units of varying

length?– Use simple linear alignment, or Dynamic

Programming for acoustic distance (ADM) measure

• Repeat the process with each child node until you have 10-30 units left in that cluster

Page 8: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad8

Indexing / Clustering using Decision Trees

Linguistic / Contextual Questions

Page 9: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad9

Synthesis (Testing Phase)

• Given the sequence of phones

• For each phone, create a set of phonemic features (Feature set is same as that of training Phase)

• Traverse through the tree and arrive at the child node

• Child node contain a set of target units

Page 10: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad10

Synthesis (Testing Phase)

• Give dh, ax and c, ae, t …., a sequence of phones to be synthesized

• Using decision trees: For the given sequence arrive at T_1, T_2 and T_3, where T_i is the set of target units for phone i.

• Use Viterbi alignment for choosing a sequence of units which minimize the concatenation cost

Page 11: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad11

Target + Join Cost

Source: CSTR, UK

Page 12: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad12

Smoothing or Joining

• Where to join the two units– Optimal Coupling – Flexible joining point

– Select the joining point, which has minimal distance

– Select the last N frames of U(i-1) unit and first K frames of U(i) unit and perform N*K distance measures

– Find out the set of frames which has the least distance

• What is the measure of joining?– F0, Power

– Cepstral Features

diph unit

Page 13: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad13

Building an Indian language Voice

$FESTVOXDIR/src/festvox/src/clunits/setup_clunits iiit hin pra

Incorporate the language knowledge

1. festvox/*.phoneset.scm

2. festvox/*.durdata.scm

3. festvox/*.lexicon.scm

Page 14: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad14

Scripts of Indian Languages

Basic units of writing system are characters

Characters are close to syllable: CV, CVC, CCV, VC, C, V units (C is consonant, V is vowel)

क ख ग घ ङ

/ka/ /kha/ /ga/ /gha/ /ng-a/

C V

Universal phone set – About 35 consonants, 18 vowels

Almost one to one correspondence between what you write and you speak

Page 15: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad15

Issues: Relevant to Indic Scripts

Input text: ISCII, UNICODE, and other font encodings

Occurrence of English words in Indic scripts

- phonetic coverage, LTS rules etc.

Text normalization: non-standard words

Phonetic nature?

- schwa deletion in Hindi and Bengali

Syllabification rules

Stress information

Page 16: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad16

Syllable as unit size for Indian language TTS

Various suggestions: Phones, Diphones, Half phones, Syllable like units

What we have done: Build different synthesizers for different size of units and

compare the alternatives Found syllable to be a better unit for synthesis in

Indian languagesCoverage of syllable for unrestricted TTS is a major

issue of concernVisit demo on http://speech.iiit.ac.in Demo

Page 17: Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad ([email protected]), IIIT-Hyderabad17

References• http://festvox.org• 11-752 CMU course slides

– http://festvox.org/festtut/• 11-752 CMU Course Lecture Notes

– http://festvox.org/festtut/notes/festtut_toc.html• Building Synthetic Voices

– http://www.festvox.org/bsv/• The Festival Speech Synthesis System

– http://www.festvox.org/docs/manual-1.4.3/festival_toc.html• http://www.cstr.ed.ac.uk/emasters/summer_school_2005/tutorial3/session2.pdf• S. P. Kishore, Alan W Black, Rohit Kumar and Rajeev Sangal, "Experiments with Unit Selection

Speech Databases for Indian Languages", in Proceedings of National Seminar on Language Technology Tools: Implementations of Telugu, Hyderabad, India, 2003.

• S. P. Kishore and Alan W Black,"Unit Size in Unit Selection Speech Synthesis", in Proceedings of Eurospeech, Geneva, Switzerland, 2003.

• E. Veera Raghavendra, Srinivas Desai, B Yegnanarayana, Alan W Black, Kishore Prahallad "Global Syllable Set for Building Speech Synthesis in Indian Languages", in Proceedings of IEEE workshop on Spoken Language Technologies, Goa, India, December 2008.

• 6.      E. Veera Raghavendra, B Yegnanarayana, Kishore Prahallad "Speech Synthesis Using Approximate Matching of Syllables", in Proceedings of IEEE workshop on Spoken Language Technologies, Goa, India, December 2008.