Design of Tree-based Context Clustering for an HMM-based Thai Speech Synthesis System

Outlines Objectives Study of Thai tones

Construction of contextual factors

Design of decision-tree structures

Design of context clustering styles

Characteristics of Thai tones

Categorizations of Thai tones

Tree-based context clustering

Evaluation of overall tone correctness

Evaluation of tone correctness for each tone type

Evaluation of syllable duration distortion

Experiments

Conclusions

Objectives

To implement an HMM-based speech synthesis system for Thai language with the highest correctness of tone.

Study of Thai tones

Characteristics of Thai tones Syllable Structure [Nakasakul2002]

Thai : Tonal Language

)(CV(V)T )(CC fii

fi CVTCfi CVVTC

fii CVVT CC

fii CVT CC

รกั r-a-k^-3 (love)

เรื่อย r-va-j^-2 (always)

เครง่ khr-e-ng^-2 (strict)

เครยีด khr-ia-t^-2 (stress)

และ l-x-3 (and)

VVTCC ii

เพลีย phl-iia-0 (exhausted)

VVTCiเสยี s-iia-4 (spoil)

VTCC ii

ปร ิpr-i-1 (break)

Study of Thai tones

Characteristics of Thai tones F0 contours of Standard Thai Tones (normalized

duration)[Luksaneeyanawin1992]

F0(Hz)

0% 50% 100%

rising (4)high (3)

falling (2)low (1)middle (0)

Duration

สามญั Middle(0) เอก Low(1) โท Falling(2) ตร ีHigh(3) จตัวา Rising(4)

Study of Thai tones

Categorizations of Thai tones Abramson divided the tones into two groups:

static group dynamic group

According to the final trend of contours: upward trend group downward trend group

F0(Hz)

0% 50% 100%

rising (4)high (3)

falling (2)low (1)middle (0)

Duration

HMM-based speech synthesizer

• Phoneme based speechunit modeling

• Provide flexible models,an efficient adaptation

Speaker adaptation Speaking style conversion

1994 K. Tokuda; et al, proposedHMM-based speech synthesizerfor Japanese Excitation

ParameterExtraction

Spectral ParameterExtraction

Training of HMM

ExcitationGeneration

Synthesisfilter

Parameter Generation from HMM

Speech Signal

Excitation Parameter Spectral Parameter

Text Analysis

Synthetic Speech

Context Dependent HMMs

Training Stage

Excitation Parameter Spectral Parameter

Synthesis Stage

Speech Database

Phrase level

• current word position in current phrase

• the number of syllables in {preceding, current, succeeding} phrase

Utterance level

• current phrase position in current sentence

• the number of syllables in current sentence

• the number of words in current sentence

Phoneme level

• {preceding, current, succeeding} phonetic type

• {preceding, current, succeeding} part of syllable structure

Syllable level

• {preceding, current, succeeding} tone type

• the number of phones in {preceding, current, succeeding} syllable

• current phone position in current syllable

Word level

• current syllable position in current word• part of speech• the number of syllables in {preceding,

current, succeeding} word

Construction of contextual factorsContext clustering is to treat the problem of limitation of training data.

F0 contours of (a) synthesized speech from the clustering style of single binary tree without tone type questions and (b) natural speech.

Problem of Misshaped F0 contour

5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4150

T i m e ( s )

z ) (a)

Tone 0 Tone 1 Tone 2 Tone 3 Tone 4

Static Tone(Tone 0, 1, 3)

Tone 2 Tone 4

Dynamic Tone(Tone 2, 4)

Upward Trend (Tone 3, 4)

Downward Trend (Tone 0, 1, 2)

Tone 3 Tone 4

Design of 8 context clustering styles (a)-(h)

Tone 0 Tone 1 Tone 2 Tone 3 Tone 4

Static Tone(Tone 0, 1, 3)

Tone 2 Tone 4

Dynamic Tone(Tone 2, 4)

Upward Trend (Tone 3, 4)

Downward Trend (Tone 0, 1, 2)

Tone 3 Tone 4

+ tone type questions (g)

+ tone type questions (e)

+ tone type questions (h) + tone type questions (f)

1. Sentence structure analysis

2. Word structure analysis3. Full context labeling 4. Construction of question

set for context clustering5. Feature extraction

System PreparationsVAJA

Speech corpus

Wav file Label file

ORCHID Text corpus

Wav file Wav file Wav file Label fileLabel fileLabel file

XML fileXML fileXML fileXML file

Parameterfile (.cmp)

Full contextLabeling

FeatureExtraction(mcep,f0)

Parameterfile (.cmp) Parameterfile (.cmp) Parameterfile (.cmp)

Full contextlabel file(.lab)

Label file (.lab)

Full contextlabel file(.lab)

HMM Training and SynthesisSyntheticSpeech

Experiments Evaluation of overall tone correctness

1 5 02 0 02 5 0

5 . 0 5 . 2 5 . 4 5 . 6 5 . 8 6 . 0 6 . 2 6 . 41 5 02 0 02 5 0

T i m e ( s )

Figure 5: F0 contours of synthesized speech from 8 different clustering styles; and F0 contour of natural speech.

100 200 300 400 500 1000 1500 2000 2500

N u m b e r o f t r a i n i n g u t t e r a n c e s

(a)(b)(c)(d)

Figure 6: Tone error percentages of synthesized speech from 4 different clustering styles

100 200 300 400 500 1000 1500 2000 2500

(a)(b)(c)(d)(e)(f)(g)(h)

Figure 7: Tone error percentages of synthesized speech from 8 different clustering styles

t o n e 03 8 %

t o n e 12 2 %

t o n e 21 7 %

t o n e 31 5 %

t o n e 48 %

Experiments Evaluation of tone correctness for each tone type

100 300 500 1500 25000

t o n e 0t o n e 1t o n e 2t o n e 3t o n e 4

Figure 8: Tone error percentages of synthesized speech from 8 different clustering styles categorized by tone types;

Experiments Evaluation of syllable duration distortion

6055 53 51

4247 49

49 49 48

56 55 5451 52

100 300 500 1500 2500

(e)(f)(g)(h)

Figure 9: Scores of a paired-comparison test for natural duration among 4 different clustering styles;

Examples of synthesized speech

Female

Methodcorpus size (number of

training utterances)

Examples1 2

VAJA (Unit Selection) Analysis-Synthesis speech

Female

Method Tree Structure

Add tone question set

(a) (e)

(b) (f)

(c) (g)

(d) (h)

Conclusions An analysis of tree-based context clustering of an HMM-based Thai speech synthesis system has been conducted in this paper.Four structures of decision tree were designed according to tone groups and tone types to obtain higher correctness of tone of synthesized speech.The results show that the tone-separated tree structures can reduce the tone error percentage of the synthesized speech compared to the single binary tree structure significantly.As for using the contextual tone information in the syllable level, it can improve the tone correctness for all structures of decision tree.There are some distortions of the syllable duration appearing in the case of using the simple tone-separated tree context clustering with a small amount of training data, however it can be relieved when using the constancy-based-tone-separated or the trend-based-tone-separated tree context clustering.The analysis of tone correctness of the average-voice-based speech model and the intonation analysis issues are anticipated to be studied in the future.

Design of Tree-based Context Clustering for an HMM-based Thai Speech Synthesis System

Documents

Transcript of Design of Tree-based Context Clustering for an HMM-based Thai Speech Synthesis System

An HMM based Comparative Genomic Framework for …

An HMM-Based Approach for Automatic Detection and …users.encs.concordia.ca/~abdelw/papers/IST-HHMDuplicate... · 2020-02-24 · 1 An HMM-Based Approach for Automatic Detection and

HMM-Based Semantic Learning for a Mobile Robot

HMM Based Speech Background

HMM-based Pathological Gait Analyzer for a User-Adaptive ...users.ntua.gr › gchal › PDFs › ML_eusipco.pdf · HMM-based Pathological Gait Analyzer for a User-Adaptive Intelligent

Improving a Bank-Check Processing System with New HMM-based

Part I: Designing HMM-based ASR systems Part II: Training ...

A Bayesian Approach to HMM-Based Speech Synthesis

A HMM-Based Location Prediction Framework with Location ... · A HMM-Based Location Prediction Framework with Location Recognizer Combining k ... The k-means clustering method ...

Part I: Designing HMM-based ASR systems · 6.345 Automatic Speech Recognition Designing HMM-based speech recognition systems 27 . Decoding word sequences. SET 4 and its best path.

SVM-HMM LANDMARK BASED SPEECH RECOGNITION€¦ · SVM-HMM LANDMARK BASED SPEECH RECOGNITION Sarah Borys and Mark Hasegawa-Johnson 405 N. Mathews Urbana, IL 61801 Abstract Supportvector

HMM-based Defect Localization in Wire Ropes - A new ...HBD.pdf · HMM-based Defect Localization in Wire Ropes - A new Approach to Unusual Subsequence Recognition Esther-Sabrina Platzer1,

HMM-based Arabic Handwritten word recognition via zone ...intra.fsktm.um.edu.my/~aini/sistem/pgres2017/Lampiran/RS/002_Zer... · HMM-based Arabic Handwritten word recognition via

Fusion in Multimodal Interactive Systems: An HMM-Based ......Fusion in Multimodal Interactive Systems: An HMM-Based Algorithm for User-Induced Adaptation Bruno Dumas WISE Lab Vrije

Fundamentals & recent advances in HMM-based speech synthesis

Hybrid NN/HMM-Based Speech Recognition with a Discriminant … · HMM-System (RBF-network) p(x(t)l~ Figure I: Architecture of the hybrid NN/HMM system continuous systems, but with

Improving performance of HMM-based off-line signature ... · Improving performance of HMM-based off-line signature veriﬁcation systems through a multi-hypothesis approach ... de

HMM-Based Face Recognition System with SVD Parameter

A New Fast and Efficient HMM-Based Face Recognition …ijeee.iust.ac.ir/files/site1/user_files_5e3495/ijeee-A-10-3-34... · A New Fast and Efficient HMM-Based Face Recognition System

Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input