ACOUSTIC-PHONETIC UNIT SIMILARITIES FOR CONTEXT DEPENDENT ACOUSTIC MODEL PORTABILITY Viet Bac Le*,...

1048576ACOUSTIC-PHONETIC UNIT SIMILARITIESFOR CONTEXT DEPENDENT ACOUSTIC MODEL

PORTABILITY

Viet Bac Le Laurent Besacier Tanja Schultz

CLIPS-IMAG Laboratory UMR CNRS 5524 BP 53 38041 Grenoble Cedex 9 FRANCE Interactive Systems Laboratories Carnegie Mellon University Pittsburgh PA USA

Presenter Hsu-Ting Wei

Outline

bull 1 Introductionbull 2 Acoustic-phonetic unit similarities

ndash 21 Phoneme Similarityndash 22 Polyphone Similarityndash 23 Clustered Polyphonic Model Similarity

bull 3 Experimental and resultsbull 4 Conclusion

1 Introduction

bull There are at less 6900 languages in the worldbull We are interested in new techniques and tools for rapid p

ortability of speech recognition systems (exASR) when only limited resources are available

1 Introduction (cont)

bull In crosslingual acoustic modeling previous approaches have been limited to context independent modelsndash Monophonic acoustic models in target language were initialized us

ing seed models from source languagendash Then these initial models could be rebuilt or adapted using trainin

g data from the target language

bull The cross-lingual context dependent acoustic modeling portability and adaptation can be investigated

bull 1996 J Kohlerndash J Kohler used HMM distances to calculate the similarity between two mo

nophonic models

ndash Method

bull To measure the distance or the similarity of two phoneme models we use a relative entropy-based distance metric

bull During training the parameters of the mixture Laplacian density phoneme models are estimated

bull Further for each phoneme a set of phoneme tokens X is extracted from a test or development corpus

bull Given two phoneme models and and given the sets of phoneme tokens or observations Xi and Xj the distance between model and is defined by

ijjiji

ijjjij

jiiiji

is average thedictance symmetric

|log|log

bull 2000 B Imperlndash proposed that a triphone similarity estimation method based on phoneme

distances and used an agglomerative (bottom-up) clustering process to define a multilingual set of triphones

ndash Method

SWSWSW

1 )( where

)()()()(2

)()()(

) - -(

phoneme) central theand

phonemecontext right phonemecontext left thedenote and (

- and - triphones twoof similarity The

1kjkikjkiji

Confusion matrix

bull 2001 T Schultzndash proposed PDTS (Polyphone Decision Tree Specialization) to

overcome the problem which the context mismatch across languages increases dramatically for wider contexts

ndash PDTS data-driven method (darr)bull In PDTS the clustered multilingual polyphone decision tree is

adapted to the target language by restarting the decision tree growing process according to the limited adaptation data in the target language

bull In this paper we investigate a new method for this crosslingual transfer process

bull We do not use the existing decision tree in source language but build a new decision tree just with a small amount of data from the target language

bull Then based on the acoustic-phonetic unit similarities some crosslingual transfer and adaptation processes are applied

2 Acoustic-phonetic unit similarities

bull 21 Phoneme Similarityndash 211 Data-driven methodsndash 212 Proposed knowledge-based method

bull 211 Data-driven methods (darr)ndash The acoustic similarity between two phonemes can be obtained

automatically by calculating the distance between two acoustic models

bull HMM distance (Entropy)bull Kullback-Leibler distance bull Bhattacharyya distancebull Euclidean distance bull Confusion matrix

2 Acoustic-phonetic unit similarities (cont)

bull 212 Proposed knowledge-based method (uarr)ndash Step 1 Top-down classificationndash Step 2 Bottom-up estimation

bull Step 1 Top-down classificationndash Figure 1 shows a hierarchical graph where each node is classified into different layers

bull k = number of layers

bull Gi = user defined similarity value for layer i (i = 0k-1)

ndash In our work we investigated several settings of k and G i and set G = 09 045 025 01 00 with k = 5 based on a cross-evaluation in crosslingual acoustic modeling experiments

G0 = 09

G1 = 045

G4 = 00

G3 = 01

G2 = 025

bull Step 2 Bottom-up estimationndash d( [i] [u] ) = G2 (=025 in this experiment)

G0 = 09

G1= 045

G4 = 00

G3 = 01

G2 = 025

bull 22 Polyphone Similarityndash Let S be the phoneme set in source language T be the phoneme set in

target language

Final find the nearest polyphones Ps

ttctscttctsctsd1

kkkk )()()()(2

bull 23 Clustered Polyphonic Model Similarityndash Since the number of polyphones in a language is very large a limited

training corpus usually does not cover enough occurrences of every polyphones

ndash A decision tree-based clustering (figure 3) or an agglomerative clustering procedure (figure 1) is needed to cluster and model similar polyphones in a clustered polyphonic model

bull 23 Clustered Polyphonic Model Similarity (cont)

3 Experimental and results

bull Syllable-based ASR system

bull In order to build a polyphonic decision tree and to adapt the crosslingual acoustic models ndash Baseline

bull 225 hours of data spoken by 8 speakers were used (Vietnamese speech)

ndash Crosslingual methods

bull Speech data from six languages were used to build these models Arabic Chinese English German Japanese and Spanish

bull Adapted by Vietnamese speech

3 Experimental and results (cont)

Initial model

越南人說越南話 (25 小時 )

六國人說自己本國話

( 也許 100 小時 )

加入越南人說越南話 (25 小時 ) 調適

New models

bull As the number of clustered sub-models increases SER of the baseline system increases proportionally since the amount of data per model decreases due to the limited training data

bull However the crosslingual system is able to overcome this problem by indirectly using data in other languages

bull The influence of adaptation data size and the number of speakers on the baseline system and two methods of phoneme similarity estimation ndash knowledge-based ndash data-driven using confusion matrix

bull We find that the knowledge based method outperforms the data-driven method

4 Conclusion

bull This paper presents different methods of estimating the similarities between two acoustic-phonetic units

bull The potential of our method is demonstrated even though the use of trigrams the in syllable-based language modeling might be insufficient to obtain acceptable error rates

Outline

bull 1 Introductionbull 2 Acoustic-phonetic unit similarities

ndash 21 Phoneme Similarityndash 22 Polyphone Similarityndash 23 Clustered Polyphonic Model Similarity

bull 3 Experimental and resultsbull 4 Conclusion

1 Introduction

nophonic models

ndash Method

ijjiji

ijjjij

jiiiji

|log|log

ndash Method

SWSWSW

1 )( where

)()()()(2

)()()(

) - -(

1kjkikjkiji

Confusion matrix

G0 = 09

G1 = 045

G4 = 00

G3 = 01

G2 = 025

G0 = 09

G1= 045

G4 = 00

G3 = 01

G2 = 025

target language

ttctscttctsctsd1

kkkk )()()()(2

Initial model

( 也許 100 小時 )

New models

4 Conclusion

1 Introduction

nophonic models

ndash Method

ijjiji

ijjjij

jiiiji

|log|log

ndash Method

SWSWSW

1 )( where

)()()()(2

)()()(

) - -(

1kjkikjkiji

Confusion matrix

G0 = 09

G1 = 045

G4 = 00

G3 = 01

G2 = 025

G0 = 09

G1= 045

G4 = 00

G3 = 01

G2 = 025

target language

ttctscttctsctsd1

kkkk )()()()(2

Initial model

( 也許 100 小時 )

New models

4 Conclusion

nophonic models

ndash Method

ijjiji

ijjjij

jiiiji

|log|log

ndash Method

SWSWSW

1 )( where

)()()()(2

)()()(

) - -(

1kjkikjkiji

Confusion matrix

G0 = 09

G1 = 045

G4 = 00

G3 = 01

G2 = 025

G0 = 09

G1= 045

G4 = 00

G3 = 01

G2 = 025

target language

ttctscttctsctsd1

kkkk )()()()(2

Initial model

( 也許 100 小時 )

New models

4 Conclusion

nophonic models

ndash Method

ijjiji

ijjjij

jiiiji

|log|log

ndash Method

SWSWSW

1 )( where

)()()()(2

)()()(

) - -(

1kjkikjkiji

Confusion matrix

G0 = 09

G1 = 045

G4 = 00

G3 = 01

G2 = 025

G0 = 09

G1= 045

G4 = 00

G3 = 01

G2 = 025

target language

ttctscttctsctsd1

kkkk )()()()(2

Initial model

( 也許 100 小時 )

New models

4 Conclusion

ndash Method

SWSWSW

1 )( where

)()()()(2

)()()(

) - -(

1kjkikjkiji

Confusion matrix

G0 = 09

G1 = 045

G4 = 00

G3 = 01

G2 = 025

G0 = 09

G1= 045

G4 = 00

G3 = 01

G2 = 025

target language

ttctscttctsctsd1

kkkk )()()()(2

Initial model

( 也許 100 小時 )

New models

4 Conclusion

G0 = 09

G1 = 045

G4 = 00

G3 = 01

G2 = 025

G0 = 09

G1= 045

G4 = 00

G3 = 01

G2 = 025

target language

ttctscttctsctsd1

kkkk )()()()(2

Initial model

( 也許 100 小時 )

New models

4 Conclusion

G0 = 09

G1 = 045

G4 = 00

G3 = 01

G2 = 025

G0 = 09

G1= 045

G4 = 00

G3 = 01

G2 = 025

target language

ttctscttctsctsd1

kkkk )()()()(2

Initial model

( 也許 100 小時 )

New models

4 Conclusion

G0 = 09

G1 = 045

G4 = 00

G3 = 01

G2 = 025

G0 = 09

G1= 045

G4 = 00

G3 = 01

G2 = 025

target language

ttctscttctsctsd1

kkkk )()()()(2

Initial model

( 也許 100 小時 )

New models

4 Conclusion

G0 = 09

G1 = 045

G4 = 00

G3 = 01

G2 = 025

G0 = 09

G1= 045

G4 = 00

G3 = 01

G2 = 025

target language

ttctscttctsctsd1

kkkk )()()()(2

Initial model

( 也許 100 小時 )

New models

4 Conclusion

G0 = 09

G1= 045

G4 = 00

G3 = 01

G2 = 025

target language

ttctscttctsctsd1

kkkk )()()()(2

Initial model

( 也許 100 小時 )

New models

4 Conclusion

target language

ttctscttctsctsd1

kkkk )()()()(2

Initial model

( 也許 100 小時 )

New models

4 Conclusion

Initial model

( 也許 100 小時 )

New models

4 Conclusion

Initial model

( 也許 100 小時 )

New models

4 Conclusion

Initial model

( 也許 100 小時 )

New models

4 Conclusion

Initial model

( 也許 100 小時 )

New models

4 Conclusion

ACOUSTIC-PHONETIC UNIT SIMILARITIES FOR CONTEXT DEPENDENT ACOUSTIC MODEL PORTABILITY Viet Bac Le*,...

Documents

Transcript of ACOUSTIC-PHONETIC UNIT SIMILARITIES FOR CONTEXT DEPENDENT ACOUSTIC MODEL PORTABILITY Viet Bac Le*,...

Imag Manual

Acoustic radiation force imaging sonoelastography for ... · Recently, acoustic radiation force impulse (ARFI) imag-ing sonoelastography has been proposed as an alterna-tive method

Imag. Alcaloides.lab (1)

iMag Pro - Aptika

Sillabus Imag, 2008

Circuitos 2 Ultimo Imag

Bieber iMag issue 1

iMAG 4700 - WordPress.com

Nissan broshure imag couche

iMag, iMag Pro (II) User Manual - ID TECH Products

Nissan broshure imag

TABLE DES MATIERES - imag

GHOSH SPEECH 2 23 04 FINAL .PPT [Read-Only] · VIDEO 4 “Battlefield” (SPEAKER IMAG) (SPEAKER IMAG) (SPEAKER IMAG) VIDEO 6 “Soldier/Circle” (BETA) TRUSTWORTHY FOUNDATIONS .

RE IMAG INE

AIM Imag Issue 51

Far South Coast Imag

AIM IMag Issue 46

iMAG 4700 Magmeter Instructions

Handling numbers - imag

Paper sin imag si