University of Alberta Letter-to-phoneme conversion Sittichai Jiampojamarn sj@cs.ualberta.ca CMPUT...

University of Alberta

Letter-to-phoneme conversion

Sittichai Jiampojamarnsj@cs.ualberta.ca

CMPUT 500 / HUCO 612September 26, 2007

Outline• Part I

– Introduction to letter-phoneme conversion

• Part II– Many-to-Many alignments and Hidden Markov Models to Letter-

to-phoneme conversion., NAACL 2007

• Part III– On-going work: discriminative approaches for letter-to-phoneme

conversion

• Part IV– Possible term projects for CMPUT 500 / HUGO 612

The task

• Converting words to their pronunciations– study -> [ s t ʌ d I ]– band -> [b æ n d ] – phoenix -> [ f i n I k s ]– king -> [ k I ŋ ]

• Words sequences of letters.• Pronunciations sequence of phonemes.

– Ignoring syllabifications, and stresses.

Why is it important?• Major component in speech synthesis systems

• Word similarity based on pronunciation– Spelling correction. (Toutanova and Moore, 2001)

• Linguistic interest of relationships between letters and phonemes.

• Not a trivial task, but tractable.

Trivial solutions ?

• Dictionary – searching answers on database– Great effort to construct such large lexicon database.– Can’t handle new words and misspellings.

• Rule-based approaches– Work well on non-complex languages– Fail on complex languages

• Each word creates its own rules. --- end up with remembering word-phoneme pairs.

John Kominek and Alan W. Black, “Learning Pronunciation Dictionaries: Language Complexity and Word Selection Strategies”, In proceeding of HLT-NAACL 2006, June 4-9, pp.232-239

Learning-based approaches

• Training data– Examples of words and their phonemes.

• Hidden structure– band [b æ n d ]

• b [b], a [æ], n [n], d [d]

– abode [ə b o d]• a [ə], b [b], o [o], d [d], e [ _ ]

Alignments

• To train L2P, we need alignments between letters and phonemes

a -> [ə]b -> [b]o -> [o]d -> [d]e -> [_]

a b o d e

ə b o d _

Overview standard process

Training data

1-1 alignerAligned

dataPhoneme prediction

pronunciation

Letter-to-phoneme alignments

• Previous work assumed one-to-one alignment for simplicity (Daelemans and Bosch, 1997; Black et al., 1998; Damper et al., 2005).

• Expectation-Maximization (EM) algorithms are used to optimize the alignment parameters.

• Matching all possible letters and phonemes iteratively until the parameters converge.

1-to-1 alignments• Initially, alignments parameters can start from uniform

distribution, or counting all possible letter-phoneme mapping. Ex. abode [ə b o d]

a b o d e

ə b o d_

a b o d e

ə b o d_

a b o d e

ə b o d_

a b o d e

ə b o d_

P(a, ə) = 4/5P(b,b) = 3/5…

a b o d e

ə b o d _

1-to-1 alignments• Find the best possible alignments based on current

alignment parameters.

a b o d e

ə b o d _

• Based on the alignments found, update the parameters.

Finding the best possible alignments

• Dynamic programming:– Standard weighted minimum edit distance algorithm style.

– Consider the alignment parameter P(l,p) is a mapping score component.

– Try to find alignments which give the maximum score.

– Allow to have null phonemes but not null letters• It is hard to incorporate null letters in the testing data

Visualizationa b o d e

_ b o də

_b o də

# a b o d e

_ b o də

_b o də

# a b o d e

_ b o də

_b o də

# a b o d e

_ b o də

_b o də

# a b o d e

_ b o də

_b o də

# a b o d e

_ b o də

_b o də

# a b o d e

_ b o də

_b o də

# a b o d e

_ b o də

_b o də

# a b o d e

_ b o də

_b o də

# a b o d e

Visualization# a b o d e

a b o d e

_ b o də

_b o də

a b o d e

_b o də

Problems with 1-to-1 alignments

• Double letters: two letters map to one phoneme. (e.g. ng [ŋ], sh [ʃ], ph [f])

k i n g

k i ŋ _

k i n g

k i ŋ_

k i n g

k i ŋ

Problem with 1-to-1 alignments

• Double phonemes: one letter maps to two phonemes. (e.g. x [k s], u [j u])

f u m e

f j u m

f u m e

f j u m

f u m e

f j u m _

Previous solutions for double phonemes

• Preprocess using a fix list of phonemes.– [k s] -> [X]– [j u] -> [U]

f u m e

f j u m

f u m e

f U m _

Applying many-to-many alignments and Hidden Markov Models to Letter-to-Phoneme conversion

Sittichai Jiampojamarn, Grzegorz Kondrak and Tarek Sherif

Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-

HLT 2007), Rochester, NY, April 2007, pp.372-379.

Overview system

Training data

1-1 alignerAligned

pronunciation

M-M alignerChunking prediction

Local prediction

Phoneme prediction

Prediction process

Alignment process

Many-to-many alignments

• EM-based method.

• Extended from the forward-backward training of a one-to-one stochastic transducer (Ristad and Yianilos,

1998).

• Allow one or two letters to map to null, one, or two phonemes.

p h o e n i x

f i n ɪ k s

p h o e n i x

f i n ɪ k s

p h o e n i x

f i n ɪ k s

Prediction problem

• Should the prediction model generate phonemes from one or two letters ?

– gash [g æ ʃ ] gasholder [g æ s h o l d ə r]

g a sh

g æ ʃ

g æ s

h o l d e r

h o l d ə r

Letter chunking

• A bigram letter chunking prediction automatic discovers double letters.

Ex. longs

l ɒ ŋ z

l o ng s

Overview system

Training data

1-1 alignerAligned

pronunciation

M-M alignerChunking prediction

Local prediction

Phoneme prediction

Prediction process

Alignment process

Phoneme prediction• Once the training examples are aligned, we need a

phoneme prediction model.

• “Classification task” or “sequence prediction”?

P1 P2 P3

L1 L2 L3

L0L1L2

L1L2L3

Instance based learning• Store the training examples.

• The predicted class is assigned by searching the “most similar” training instance.

• The similarity functions: – Hamming distance, Euclidean distance, etc.

Who do I look like most?

Basic HMMs• A basic sequence-based prediction method.

• In L2P, – letters are observations– phonemes are states

• Output phoneme sequences depend on both emission and transition probabilities.

Applying HMM• Use an instance based learning to produce a list of

candidate phones with confidence values “conf(phonei)” for each letteri. (emission probability).

• Use a language model of phoneme sequence in the training data to obtain transition probability P(phonei | phonei-1, …phonei-n).

Visualization

b / b u / E r / r i / aI

e / _ d / d0.048 0.067 0.003

Conf( i / aI) = 0.714

Conf( i / I) = 0.286

Buried -> [ b E r aI d ] = 2.38 x 10-8 Buried -> [ b E r I d ] = 2.23 x 10-6

Evaluation• Data sets

– English: CMUDict (112K), Celex (65K).– Dutch: Celex (116K).– German: Celex (49K).– French: Brulex (27K).

• IB1 algorithm implemented in TiMBL package as the classifier.(W. Daelemans et al., 2004.)

• Results are reported in word accuracy rate based on 10-fold cross validation.

CMUDict Eng. Celex DutchCelex

GermanCelex

FrenchBrulex

1-1 alignments 1-1 alignments + HMM M-M alignments

Messages

• Many-to-many alignments show significant improvements over one-to-one traditional alignments.

• HMM-like approach helps when a local classify has difficulty to predict phonemes.

Criticism

• Joint models– Alignments, chunking, prediction, and HMM.

• Error propagation– Errors from one model to other models which are

unlikely to re-correct.

• Can we combine and optimize at once ? Or at least allow the system to re-correct past errors ?

On-going work

Discriminative approach

for letter-to-phoneme conversion

Online discriminative learning

• Let x is an input word and y is an output phonemes.

• represents features describing x and y.

• is a weight vector for

Online training algorithm

1. Initially,

2. For k iterations1. For all letter-phoneme sequence pairs (x,y)

2. update weights according to and

Perceptron update (Collins, 2002)

• Simple update training method.

• Try to move the weights to the direction of correct answers when predicting wrong answers.

Examples

• Separable case

Adapted from Dan Klein’s tutorial slides at NAACL 2007.

Examples

• Non-separable case

Adapted from Dan Klein’s tutorial slides at NAACL 2007.

Issues with Perceptron

• Overtraining: test / held-out accuracy usually rises, then falls.

• Regularization: – if the data isn’t separable, weights

often thrash around.

– Finds a “barely” separating solution

Taken from Dan Klein’s tutorial slides at NAACL 2007.

Margin Infused Relaxed Algorithm (MIRA) (Crammer and Singer, 2003)

• Use n-best list to update weights.

• separate by a margin at least as large as a loss function

• and keep the weight changes as small as possible.

Loss function in letter-to-phoneme

• Describe the loss of an incorrect prediction compared to the correct one.

• Word error (0/1), phoneme error, or combination.

Results

• Incomplete !!!– MIRA outperforms Perceptron.

– Using 0/1 loss and combination loss are better than the phoneme loss function alone.

– Overall, results show better performance than previous work.

Possible term projects

1. Explore more linguistic features.

2. Explore machine translation systems for letter-to-phoneme conversion.

3. Unsupervised approaches for letter-to-phoneme conversion.

4. Other cool ideas to improve on a partial system– Data for evaluation are provided– Alignments are provided.– L2P model are provided.

Linguistic features• Looking for linguistic features to help L2P

– Most systems incorporate letter feature (n-gram) type in some ways.

• The new features (must) be obtained by using (only) word information.

• Works been already done– Syllabification : Susan’s thesis

• Find syllabification break on letters using SVM approach.

Machine translation approach

• L2P problem can be seen as a (simple) machine translation problem.

• Where, we’d like to translate letters to phonemes. – Consider: L2P MT

• Letters words• Words sentences• Phonemes target sentences

• Moses -- a baseline SMT system, ACL 2007– http://www.statmt.org/wmt07/baseline.html

– May need to also look at GIZA++, Pharaoh, Carmel, etc.

Unsupervised approaches

• Assuming, we don’t have examples of word-phoneme pairs to train a model.

• We can start from a list of possible letter-phoneme mappings

• Or assuming, we have a small set of example pairs (~100 pairs).

• Don’t expect to outperform the supervised approach but take advantage of being unsupervised methods

References• Collins, M. 2002. Discriminative training methods for hidden Markov models: theory and

experiments with perceptron algorithms. In Proceedings of the Acl-02 Conference on Empirical Methods in Natural Language Processing - Volume 10 Annual Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, 1-8

• Crammer, K. and Singer, Y. 2003. Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res. 3 (Mar. 2003), 951-991.

• Kristina Toutanova and Robert C. Moore. 2001. “Pronunciation modeling for improved spelling correction”. In ACL’02: pp144-151, 2001.

• John Kominek and Alan W Black, “Learning Pronunciation Dictionaries Language Complexity and Word Selection Strategies”, NAACL06, pp. 232-239, 2006.

• Walter M. P. Daelemans and Antal P. J. van den Bosch. 1997. “Language-independent data-oriented grapheme-to-phoneme conversion.” In Progress in Speech Synthesis, pages 77.89. Springer, New York.

• Alan W Black, Kevin Lenzo, and Vincent Pagel. 1998. “Issues in building general letter to sound rules”. In The Third ESCA Workshop in Speech Synthesis, pages 77-80.

References• Robert I. Damper, Yannick Marchand, John DS. Marsters, and Alexander I. Bazin. 2005.

“Aligning text and phonemes for speech technology applications using an EM-like algorithm”, International Journal of Speech Technology, 8(2):147-160, June 2005.

• Eric Sven Ristad and Peter N. Yianilos. 1998. “Learning string-edit distance.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(5):522.532.

• Walter Daelemans, Jakub Zavrel, Ko Van Der Sloot, and Antal Van Den Bosch. 2004. “TiMBL: Tilburg Memory Based Leaner, version 5.1, reference guide.” In ILK Technical Report Series 04-02., 2004.

University of Alberta Letter-to-phoneme conversion Sittichai Jiampojamarn sj@cs.ualberta.ca CMPUT...

Documents

Transcript of University of Alberta Letter-to-phoneme conversion Sittichai Jiampojamarn sj@cs.ualberta.ca CMPUT...

Aura 3D Textures · Aura 3D Textures Xuejie Qin and Yee-Hong Yang Computer Graphics Lab Department of Computing Science University of Alberta Edmonton, Canada {xuq, yang}@cs.ualberta.ca

Motion Similarity Analysis and Evaluation of Motion ... · Edmonton, AB T6G 2E8 Canada guantong02@hotmail.com, yang@cs.ualberta.ca Abstract Motion similarity analysis is a critical

flexible couplings - farnell.com · +44 (0) 1992 501900 2 Huco products are manufactured in Hertford, England, in a modern plant equipped with all necessary design, development, toolroom

Advanced Mechanics of Materials Dr. Sittichai Seangatith

Pitfalls, Remedies, and Opportunities in Modeling …yannis/Presentations/wln...Pitfalls, Remedies, and Opportunities in Modeling Wireless Networks Ioanis Nikolaidis yannis@cs.ualberta.ca

Context-Sensitivity Analysis Literature Review by José Nelson Amaral (amaral@cs.ualberta.ca) University of Alberta.

Huco Dynatork Flexible Couplings · actuators, etc. Uni-Lat models are ideal for encoder, resolver, tachometers, potentiometer drives, as well as small positioning slides, dosing

MACI - University of Alberta - April 20011 High-Performance Computing José Nelson Amaral Department of Computing Science University of Alberta amaral@cs.ualberta.ca.

2015/6/24\course\cpeg421-10F\Topic1-b.ppt1 Topic 1b: Flow Analysis Some slides come from Prof. J. N. Amaral (amaral@cs.ualberta.ca)

Type System for an Object-Oriented Database Programming Language Yuri Leontiev yuri@cs.ualberta.ca yuri University of Alberta.

ทําไมต องศึกษาวิชานี้ Chapter Subjectseng.sut.ac.th/ce/oldce/theory50/theory1.pdf · 1. Lecture Note: 430 331 Theory of Structures, Sittichai

ACOPLAMIENTOS RUEDAS DE MEDIR ÁNGULO FLEXIBLE DE … · 2021. 1. 22. · 452H25.2232.238 ACHU0031 06/10 mm Acoplamientos HUCO disponibles hasta fin de existencias. 12 Art. nº Referencia

Regret Bounds for the Adaptive Control of Linear Quadratic ... · Regret Bounds for the Adaptive Control of Linear Quadratic Systems Yasin Abbasi-Yadkori abbasiya@cs.ualberta.ca Csaba

Does Training Input Selection Matter for Feedback-Directed Optimizations? Paul Berube berube@cs.ualberta.ca University of Alberta CDP05, October 17, 2005.

ADVANCED MECHANICS OF MATERIALS · 2012-11-08 · Advanced Mechanics of Materials by Dr. Sittichai Seangatith 1-1 Chapter 1 Theories of Stress and Strain 1.1 Definition of Stress

By Assoc. Prof. Dr. Sittichai Seangatith SCHOOL OF CIVIL …eng.sut.ac.th/ce/oldce/mechmat50/mech1a.pdf · · 2007-08-27... Russell C. Hibbeler, 2nd SI Edition ... Chapter Subjects:

Huco Precision Couplings - RS ComponentsCoupling Type 536or537:Short 538or539:Long Bore 2 Code SeeBoreSizeChart (largerbore) 537 •20 28 Coupling Size 20,26,34,41 18 Bore 1 Code SeeBoreSizeChart

Robust Tracking by Ilya Levner ilya@cs.ualberta.ca.

Evaluating Address Register Assignment and Oﬀset ...Canada; email: {huynh,amaral,berube}@cs.ualberta.ca; S. Touati, University of Versailles Saint-Quentin-en-Yvelines, 45 avenue

2017NAEM EMIS Program FINAL€¦ · March 7 Demonstration of regAction with BBNA by NuStar Energy LP Tanglewood • Jackie Ednave, HSE Coordinator; ... • Sameer Vyas, Partner; Huco