Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic...

27
Automatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute of Information Technology University of Ulm, Germany

Transcript of Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic...

Page 1: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Automatic Speech Recognition for

Dialectal Arabic

PhD Examination

Mohamed Ahmed ElmahdyAugust 26, 2011

Dialogue Systems Group

Institute of Information Technology

University of Ulm, Germany

Page 2: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 2

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Largest still living Semitic language

250+ million native speakers

Arabic

Formal Dialectal

Modern Standard Arabic (MSA)

Standardized

A lot of ASR research

Not used in everyday life

Used in everyday life

Not standardized (mainly spoken)

Many different dialects

Very few ASR research

Significant differences between MSA and Dialectal Arabic

Considered as completely different languages

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

Arabic Language

Page 3: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 3

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

MSA Versus Dialectal Arabic

Egyptian Colloquial Arabic (ECA) has been chosen as a typical Arabic dialect

Phonological

/t/, /s/ in ECA instead of /T/ in MSAe.g. /tala:tah/ (three) in ECA versus /Tala:Tah/ in MSA

Lexical

/t‘ArAbE:zA/ (table) in ECA versus /t‘awila/ in MSA

Syntactic

SVO in ECA versus VSO in MSA

Page 4: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 4

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Automatic Speech Recognition

High level diagram for a state-of-the-art ASR system

FeatureExtraction

Speech

DecoderFeatures

Words

SpeechCorpus

Acoustic Model

LanguageModel

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

For dialectal Arabic, sparse and low quality speech corpora are available

)()|(maxarg^

WPWOPWLW

)|( WOP )(WP

^

W

Training

O

Page 5: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 5

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Thesis Objectives

Acoustic modeling for dialectal Arabic where little speech data exists

To benefit from existing MSA speech data to improve dialectal Arabic speech

recognition

Acoustic modeling for dialectal Arabic where phonetic transcription is not

possible

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

Page 6: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 6

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Outline

Introduction

Previous work

Approaches

Experiments and results

Conclusions and future directions

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

Page 7: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 7

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Previous Work

Data Pooling Acoustic Modeling

A speech data pool of MSA and Egyptian Colloquial Arabic (ECA) is used to train the acoustic model

By adding more MSA data, less contribution of ECA data

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

Data Pool

ECAcorpus

MSAcorpus

Training

Training

Kirchhoff, K., and Vergyri, D. (2005) Cross-Dialectal Data Sharing For Acoustic Modeling in Arabic Speech Recognition. Speech Communication, vol. 46(1), pp. 37-51

~3% Relative reduction in WER

Baselineacousticmodel

Finalacousticmodel

Page 8: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 8

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Outline

Introduction

Previous work

Approaches

Experiments and results

Conclusions and future directions

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

Page 9: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 9

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Proposed Approaches for Dialectal Arabic ASR

Phonemic acoustic modeling

→ Dialectal speech data where phonetic transcription is available

Graphemic acoustic modeling

Unsupervised acoustic modeling

Arabic Chat Alphabet-based acoustic modeling

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

Page 10: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 10

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Phonemic Cross-Lingual Acoustic Modeling

Benefit from existing large MSA speech corpora

Assumptions:

MSA is always a 2nd language for any Arabic speaker

Large amount of MSA speech data (large number of speakers) implicitly cover all the acoustic features of the different Arabic dialects

Approach:

Train an acoustic model using a large amount of MSA speech data

Adaptation of the MSA acoustic models with a little amount of dialectal speech data

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

Page 11: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 11

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Phonemic Cross-Lingual Acoustic Modeling (cont.)

State-of-the-art AM adaptation techniques include:

Maximum Likelihood Linear Regression (MLLR)

Maximum A-Posteriori (MAP)

Requirement: adaptation data and the AM have to share the same

language and phoneme set

Egyptian Colloquial Arabic (ECA) is chosen as a typical dialect

INITIALLY: MSA and ECA do not share the same phoneme inventory

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

MSA ECA

)()|(maxarg

POPMAP

bAMLLR

Acoustic model adaptation is not possible

Page 12: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 12

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Phonemic Cross-Lingual Acoustic Modeling (cont.)

SOLUTION: Phoneme sets normalization

AM adaptation is possible

Phoneme sets normalization

Several phone mapping rules are applied

Map ECA phonemes to their origins in MSA

(even if they are acoustically different)

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

MSA

MSA

ECA

ECA

Normalizationphone

mapping rulesECA

MSA

/b/ /g/ /j/ /e/ /i/ /o/ /u/ /t/ …….

/b/ /dZ/ /i/ /u/ /t/ ………

جزر

(carrot) /g/ /A/ /z/ /A/ /r/ /dZ/ /a/ /z/ /a/ /r/

Page 13: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 13

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Phonemic Cross-Lingual Acoustic Modeling (cont.)

Block diagram for the proposed approach

The adapted ECA AM is evaluated against the ECA baseline AM

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

ECA corpus

MSAcorpus

PhonemesNormalization

PhonemesNormalization

NormalizedMSA

corpus

NormalizedECA

corpus

Training MSAacousticmodel

ECA baselinemodel

Training

MLLR adaptation

MAP adaptation

ECA final

model

Page 14: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 14

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Proposed Approaches for Dialectal Arabic ASR

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

Phonemic acoustic modeling

→ Dialectal speech data where phonetic transcription is available

Graphemic acoustic modeling

→ Phonetic transcription is not possible/difficult

→ Short vowels are missing

→ Phonetic transcription is approximated to be word letters

Unsupervised acoustic modeling

→ Transcriptions are not available at all

→ Dialectal speech was automatically transcribed using a MSA model

Arabic Chat Alphabet-based acoustic modeling

→ Latin letters are used instead of Arabic ones

→ Include short vowels that are missing in traditional Arabic orthography

Page 15: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 15

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Outline

Introduction

Previous work

Approaches

Experiments and results

Conclusions and future directions

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

Page 16: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 16

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Speech Corpora

Egyptian Colloquial Arabic (ECA) corpus

→ High quality read speech

→ Accurate phonetic transcription

→ Different speech domains to ensure

good acoustic features coverage:

time/date, spelling, restaurant,

numbers, reservation, etc

Levantine Colloquial Arabic (LCA) corpus

→ Spontaneous speech

→ No phonetic transcription

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

Modern Standard Arabic (MSA) corpus

→ News broadcast speech

→ Accurate phonetic transcription

Page 17: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 17

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Phonemic Cross-Lingual Adaptation Results

ECA corpus:

→ 65% for training/adaptation

→ 35% for testing

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

Word Error Rate (WER)

N

DelInsSubWER

0.00

5.00

10.00

15.00

20.00

25.00

30.00

ECA Phonemic AM

ECA baseline

MSA only

MSA+ECA data pooling

MSA+ECA adaptationWE

R (

%)

41.8% Relative reduction in WER

Page 18: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 18

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Effect of MSA Speech Data Amount

Varying the amount of MSA speech data

Effect on phonemic cross-lingual adaptation

0

2

4

6

8

10

12

14

16

18

0.5 1 2 4 8 16 32

MSA+ECA adaptation

MSA speech amount (hours)

WE

R (

%)

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

Consistent decrease in WER

Page 19: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 19

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Graphemic Cross-Lingual Adaptation Results

Tested on two Arabic dialects:

→ Egyptian Colloquial Arabic (ECA)

→ Levantine Colloquial Arabic (LCA)

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

ECA graphemic AM LCA graphemic AM

Dialect baseline

MSA only

MSA+dialect adaptation

WE

R (

%)

Relative reduction in WER

22.5% for ECA

15.9% for LCA

Page 20: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 20

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Outline

Introduction

Previous work

Approaches

Experiments and results

Conclusions and future directions

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

Page 21: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 21

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Conclusions and Future Directions

Conclusions

→ Supervised/unsupervised cross-lingual acoustic modeling for dialectal

Arabic

→ Improvements are observed in both phonemic and graphemic modeling

→ Consistent reduction in WER by adding more MSA data

→ The approach can be extended to other dialects e.g. Levantine

→ Novel technique using the Arabic Chat Alphabet for phonetic

transcription

Future directions

→ Extension to all the Arabic dialects

→ Cross-lingual language modeling for dialectal Arabic using existing

MSA text resources

→ Automatic dialect recognition

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

Page 22: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 22

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Selected Publications

1. M. Elmahdy, R. Gruhn, S. Abdennadher, and W. Minker. Rapid Phonetic Transcription

using Everyday Life Natural Chat Alphabet Orthography for Dialectal Arabic Speech

Recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing

(ICASSP), Prague, Czech republic, 2011

2. M. Elmahdy, R. Gruhn, W. Minker, and S. Abdennadher. Cross-Lingual Acoustic Modeling

for Dialectal Arabic Speech Recognition. International Conference on Speech and

Language Processing (Interspeech), Makuhari, Japan, 2010

3. M. Elmahdy, R. Gruhn, W. Minker, and S. Abdennadher. Effect of Gaussian Densities and

Amount of Training Data on Grapheme-Based Acoustic Modeling for Arabic. IEEE

International Conference on Natural Language Processing and Knowledge Engineering

(IEEE NLP-KE), Dalian, China, 2009

4. M. Elmahdy, R. Gruhn, W. Minker, and S. Abdennadher. Modern Standard Arabic Based

Multilingual Approach for Dialectal Arabic Speech Recognition. International Symposium on

Natural Language Processing (SNLP), Bankok, Thailand, 2009

5. M. Elmahdy, R. Gruhn, W. Minker, and S. Abdennadher. Survey on common Arabic

language forms from a speech recognition point of view. International Conference on

Acoustics (NAG-DAGA), Netherlands, 2009

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

Page 23: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 23

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Conclusions and Future Directions

Conclusions

→ Supervised/unsupervised cross-lingual acoustic modeling for dialectal

Arabic

→ Improvements are observed in both phonemic and graphemic modeling

→ Consistent reduction in WER by adding more MSA data

→ The approach can be extended to other dialects e.g. Levantine

→ Novel technique using the Arabic Chat Alphabet for phonetic

transcription

Future directions

→ Extension to all the Arabic dialects

→ Cross-lingual language modeling for dialectal Arabic using existing

MSA text resources

→ Automatic dialect recognition

Thank you for your attention

Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions

Page 24: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 24

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Arabic Dialects Classification

Backup slides

Page 25: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 25

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Phonetic Transcription in English

Backup slides

Page 26: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 26

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Phonetic Transcription in Arabic

Diacritics (short vowels) are not written (ambiguity)

High OOV rate due to morphological complexity

Backup slides

Page 27: Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic PhD Examination Mohamed Ahmed Elmahdy August 26, 2011 Dialogue Systems Group Institute

Page 27

Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011

Unsupervised Acoustic Modeling

Backup slides