1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English...

24
1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners’ English Toshifumi Oba, Eric Atwell University of Leeds, School of Computing [email protected] [email protected]

Transcript of 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English...

Page 1: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

1

Using the HTK speech recogniser to analyse prosody in a corpus of German

spoken learners’ English

Toshifumi Oba, Eric Atwell

University of Leeds,

School of Computing

[email protected]

[email protected]

Page 2: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

2

Outline

• Introduction- Intonation and Speech Recognition- Tendency of Speech Recognition Research- ISLE Speech Corpus- HTK Hidden Markov Model Toolkit

• Prosodic Annotation• Human Evalution of Intonation Abilities• Grouping of German Speakers by Intonation Ability• HTK speech recognition experiments• Conclusions• Q & A

Page 3: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

3

Intonation and Speech Recognition

• Intonation is important in Human Communication.- Convey the meaning and attitude of the speaker

• Intonation is important for Speech Recognition.- Acoustic Models

(duration, F0, intensity)

- Language Models

(identify the dialogue type)

Page 4: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

4

Tendency of Speech Recognition Research

• Intonation << Pronunciation• Non-native speaker << Native speaker

→ Speech recognition research for non-native speakers’ intonation is unique.

Also,• Intonation is paid less attention in CALL compared with

pronunciation.

Page 5: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

5

Features of Various Speech Recognition Research

Research Reference Non-N German Inton G/P HTK

(Taylor, 1998) N N Y N Y

(Uebler, 1998) Y N N N N

(Stemmer, 2001) Y Y N N N

(Teixeira, 1996) Y Y N N Y

(Hansen, 1995) Y Y Y Y N

(Yan and Vaseghi, 2002) N N Y N Y

(Jurafsky et al, 1994) Y Y N N N

(Berkling et al,1998) Y N N N Y

(Oba and Atwell, 2003) Y Y Y Y Y

Page 6: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

6

Objectives

• Analysis of non-native speakers’ English intonation.- If the HTK is able to distinguish intonation ?- Is it possible to train distinct models for different

intonation ability groups?

• Prosodic annotation of written English text to produce ‘model’ intonation patterns.

• Human evaluation to group German speakers by English intonation ability.

Page 7: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

7

ISLE Speech Corpus (1)

• Re-use of speech corpus collected in ISLE Interactive Spoken Languge Education project.Leeds University, Universität Hamburg, Università di

Milano-Bicocca, Entropic Ltd., Ernst Klett Verlag GmbH, and Dida*El S.R.L.

• Time-aligned audio recordings from 23 German and 23 Italian spoken learners’ English + 2 Native English Speakers.

Page 8: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

8

ISLE Speech Corpus (2)

• Speaker adaptation- 82 sentences edited from ‘The Ascent of Everest’

e.g. ‘It is in fact a story of many years, in which men tried to climb that mountain.’

• Typical EFL exercises- Minimal Pairs and Polysyllabic words

e.g. ‘I said bad not bed.’ ‘He's a photographer.’

Page 9: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

9

ISLE Speech Corpus (3)

• Annotated corpus- Pronuciation errors at word- and phone-levels- Stress errors at word level

Prosodic annotation was added to a written transcription of the speech corpus in our research.

Page 10: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

10

HTK Hidden Markov Model Toolkit

• Developed at Cambridge University Engineering Depertment (CUED).

• Free toolkit for building Hidden Markov Models (HMMs).

• Module call: available from both command line and script file.

• Used in speech recognition research and other pattern recogntion research.

e.g. Hand writing recognition

Facial recognition

Page 11: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

11

Prosodic Annotation

• Purpose: Predict ‘model’ intonation patterns to be compared against German spoken learners’

English.

• Instructions: ‘From text structure to prosodic structure’ (Knowles, 1996)

• Environment: Windows Excel

• Amount: First 27 sentences from ‘the Ascent of Everest’

Page 12: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

12

Result of Prosodic Annotation (1)

• 27 sentences, consisting of 429 words, were divided into 84 tone groups: prosodic ‘phrases‘.

→ 1 ‘low rise ’, 3 ‘high rise’, 52 ‘fall-rise’ and 28 ‘fall’ patterns.

• First 10 sentences were modified according to native speakers‘ recordings.

→ 15 ‘fall-rise’ and 10 ‘fall’ patterns 1 ‘low rise’, 2 ‘high rise’ and 4 ‘fall-rise’ were deleted.

Page 13: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

13

Result of Prosodic Annotation (2)

(A_01)This is the story <HR> of how two men <FR> reached the top of Everest <FR> on the twenty-ninth of May nineteen fifty-three <FR> and came back safely <HR> to their friends below <F>.

(A_02)Yet this will not be the whole story <F>.

(A_03) The ascent of Everest <FR> was not the work of one day <FR>, nor even of those few unforgettable weeks <FR> in which we prepared and climbed that summer <F>.

Page 14: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

14

Human Evaluation of German Spoken Learners’ English Intonation Abilities

• Purpose: Group German speakers into ‘good’ and ‘poor’ intonation groups.

• Evaluator I: Computational linguistics researcher• Evaluator II: English language teaching researcher

• Quantity: First 10 utterances from each speaker.- If all the tone types of an utterance was matched with

model pattern, then it was judged as correct; otherwise incorrect.

Page 15: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

15

Grouping of 23 German Speakers

Grouping I: based on Evaluator I (Computational linguistics researcher)

Grouping II: based on Evaluator II (English language teaching (ELT) researcher)

Grouping III: agreement of Evaluator I and II.

23speakers

3exceptionally poor pronunciation speakers

8good 4intermediate 8poor intonation speakers

Page 16: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

16

Result of Human Evalution and Grouping

• Two evaluators agreed about 63% (144 utterances out of 230)

• Evaluator II marked 109 errors, while Evaluator I marked 78 errors.

However,• 7 ‘poor’ and 5 ‘good’ speakers were same in Grouping I

and Grouping II.

→ 2 speakers were added to ‘good’ intonation group in Grouping III.

Page 17: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

17

Conditions of HTK Speech Recognition Experiments

• Monophone and triphone HMMs were trained.

• No language models were used.

• Perl script and configuration file were used for module calls.

• Number of training speakers: 6 speakers from the same intonation group.

• Number of test speakers: 2 (1 for Grouping III) speakers from each group.

Page 18: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

18

Results of HTK experiments

• Recognition accuracy was generally higher when test and training speakers’ intonation abilities were same.

• Improvement was higher against triphone HMMs.

• Improvement was most significant in Experiment II.

• One ‘poor’ intonation speaker showed negative improvement in all three experiments.Another ‘poor’ speaker also showed the negative

improvement in Experiment I.

Page 19: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

19

Average Recognition Accuracies of Good Intonation Speakers

(Parentheses show results against monophone HMMs)

Good Poor Improvement

Experiment I 50.26 %

(33.31 %)

40.13 %

(20.11 %)

10.13 %

(13.20 %)

Experiment II 54.62 %

(34.84 %)

35.41 %

(19.41 %)

19.20 %

(15.43 %)

Experiment III 52.24 %

(34.50 %)

35.13 %

(18.09 %)

17.11 %

(16.41 %)

Trained Models

Page 20: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

20

Average Recognition Accuracies of Poor Intonation Speakers

(Parentheses show results against monophone HMMs)

Good Poor Improvement

Experiment I 38.25 %

(22.88 %)

45.01 %

(20.03 %)

6.76 %

(-2.85 %)

Experiment II 30.84 %

(34.84 %)

46.34 %

(19.41 %)

15.50 %

(1.67 %)

Experiment III 31.25 %

(20.60 %)

44.40 %

(20.12 %)

13.14 %

(-0.48 %)

Trained Models

Page 21: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

21

Prosodic Keywords

• Tone type is decided by the last accented syllable. (Knowles, 1996)

→ We called word containing the last accented syllable of each tone group the ‘prosodic keyword’.

→ Recognition accuracy among ‘prosodic keywords’ was counted for triphone cases of Experiment II.

• Improvement of recognition accuracy among prosodic keywords was higher that of overall.- Good test speakers: 26.00% (overall 19.20%)- Poor test speakers: 24.50% (overall 15.50%)

Page 22: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

22

Irrelevance of Pronunciation Abilities

• Good intonation speakers tended to have slightly better pronunication ability than poor intonation speakers, although 3 exceptionally poor pronunciatioin speakers had been excluded.

→ Additional experiments were executed taking 2 ‘best’ and 2 ‘worst’ pronunciation speakers from poor and good intonation groups, respectively.

→ Similar improvement was observed in this experiment too.

Page 23: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

23

Conclusions

• Matching of test and training speakers’ intonation abilities brought about higher recognition accuracy.

• HTK was able to distinguish ‘good’ and ‘poor’ intonation.

• Confirmed that German speakers’ weakness of English intonation was generally ‘fall-rise’ patterns.

• Human evaluation was successful enough.

Page 24: 1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

24

Future Work

• Expand tone types. (not only for ‘fall-rise’ and ‘fall’ patterns)

• Applied to other languages and to different native-speaker groups.

• Use of results in practical language-teaching systems.