Advances in WP2

14
Advances in WP2 Chania Meeting – May 2007 www.loquendo.com

description

Chania Meeting – May 2007. Advances in WP2. www.loquendo.com. Summary. Unsupervised Adaptation Adaptation on Hiwire DB. Chania Meeting – May 2007. Supervised vs Unsupervised Adaptation. www.loquendo.com. transcriptions. ASR Forced segmentation. forced segmentations. Adapted models. - PowerPoint PPT Presentation

Transcript of Advances in WP2

Page 1: Advances in WP2

Advances in WP2

Chania Meeting – May 2007

www.loquendo.com

Page 2: Advances in WP2

2

Summary

• Unsupervised Adaptation

• Adaptation on Hiwire DB

Page 3: Advances in WP2

Supervised vs Unsupervised

Adaptation

Chania Meeting – May 2007

www.loquendo.com

Page 4: Advances in WP2

4

Supervised Adaptation

Gen. models

Adapted models

transcriptionsASR

Forced segmentation

AdaptationModule

forced segmentations

Speech

parametersAdaptation set

Page 5: Advances in WP2

5

Unsupervised Adaptation

transcriptions

AdaptationModule

Gen. models

Adapted models

ASR

Forced segmentation

Speech

parameters

ASR

Recognition

Confidence based

selection

Adaptation set

forced segmentationsASR segmentations

Page 6: Advances in WP2

Adaptation on HIWIRE DB

Chania Meeting – May 2007

www.loquendo.com

Page 7: Advances in WP2

7

Kinds of Adaptation

Two kind of adaptation were performed:

• Multi-Condition: the adaptation data of all the speakers and all noise conditions are pooled. The models are adapted to channel, noise conditions, and non-native common aspects.

• Speaker-Dependent: Adaptation and tests are performed for each speaker separately, and all results are finally averaged. The models are adapted mainly to speaker’s voice, but also to channel and noise conditions.

Page 8: Advances in WP2

8

Adaptation Types

Two type of adaptation are experimented:

• Supervised: the transcriptions of the sentences available in HDB are employed to perform forced segmentation of the adaptation utterances, providing the labels needed by the adaptation process, which is intrinsically supervised.

• Unsupervised: the transcriptions of the sentences are not employed, to simulate an “on-the-field” adaptation, and are approximated by the ASR outputs. Only the adaptation utterances recognized with a certain degree of confidence are used in the adaptation process, to avoid divergence due to incorrectly labeled data.

Page 9: Advances in WP2

9

Multi-Condition Adaptation

Multi-Condition

Adaptation Denoising method

Noise ConditionAVG E.R.

%Method Type Clean LN MN HN

No -

No

90.5 49.1 27.5 5.0 43.0 -

LHN cons Supv 97.5 81.1 59.2 13.4 62.8 34.7

LHN spec Supv 98.2 90.9 79.6 34.8 75.9 57.7

No -

EM

90.2 71.9 55.0 16.6 58.4 27.0

LHN cons Supv 90.6 97.1 79.3 31.1 74.5 55.3

LHN spec Supv 98.0 93.2 83.7 35.5 77.6 60.7

LHN cons Unsupv EM 94.3 87.2 76.8 31.5 72.5 51.7

LHN spec Unsupv 93.7 85.5 73.7 27.1 70.0 47.4

• Adaptation is done with all the speakers and noise conditions together

• It adapts to channel, noise conditions, and non-native common aspects

Page 10: Advances in WP2

10

Multi-Condition Adaptation

0

20

40

60

80

100

120

Clean LN MN HN AVG

wo

rd a

ccu

racy % No-Adapt No-Den

No-Adapt EM-Den

Supv-Adapt No-Den

Supv-Adapt EM-Den

Unsupv-Adapt EM-Den

• Adaptation is done with all the speakers and noise conditions together• It adapts to channel, noise conditions, and non-native common aspects

Page 11: Advances in WP2

11

Comments

• supervised multi-condition adaptation gives good performance improvement. It operates well even without denoising, since it incorporates information of channel, noise and non-native accents in the models.

• The average best results are obtained with supervised adaptation in conjunction with denoising (60.7% E.R.)

• As expected, unsupervised adaptation is inferior to supervised adaptation (51.7% vs. 60.7% E.R.), but it proves to be an effective technique for adaptation in real life applications, when transcriptions of vocal material are not available.

Page 12: Advances in WP2

13

Speaker Adaptation• Adaptation is done speaker by speaker

• Starting Models: Microphone 16kHz

• Denoising method is SNR dep. Ephraim-Malah spectral attenuation

0

20

40

60

80

100

120

Clean LN MN HN AVG

wo

rd a

ccu

racy

%

No Adapt - Adapt Supv Adapt Unsupv

Page 13: Advances in WP2

14

Comments

• Speaker adaptation is very effective on HDB. The error reduction achieved by Supervised Adaptation plus Ephraim-Malah noise reduction is quite large

• The main improvements are in noisy conditions

• As expected, unsupervised adaptation is inferior to supervised adaptation, due to the errors introduced by the ASR transcriptions, but still it is very relevant.

Page 14: Advances in WP2

15

Workplan

• Selection of suitable benchmark databases (m6)

• Baseline set-up for the selected databases (m8)

• LIN adaptation method implemented and experimented on the

benchmarks (m12)

• Experimental results on Hiwire database with LIN (m18)

• Innovative NN adaptation methods and algorithms for acoustic

modeling and experimental results (m21)

• Further advances on new adaptation methods (m24)

• Unsupervised Adaptation: algorithms and experimentation (m33)