Advances in WP2

Advances in WP2

Chania Meeting – May 2007

www.loquendo.com

2

Summary

• Unsupervised Adaptation

• Adaptation on Hiwire DB

Supervised vs Unsupervised

Adaptation


www.loquendo.com

4

Supervised Adaptation

Gen. models

Adapted models

transcriptionsASR

Forced segmentation

AdaptationModule

forced segmentations

Speech

parametersAdaptation set

5

Unsupervised Adaptation

transcriptions

AdaptationModule

Gen. models

Adapted models

ASR

Forced segmentation

Speech

parameters

ASR

Recognition

Confidence based

selection

Adaptation set

forced segmentationsASR segmentations

Adaptation on HIWIRE DB


www.loquendo.com

7

Kinds of Adaptation

Two kind of adaptation were performed:

• Multi-Condition: the adaptation data of all the speakers and all noise conditions are pooled. The models are adapted to channel, noise conditions, and non-native common aspects.

• Speaker-Dependent: Adaptation and tests are performed for each speaker separately, and all results are finally averaged. The models are adapted mainly to speaker’s voice, but also to channel and noise conditions.

8

Adaptation Types

Two type of adaptation are experimented:

• Supervised: the transcriptions of the sentences available in HDB are employed to perform forced segmentation of the adaptation utterances, providing the labels needed by the adaptation process, which is intrinsically supervised.

• Unsupervised: the transcriptions of the sentences are not employed, to simulate an “on-the-field” adaptation, and are approximated by the ASR outputs. Only the adaptation utterances recognized with a certain degree of confidence are used in the adaptation process, to avoid divergence due to incorrectly labeled data.

9

Multi-Condition Adaptation

Multi-Condition

Adaptation Denoising method

Noise ConditionAVG E.R.

%Method Type Clean LN MN HN

No -

No

90.5 49.1 27.5 5.0 43.0 -

LHN cons Supv 97.5 81.1 59.2 13.4 62.8 34.7

LHN spec Supv 98.2 90.9 79.6 34.8 75.9 57.7

No -

EM

90.2 71.9 55.0 16.6 58.4 27.0

LHN cons Supv 90.6 97.1 79.3 31.1 74.5 55.3

LHN spec Supv 98.0 93.2 83.7 35.5 77.6 60.7

LHN cons Unsupv EM 94.3 87.2 76.8 31.5 72.5 51.7

LHN spec Unsupv 93.7 85.5 73.7 27.1 70.0 47.4

• Adaptation is done with all the speakers and noise conditions together

• It adapts to channel, noise conditions, and non-native common aspects

10

Multi-Condition Adaptation

0

20

40

60

80

100

120

Clean LN MN HN AVG

wo

rd a

ccu

racy % No-Adapt No-Den

No-Adapt EM-Den

Supv-Adapt No-Den

Supv-Adapt EM-Den

Unsupv-Adapt EM-Den

• Adaptation is done with all the speakers and noise conditions together• It adapts to channel, noise conditions, and non-native common aspects

11

Comments

• supervised multi-condition adaptation gives good performance improvement. It operates well even without denoising, since it incorporates information of channel, noise and non-native accents in the models.

• The average best results are obtained with supervised adaptation in conjunction with denoising (60.7% E.R.)

• As expected, unsupervised adaptation is inferior to supervised adaptation (51.7% vs. 60.7% E.R.), but it proves to be an effective technique for adaptation in real life applications, when transcriptions of vocal material are not available.

13

Speaker Adaptation• Adaptation is done speaker by speaker

• Starting Models: Microphone 16kHz

• Denoising method is SNR dep. Ephraim-Malah spectral attenuation

0

20

40

60

80

100

120

Clean LN MN HN AVG

wo

rd a

ccu

racy

%

No Adapt - Adapt Supv Adapt Unsupv

14

Comments

• Speaker adaptation is very effective on HDB. The error reduction achieved by Supervised Adaptation plus Ephraim-Malah noise reduction is quite large

• The main improvements are in noisy conditions

• As expected, unsupervised adaptation is inferior to supervised adaptation, due to the errors introduced by the ASR transcriptions, but still it is very relevant.

15

Workplan

• Selection of suitable benchmark databases (m6)

• Baseline set-up for the selected databases (m8)

• LIN adaptation method implemented and experimented on the

benchmarks (m12)

• Experimental results on Hiwire database with LIN (m18)

• Innovative NN adaptation methods and algorithms for acoustic

modeling and experimental results (m21)

• Further advances on new adaptation methods (m24)

• Unsupervised Adaptation: algorithms and experimentation (m33)

Advances in WP2

Documents

Transcript of Advances in WP2