An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system •...

19
An Investigation of Adaptation Techniques for Building Acoustic Models for Hearingimpaired Children in a CAPT Application Yingke Zhu Brian Mak {yzhuav, mak} @cse.ust.hk ISCSLP 2016@Tianjin

Transcript of An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system •...

Page 1: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

An  Investigation  of  Adaptation  Techniques  for

Building  Acoustic  Models  for  Hearing-­impaired  Children  

in  a  CAPT  Application

Yingke Zhu                        Brian  Mak{yzhuav,  mak}  @cse.ust.hk

ISCSLP  2016@Tianjin

Page 2: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

Outline§ Computer  assisted  pronunciation  training  system

§ Adaptation  techniques§ KL  divergence  regularization§ Linear  input  network§ Learning  hidden-­unit  contributions  (LHUC)

§ Evaluation  in  the  real  system

§ Task  description

ISCSLP  2016@Tianjin

Page 3: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

§ CAPT  system

• Android-­‐based  computer-­‐assisted  pronunciation  training  application  developed  for  the  local  hearing-­‐impaired  (HI)  children.

• Cantonese

Each  character  is  a  syllable.

19  initial  consonants

initial finalnull a  vowel  and  an  optional  ending  consonant

18  vowels  and  6  ending  consonants

• Contains  listening  and  speaking  exercises  of  around  400  Cantonese  words.

ISCSLP  2016@Tianjin

Page 4: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

§ CAPT  system

Exercise  for  19  initial  consonants AssessmentPhoneme  verification  problem

• ExerciseTell  the  difference  between  two  very  similar  words  that  differ  only  in  their  initial  consonants.

ISCSLP  2016@Tianjin

Page 5: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

§ Task  Description

• Aim

Score  the  pronunciation  of  the  initial  phone  in  Cantonese  words.

• Acoustic  model

Modeling  Cantonese  monophones

• Performance  metrics

PER:      overall  phone  error  rateICER:    initial  consonant  error  rate

ISCSLP  2016@Tianjin

Page 6: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

§ Acoustic  Model

Target  userHearing  impaired(HI)  children  in  Hong  Kong

• Aged  between  6  to  12

Lack  of  data

• 1  hour  of  Cantonese  speech  from  36  HI  children

• Sufficient  normal  hearing  adults  speech  data

Problem

Strategy

Training

Group  Adaptation• HI  children  speech  data

NH  adults  acoustic  model

HI  children  acoustic  model

ISCSLP  2016@Tianjin

Page 7: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

§ Acoustic  Model

h1

h2

h3

h4

Softmax:  132  monophone states

Input:  39-­MFCC  × 11  context  frames

4×  2048hidden  layers

Training:  NH  adult  model

• 35  hours  speech  data  from  166  normal  hearing  adults

Adaptation:  HI  children  model

• 1  hours  speech  data  from  36  hearing  impaired  children

Data set #  Speakers Amount

Adaptation 18 0.51  h

Dev 9 0.22  h

Test 9 0.27  h

ISCSLP  2016@Tianjin

Page 8: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

§ NH  Adult  Acoustic  Model

• Results  on  two  test  sets

• The  performance  of  NH  adult  acoustic  model  on  HI  children  test  set  has  a  significant  drop.

• There’s  a  mismatch  between  two  speech  corpus.

Adults – Children  

Normal  hearing  – Hearing  impaired

ISCSLP  2016@Tianjin

Page 9: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

Adaptation  techniques

§ KL  divergence  regularization

§ Linear   input  network

§ Learning  hidden-­unit   contributions

ISCSLP  2016@Tianjin

Page 10: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

§ KL  divergence  regularization

• DNN  training  – optimization  criterion

• KLD  adaptation  – regularized  optimization  criterion

• Implementation

Conventional  BP  algorithm

ISCSLP  2016@Tianjin

Page 11: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

§ KL  divergence  regularization

ISCSLP  2016@Tianjin

Page 12: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

§ Linear  input  network

h1

h2

h3

h4

Softmax

D-­dimension  feature  ×N  context  frames

Added   linear  layer

• LIN

• LIN-­Nblock

#Parameters:  ND× (ND+1)

#Parameters:  ND× (D+1)

intra-­‐frame  relation

intra-­frame  and    inter-­frame  relations

ISCSLP  2016@Tianjin

Page 13: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

§ Linear  input  network

• Results

ISCSLP  2016@Tianjin

Page 14: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

… …

LHUC  adaptation:

Speaker  independent  DNN:

§ Learning  hidden-­unit  contributions  (LHUC)

Choice  of  scaling  factor:

×

h1

h2

h3

h4

Softmax

Input #Parameters: #adapted  hidden  layers

#hidden  nodes

ISCSLP  2016@Tianjin

Page 15: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

§ Learning  hidden-­unit  contributions  (LHUC)

ISCSLP  2016@Tianjin

Page 16: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

§ Adaptation  results  summarization

ISCSLP  2016@Tianjin

Page 17: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

§ Evaluation  in  the  real  system• Assessment  performance  is  reported    in  terms  of  equal  error  rate  (EER).  

System  operation  point

EER=0.27EER=0.22

0.420.35

ISCSLP  2016@Tianjin

Page 18: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

§ Conclusion• We  investigated  various  speaker  adaptation  techniques  for  group  adaptation:  adapting  an  NH  adults  acoustic  model  to  work  force  HI  children  in  a  mobile  CAPT  application.

• The  major  challenges  are:

§ the  acoustic  characteristics  of  HI  children  speech  are  very  different  from  those  of  NH  speakers  in  the  original  model

§ the  amount  of  adaptation  data  is  very  limited

• We  investigated  KLD  regularization,  LHUC,  LIN,  and  their  combinations.  Among  the  three  methods,  if  they  were  applied  alone,  KLD  regularization  gave  the  best  performance.  

• Further  improvement  could  be  achieved  from  the  joint  adaptation  of  KLD  and  LIN-­Nblock,  reducing  PER  and  ICER  by  a  relative  11%  and  16%  respectively.

ISCSLP  2016@Tianjin

Page 19: An#Investigation#of#Adaptation#Techniques#for Building# ... · CAPT*system • Android5basedcomputer5assisted&pronunciation&training& application&developed&for&the&localhearing5impaired&(HI)&

Q  &  A

ISCSLP  2016@Tianjin