Towards a Cohort-Selective Frequency-Compression Hearing Aid
description
Transcript of Towards a Cohort-Selective Frequency-Compression Hearing Aid
Towards a Cohort-Selective Frequency-Compression Hearing Aid
Marie Roch¤, Richard R. Hurtig¥,
Jing Lui¤, and Tong Huang¤
¥¤
2
Sensorineural Hearing loss
• Most common type of hearing loss
• Affects > 20 million in the US alone
• Caused by physiological problems in the cochlea
3
Traditional Hearing Aids
• Amplification of frequency bands
• Amplitude compression
• Works best in situations with high SNR
4
Problems With Traditional Methods
• Simple amplification insufficient
• Individuals with severe hearing loss cannot perceive formants
“Where were you while we were away”Harrington and Cassidy 1999, p. 110
5
Preserving the formants
• Frequency domain compression [Turner & Hurtig 1999] permits preservation of formants
6
Effectiveness
• Clinical study of 15 hearing-impaired listeners showed improvement when listening to different groups– female talkers: 45% improvement– male talkers: 20% improvement
Female Talker- Uncompressed
Female Talker- Compressed
7
Challenges
• Not all voices require the same level of compression
• Single setting leads to inappropriate levels of compression
8
Adaptive thresholds
• Decision-based control mechanism
• Establish cohorts and compress according to cohort class.
• Some possible cohorts:– Phonological units– Pitch– Speaker “gender”
9
Gender-based classifier
• Selected “gender” for first study.
– Female, Male, Child
– Classifier output more stable than with phonological approaches.
– Broad support in the literature for the ability of both humans and machines to do this.
10
Classifier
• Gaussian mixture models
• Features extracted from 25 ms windows shifted every 10 ms– Energy– 12 Mel-filtered cepstral coefficients (MFCC)– Time-derivatives of Energy & MFCC
11
Control system architecture
FeatureExtractionx[t] -> f[t]
speech
femalemodel
malemodel
likelihoodprojection
log Pr(f[t]|female)
log Pr(f[t]|male)
MovingAverage
Decisionlogic
cohortIncreases F-ratiowhen all framesfrom same class
12
LDC SPIDRE Corpus
• Conversational telephone speech– Band-limited 8 kHz– Mu-law encoded
• Endpointed with the NIST/Kubala endpointer
• Train– Single sides of same-
gender phone calls– 25 male & female
• Test– 87 annotated cross-
gender phone calls– About 7 hours of calls
(~5 min. each)
13
SPIDRE Classification Results
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.23
0.24
0.25
0.26
0.27
0.28
0.29
0.3
Averaging window (secs.)
Err
or
rate
32 mixtures64 mixtures128 mixtures256 mixtures
14
Error analysis
• Many errors occurred in fricatives which have high frequency energy
0 2000 4000 6000 8000 10000 12000-90
-80
-70
-60
-50
-40
-30
-20
Hz
dB
telephonebandwidth
15
Evalution on TIMIT• 630 speakers, clean speech 16 kHz corpus• Train: 25 male, 25 female. Test 413 male, 167 female.
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.235
0.24
0.245
0.25
0.255
0.26
0.265
0.27
Averaging window (secs.)
Err
or
rate
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.045
0.05
0.055
0.06
0.065
0.07
0.075
Averaging window (secs.)
Err
or
rate
TIMITSPIDRE
16
Median Smoothing (SPIDRE)
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.23
0.24
0.25
0.26
0.27
0.28
0.29
0.3
Averaging window (secs.)
Err
or
rate
32 mixtures256 mixtures
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
0.1
0.2
0.3
0.4
0.5
Secs.
Pr(
Seg
men
t L
eng
th
N S
ecs.
Median 0
Median 15
mediansmoothed
17
Conclusions & Future Work
• Classifier-based control systems – feasible– can be applied to other signal enhancement
algorithms– need not be limited to the cohorts presented
today (e.g. auditory scene analysis)