Towards a Cohort-Selective Frequency-Compression Hearing Aid

Towards a Cohort-Selective Frequency-Compression Hearing Aid

Marie Roch¤, Richard R. Hurtig¥,

Jing Lui¤, and Tong Huang¤

¥¤

2

Sensorineural Hearing loss

• Most common type of hearing loss

• Affects > 20 million in the US alone

• Caused by physiological problems in the cochlea

3

Traditional Hearing Aids

• Amplification of frequency bands

• Amplitude compression

• Works best in situations with high SNR

4

Problems With Traditional Methods

• Simple amplification insufficient

• Individuals with severe hearing loss cannot perceive formants

“Where were you while we were away”Harrington and Cassidy 1999, p. 110

5

Preserving the formants

• Frequency domain compression [Turner & Hurtig 1999] permits preservation of formants

6

Effectiveness

• Clinical study of 15 hearing-impaired listeners showed improvement when listening to different groups– female talkers: 45% improvement– male talkers: 20% improvement

Female Talker- Uncompressed

Female Talker- Compressed

7

Challenges

• Not all voices require the same level of compression

• Single setting leads to inappropriate levels of compression

8

Adaptive thresholds

• Decision-based control mechanism

• Establish cohorts and compress according to cohort class.

• Some possible cohorts:– Phonological units– Pitch– Speaker “gender”

9

Gender-based classifier

• Selected “gender” for first study.

– Female, Male, Child

– Classifier output more stable than with phonological approaches.

– Broad support in the literature for the ability of both humans and machines to do this.

10

Classifier

• Gaussian mixture models

• Features extracted from 25 ms windows shifted every 10 ms– Energy– 12 Mel-filtered cepstral coefficients (MFCC)– Time-derivatives of Energy & MFCC

11

Control system architecture

FeatureExtractionx[t] -> f[t]

speech

femalemodel

malemodel

likelihoodprojection

log Pr(f[t]|female)

log Pr(f[t]|male)

MovingAverage

Decisionlogic

cohortIncreases F-ratiowhen all framesfrom same class

12

LDC SPIDRE Corpus

• Conversational telephone speech– Band-limited 8 kHz– Mu-law encoded

• Endpointed with the NIST/Kubala endpointer

• Train– Single sides of same-

gender phone calls– 25 male & female

• Test– 87 annotated cross-

gender phone calls– About 7 hours of calls

(~5 min. each)

13

SPIDRE Classification Results

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.23

0.24

0.25

0.26

0.27

0.28

0.29

0.3

Averaging window (secs.)

Err

or

rate

32 mixtures64 mixtures128 mixtures256 mixtures

14

Error analysis

• Many errors occurred in fricatives which have high frequency energy

0 2000 4000 6000 8000 10000 12000-90

-80

-70

-60

-50

-40

-30

-20

Hz

dB

telephonebandwidth

15

Evalution on TIMIT• 630 speakers, clean speech 16 kHz corpus• Train: 25 male, 25 female. Test 413 male, 167 female.

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.235

0.24

0.245

0.25

0.255

0.26

0.265

0.27


Err

or

rate

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.045

0.05

0.055

0.06

0.065

0.07

0.075


Err

or

rate

TIMITSPIDRE

16

Median Smoothing (SPIDRE)

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.23

0.24

0.25

0.26

0.27

0.28

0.29

0.3


Err

or

rate

32 mixtures256 mixtures

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0

0.1

0.2

0.3

0.4

0.5

Secs.

Pr(

Seg

men

t L

eng

th

N S

ecs.

Median 0

Median 15

mediansmoothed

17

Conclusions & Future Work

• Classifier-based control systems – feasible– can be applied to other signal enhancement

algorithms– need not be limited to the cohorts presented

today (e.g. auditory scene analysis)

Towards a Cohort-Selective Frequency-Compression Hearing Aid

Documents

Transcript of Towards a Cohort-Selective Frequency-Compression Hearing Aid