Nonlinear Frequency Compression: Balancing Start Frequency...
Transcript of Nonlinear Frequency Compression: Balancing Start Frequency...
American Auditory Society Annual Meeting, March 8-10, 2012
Nonlinear Frequency Compression: Balancing Start Frequency and Compression Ratio
Joshua M. Alexander
Department of Speech, Language, and Hearing Sciences
Purdue University, West Lafayette, IN 47907
http://www.TinyURL.com/PurdueEar
Research Question
Listeners with hearing aids often have limited access to important high-frequency speech information.
For moderately impaired listeners, this can occur because the miniature receivers are unable to provide
sufficient high-frequency amplification or cannot do so without audible whistling and overtones caused
by feedback. For more severely impaired listeners, the inner hair cells that code these frequencies may
be absent or non-functioning. Frequency lowering techniques, including nonlinear frequency
compression (NFC), have been suggested as a means of re-introducing high-frequency speech cues to
these listeners.
Compared to other methods, NFC is unique in that the low-frequency spectrum below a programmable
start frequency is unaltered to help preserve signal quality. The high-frequency spectrum is compressed
toward the start frequency by an amount determined by the compression ratio (CR). CR corresponds
very closely with bandwidth reduction (i.e., reduction in spectral resolution) on a normal-hearing
Equivalent Rectangular Bandwidth (ERBN) scale (Moore, 2003). When implementing any frequency-
lowering algorithm, the upper frequency limit of aided audibility (the “max output frequency”) is critical
because it helps inform about the frequency range that should be targeted for lowering and about
where it can be moved. Because the NFC start frequency and CR both influence how frequencies are
remapped, there are infinite ways the unaidable high-frequency spectrum can be repackaged into the
audible range of the listener.
This project examines the perceptual tradeoffs that occur when trying to optimize the choice of NFC
start frequency and CR to fit a moderately-severe to profound high-frequency hearing loss and a mild to
moderate high-frequency loss. On the one hand, lower start frequencies might be detrimental for
phonemes that rely heavily on formant frequency, especially vowels. On the other hand, lower start
frequencies could be beneficial because a) they allow a greater amount of high-frequency information
to be lowered if CR is fixed, or b) they allow for lower CRs (less reduction in spectral resolution) if the
input bandwidth (the “max input” frequency) is fixed. Similarly, it is uncertain whether CR should be
kept low to maintain spectral resolution or should be increased so that a greater amount of high-
frequency information can be lowered into the range of audibility and whether this effect depends on
start frequency (e.g., lower CRs might be best for low start frequencies, but less important for high start
frequencies where formant frequencies are less critical).
Supported by NIDCD grant 1RC1DC010601-01
Alexander American Auditory Society Annual Meeting, March 8-10, 2012
2
Listeners
Group 1: Simulation of moderately-severe to profound high-frequency loss
14 (6 male, 8 female) listeners with sensorineural loss, ages 47-83 years (median = 70 years)
Average Thresholds
Freq. (Hz) 250 500 1000 2000 3000 4000 6000 8000
dB HL 17.1 22.5 28.9 37.5 47.1 55.7 68.2 68.6
Group 2: Mild to moderate high-frequency loss
13 (6 male, 7 female) listeners with sensorineural loss, ages 27-82 years (median = 62 years)
Average Thresholds
Freq. (Hz) 250 500 1000 2000 3000 4000 6000 8000
dB HL 22.3 24.2 28.5 40.4 44.6 48.8 52.3 50.4
Alexander American Auditory Society Annual Meeting, March 8-10, 2012
3
Hearing Aid Simulator
Nonlinear Frequency Compression (NFC)
Only the part of the input spectrum above the start frequency was subjected to NFC. The upper limit of
the input band used for NFC (denoted as “max input” and “BW” for input bandwidth) varied by
condition. The compression ratio (“CR”) was precisely set so that the max input frequency for NFC was
lowered to a max output frequency of 3273 Hz (Group 1) or 4996 Hz (Group 2).
NFC was carried out in MATLAB using techniques described by Simpson et al. (2005). Short-time fast-
Fourier transform segments (5.8 ms) were used to compute the instantaneous frequencies of the input.
Input frequencies targeted for frequency remapping were synthesized at lower output frequencies using
phase-vocoding, with overlap-and-add (Allen, 1977) being used for signal reconstruction. The processed
signal was recombined with appropriate delay with the unprocessed signal (the input signal low-pass
filtered at the start frequency). The combined signal was then subjected to wide dynamic range
(amplitude) compression.
Wide Dynamic Range Compression
To control output levels, wide dynamic range compression was simulated in MATLAB. The amplified
speech was presented monaurally via circumaural BeyerDynamic DT150 headphones. Using a transfer
function obtained on KEMAR, listeners’ audiometric thresholds were converted to estimated dB SPL at
the tympanic membrane. These values were entered in the DSL m(I/O) v5.0a algorithm for adults which
generated individualized prescriptive values for compression threshold and compression ratio for each
channel as well as target values for the real-ear aided responses. Gain was automatically tuned to
targets using the ‘carrot passage’ from Audioscan®.
Stimuli were scaled to 60 dB SPL, band pass filtered into 8 channels, and then processed with wide
dynamic range compression. Center and crossover frequencies were based on the recommendations of
the DSL algorithm: 315, 500, 800, 1250, 2000, 3150, 5000, and 8000. Channels beyond the max output
frequency were not amplified.
Output compression limiting was used to keep the output from exceeding recommended broadband
output limiting targets (BOLT) or 105 dB SPL, whichever was less. Signals were summed across channels
and subjected to a final stage of output compression limiting to control the final presentation level and
prevent peak clipping.
Alexander American Auditory Society Annual Meeting, March 8-10, 2012
4
Test Stimuli
Practice blocks using different talkers preceded each test block. Practice blocks had half the number of
talkers as the test blocks and included feedback about the correct response (test blocks did not).
• Consonants
– 240 nonsense syllables (vCv) presented in speech-shaped noise at 10 dB SNR
• 20 consonants x 3 vowel contexts (/a/, /i/, /u/) x 4 adult talkers (2 males, 2 females)
• Vowels
– 144 nonsense syllables (/hVd/) presented in speech-shaped noise at 5 dB SNR (Hillenbrand et
al., 1995)
• 12 vowels x 12 talkers (4 adult males, 4 adult females, 2 boys, 2 girls)
• Fricatives and Affricates
– 108 nonsense syllables (/iC/) presented in speech-shaped noise at 10 dB SNR
• 9 fricatives and affricates x 3 adult female talkers x 4 renditions
Conditions
There was 1 control condition with no NFC and 6 experimental conditions with NFC. All test conditions
were low-pass filtered at max output. A within-subjects, Latin Squares design was used. To help
orientate listeners to the tasks, all listeners had exposure to one session without any filtering or
processing (wideband) before beginning the randomized test sequence.
Moderately-severe to profound Mild to moderate
Results
For each of the figures below, proportion correct for each condition is assessed against performance for
the low-pass filtered controls (purple dotted line). Error bars indicate the 95% confidence interval of the
difference, after Bonferroni correction (* for p ≤ 0.05, ** for p ≤ 0.01, *** for p ≤ 0.001). “Best NFC
Setting” corresponds to the highest performance across the NFC conditions for each listener.
Alexander American Auditory Society Annual Meeting, March 8-10, 2012
5
Moderately-Severe to Profound Mild to Moderate
Within-subjects ANOVA indicated that start and BW and the
interaction were statistically significant. The effect of BW
depended on the start. There was no effect of BW (CR) for
the 2239-Hz start, but for the 1550-Hz start, performance for
the two larger input BWs (also, higher CRs) were significantly
worse than for the 4996-Hz input BW.
Within-subjects ANOVA indicated that start and BW were
both statistically significant. Performance for the smaller BW
(lower CRs) was significantly better than the larger BW
(higher CRs). Performance for the lowest start was
significantly worse than for the other two.
Within-subjects ANOVA indicated that start and BW and the
interaction were statistically significant. The effect of BW
depended on the start. There was no effect of BW (CR) for
the 2239-Hz start, but for the 1550-Hz start, performance for
the 9130-Hz input BW was significantly worse than for the
7063-Hz input BW.
Within-subjects ANOVA indicated that only start was
statistically significant. Performance for the lowest start was
significantly worse than for the other two.
Within-subjects ANOVA indicated that were no statistically
significant main effects or interaction. Within-subjects ANOVA indicated that only start was
statistically significant. Performance for the 1550-Hz start
was significantly worse than for the 2756-Hz start.
Alexander American Auditory Society Annual Meeting, March 8-10, 2012
6
Feature Analysis (VCV)
Plotted below are the differences in relative information transmitted for each condition and feature
compared to the low-pass filtered controls.
Moderately-Severe to Profound Mild to Moderate
/ʃ/ for /s/ Confusions
Moderately-Severe to Profound Mild to Moderate
A lower start substantially increased errors for place of
articulation and nasality, especially with larger BW (higher CRs). Interestingly, place of articulation and voicing were better for
the two higher starts despite no significant difference in
overall performance compared to the control. This indicates
that errors made with NFC were more systematic, while
errors for the control were more random.
Compared to the low-pass filtered control, /ʃ/ for /s/ errors
were substantially increased for all conditions, especially with
the lower start and the larger BWs (higher CRs).
/ʃ/ for /s/ errors are comparable to the low-passed filtered
control, except when the start is low and BW (CR) is large.
Alexander American Auditory Society Annual Meeting, March 8-10, 2012
7
Personal Best NFC Settings
Plotted for each set of test stimuli are the number of listeners who had their best performance at each
of the NFC conditions. Listeners participated in an additional 1-hour task that involved /s/-/ʃ/
discrimination for each of the NFC conditions. The conditions that yielded best performance on this task
did NOT predict the conditions that yielded best performance in the main experiment.
Moderately-Severe to Profound Mild to Moderate
Discussion
Overall, the results demonstrate that improvements in fricative/affricate identification should be
expected when using NFC for a variety of hearing losses. However, in some cases this might come at the
expense of a decrease in vowel and non-fricative consonant identification. The results also indicate that
low start frequencies should be avoided and that in cases where the bandwidth of audibility is
restricted, it is better to tradeoff an increase in CR (which reduces spectral resolution of the lowered
signal) for a higher start frequency. When this happens, a slightly lower CR can be maintained by
bringing less high-frequency information down into the range of audibility. This strategy can help
preserve vowel and non-fricative consonant identification. However, if the reduction in high-frequency
information is too great, fricative/affricate identification might not be optimized. In cases where the
bandwidth of audibility is less restricted, attempts should be made to keep the start frequency above
the range of most second formants. If this is done and if a sufficient amount of high-frequency
information is brought down, CR seems to be less important. Finally, attempts to identify the best NFC
setting on an individual basis using a /s/-/ʃ/ discrimination task or rules that simply maximize input
bandwidth are limited. Recommendations also need to consider the start frequency, the CR, and the
interaction between the two.