Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate...

22
Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate Sensorineural Impairment [Ref.: N. Tiwari, P. C. Pandey, P. N. Kulkarni, Proc. Interspeech 2012, Portland, Oregon, 9-13 Sept 2012, Paper 689]

Transcript of Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate...

  • Slide 1
  • Slide 2
  • Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate Sensorineural Impairment [Ref.: N. Tiwari, P. C. Pandey, P. N. Kulkarni, Proc. Interspeech 2012, Portland, Oregon, 9-13 Sept 2012, Paper 689]
  • Slide 3
  • [email protected] 12345 Outline 1.Introduction 2.Signal processing 3.Real-time implementation 4.Results 5.Conclusion 1.Introduction 2.Signal processing 3.Real-time implementation 4.Results 5.Conclusion 2/24
  • Slide 4
  • [email protected] 12345 3/24 Signal processing for reducing the effects of increased intra-speech spectral masking Binaural hearing (moderate bilateral loss) Binaural dichotic presentation (Lunner et a l. 1993, Kulkarni et al. 2012) Monaural hearing Spectral contrast enhancement (Yang et al. 2003) Errors in identifying peaks, increases dynamic range >> degraded speech perception Multi-band frequency compression (Arai et al. 2004, Kulkarni et al. 2012) Multi-band frequency compression Presentation of speech energy in relatively narrow bands to avoid masking by adjacent spectral components, and without causing perceptible distortions. Signal processing for reducing the effects of increased intra-speech spectral masking Binaural hearing (moderate bilateral loss) Binaural dichotic presentation (Lunner et a l. 1993, Kulkarni et al. 2012) Monaural hearing Spectral contrast enhancement (Yang et al. 2003) Errors in identifying peaks, increases dynamic range >> degraded speech perception Multi-band frequency compression (Arai et al. 2004, Kulkarni et al. 2012) Multi-band frequency compression Presentation of speech energy in relatively narrow bands to avoid masking by adjacent spectral components, and without causing perceptible distortions. 1. Introduction
  • Slide 5
  • [email protected] 12345 Research objective Implementation of multi-band frequency compression for real-time operation for use in hearing aids Reduction in computational requirement for implementing on a DSP chip without speech quality degradation Low algorithmic and computational delay for real-time implementation on a DSP board with16-bit fixed-point processor Research objective Implementation of multi-band frequency compression for real-time operation for use in hearing aids Reduction in computational requirement for implementing on a DSP chip without speech quality degradation Low algorithmic and computational delay for real-time implementation on a DSP board with16-bit fixed-point processor 4/24
  • Slide 6
  • [email protected] 12345 2. Signal Processing Signal processing for multi-band frequency compression Processing steps Segmentation and spectral analysis Spectral modification Resynthesis Multi-band frequency compression (Arai et al. 2004) Processing using auditory critical bandwidth Compressed magnitude & original phase spectrum >> Complex spectrum Multi-band frequency compression (Kulkarni et al. 2009, 2012) Investigations using different bandwidth, segmentation & frequency mapping Processing using complex spectrum Reduced computation and processing artifacts Signal processing for multi-band frequency compression Processing steps Segmentation and spectral analysis Spectral modification Resynthesis Multi-band frequency compression (Arai et al. 2004) Processing using auditory critical bandwidth Compressed magnitude & original phase spectrum >> Complex spectrum Multi-band frequency compression (Kulkarni et al. 2009, 2012) Investigations using different bandwidth, segmentation & frequency mapping Processing using complex spectrum Reduced computation and processing artifacts 5/24
  • Slide 7
  • [email protected] 12345 Multi-band frequency compression (Kulkarni et al. 2009, 2012) Signal processing Segmentation using analysis window with 50 % overlap fixed-frame (FF) segmentation : window length L = 20 ms pitch-synchronous (PS) segmentation: L = 2 pitch periods / prev. L (voiced / unvoiced) Zero padding >> N -point frame Complex spectra by N -point DFT Analysis bands for compression fixed bandwidth one-third octave auditory critical band (ACB) Mappings for spectral modification sample-to-sample spectral sample superposition spectral segment Resynthesis by overlap-add method Multi-band frequency compression (Kulkarni et al. 2009, 2012) Signal processing Segmentation using analysis window with 50 % overlap fixed-frame (FF) segmentation : window length L = 20 ms pitch-synchronous (PS) segmentation: L = 2 pitch periods / prev. L (voiced / unvoiced) Zero padding >> N -point frame Complex spectra by N -point DFT Analysis bands for compression fixed bandwidth one-third octave auditory critical band (ACB) Mappings for spectral modification sample-to-sample spectral sample superposition spectral segment Resynthesis by overlap-add method 6/24 Frequency mapping with compression factor = 0.6 as shown by Kulkarni et al. 2012
  • Slide 8
  • [email protected] 12345 Details of signal processing Analysis band for compression 18 bands based on ACB Mapping for spectral modification Spectral segment mapping No irregular variations in spectrum Resynthesis Resynthesis by N -point IDFT Reconstruction by overlap-add method Details of signal processing Analysis band for compression 18 bands based on ACB Mapping for spectral modification Spectral segment mapping No irregular variations in spectrum Resynthesis Resynthesis by N -point IDFT Reconstruction by overlap-add method 7/24 Spectral Segment Mapping as shown by Kulkarni et al. 2012 (a) Unprocessed /i/ BB noise (b) Processed Unprocessed and processed ( =0.6) spectra of vowel /i/ and broad band nois e
  • Slide 9
  • [email protected] 12345 8/24 Evaluation and results Modified Rhyme Test (MRT) Best results with PS segmentation, ACB, spectral segment mapping & compression factor of 0.6 Normal hearing subjects with loss simulated by additive noise 17 % improvement in recognition scores 0.88 s decrease in response time Subjects with moderate sensorineural loss 16.5 % improvement in recognition scores 0.89 s decrease in response time Evaluation and results Modified Rhyme Test (MRT) Best results with PS segmentation, ACB, spectral segment mapping & compression factor of 0.6 Normal hearing subjects with loss simulated by additive noise 17 % improvement in recognition scores 0.88 s decrease in response time Subjects with moderate sensorineural loss 16.5 % improvement in recognition scores 0.89 s decrease in response time
  • Slide 10
  • [email protected] 12345 9/24 Real-time implementation requirements Comparison of processing schemes with different segmentation FF processing: Fixed-frame segmentation Discontinuities masked by 50 % overlap-add Perceptible distortions in form of another superimposed pitch related to shift duration PS Processing: Pitch-synchronous segmentation Less perceptible distortions Pitch estimation (large computational requirement) Not suitable for music Modified segmentation scheme for real-time implementation Lower perceptual distortions Lower computational requirement Real-time implementation requirements Comparison of processing schemes with different segmentation FF processing: Fixed-frame segmentation Discontinuities masked by 50 % overlap-add Perceptible distortions in form of another superimposed pitch related to shift duration PS Processing: Pitch-synchronous segmentation Less perceptible distortions Pitch estimation (large computational requirement) Not suitable for music Modified segmentation scheme for real-time implementation Lower perceptual distortions Lower computational requirement
  • Slide 11
  • [email protected] 12345 Modified multi-band frequency compression (LSEE processing) Griffin and Lims method of signal estimation from modified STFT Minimize mean square error between estimated and modified STFT Multiplication by analysis window before overlap-add Window requirement For partial overlap only a few windows meet the requirement Fixed window length L and shift S, with S = L / 4 (75% overlap) Modified Hamming window where p = 0.54 and q = -0.46 Modified multi-band frequency compression (LSEE processing) Griffin and Lims method of signal estimation from modified STFT Minimize mean square error between estimated and modified STFT Multiplication by analysis window before overlap-add Window requirement For partial overlap only a few windows meet the requirement Fixed window length L and shift S, with S = L / 4 (75% overlap) Modified Hamming window where p = 0.54 and q = -0.46 10/24
  • Slide 12
  • [email protected] 12345 Comparison of analysis-synthesis techniques for multi-band frequency compression Pitch-synchronous implementation LSEE method based implementation Evaluation Informal listening Objective evaluation using PESQ measure (0 4.5) Test material Sentence Where were you a year ago? from a male speaker, vowels /a/-/i/-/u/, music Results PESQ score (compression factor range 1 0.6): 4.3 3.7 (sentence/ vowels) Comparison of analysis-synthesis techniques for multi-band frequency compression Pitch-synchronous implementation LSEE method based implementation Evaluation Informal listening Objective evaluation using PESQ measure (0 4.5) Test material Sentence Where were you a year ago? from a male speaker, vowels /a/-/i/-/u/, music Results PESQ score (compression factor range 1 0.6): 4.3 3.7 (sentence/ vowels) 11/24 Investigations using offline implementation a) b) c) d) Comparison of analysis-synthesis methods: (a) unprocessed, (b) FF, (c) PS, and (d) LSEE. Processing with spectral segment mapping, auditory critical bandwidth, = 0.6.
  • Slide 13
  • [email protected] 12345 Input Signal Fixed-frame MFCPitch-synchronousFixed-frame LSEE = 1 = 0.8 = 0.6 = 1 = 0.8 = 0.6 = 1 = 0.8 = 0.6 Vowels /a i u / Where were you a year ago? Music clip Conclusion LSEE processing suited for non-speech and audio applications 12/24 Processed outputs from off-line implementation
  • Slide 14
  • [email protected] 12345 13/24 3. Real-time Implementation 16-bit fixed point DSP: TI/TMS320C5515 16 MB memory space : 320 KB on-chip RAM with 64 KB DARAM, 128 KB on-chip ROM Three 32-bit programmable timers, 4 DMA controllers each with 4 channels FFT hardware accelerator (8 to 1024-point FFT) Max. clock speed: 120 MHz DSP Board: eZdsp 4 MB on-board NOR flash for user program Codec TLV320AIC3204: stereo ADC & DAC, 16/20/24/32-bit quantization, 8 192 kHz sampling Development environment for C: TI's 'CCStudio, ver. 4.0
  • Slide 15
  • [email protected] 12345 14/24 Implementation details ADC & DAC quantization: 16-bit Sampling rate: 10 kHz FFT length N : 1024 / 512 Analysis-synthesis : LSEE-based fixed frame, 260-sample (26 ms) window, 75% overlap Mapping for spectral modification: spectral segment mapping with auditory critical bands Input samples, spectral values & processed output: 32-bits each with 16-bit real and imaginary parts Implementation details ADC & DAC quantization: 16-bit Sampling rate: 10 kHz FFT length N : 1024 / 512 Analysis-synthesis : LSEE-based fixed frame, 260-sample (26 ms) window, 75% overlap Mapping for spectral modification: spectral segment mapping with auditory critical bands Input samples, spectral values & processed output: 32-bits each with 16-bit real and imaginary parts Signal acquisition & processing
  • Slide 16
  • [email protected] 12345 15/24 Data transfer and buffering operations (S = L/4) DMA cyclic buffers 5 block input buffer 2 block output buffer (each of S samples) Four pointers current input block current output block just-filled input block write-to output block (Pointers incremented cyclically on DMA interrupt) Efficient realization of 75 % overlap & zero padding Total delay Algorithmic delay: L = 4 S samples (26 ms) Computational delay: L /4 = S samples (6.5 ms) Data transfer and buffering operations (S = L/4) DMA cyclic buffers 5 block input buffer 2 block output buffer (each of S samples) Four pointers current input block current output block just-filled input block write-to output block (Pointers incremented cyclically on DMA interrupt) Efficient realization of 75 % overlap & zero padding Total delay Algorithmic delay: L = 4 S samples (26 ms) Computational delay: L /4 = S samples (6.5 ms)
  • Slide 17
  • [email protected] 12345 Comparison of proc. output from offline-implementation & DSP board Test material Sentence Where were you a year ago?, vowels /a/-/i/-/u/, music Evaluation methods Informal listening PESQ measure (0 4.5) Results Lowest clock for satisfactory operation: 20 MHz / 12 MHz ( N = 1024/512) No perceptible change in output with N = 512, processing capacity used 1/10 of the capacity with highest clock (120 MHz) Informal listening: Processed output from DSP similar to corresponding output from offline implementation PESQ scores (compression factor range 0.6 1) For sentence 2.5 3.4 & for vowels 3.0 3.4 Total processing delay: 35 ms Comparison of proc. output from offline-implementation & DSP board Test material Sentence Where were you a year ago?, vowels /a/-/i/-/u/, music Evaluation methods Informal listening PESQ measure (0 4.5) Results Lowest clock for satisfactory operation: 20 MHz / 12 MHz ( N = 1024/512) No perceptible change in output with N = 512, processing capacity used 1/10 of the capacity with highest clock (120 MHz) Informal listening: Processed output from DSP similar to corresponding output from offline implementation PESQ scores (compression factor range 0.6 1) For sentence 2.5 3.4 & for vowels 3.0 3.4 Total processing delay: 35 ms 16/24 4. Results
  • Slide 18
  • [email protected] 12345 Input Signal Fixed-frame LSEE MFC = 1 = 0.8 = 0.6 Vowels /a/-/i/-/u/ Where were you a year ago? Music clip 17/24 Processed outputs from DSP board
  • Slide 19
  • [email protected] 12345 Multi-band frequency compression using LSEE processing -- Processed speech output perceptually similar to that from earlier established pitch- synchronous processing. -- Suitable for speech as well as non-speech audio signals. -- Well-suited for real-time processing due to use of fixed-frame segmentation. Implementation for real-time operation using 16-bit fixed-point processor TI/TMS320C5515: used-up processing capacity 1/10, delay = 35 ms Further work Combining multi-band frequency compression with other types of processing for use in hearing aids Frequency-selective gain Multi-band dynamic range compression (settable gain and compression ratios) Noise reduction CVR modification Multi-band frequency compression using LSEE processing -- Processed speech output perceptually similar to that from earlier established pitch- synchronous processing. -- Suitable for speech as well as non-speech audio signals. -- Well-suited for real-time processing due to use of fixed-frame segmentation. Implementation for real-time operation using 16-bit fixed-point processor TI/TMS320C5515: used-up processing capacity 1/10, delay = 35 ms Further work Combining multi-band frequency compression with other types of processing for use in hearing aids Frequency-selective gain Multi-band dynamic range compression (settable gain and compression ratios) Noise reduction CVR modification 18/24 5. Conclusion
  • Slide 20
  • [email protected] 12345 THANK YOU
  • Slide 21
  • [email protected] 12345 20/ Abstract Widening of auditory filters in persons with sensorineural hearing impairment leads to increased spectral masking and degraded speech perception. Multi-band frequency compression of the complex spectral samples using pitch-synchronous processing has been reported to increase speech perception by persons with moderate sensorineural loss. It is shown that implementation of multi-band frequency compression using fixed-frame processing along with least-squares error based signal estimation reduces the processing delay and the speech output is perceptually similar to that from pitch-synchronous processing. The processing is implemented on a DSP board based on the 16-bit fixed-point processor TMS320C5515, and real-time operation is achieved using about one-tenth of its computing capacity.
  • Slide 22
  • [email protected] 12345 21/ References [1]H. Levitt, J. M. Pickett, and R. A. Houde (eds.), Senosry Aids for the Hearing Impaired. New York: IEEE Press, 1980. [2]B. C. J. Moore, An Introduction to the Psychology of Hearing, London, UK: Academic, 1997, pp 66107. [3]J. M. Pickett, The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and Technology. Boston, Mass.: Allyn Bacon, 1999, pp. 289323. [4]H. Dillon, Hearing Aids. New York: Thieme Medical, 2001. [5]T. Lunner, S. Arlinger, and J. Hellgren, 8-channel digital filter bank for hearing aid use: preliminary results in monaural, diotic, and dichotic modes, Scand. Audiol. Suppl., vol. 38, pp. 7581, 1993. [6]P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, Binaural dichotic presentation to reduce the effects of spectral masking in moderate bilateral sensorineural hearing loss, Int. J. Audiol., vol. 51, no. 4, pp. 334344, 2012. [7]T. Baer, B. C. J. Moore, and S. Gatehouse, Spectral contrast enhancement of speech in noise for listeners with sensorineural hearing impairment: effects on intelligibility, quality, and response times, Int. J. Rehab. Res., vol. 30, no. 1, pp. 4972, 1993. [8]J. Yang, F. Luo, and A. Nehorai, Spectral contrast enhancement: Algorithms and comparisons, Speech Commun., vol. 39, no. 12, pp. 3346, 2003. [9]T. Arai, K. Yasu, and N. Hodoshima, Effective speech processing for various impaired listeners, in Proc. 18th Int. Cong. Acoust., 2004, Kyoto, Japan, pp. 13891392. [10]K. Yasu, M. Hishitani, T. Arai, and Y. Murahara, Critical-band based frequency compression for digital hearing aids, Acoustical Science and Technology, vol. 25, no. 1, pp. 61-63, 2004. [11]P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, Multi-band frequency compression for reducing the effects of spectral masking, Int. J. Speech Tech., vol. 10, no. 4, pp. 219227, 2009. [12]P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, Multi-band frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss, Speech Commun., vol. 54, no. 3 pp. 341350, 2012. [13]E. Zwicker, Subdivision of the audible frequency range into critical bands (Freqenzgruppen), J. Acoust. Soc. Am., vol. 33, no. 2, pp. 248, 1961. [14]D. G. Childers and H. T. Hu, Speech synthesis by glottal excited linear prediction, J. Acoust. Soc. Am., vol. 96, no. 4, pp. 20262036, 1994.
  • Slide 23
  • [email protected] 12345 22/ [15]D. W. Griffin and J. S. Lim, Signal estimation from modified short-time Fourier transform, IEEE Trans. Acoustics, Speech, Signal Proc., vol. 32, no. 2, pp. 236-243, 1984. [16]X. Zhu, G. T. Beauregard, and L. L. Wyse, Real-time signal estimation from modified short-time Fourier transform magnitude spectra, IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 5, pp. 16451653, 2007. [17]Texas Instruments Inc., TMS320C5515 Fixed-Point Digital Signal Processor. 2011, [online] Available: http://focus.ti.com/ lit/ds/symlink/tms320c5515.pdf. [18] Texas Instruments Inc., TLV320AIC3204 Ultra Low Power Stereo Audio Codec. 2008, [online] Available: http://focus. ti.com/lit/ds/symlink/tlv320aic3204.pdf.