Deep Long Short-Term Memory Adaptive Beamforming Networks ... · Deep Long Short-Term Memory...

8
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Deep Long Short-Term Memory Adaptive Beamforming Networks for Multichannel Robust Speech Recognition Meng, Z.; Watanabe, S.; Hershey, J.R.; Erdogan, H. TR2017-012 March 2017 Abstract Far-field speech recognition in noisy and reverberant conditions remains a challenging prob- lem despite recent deep learning breakthroughs. This problem is commonly addressed by acquiring a speech signal from multiple microphones and performing beamforming over them. In this paper, we propose to use a recurrent neural network with long short-term memory (LSTM) architecture to adaptively estimate real-time beamforming filter coefficients to cope with non-stationary environmental noise and dynamic nature of source and microphones po- sitions which results in a set of timevarying room impulse responses. The LSTM adaptive beamformer is jointly trained with a deep LSTM acoustic model to predict senone labels. Further, we use hidden units in the deep LSTM acoustic model to assist in predicting the beamforming filter coefficients. The proposed system achieves 7.97% absolute gain over base- line systems with no beamforming on CHiME-3 real evaluation set. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c Mitsubishi Electric Research Laboratories, Inc., 2017 201 Broadway, Cambridge, Massachusetts 02139

Transcript of Deep Long Short-Term Memory Adaptive Beamforming Networks ... · Deep Long Short-Term Memory...

Page 1: Deep Long Short-Term Memory Adaptive Beamforming Networks ... · Deep Long Short-Term Memory Adaptive Beamforming Networks for Multichannel Robust Speech Recognition Meng, Z.; Watanabe,

MITSUBISHI ELECTRIC RESEARCH LABORATORIEShttp://www.merl.com

Deep Long Short-Term Memory Adaptive BeamformingNetworks for Multichannel Robust Speech Recognition

Meng, Z.; Watanabe, S.; Hershey, J.R.; Erdogan, H.

TR2017-012 March 2017

AbstractFar-field speech recognition in noisy and reverberant conditions remains a challenging prob-lem despite recent deep learning breakthroughs. This problem is commonly addressed byacquiring a speech signal from multiple microphones and performing beamforming over them.In this paper, we propose to use a recurrent neural network with long short-term memory(LSTM) architecture to adaptively estimate real-time beamforming filter coefficients to copewith non-stationary environmental noise and dynamic nature of source and microphones po-sitions which results in a set of timevarying room impulse responses. The LSTM adaptivebeamformer is jointly trained with a deep LSTM acoustic model to predict senone labels.Further, we use hidden units in the deep LSTM acoustic model to assist in predicting thebeamforming filter coefficients. The proposed system achieves 7.97% absolute gain over base-line systems with no beamforming on CHiME-3 real evaluation set.

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy inwhole or in part without payment of fee is granted for nonprofit educational and research purposes provided that allsuch whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi ElectricResearch Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and allapplicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall requirea license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved.

Copyright c© Mitsubishi Electric Research Laboratories, Inc., 2017201 Broadway, Cambridge, Massachusetts 02139

Page 2: Deep Long Short-Term Memory Adaptive Beamforming Networks ... · Deep Long Short-Term Memory Adaptive Beamforming Networks for Multichannel Robust Speech Recognition Meng, Z.; Watanabe,
Page 3: Deep Long Short-Term Memory Adaptive Beamforming Networks ... · Deep Long Short-Term Memory Adaptive Beamforming Networks for Multichannel Robust Speech Recognition Meng, Z.; Watanabe,
Page 4: Deep Long Short-Term Memory Adaptive Beamforming Networks ... · Deep Long Short-Term Memory Adaptive Beamforming Networks for Multichannel Robust Speech Recognition Meng, Z.; Watanabe,
Page 5: Deep Long Short-Term Memory Adaptive Beamforming Networks ... · Deep Long Short-Term Memory Adaptive Beamforming Networks for Multichannel Robust Speech Recognition Meng, Z.; Watanabe,
Page 6: Deep Long Short-Term Memory Adaptive Beamforming Networks ... · Deep Long Short-Term Memory Adaptive Beamforming Networks for Multichannel Robust Speech Recognition Meng, Z.; Watanabe,
Page 7: Deep Long Short-Term Memory Adaptive Beamforming Networks ... · Deep Long Short-Term Memory Adaptive Beamforming Networks for Multichannel Robust Speech Recognition Meng, Z.; Watanabe,
Page 8: Deep Long Short-Term Memory Adaptive Beamforming Networks ... · Deep Long Short-Term Memory Adaptive Beamforming Networks for Multichannel Robust Speech Recognition Meng, Z.; Watanabe,