SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

16
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. Grais Hakan Erdogan 17 th International Conference on Digital Signal Processing,2011

description

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS. Emad M. Grais. Hakan Erdogan. 17 th International Conference on Digital Signal Processing,2011. Jain- De,Lee. Outline. INTRODUCTION NON-NEGATIVE MATRIX FACTORIZATION - PowerPoint PPT Presentation

Transcript of SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Page 1: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Jain-De,Lee

Emad M. Grais Hakan Erdogan

17th International Conference on Digital Signal Processing,2011

Page 2: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Outline INTRODUCTION

NON-NEGATIVE MATRIX FACTORIZATION

SIGNAL SEPARATION AND MASKING

EXPERIMENTS AND DISCUSSION

CONCLUSION

Page 3: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Introduction There are two main stages of this work

– Training stage– Separation stage

Using NMF with different types of masks to improve the separation process

– The separation process faster– NMF with fewer iterations

Page 4: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Introduction Problem formulation

– The observe a signal x(t) ,which is the mixture of two sources s(t) and m(t)

– Assume the sources have the same phase angle as the mixed

),(),(),( ),(),(),(

),(),(),(ftMjftSjftXj eftMeftSeftX

ftMftSftX

Where (t , f) be the STFT of x(t)

X = S + M

Page 5: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Non-negative Matrix Factorization

Non-negative matrix factorization algorithm

Minimization problem

Different cost functions C of NMF– Euclidean distance– KL divergence

BWV

),(min,

BWVCWB

subject to elements of B,W 0≧

Page 6: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Non-negative Matrix Factorization

The magnitude spectrogram S and M are calculated by NMF

Larger number of basis vectors– Lower approximation error– Redundant set of basis– Require more computation time

musicmusicTrain

speechspeechTrain

WBM

WBS

Page 7: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Signal Separation and Masking

The NMF is used decompose the magnitude spectrogram matrix X

The initial spectrograms estimates for speech and music signals are respectively calculated as follows

WBBX musicspeech ][

Mmusic

Sspeech

WBM

WBS

~

~

Where WS and WM are submatrices in matrix W

Page 8: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Signal Separation and Masking

Use the initial estimated spectrograms and to build a mask as follows

Source signals reconstruction

S~ M~

PP

P

MSSH ~~~

XHM

XHS

)1(ˆ

ˆ

Where 1 is a matrix of ones is element-wise multiplication

Page 9: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Signal Separation and Masking

Two specific values of p correspond to special masks– Wiener filter(soft mask)

– Hard mask

22

2

~~~

MSSHWiener

)~~~

(22

2

MSSroundH hard

Page 10: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Signal Separation and Masking

The value of the mask versus the linear ratio for different values of p

Page 11: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Experiments and Discussion Simulation

– 16kHz sampling rate– Speech

• Training speech data-540 short utterances• Testing speech data-20 utterances

– Music• 38 pieces for training• one piece for testing

– Hamming window-512 point– FFT size-512 point

Page 12: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Experiments and Discussion

Page 13: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Experiments and Discussion

Page 14: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Experiments and Discussion

Page 15: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Experiments and Discussion

Page 16: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Conclusion The family of masks have a parameter to control the

saturation level

The proposed algorithm gives better results and facilitates to speed up the separation process