Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

106
Comparison of Single Channel Blind Dereverberation Methods for Speech Signals Deha Deniz Türköz - MSc Thesis Thesis Supervisor: Hakan Erdoğan Sabancı Üniversitesi 27.06.2016

Transcript of Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Page 1: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Comparison of Single Channel Blind Dereverberation Methods

for Speech Signals

Deha Deniz Türköz - MSc ThesisThesis Supervisor: Hakan Erdoğan

Sabancı Üniversitesi27.06.2016

Page 2: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

OUTLINE1) Introduction2) Background

a) Features of speechb) Reverberation modelc) Room impulse response (RIR)d) Non-negative matrix factorization (NMF)

3) Blind-Dereverberation Methodsa) Delayed linear prediction (DLP)b) Weighted prediction error (G-WPE)c) Laplacian based Weighted Prediction Error (L-WPE) d) NMF based spectral modeling (NMF+N-CTF)e) Sparsity penalized weighted least squares method (SPWLS)

4) Experiments and Comparisons

5) Discussion and Conclusion2

Page 3: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

1. Introduction

3

Page 4: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

1. IntroductionReverberation:

● is an effect occurs on speech data due to reflections through walls,

● decreases speech intelligibility,

● degrades applications such as ASR, hands-free teleconferencing,

● can be modeled with an LTI filter.

4

Page 5: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

● If filter, h is known, then clean signal,s can be recovered with a simple deconvolution operation called dereverberation.

● For most cases h & s are unknowns and x is the only known parameter. Predicting h & s from x is called “Blind-dereverberation problem” which is the main subject of this work.

1. Introduction

5

Page 6: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Aim of this work is to compare the existing blind-dereverberation methods

○ DLP: delayed linear prediction,○ G-WPE: Gaussian based weighted prediction error, ○ L-WPE: Laplacian based based weighted prediction

error,○ NMF+N-CTF: NMF based spectral-temporal modeling

and offer a new algorithm called

○ SPWLS: sparsity penalized weighted least squares.

1. Introduction

6

Page 7: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2. Background

a. Features of Speech

7

Page 8: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2a. Features of Speech● Speech is a signal created

through human vocal system.● Input of vocal tract is called

glottal signal:○ White noise,○ Impulse train

● Vocal tract system can be modeled as all-pole filter

means speech production is a simple LTI filtering operation of a glottal signal.

8

Page 9: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2A. Features of Speech

● Speech signals are non-stationary.● General approach: divide signal into small time segments,

assume each of them are stationary. ● To analyze speech: short-time Fourier transform (STFT)● STFT: divides speech signal into overlapping segments

called frames by using a window filter. Calculates DFT of these frames

9

Page 10: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2A. Features of SpeechFormulation of STFT:

L: frame shift,

N:frame size,

X(n,k): discrete STFT coefficients of speech signal x[m] at frame n.

W[m]: Hamming window10

Page 11: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2A. Features of Speech

● STFT of signal is interpreted as a matrix having complex DFT coefficients at columns.

11

Page 12: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2A. Features of Speech● To visualize signal’s frequency changes with respect to

time: spectrogram● Spectrogram, S(n,k) uses power spectral domain (PSD)

measures of STFT matrix, X(n,k) as intensity values in an 2D image:

12

Page 13: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2. Background

b. Reverberation Model

13

Page 14: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2b. Reverberation Model● Reverberation environment can be modeled as an LTI filter

which is called room impulse response (RIR).● Reverberation model:

h(t): RIR, unknown

s(t): clean signal (anechoic signal), unknown

x(t): reverberated signal (echoed signal), known

14

Page 15: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2b. Reverberation ModelReverberation effect on spectrogram:

15

Page 16: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2. Background

c. Room Impulse Response (RIR)

16

Page 17: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2c. Room Impulse Response (RIR)

The length of RIR depends on

● Room size,● Room temperature,● Room shape,● Microphone’s distance to the speech source,● Absorption of sound in room,

: time required for reflected signal to drop by 60 dB level

● RIR shows FIR filter characteristic.

17

Page 18: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2c. Room Impulse Response (RIR)Usually RIR is divided into two parts:

1. Early reverberation

2. Late reverberation: the most detrimental part of echo

n(t): noise

d(t): early echo + clean signal (desired signal)

r(t): Late echo

Lh: the length of RIR

h(t): RIR, (earl echo + late echo) 18

Page 19: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2c. Room Impulse Response (RIR)Then, early and late reverberations are

D: the length of early reverberation

19

Page 20: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2c. Room Impulse Response (RIR)

20

Page 21: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2. Background

d. Non-negative Matrix Factorization (NMF)

21

Page 22: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2d. Non-negative Matrix Factorization (NMF)NMF: decomposition a V matrix as production of two matrices B and G with non-negative entries.

B: basis or dictionary matrix, G: weight or gains matrix.

● This problem can be interpreted as an optimization problem as follows:

where C is the cost function for measuring the distance between V and BG

22

Page 23: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2d. Non-negative Matrix Factorization (NMF)● Columns of B are called basis vectors, ● Number of B matrix columns are kept smaller than the size

of V,● Iterative algorithms are utilized to solve the NMF

problem, since there is no unique solution.● Initial B & G matrices can be randomized positive numbers

or supervised matrices for fast convergence. ● Popular iterative methods to formulate distance function

between V and BG are:○ Euclidean distance, ○ Kullback-Leibler distance (KL),○ Itakuro-Saito distance method (IS).

23

Page 24: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2d. Non-negative Matrix Factorization (NMF)Kullback-Leibler divergence between V and BG and defined as [6]:

where “1” is the matrix of ones, has the

same size of V

24

Page 25: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

2d. Non-negative Matrix Factorization (NMF)

● NMF is a non-convex algorithm and have multiple local minimums. As a result, B and G can vary for the same V matrix.

● NMF is a common method used in speech processing, deep learning, clustering, and computer vision.

● In speech processing, NMF has applications for Audio-Source Separation, source/filter model, blind-dereverberation [3][4], speech denoising and so on.

25

Page 26: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3. Blind-Dereverberation Methods

a. Delayed linear prediction (DLP)

26

Page 27: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3a. Delayed Linear Prediction(DLP)We denote time-domain signals x(t), s(t), h(t) as respectively.

STFT-domain signal notations are , for x(n,k), s(n,k), h(n,k) respectively.

Then,

27

Page 28: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3a. Delayed Linear Prediction (DLP)● DLP estimates inverse filter coefficients from

reverberated signal.● inverse filter of length Lw, can be used to

approximately obtain a dereverberated signal as:

● In matrix form, reverberation can be formulated as

28

Page 29: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3a. Delayed Linear Prediction (DLP)

29

Page 30: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3a. Delayed Linear Prediction (DLP)

● means desired signal can be estimated by only using reverberated signal and its past samples.

● Then, the inverse filter is● The number of zeros in the inverse filter vector is equal

to D, delay.● In conclusion, DLP algorithm is a simple technique to

achieve dereverberation.● it may not work well in most cases. Reason is having an

FIR filter as the inverse filter.

30

Page 31: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3. Blind-Dereverberation Methods

b. Weighted prediction error (G-WPE)

31

Page 32: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3b. Weighted prediction error (g-wpe)Assumption 1: speech signal has local Gaussian distribution for small frames with length Lf,

Assumption 2: samples are mutually uncorrelated after a certain distance,

Assumption 3: variance is constant for short-time frames with size Lf.

32

Page 33: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3b. Weighted prediction error (g-wpe)● Dereverberation can be done both in time domain and in

STFT domain,● Using time domain is very costly, because of having quite

big matrices, so STFT domain will be used.● Probability density function of desired signal in STFT

domain,

n:frame number, k:frequency bin, : time-varying variance

Then,

33

Page 34: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3b. Weighted prediction error (g-wpe)● Variance values alter only with respect to time frames

Thus,

● Apply likelihood maximization to Gaussian pdf. Then, log likelihood function for dereverberation process in STFT domain becomes:

Parameter vector for likelihood maximization: 34

Page 35: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3b. Weighted prediction error (g-wpe)

Maximizing the equation with respect to parameter vector, cannot be achieved analytically and there is no closed form solution for this equation. Thus, an iterative algorithm is needed.

35

Page 36: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3b. Weighted prediction error (g-wpe)Two step procedure has been proposed in [1] to solve Likelihood maximization problem.

1. Keep constant and solve for to maximize likelihood, then obtain ;

2. Keep constant and update

and so on until a convergence criterion satisfied or a maximum number of iterations completed

36

Page 37: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3b. Weighted prediction error (g-wpe)

37

Page 38: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3. Blind-Dereverberation Methods

c. Laplacian based weighted linear prediction (L-WPE)

38

Page 39: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3c. Laplacian based weighted prediction ERROR (L-WPE)L-WPE in [2] suggests that speech can be modeled more precisely with a Laplacian model rather than a Gaussian model in STFT domain.

● Assumption 1: speech signal has local Laplacian distribution for small frames with length Lf,

● Assumption 2: represent STFT coefficients of the desired signal, for each time-frequency bin with an equal variance, for independent imaginary and real parts.

39

Page 40: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3c. Laplacian based weighted prediction ERROR (L-WPE)Then, pdf of the Laplacian Model is

Likewise to G-WPE method, maximum likelihood estimation(ML) will be utilized for parameter vector, . Then, likelihood function:

40

Page 41: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3c. Laplacian based weighted prediction ERROR (L-WPE)No closed formulation for likelihood function. Thus, solve it numerically.

1. Keep constant and solve for to maximize likelihood (or minimize l1 norm), then obtain

2. Keep constant and update

Step1: fix & update

Likelihood function can be rewritten in terms of as

41

Page 42: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3c. Laplacian based weighted prediction ERROR (L-WPE)

Thus, likelihood function can be written as:

42

Page 43: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3c. Laplacian based weighted prediction ERROR (L-WPE)Then, problem can be interpreted as a linear programming problem as:

43

Page 44: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3c. Laplacian based weighted prediction ERROR (L-WPE)Step 2: fix & update

After calculating log likelihood and calculating its maximum with respect to variable , closed form solution for variance becomes:

● These two steps will proceed until a convergence criterion is satisfied or maximum number of iterations has been reached.

44

Page 45: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3c. Laplacian based weighted prediction ERROR (L-WPE)

45

Page 46: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

46

3. Blind-Dereverberation Methods

d. NMF based spectral modeling (NMF+N-CTF)

Page 47: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3d. NMF based spectral modeling (NMF+N-CTF)● The method in [3] is a combined version of non-negative

convoluted transfer function (N-CTF) model and non-negative matrix factorization (NMF).

● N-CTF model assumption: for each frequency bin, the power spectrogram of STFT coefficient matrices of clean speech signal & RIR convolution gives the reverberated signal’s power spectrogram of STFT coefficient matrix.

,

47

Page 48: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3d. NMF based spectral modeling (NMF+N-CTF)Assumptions:

● Phase elements of the at different frames are mutually independent

● Zero-mean random variable with Gaussian distribution● Clean signal & RIR spectral coefficients are mutually

independent.

For simplicity, set , likewise for s(n,k) and h(n,k). (different than other methods)

48

Page 49: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3d. NMF based spectral modeling (NMF+N-CTF)Kullback-Leibler (KL) divergence will be used to estimate power spectrogram of s(n,k) from previous eqn. As:

Where,

: estimated power spectrogram of reverberated signal

49

Page 50: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3d. NMF based spectral modeling (NMF+N-CTF)

To acquire more accurate estimation, the sparsity of clean speech spectrogram can be added as a regularization term with weight .

As a non-negativity constraint, are expected to be greater than zero.

50

Page 51: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3d. NMF based spectral modeling (NMF+N-CTF)

This model can be solved as an iterative learning method as:

51

Page 52: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3d. NMF based spectral modeling (NMF+N-CTF)Let’s add NMF approach:

The clean speech magnitude spectrogram S can be formulated as the production of a dictionary matrix B and a weight matrix G.

Where,

R: the number of basis vectors in the dictionary matrix B, dictionary size; R<N (s frame size)

52

Page 53: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3d. NMF based spectral modeling (NMF+N-CTF)After combination of method N-CTF and NMF, problem definition becomes:

Approach: keep two fixed, update one in order until a convergence criterion has been succeeded or maximum number of iteration has been reached

53

Page 54: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3d. NMF based spectral modeling (NMF+N-CTF)

54

Page 55: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3d. NMF based spectral modeling (NMF+N-CTF)● To remove scale ambiguity, after each iteration each

columns of B is normalized to sum to one ● The columns of H are element-wise divided by the first

column of H.● The nature of RIR consists of decaying impulses.

● Mapping coefficient matrix, between clean speech signal and reverberated speech signal can be formulated as:

where,55

Page 56: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

56

Page 57: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3d. NMF based spectral modeling (NMF+N-CTF)

● Initializations of basis, B and weight, G matrices are conducted with randomized non-negative numbers for online method.

● B & G can be initialized with supervised matrices to increase efficiency.

● In this work, we employ online method.

57

Page 58: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3. Blind-Dereverberation Methods

e. Sparsity penalized weighted least squares method (SPWLS)

58

Page 59: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3e. Sparsity penalized weighted least squares method (SPWLS)❖ SPWLS combines the idea of variance normalization with a

weight matrix and the sparsity property of speech spectrogram matrices.

❖ To provide sparsity of a variable, generally norm regularization is used.

❖ With regularization, optimization problem, also known as Lasso problem, requires an iterative algorithm to solve.

❖ Some popular algorithms to solve Lasso problem are➢ ISTA (iterative shrinkage and threshold algorithm) [7]➢ FISTA ➢ SALSA

59

Page 60: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3e. Sparsity penalized weighted least squares method (SPWLS)Convolution equation (in STFT domain with fixed frequency k) can be rewritten in matrix form as:

Then, with regularization term for sparsity, we need to solve the Lasso problem:

n: noise signal, s: clean speech signal, x: reverberated signal,

H: convolution matrix of RIR.60

Page 61: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3e. Sparsity penalized weighted least squares method (SPWLS)● Add weights to the problem as in L-WPE and G-WPE method.● Add an extra regularization on the norm of the filter h to

make sure that not getting a trivial solution. ● Our optimization loss function becomes:

where,

: regularization parameter, W: diagonal weight matrix with 1/(std) values

: the target norm for filter h,

k: freq. Index (fixed), n: frame index

61

Page 62: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3e. Sparsity penalized weighted least squares method (SPWLS)● Problem is non-differentiable at its local minimum.● s & h need to be calculated numerically with an iterative

approach.● Our approach requires a good initialization for s & h

which can be obtained from an earlier method such as G-WPE.

● Our approach: Performing alternating updates of s and h that would minimize the objective function with respect to the corresponding variable.

● For updating s & h, ISTA algorithm is utilized.

62

Page 63: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3e. Sparsity penalized weighted least squares method (SPWLS)ISTA: minimizes functions like f(s)+g(s) where the first function is differentiable and the second function is usually not differentiable, but simple.

Step 1 to update s: Take a gradient descent step in the direction of the first function f(.):

(i: iteration index)

The result is an intermediate solution.

● If we calculate the gradient of the first function f(.):

63

Page 64: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3e. Sparsity penalized weighted least squares method (SPWLS) : positive step size parameter, indicates the amount that we move along the negative gradient.

Step 2 to update s: A proximal operator step of g(.) is performed around that intermediate solution as follows:

Proximal step corresponds to a thresholding/shrinkage operation for the norm penalty:

Basically, this step erases the components with small energy and shrinks the other parts. (a = for our algorithm) 64

Page 65: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3e. Sparsity penalized weighted least squares method (SPWLS)● After the update of s, we update W matrix according to

new variance values of s.

Now, we need to solve problem for h. Update h according to:

● Use ISTA again:

Step 1 to update h: minimizer for f(.), simple least-square problem with exact solution:

65

Page 66: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

3e. Sparsity penalized weighted least squares method (SPWLS)Step 2 to update h: Proximal operation step for the regularization of h

● Step size parameter, for the inner gradient descent descent iteration for s can be set to change for each iteration as

Where are hyperparameters and is the initial step size, are the inner and outer iteration indices. 66

Page 67: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

67

Page 68: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

4) Experiments & Comparisons

68

Page 69: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

TEST DATAExperiment 1: 3 male & 3 female (clean) voices convolved with 6 different RIR samples with 30 dB and 60 dB additive noises (for DLP, G-WPE, NMF+N-CTF, SPWLS methods)

72 different samples have been dereverberated.

Experiment 2: 1 male and 1 female (clean) voices convolved with 5 different RIR samples and added 30 dB and 60 dB additive noises. (for all methods)

20 different samples have been dereverberated.

● Test data has been taken from \Reverb Challenge" data set. 69

Page 70: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

TEST DATA● Sampling frequency was 16KHz same for all files.● RIR times (RT60) were 0.17, 0.11, 0.95, 0.33, 0.54, 0.35s

respectively● L-WPE method was not performed with the RT60 = 0.95s only

due to excessive run time.● As additive noise, a cafe environment noise with 30 dB and

60 dB levels has been used.

70

Page 71: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

setup● Number of delayed frame size, D was set to 3 frames for

G-WPE, L-WPE and DLP methods,● Lf , number of frames used for variance calculations is

set to 1 frame for G-WPE, L-WPE and SPWLS methods,● Iteration number for G-WPE, L-WPE and SPWLS methods is

set to 5,● Iteration number for NMF+N-CTF method is set to 100.● STFT parameters: hop size =10ms, window size =30ms.● Minimum variance to avoid zero divisions,v = 1e(-6)● Number of STFT frames used to predict signal changes with

respect to RT60 estimates of internal compiling.

71

Page 72: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

setupSPWLS parameters specific to this method are

● step size, = 1E-7, ● ISTA regularization parameter = 1E5, ● inner iteration number for ISTA i =10, ● ISTA regularization parameter for filter =10.● SPWLS initialization for RIR, H is set as the output of

G-WPE method.

NMF+N-CTF method has

● dictionary matrix size \ndict" as 100. ● Method uses online method.

72

Page 73: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Computational effıciency● All the algorithms are implemented in MATLAB on a

computer with an Intel Xeon CPU, 2.5GHz.● the fastest one is SPWLS method. Then, G-WPE, DLP,

NMF+N-CTF and L-WPE come in order.● L-WPE is very slow due to linear programming (LP) part

inside. CVX tool for Matlab is utilized for LP part.● Compiling times of data with RT60= 0.54 s :

○ L-WPE, ~one day○ NMF+N-CTF ~1.5hour (with 100 iter#, 100 ndict)○ G-WPE ~4mins (5 iterations)○ SPWLS ~2mins (5 iterations)○ DLP ~3mins (1 iteration) - implemented with Levinson-Durbin algorithm

73

Page 74: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test Methods ● Accuracy of the dereverberation process is calculated with average

cepstral distortion (CD) test over short time frames.● Popular method to measure speech quality measure between clean signal and

reconstructed signal.

: clean speech signal cepstral coeffs from 1th to 12th order

: estimated speech signal's cepstral coeffs 1th to 12th order.

: Zero order coeff, denotes the power spectrum envelope in dB.

● CD between similar signals converges to 0. ● Our aim is to keep CD as small as possible after dereverberation process.

74

Page 75: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test Methods● STOI, short-time objective intelligibility measure: For

short-time frames, STOI compares the temporal envelopes of the clean and dereberberated speech in terms of correlation coefficients.

● PESQ, Perceptual Evaluation of Speech Quality: common standardized test method for speech quality measure. 3 types of PESQ measure is applied.

● Signal to noise (SNR) ratio test between clean signal and dereverberated signal.

● Segmented SNR (segSNR): SNR results for short time frames.

75

Page 76: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test results - iteration# (experiment 2 - for 20 files)

76

Page 77: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test results - iteration# (experiment 2 - for 20 files)

77

Page 78: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test results - iteration# (experiment 2 - for 20 files)

78

Page 79: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test results - iteration# (experiment 2 - for 20 files)

79

Page 80: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test results - iteration# (experiment 2 - for 20 files)

80

Page 81: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test results - iteration# (experiment 2 - for 20 files)

81

Page 82: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test results - iteration# (experiment 2 - for 20 files)

82

Page 83: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test results - NMF+N-CTF Method (experiment 2 - for 20 files)

83

Page 84: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test results - NMF+N-CTF Method (experiment 2 - for 20 files)

84

Page 85: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test results - NMF+N-CTF Method (experiment 2 - for 20 files)

85

Page 86: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test results - NMF+N-CTF Method (experiment 2 - for 20 files)

86

Page 87: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test results - NMF+N-CTF Method (experiment 2 - for 20 files)

87

Page 88: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test results - NMF+N-CTF Method (experiment 2 - for 20 files)

88

Page 89: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test results - NMF+N-CTF Method (experiment 2 - for 20 files)

89

Page 90: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Spectrogram results OF DEREVERBERATED Sıgnals

90

Page 91: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Spectrogram results OF DEREVERBERATED Sıgnals

91

Page 92: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Spectrogram results OF DEREVERBERATED Sıgnalsiter# =1

92

Page 93: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Spectrogram results OF DEREVERBERATED Sıgnalsiter# =5

93

Page 94: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Spectrogram results OF DEREVERBERATED Sıgnalsiter# =5

94

Page 95: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Spectrogram results OF DEREVERBERATED Sıgnalsiter# =100

95

Page 96: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

NUMERICAL RESULTSiter# =5

96

Page 97: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test results - Average

97

Page 98: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

Test results - Average

98

Page 99: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

NUMERICAL RESULTS (For long RIR with RT60 = 0.54s results)

99

Page 100: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

NUMERICAL RESULTS - NMF+N-CTF Method ndict= dictionary matrix size , #iter = number of iterations

NNCTF1 ndict = 100 & #iter= 100,

NNCTF2 ndict = 500 & #iter= 200,

NNCTF3 ndict= 1000 & #iter= 200,

NNCTF4 ndict= 1000 & #iter= 400,

NNCTF5 ndict= 1000 & #iter= 240.

100

Page 101: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

NUMERICAL RESULTS - NMF+N-CTF Method

101

Page 102: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

102

Listen to the results

Page 103: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

5) DISCUSSION & CONCLUSION

103

Page 104: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

DISCUSSION & CONCLUSION● The best test results belongs to L-WPE method.● In terms of time efficiency and test results, G-WPE works better,

could work better with real time applications.● L-WPE algorithm is much more complex than G-WPE because of linear

programming part. Thus, it works very slow.● NMF+N-CTF results

○ converging,○ test results are not as good as proposed in paper,○ method could perform better with a good initialization or

supervised dictionary matrix. ○ Increasing dictionary size has good effects on test results, but

Iteration number does not always improve them.○ No phase information.

104

Page 105: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

DISCUSSION & CONCLUSION● L-WPE was slower, G-WPE was faster than DLP for one iteration.● SPWLS could not show good performance for CD. To improve the

performance, more constraints can be set for h. In SPWLS, we are trying to eliminate the whole echo, not only late as in G-WPE, L-WPE & DLP. Also, step size might be decreased.

● SPWLS shows promises due to time efficiency, SNR and PESQ results.

● Spectrogram results show that L-WPE and G-WPE are successfully managing eliminating late reverberant parts.

● DLP is just utilized to make comparisons with L-WPE and G-WPE methods, since they rooted from DLP method. As expected L-WPE and G-WPE are better.

105

Page 106: Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

REFERENCES[1] Nakatani, Tomohiro, et al. "Speech dereverberation based on variance-normalized delayed linear prediction." IEEE transactions on audio, speech, and language processing 18.7 (2010): 1717-1731.

[2] Jukić, Ante, and Simon Doclo. "Speech dereverberation using weighted prediction error with Laplacian model of the desired signal." 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014.

[3] Mohammadiha, Nasser, Paris Smaragdis, and Simon Doclo. "Joint acoustic and spectral modeling for speech dereverberation using non-negative representations." 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015.

[4] Mohammadiha, Nasser, and Simon Doclo. "Speech dereverberation using non-negative convolutive transfer function and spectro-temporal modeling."IEEE/ACM Transactions on Audio, Speech, and Language Processing 24.2 (2016): 276-289.

[5] Selesnick, Ivan. "Introduction to sparsity in signal processing." Connexions(2012).

[6] Lee, Daniel D., and H. Sebastian Seung. "Algorithms for non-negative matrix factorization." Advances in neural information processing systems. 2001.

[7] Combettes, Patrick L., and Jean-Christophe Pesquet. "Proximal splitting methods in signal processing." Fixed-point algorithms for inverse problems in science and engineering. Springer New York, 2011. 185-212.

106