Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using...

71
Speech Enhancement Based on Adaptive Line Enhancer Research Thesis Aviva Atkins April 7 th , 2020 Supervised by Prof. Israel Cohen

Transcript of Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using...

Page 1: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Speech Enhancement Based on Adaptive Line Enhancer

Research ThesisAviva Atkins

April 7th, 2020

Supervised by Prof. Israel Cohen

Page 2: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Outline▪ Introduction

▪ The problem researched

▪ The challenges

▪ Research contributions

▪ Adaptive Line Enhancer background ▪ Convention fixed step size

▪ Mutual Information approach

▪ Proposed method

▪ Conclusions and future research

Sound test

Page 3: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Noise is Everywhere!

Interference

Source

ReverberationEcho

Additive noise

Page 4: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Speech Enhancement

Page 5: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Applications

Page 6: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Harmonic noise▪ Contains deterministic sinusoidal components

Page 7: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

The problem researched

Reducing nonstationary harmonic noise from a speech signal recorded with a single microphone

Source

Nonstationary HarmonicAdditive noise

Page 8: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

The challenges▪ Single channel – only the noisy signal is available with no access to additional reference signals and no spatial information

→only intrinsic properties of speech or noise can be used

▪ The vast majority of methods require an estimate of the noise spectrum▪ When the noise is stationary it can be estimated during segments when speech is absent

▪ When the noise is nonstationary it needs to be tracked continuously

→it is more difficult to estimate nonstationary noise

▪ Trade-off between noise reduction to speech distortion

▪ The developed method needs to be relevant for real-time applications

Page 9: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Research Contributions

▪ Introduced a filtering method based on the frequency domain Adaptive Line Enhancer, that enables better reduction of nonstationary harmonic noise.▪ Proposed the combined filter – a combination of the commonly-used forward adaptive linear

filter and a non-causal backward adaptive linear filter used together, increasing the reduction span of the noise transient

▪ Applied the filter based on a comparison to the noisy spectrum, reducing noise overestimation

▪ Applied the filter based on a noise presence indicator for better speech preservation

▪ Employed a set of filter lengths, to ensure the combined filter spans throughout the noise transient

Page 10: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Additional contributions▪Investigated a statistical model as an alternative to the Decision Directed for the a-priori SNR estimator and showed that it can eliminate the musical noise while compromising between signal distortion and noise reduction.

▪ Introduced a beamformer that enables fine tuning of the compromise between Directivity Factor and White Noise Gain, through a simple computationally-efficient algorithm.

Page 11: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Why use Adaptive Line Enhancer?▪ Exploits the structure of the harmonic noise

▪ Simple with low computational cost

▪ Modifies both magnitude and phase so has the potential to improve on signal intelligibility and not just quality

Page 12: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Adaptive Noise Canceller (ANC)

+

Adaptive filter

Signal Source

Noise Source

Primary Input

Output𝑥 𝑛 + 𝑣(𝑛)

𝑣0(𝑛) ො𝑣 (𝑛)

𝑒(𝑛)

Adaptive algorithm

ො𝑥 (𝑛)

ො𝑥 = 𝑥 + 𝑣 − ො𝑣

𝑚𝑖𝑛𝐸 ො𝑥2 = 𝐸 𝑥2 +𝑚𝑖𝑛𝐸 𝑣 − ො𝑣 2

𝑚𝑖𝑛𝐸 𝑥 − ො𝑥 2 = 𝑚𝑖𝑛𝐸 𝑣 − ො𝑣 2

ො𝑣 = 𝑣, ො𝑥 = 𝑥Ideal case:

+𝑑(𝑛)+𝑑(𝑛)

+𝑥0(𝑛)

?Reference Input

distortion

Page 13: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Adaptive Line Enhancer (ALE)

+

Adaptive filter

Signal Source

Noise Source

Output𝑦 𝑛= 𝑥 𝑛 + 𝑣(𝑛)

𝑣(𝑛) 𝑧 (𝑛)

𝑒(𝑛)

Adaptive algorithm

ො𝑥 (𝑛)𝑥(𝑛)

−z

Input

ො𝑥 (𝑛)Output’

Signal decorrelatedNoise correlated

Noise decorrelatedsignal correlated

Page 14: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Adaptive Line Enhancer (ALE)

+

Adaptive filter

Signal Source

Noise Source

Output𝑦 𝑛= 𝑥 𝑛 + 𝑣(𝑛)

𝑣(𝑛) 𝑧 (𝑛)

𝑒(𝑛)

Adaptive algorithm

ො𝑥 (𝑛)𝑥(𝑛)

−z

Input

X(𝑘,𝑚)𝑌 𝑘,𝑚= 𝑋 𝑘,𝑚 + 𝑉 𝑘,𝑚

𝑉 𝑘,𝑚 𝑍 𝑘,𝑚

𝐸 𝑘,𝑚 𝑋 𝑘,𝑚

TDFD

Page 15: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

𝑋 𝑘,𝑚

𝑍 𝑘,𝑚

𝐸 𝑘,𝑚𝑌 𝑘,𝑚= 𝑋 𝑘,𝑚 + 𝑉 𝑘,𝑚

𝑉 𝑘,𝑚

X(𝑘,𝑚)

Adaptive Line Enhancer (ALE)

+

Adaptive filter

Signal Source

Noise Source

Output

Adaptive algorithm

−z

Input

𝜇

( ) ( ) ( )−= mkmkmkZ H ,,, yh

( ) ( ) ( ) TL mkHmkHmk ,,...,,, 10 −=h

( ) ( ) ( ) TLmkYmkYmk 1,,...,,, +−−−=− y ( ) ( )( ) ( )

( )

+−

−+=+

2

*

,

,,,1,

mk

mkmkEmkmk

y

yhhNLMS:

FD

𝜏

Page 16: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Conventional Fixed Step Size Example

For the conventional fixed step size, it is difficult to both reduce the noise and maintain high quality of the enhanced signal

Frame Index Frame Index Frame Index

Freq

. In

dex

Freq

. In

dex

Freq

. In

dex

(a) Clean signal (b) Noisy signal (c) Enhanced signal

3,1 == L

Page 17: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Mutual Information ApproachTaghia, J., Martin, R., 2016, “A frequency-domain adaptive line enhancer with step-size control based on mutual information for harmonic noise reduction” IEEE Trans. Audio Speech Lang. Process.

▪ Frequency dependent step size, detecting harmonic noise presence per frequency

▪ Based on Mutual Information (MI)

▪ Step size: ( ) ( )kQk 0=

( )( )

= =

else,0

if,1K

1k

2

thr

P IkIQconstant0

( )( )

( )

=

*

*

kkkI

kkkIk

total

P

Page 18: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

MI Approach Example

Frame Index Freq. [KHz]

Freq

. In

dex(b) Noisy signal (c) MI Step Size

1=Q

𝜇

Page 19: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

MI Approach Example

Frame Index Frame Index

Freq

. In

dex

Freq

. In

dex

(a) Clean signal (c) Enhanced signal - MI

Frame Index

Freq

. In

dex

(b) Enhanced signal – fixed step size

Page 20: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

MI Approach ▪ Implemented in block-wise manner

▪ Assumption: stationarity of the noise is at least as large as the block length

▪ They take block length of 3 seconds

Taghia, J., Martin, R., 2016, “A frequency-domain adaptive line enhancer with step-size control based on mutual information for harmonic noise reduction” IEEE Trans. Audio Speech Lang. Process.

The assumption does not hold for highly non-stationary signals, such as the heart monitor beeping

Decision block often zero for highly non-stationary signals, such as the heart monitor beepingSpectrogram of 3.4s long

heart monitor beeping

Page 21: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

MI Approach Example Non-stationary

Frame Index Freq. [KHz]

Freq

. In

dex

(a) Noisy signal (b) MI Step Size

0=Q

𝜇

Page 22: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

MI Approach Example Non-stationary

Frame Index Frame Index

Freq

. In

dex

Freq

. In

dex

(a) Clean signal (c) Enhanced signal - MI

Frame Index

Freq

. In

dex

(b) Noisy signal

ignoredQ

Page 23: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

?

Non-Stationary noise – filter output estimate

+

Adaptive filter

Signal Source

Noise Source

Output𝑦 𝑛= 𝑥 𝑛 + 𝑣(𝑛)

𝑣(𝑛) 𝑧 (𝑛)

𝑒(𝑛)

Adaptive algorithm

𝑥(𝑛)

−z

Input

?

= ො𝑣(𝑛)

ො𝑥 (𝑛)

Page 24: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Experimental Setup▪ Clean speech: 20 different speech signals from different speakers from TIMIT database (0.5M/0.5F)

▪ Sampled @ 16KHz

▪ SNR range [0,20] dB

▪ STFT, overlap-add

▪ Noise: 26 different non-stationary harmonic noise signals, e.g., heart monitor beeping, train door beeping, house alarm, railroad crossing bells.

Page 25: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Correlation

𝛄𝑋 𝑘,𝑚, 𝜏 =𝐸 𝑋 𝑘,𝑚 𝐱∗ 𝑘,𝑚 − 𝜏

𝐸 𝑋 𝑘,𝑚 2

𝛄V 𝑘,𝑚, 𝜏 =𝐸 𝑉 𝑘,𝑚 𝐯∗ 𝑘,𝑚 − 𝜏

𝐸 𝑉 𝑘,𝑚 2

[Frames]

= ො𝑣(𝑛)

1 Frame = 32ms

Page 26: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Proposed Approach▪ Combined filter (CMLNLMS):

𝐸𝑐 𝑘,𝑚 =

𝐸𝑏 𝑘,𝑚 + 𝐿 , 𝐸𝑏 𝑘,𝑚 + 𝐿 2 ≤ 𝐸𝑓 𝑘,𝑚2𝑎𝑛𝑑 𝐸𝑏 𝑘,𝑚 + 𝐿 2 ≤ 𝑌 𝑘,𝑚 2

𝐸𝑓 𝑘,𝑚 , 𝐸𝑏 𝑘,𝑚 + 𝐿 2 > 𝐸𝑓 𝑘,𝑚2𝑎𝑛𝑑 𝐸𝑓 𝑘,𝑚 + 𝐿

2≤ 𝑌 𝑘,𝑚 2

𝑌 𝑘,𝑚 , 𝑒𝑙𝑠𝑒

F B

C

Page 27: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Proposed Approach▪ Harmonic noise presence detector for better speech preservation

▪ Set of filters with changing length, until maximal filter length L, based on the available amount of noise samples

𝐼 𝑘, 𝑚 = ቊ1 𝑉 𝑘,𝑚 ∈ ℋ0

0 𝑉 𝑘,𝑚 ∈ ℋ1

FB

CC

Page 28: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Performance Measures▪ Distortion Index

▪ Noise reduction Factor

▪ Perceptual Evaluation of Speech Quality (PESQ) ITU-T P.862.2

▪ The Short-Time Objective Intelligibility (STOI)

𝑣𝑠𝑑

𝜉𝑛𝑟

Page 29: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Transient Reduction

Frame Index

NR

R [

dB

]

3,3 == L

thresholdindicator dB25−

5.0=

Better noise reduction which leads to improved

, PESQ, and STOI levels for the combined filter

𝜉𝑛𝑟

Page 30: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Step Size▪ An appropriate selection of the step size is required

▪ Fixed step size

Frame Index

𝑣𝑠𝑑

NR

R [

dB

]

𝜏5.0=

?const,MImax

[Frames]

Page 31: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

PES

Q

STO

I

𝑣𝑠𝑑

𝑑𝐵

𝜉 𝑛𝑟𝑑𝐵

5.0=

thresholdindicator dB25−

𝜏 [Frames] 𝜏 [Frames]

𝜏 [Frames] 𝜏 [Frames]

Combined & MI-Combined show better results than MI

Page 32: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

PES

Q

STO

I

𝑣𝑠𝑑

𝑑𝐵

5.0=

thresholdindicator dB25−

𝜏 [Frames]

𝜏 [Frames] 𝜏 [Frames]𝜉 𝑛

𝑟𝑑𝐵

𝜏 [Frames]

shortL

1=

Recommendation:

Combined & MI-Combined show better results than MI

Page 33: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Noise Presence Indicator

3,1 == L

Page 34: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Experimental Results Summary

3,1 == L

thresholdindicator dB25−

Page 35: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Frame Index

Frame Index

Freq

. In

dex

Freq

. In

dex

(a) Clean signal

(c) Enhanced signal - MI

Frame Index

Freq

. In

dex

(b) Noisy signal

(d) Enhanced signal - Combined

Freq

. In

dex

Frame Index

Page 36: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Conclusions▪ Introduced the combined filter

▪ Parameter selection

▪ Noise presence indicator impact

▪ Improved results compared to other methods

Page 37: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Future Research▪Noise presence indicator implementation

▪ Residual noise at transient edges

▪ Deep Learning approach for noise reduction

Page 38: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:
Page 39: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Speech Enhancement Using ARCH model▪ We investigate the use of the autoregressive conditional heteroscedasticity (ARCH) model as a replacement for the well-known Decision-Directed estimator by Epharim and Malah

▪ We employ three sound quality measures: speech distortion, noise reduction and musical noise, and explain the effect the ARCH model parameters have on these measures.

▪ We demonstrate that the ARCH model achieves better results than the decision-directed for some of these measures, while compromising between the speech distortion and noise reduction.

Page 40: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Problem Formulation▪ Let 𝑌ℓ 𝑘 = 𝑋ℓ 𝑘 + 𝐷ℓ 𝑘 denote an observed noisy speech signal in the STFT domain.

▪ Given an error function between the clean signal and its estimate, the spectral enhancement problem can be formulated as

𝑋ℓ 𝑘 = argmin 𝑋𝐸 𝑒 𝑋ℓ 𝑘 , 𝑋 𝑘 |𝑌0 𝑘 ,… , 𝑌ℓ′ 𝑘

▪ We consider the casual case ℓ ≤ ℓ′ and the LSA error function

𝑒LSA 𝑋ℓ 𝑘 , 𝑋ℓ 𝑘 = log 𝑋ℓ 𝑘 − log 𝑋ℓ 𝑘2

Page 41: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Problem Formulation▪ The estimate is obtained by applying a spectral gain to each noisy spectral component:

𝑋ℓ 𝑘 = 𝐺LSA 𝜉ℓ|ℓ′ ∙ 𝑌ℓ

where the a-priori and a-posteriori SNRs are defined, respectively, by:

𝜉ℓ|ℓ′ ≜𝜆ℓ|ℓ′

𝜎ℓ2 , 𝛾ℓ ≜

𝑌ℓ2

𝜎ℓ2

𝜎ℓ2 = 𝐸 𝐷ℓ

2 denotes the short-term spectrum of the noise, and

𝜆ℓ|ℓ′ = 𝐸 𝑋ℓ2|𝑌0 𝑘 ,… , 𝑌ℓ′ 𝑘 denotes the short-term spectrum of the speech

signal.

Page 42: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Decision-DirectedY. Ephraim and D. Malah, “Speech enhancement using a minimum mean square error short-time spectral amplitude estimator,“ IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP-32, pp. 1109-1121, December 1984

▪ Over the past decades, the decision-directed (DD) approach has become the acceptable estimation method for the a-priori SNR

መ𝜉ℓ|ℓ = max 𝛼𝑋ℓ−1

2

𝜎ℓ2 + 1 − 𝛼 𝑃 𝛾ℓ − 1 , 𝜉min

where 𝑃 𝑥 = 𝑥 if 𝑥 ≥ 0 and 𝑃 𝑥 = 0 otherwise.

▪ The decision-directed approach is not supported by a statistical model.

▪ 𝛼 and 𝜉min have to be determined by simulations.

▪ 𝛼 and 𝜉min are fixed constants and are not adapted to the speech components.

Page 43: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

ARCH Model▪ The GARCH (generalized autoregressive conditional heteroscedasticity) model is extensively used in financial applications where it is necessary to model time varying volatility while taking into account heavy tailed behavior and volatility clustering.

▪ Recently 1 , it was proposed to use the GARCH for statistically modeling the speech signals in the STFT domain, as they show these two characteristics.

▪ In this work, we investigate the use of a simplified case of the GARCH, the ARCH model. We explain the effect that the ARCH model parameters have on commonly used performance measures and compare it to the decision-directed estimator.

[1] I. Cohen, “Modeling speech signals in the time frequency domain using GARCH,” Signal Processing, vol. 84 (12), pp. 2453–2459, 2004.

Page 44: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

ARCH ModelWe use a two-step estimator, to recursively update the estimate of the conditional a-priori SNR as new data arrives.

Given an estimate of መ𝜉ℓ|ℓ−1 and a new noisy spectral component 𝑌ℓ

Update step:

መ𝜉ℓ|ℓ = 𝐸 ฬ𝑋ℓ

2

𝜎ℓ2

መ𝜉ℓ|ℓ−1, 𝑌ℓ

Using ARCH(1), propagate the a-priori SNR to obtain the one-frame-ahead a priori SNR,

Propagation step:

መ𝜉ℓ|ℓ−1 = 𝜅 + 𝜇 መ𝜉ℓ−1|ℓ−1, 𝜅 > 0, 0 ≤ 𝜇 < 1

Page 45: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

ARCH Model▪ Solving for the update step we get: መ𝜉ℓ|ℓ = 𝐺𝑆𝑃

2 መ𝜉ℓ|ℓ−1, 𝛾ℓ ∙ 𝛾ℓ

where 𝐺𝑆𝑃 𝜉ℓ|ℓ′, 𝛾ℓ =𝜉ℓ|ℓ′

𝜉ℓ|ℓ′+1

1

𝛾ℓ+

𝜉ℓ|ℓ′

𝜉ℓ|ℓ′+1

▪ Employing some algebra, we can write:መ𝜉ℓ|ℓ = 𝛼ℓ መ𝜉ℓ|ℓ−1 + 1 − 𝛼ℓ 𝛾ℓ − 1

where 𝛼ℓ = 1 −𝜉ℓ|ℓ−1

𝜉ℓ|ℓ−1+1

2

, 𝛼ℓ ∈ 0,1

▪ Note the similarity of form to the decision-directed but with a time-varying frequency-dependent weighting factor 𝛼ℓ.

Page 46: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

ARCH Model▪ Since the a-priori SNRs need to be equal to 𝜉min when speech is absent, we obtain a condition on 𝜅, 𝜅 = 1 − 𝜇 𝜉min

▪ Using ARCH(1) we have two parameters 𝜉min and 𝜇:

Propagation step: መ𝜉ℓ|ℓ−1 = 1 − 𝜇 𝜉min + 𝜇 መ𝜉ℓ−1|ℓ−1,

Update step: መ𝜉ℓ|ℓ = 𝛼ℓ መ𝜉ℓ|ℓ−1 + 1 − 𝛼ℓ 𝛾ℓ − 1 , where

𝛼ℓ = 1 −𝜉ℓ|ℓ−1

𝜉ℓ|ℓ−1+1

2

, 𝛼ℓ ∈ 0,1

Page 47: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Distortion and NRRWe employ three performance measures commonly used for the quality assessment of a speech enhancement algorithm.

The first two are easily understood when we express the estimated signal as

𝑋ℓ = 𝐺 𝜉ℓ|ℓ′, 𝛾ℓ 𝑋ℓ + 𝐺 𝜉ℓ|ℓ′, 𝛾ℓ 𝐷ℓ = 𝑋𝑓𝑑 +𝐷𝑟𝑛

Speech distortion:

𝐽𝑋 ≜ 𝐸 log 𝑋ℓ 𝑘 − log 𝐺 𝜉ℓ|ℓ′, 𝛾ℓ 𝑋ℓ2

Noise Reduction Ratio (NRR):

NRR ≜𝐸 𝐷ℓ

2

𝐸 𝐺 𝜉ℓ|ℓ′,𝛾ℓ 𝐷ℓ2

Page 48: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Musical noise via higher order statisticsThe attenuated noise will be composed of isolated spectral components, also known as tonal components.

The amount of tonal components can be quantified by the kurtosis;

kurtosis = Τ𝜇4 𝜇22, where 𝜇𝑚 is the 𝑚th order moment of the signal.

As we are interested in the amount of tonal components caused by the processing, we use the ratio of the kurtosis before and after the processing:

LKR ≜ log10kurtosisproc

kurtosisorf

which is evaluated on noise only frames. The LKR increases as the musical noise increases, and the absence of musical noise corresponds to LKR of zero and below.

Page 49: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Musical noise via higher order statisticsAnalytical calculation of the kurtosis ratio requires the use of a specific noise reduction method or assumptions about the statistical spectral components. Here, we use the sample kurtosis:

kurtosis =1

𝐿σℓ=0𝐿

1

𝑁σ𝑘=0𝑁−1 𝐷ℓ(𝑘)

2− 𝐷ℓ(𝑘)2

4

1

𝑁σ𝑘=0𝑁−1 𝐷ℓ(𝑘)

2− 𝐷ℓ(𝑘)2

2 2

Where 𝐷ℓ(𝑘) 2= 1

𝑁σ𝑘=0𝑁−1 𝐷ℓ(𝑘)

2

Page 50: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Experimental Setup▪ Speech signals: 20 different utterances from 20 different speakers, sampled at 16 kHz and degraded by white Gaussian noise with SNRs in the range [0,20]dB.

▪ The noisy signals are transformed to the time-frequency domain using STFT, with 75% overlapping Hamming analysis windows of 32ms length.

▪ The evaluation of the musical noise was done separately on a complex white Gaussian noise in the time-frequency domain, to emulate performance in noise only frames.

Page 51: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Experimental Setup

Comparison of decision-directed (solid lines) and ARCH (dashed lines) estimators for 5dB SNR: (a) Distortion, (b) NRR and, (c) LKR, with varying 𝛼(upper axis) and 𝜇(lower axis) respectively per estimator, and 𝜉min of -20dB (square), -15dB (circle), and for decision-directed method only 𝜉min = 0 (triangle).

Page 52: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Experimental Setup

We get the expected decision-directed behavior

Page 53: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Experimental Setup

For the ARCH estimator, increasing the value of 𝜇 decreases the distortion

Page 54: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Experimental Setup

When 𝜇 increases also the NRR decreases.

The lower we take the noise floor 𝜉min , the more noise reduction we get.

Page 55: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Experimental Setup

The musical noise mainly depends on the noise floor 𝜉min . Lower 𝜉min means higher 𝛼ℓ , resulting in a smoother a-priori SNR around 𝜉min , thus reducing the musical noise.

Page 56: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Experimental Setup

For the decision-directed estimator we have to compromise between the amount of distortion and amount of musical noise, while for the ARCH estimator, the musical noise can be eliminated by choosing an appropriate value of 𝜉min . However, for the ARCH estimator we need to compromise between the amount of distortion and the amount of residual noise.

Page 57: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

ConclusionsResults summary:

▪ We presented the use of the ARCH estimator, which is based on a statistical model.

▪ We explained the effect the ARCH model parameters have on three commonly used quality measures.

▪ We demonstrated that the ARCH model can achieve better results than the decision-directed, while compromising between the speech distortion and noise reduction.

Future work:

▪ We used the ARCH(1) model for the a-priori SNR estimator, which is a special case of the GARCH(0,1). It would be interesting to expand the model to a full GARCH(p,q) model and conduct a similar analysis, to understand if the full general model could provide additional advantages

Page 58: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Robust Superdirective Beamformer with Optimal Regularization▪ We introduce an optimal beamformer design that facilitates a compromise between high directivity and low white noise amplification.

▪ The proposed beamformer involves a regularization factor, whose optimal value is determined using a simple and efficient one-dimensional search algorithm.

▪ Simulation results demonstrate controlled tuning of various gain properties of the desired beamformer, and improved performance compared to a competing method.

Page 59: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Signal Model and Array Setup▪ We consider a plane wave, in the farfield, impinging on an array at angle 𝜃

▪ Uniform linear microphone array of 𝑀 sensors, with distance 𝛿between them

▪ The desired signal 𝑋(𝜔) propagates from 𝜃 = 0 (endfire)

▪ Neglecting the propagation attenuation, the observed signal is

𝐲 𝜔 = 𝐝 𝜔, 𝜃 𝑋 𝜔 + 𝐯(𝜔)

where 𝐝 𝜔, 𝜃 is the steering vector, and 𝐯(𝜔) is the additive noise vector.

𝐝 𝜔, 𝜃 = 1 𝑒−𝑗𝜔 cos 𝜃𝜏0 ⋯ 𝑒−𝑗 𝑀−1 𝜔 cos 𝜃𝜏0 𝑇, 𝜏0 =𝛿

𝑐

Page 60: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Signal Model and Array Setup▪ For the endfire direction 𝐝 𝜔 = 𝐝 𝜔, 0

▪ Applying a complex linear filter 𝐡 𝜔 , the estimated signal is

Z 𝜔 = 𝐡𝐻 𝜔 𝐲 𝜔 = 𝐡𝐻 𝜔 𝐝 𝜔 𝑋 𝜔 + 𝐡𝐻 𝜔 𝐯(𝜔)

▪ The beamformer is distortionless when 𝐡𝐻 𝜔 𝐝 𝜔 = 1

Page 61: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Performance measures▪ Taking the first microphone as reference, we define the input and output SNR

iSNR ω =𝜙𝑋(𝜔)

𝜙𝑉1(𝜔)oSNR ω =

𝜙𝑋(𝜔)

𝜙𝑉1(𝜔)×

𝐡𝐻 𝜔 𝐝 𝜔2

𝐡𝐻 𝜔 𝚪𝐯 𝜔 𝐡 𝜔

where 𝜙𝑓 𝜔 = 𝐸( 𝑓 𝜔 2) is the variance of 𝑓 ∈ 𝑋, 𝑉1 , and 𝚪𝐯 𝜔 = ൗ𝐸 𝐯(𝜔)𝐯𝐻 𝜔 𝜙𝑉1(𝜔)is the pseudo-coherence matrix of the noise.

▪ We deduce the gain in SNR:

𝒢 𝐡 𝜔 =oSNR ω

iSNR ω=

𝐡𝐻 𝜔 𝐝 𝜔 2

𝐡𝐻 𝜔 𝚪𝐯 𝜔 𝐡 𝜔

▪ WNG: 𝚪𝐯 𝜔 = 𝐈𝑀, 𝒲 𝐡 𝜔 =𝐡𝐻 𝜔 𝐝 𝜔

2

𝐡𝐻 𝜔 𝐡 𝜔

▪ DF: 𝚪𝐯 𝜔 = 𝚪𝒅 𝜔 =1

20𝜋𝐝 𝜔, 𝜃 𝐝𝐻 𝜔, 𝜃 sin 𝜃𝑑𝜃, 𝒟 𝐡 𝜔 =

𝐡𝐻 𝜔 𝐝 𝜔2

𝐡𝐻 𝜔 𝚪𝐝 𝜔 𝐡 𝜔

Page 62: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Conventional Beamformers▪ Delay-and-Sum (DS): maximizes the WNG subject to the distortionless constraint

𝐡DS 𝜔, 𝜃 =𝐝 𝜔, 𝜃

𝑀𝒲 𝐡DS 𝜔, 𝜃 = 𝑀 = 𝒲max

𝒟 𝐡DS 𝜔, 𝜃 =𝑀2

𝐝𝐻 𝜔, 𝜃 𝚪𝐝 𝜔 𝐝 𝜔, 𝜃≥ 1

While the DS maximizes WNG it never amplifies diffuse noise.

▪ Superdirective (SD): maximizes the DF subject to the distortionless constraint for the specific case of 𝜃 = 0 and small 𝛿

𝐡SD 𝜔, 𝜃 =𝚪𝐝−1 𝜔 𝐝 𝜔

𝐝𝐻 𝜔 𝚪𝐝−1 𝜔 𝐝 𝜔

while maximizing the DF the 𝐡SD 𝜔, 𝜃 can amplify the white noise especially at low frequencies

Page 63: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Conventional Beamformers▪ Robust Superdirective

𝐡𝑅,𝜀 𝜔 =𝚪𝐝 𝜔 + 𝜺𝐈𝑀

−1𝐝 𝜔

𝐝𝐻 𝜔 𝚪𝐝 𝜔 + 𝜺𝐈𝑀−1𝐝 𝜔

Where 휀 ≥ 0 is a Lagrange multiplier, which enables a compromise between the DF and the WNG

If we define 𝚪𝜀 𝜔 = 𝚪𝐝 𝜔 + 𝜺𝐈𝑀 , we can write

𝐡𝑅,𝜀 𝜔 =𝚪𝜺−1 𝜔 𝐝 𝜔

𝐝𝐻 𝜔 𝚪𝜺−1 𝜔 𝐝 𝜔

While the robust superdirective beamformer has control on the white noise amplification, it is not easy to find a closed form expression for 휀 for a desired value of the WNG

Page 64: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Combined beamformerR. Berkun, I. Cohen, and J. Benesty, “Combined beamformers for robust broadband regularized superdirective beamforming,“ IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 23, pp. 877-886, May 2015

Berkun et al. proposed the combined beamformer:

𝐡𝛼,𝜀 𝜔 =𝚪𝜺−1 𝜔 +𝛼 𝜔 𝐈𝑀 𝐝 𝜔

𝐝𝐻 𝜔 𝚪𝜺−1 𝜔 +𝛼 𝜔 𝐈𝑀 𝐝 𝜔

, 𝛼 ∈ ℝ

It can be reformulated as

𝐡𝛼,𝜀 𝜔 =𝐡𝑅,𝜀 𝜔

1 + 𝛼𝜀 𝜔+

𝐡𝐷𝑆 𝜔

1 + 𝛼𝜀−1 𝜔

Where 𝛼𝜀 𝜔 = 𝛼 𝜔𝒲max

𝒟max,𝜀 𝜔and 𝒟max,𝜀 𝜔 = 𝐝𝐻 𝜔 𝚪𝜺

−1 𝜔 𝐝 𝜔

For a fixed 𝒲 𝐡𝛼,𝜀 𝜔 = 𝒲0 < 𝑀 or a fixed 𝒟 𝐡𝛼,𝜀 𝜔 = 𝒟0 it is possible to analytically calculate 𝛼𝜀 𝜔 and hence 𝛼 𝜔 .

While finding a closed form solution for the parameter 𝛼 𝜔 , which enables control of the trade-off in performance between the WNG and the DF, The method does not address finding the regularization parameter 휀 and assumes it is user determined.

Page 65: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

New Noise Field▪ We assume the signal is corrupted both by diffuse noise and additive white noise.

▪ The input and output SNR:

iSNR ω =tr 𝜙𝑋(𝜔)𝐝 𝜔 𝐝𝐻 𝜔

tr 𝜙𝑑 𝜔 𝚪𝐝 𝜔 + 𝜙𝑤(𝜔)𝐈𝑀=

𝜙𝑋(𝜔)

𝜙𝑑 𝜔 + 𝜙𝑤(𝜔)

oSNR ω =𝜙𝑋(𝜔) 𝐡

𝐻 𝜔 𝐝 𝜔 2

𝜙𝑑 𝜔 𝐡𝐻 𝜔 𝚪𝐝 𝜔 𝐡 𝜔 + 𝜙𝑤(𝜔)𝐡𝐻 𝜔 𝐡 𝜔

▪ The SNR gain:

𝒢 𝐡 𝜔 =𝐡𝐻 𝜔 𝐝 𝜔 2

1 − 𝛼(𝜔) 𝐡𝐻 𝜔 𝚪𝐝 𝜔 𝐡 𝜔 + 𝛼(𝜔)𝐡𝐻 𝜔 𝐡 𝜔

Where

𝛼 𝜔 =𝜙𝑤(𝜔)

𝜙𝑑 𝜔 + 𝜙𝑤(𝜔), 0 ≤ 𝛼 𝜔 ≤ 1

Page 66: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

The optimal Beamformer▪ The proposed beamformer which maximizes the SNR gain is:

𝐡𝜶 𝜔 =𝚪𝐝,𝛼−1 𝜔 𝐝 𝜔

𝐝𝐻 𝜔 𝚪𝐝,𝛼−1 𝜔 𝐝 𝜔

, where 𝚪𝐝,α 𝜔 = 1 − 𝛼(𝜔) 𝚪𝐝 𝜔 + 𝛼(𝜔)𝐈𝑀

▪ The SNR gain 𝒢 𝐡𝜶 𝜔 = 𝐝𝐻 𝜔 𝚪𝐝,𝛼−1 𝜔 𝐝 𝜔

▪ The proposed beamformer is equivalent to 𝐡𝑅,𝜀 𝜔 with 휀 𝜔 =𝛼(𝜔)

1−𝛼(𝜔)

▪ Problem: 𝜙𝑑 𝜔 , 𝜙𝑤(𝜔) are not known → 𝛼(𝜔) is not known.

▪ Advantage 1: 𝛼(𝜔) varies from 0 to 1.

▪ Advantage 2: the gain is continuous and has a single minimum point in this range, the WNG and DF are monotonic in this range.

▪ Solution: 𝛼(𝜔) is found employing a binary-like search on each monotonic section.

Page 67: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Algorithm 1▪Input: Desired gain 𝒢0 , and tolerance

▪Output: Optimal regularization 𝛼

1. Find 𝛼min that minimizes the gain (e.g., using gradient descent)

2. Divide the range 0,1 into 2 sections in which the gain is monotonic: 0, 𝛼minand 𝛼min, 1

3. For each section, apply the following continuous binary search:

4. Divide the section into 2 sub-sections

5. Calculate the gain 𝒢𝑘 in the middle of each sub-section

6. Choose the gain 𝒢𝑘 and its respective sub-section for which 𝒢𝑘 − 𝒢0 is minimal

7. if 𝒢𝑘 − 𝒢0 ≤ tolerance then

8. 𝛼 ←(middle of chosen sub-section) and stop

9. else

10. update range to be the chosen sub-section and go back to 4

11. endif

12. Compare results 0, 𝛼min and 𝛼min, 1 , and choose the best result

Page 68: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Experimental ResultsSetup: 𝑀 = 8 microphones, 𝛿 = 1 cm spacing

Array gains for fixed SNR

𝛼(𝜔) is found for desired fixed SNR gain 𝒢0 using the proposed algorithm

Page 69: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Experimental ResultsArray gains for fixed WNG

𝛼(𝜔) is found for maximal SNR gain under a constant desired WNG 𝒲0 using the proposed algorithm from step 4

→ Our proposed beamformer outperforms the combined beamformer with 휀 = 10−4

Page 70: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

Experimental ResultsArray gains for fixed DF in multi-band

𝛼(𝜔) is found for maximal SNR gain under a piece-wise constant gradually increasing DF using the proposed algorithm from step 4

→WNG-DF trade-off can be considered at each frequency band separately!→ Our proposed beamformer outperforms the combined beamformer with 휀 = 10−4

Page 71: Speech Enhancement Based on Adaptive Line Enhancer · min. that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1into 2 sections in which the gain is monotonic:

ConclusionsResults summary:

▪ The proposed approach facilitates the design of beamformers with fixed SNR gain, beamformers with maximal SNR gain for constant WNG or DF, and multi-band fixed beamformers.

▪ Enables a fine tuning of the compromise between the DF and robustness against white noise.

Future work:

▪ Testing various angles of incidence other than the end-fire direction.

▪ Incorporating other considerations such as side-lobe requirements and performance under other types of noise fields.