Study And Evaluation of Innovative Algorithms for Voice ...

Study And Evaluation of Innovative Algorithms for Voice Quality

Enhancement in Speech Signals Encoded Using ACELP (Algebraic Code Excited Linear Prediction)

Daniele GIACOBELLO

Introduction

• VQE techniques usually operate in the waveformdomain.

• In GSM/UMTS networks, the signal coming from themobile terminals has to be decoded, enhanced and encoded again.

• These operations introduce delays and areparticularly prone to adding further quantizationnoise.

• Furthermore, they do not exploit the information already present in a packet of coded speech.Solution: VQE in the coded domain

Voice Quality EnhancementVQE processing location in the network

BSC TRAUBTS

PSTN

MSC

BSC TRAUBTS

TDM

TDM

TDM

Coded speech

Coded speech

Coded speech

ADC Speech Encoder

Channel Encoder

Channel Decoder

Speech Decoder

G711 Encoder

DAC

Mobile Equipment BTS TRAU Phone

VQE

DAC Speech Decoder

Channel Decoder

Channel Encoder

Speech Encoder

G711 Decoder

GSM

VQE performed on linear PCM samples after the speech decoder:

Voice Quality EnhancementVQE processing location in the network – Next evolution

PSTN

RNCUMTSscenario

NODE-B 3G-MSC

RNCNODE-B

ADC Speech Encoder

Channel Encoder

Channel Decoder

DAC Speech Decoder

Channel Decoder

Channel Encoder

Speech Decoder

G711 Encoder DAC

Mobile Equipment BTS 3G-MSC Phone

Coded speech

Coded speech

TDM

Moving VQE before speech decoder or transcoder…

VQE

Thesis objectives

• Statistical analysis of the ACELP- AMR (Adaptive Multi-Rate) parameters

• Voice Activity Detector in the coded domain. • Performs the discrimination exploiting the statistical behavior

of the set of parameters that characterize a segment of coded speech signal

• Acoustic Echo Cancellation in the coded domain.• Working directly on the coded parameters

Codec AMR 12.2 kbit/s

• Parameters where we work on:– 10 LPC coefficients– Pitch gain and lag (LTP order 1)– Fixed codebook gain

( )( )10

( )

1

( )

1 1 ( )

codebook

T n ipitch i

i

g n

g n z a n z− −

=

⎛ ⎞− ⋅ + ⋅⎜ ⎟

⎝ ⎠∑

codework speech

n-th subframe

Voice Activity Detection

• Discrimination between noise and voice AMR frames.

• Necessary for a good implementation of the VQE algorithms.

Voice Activity Detection discriminative measures - LSFs

1 2 1 3 2 10 9 10( , , ,..., , )lsf l l l l l l l lπ′ = − − − −

( ) ( )29 9

1 1

19n n

VarDiffLsf lsf n lsf n= =

⎡ ⎤′ ′= − −⎢ ⎥⎣ ⎦∑ ∑

( )( )

( )( )

2 29

22 29 91 1 1

logn n n

lsf n lsf nEntropy

lsf n lsf n= = =

⎡ ⎤⎛ ⎞′ ′⎢ ⎥⎜ ⎟= −⎢ ⎥⎜ ⎟′ ′⎝ ⎠⎣ ⎦

∑∑ ∑

Voice Activity Detection discriminative measures - pitch lag and gain

( ) ( )24 4

0 01 1

1var4n n

Pitch T n T n= =

⎡ ⎤= −⎢ ⎥⎣ ⎦∑ ∑ Gcodebook Gcodebook=

• pitch lag remains constant during vocalized speech

• algebraic codebook gain directly related to the energy

Voice Activity Detectionexample

discriminative measures

Weighted fuzzy decision

smoothing rule

Training phase and constant updating of thresholds. Robustness to change in environments

Acoustic Echo Cancellation

Parametersextraction

+ VAD


+ VAD

DecDec

AECProcessing

AECProcessingDecDec

CodCod

CodCod

Network

Delay


+ VAD


+ VAD

Echo detection

+DTD

Echo detection

+DTD

VQE

030 250ms msτ≤ ≤

FE

NE

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )y t s t e t n t s t x t h t n t= + + = + ⊗ +

( ) ( )0h t tα δ τ= ⋅ −

Acoustic Echo Cancellation echo detector: initial estimate of network delay

( )( )( ) ( )( )

( )( ) ( )( )22, 6,...,50

x yxy

x y

E x n y nr

E x n E y n

τ μ μτ τ

τ μ μ

⎡ ⎤+ − −⎣ ⎦= =⎡ ⎤⎡ ⎤+ − −⎢ ⎥⎣ ⎦ ⎣ ⎦

( )0̂ arg max xyrτ

τ τ=

( )( ) ( ) ( ) ( )

, , , ,

10

, , , ,1

13

x y x y pitch x pitch y fixed x fixed ylsf i lsf i T T g g g gi

xy

r r r rr

τ τ τ ττ =

+ + +=∑

NE

FE

SNR=20dBERL=10dB

SNR=3dBERL=40dB

Acoustic Echo Cancellation echo detector: updating network delay estimate

( ) ( )0̂ 1 1x yVAD n VAD nτ− = ∧ =

( ) ( ) ( )( ) ( )2 2

, , 20,..., 20xy

E x m y mcc n

E x m E y m

ττ τ

+⎡ ⎤⎣ ⎦= = −⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦

( )( ) ( ) ( ) ( )

, , , ,

10

, , , ,1,

13

x y x y pitch x pitch y fixed x fixed ylsf i lsf i T T g g g gi

xy

r r r rcc n

τ τ τ ττ =

+ + +=∑

( ) ( )max ,xyc n cc n τ=

( ) ( )0̂ arg max ,xyn cc nτ

δτ τ=

• Once the time near-end and far-end axis are aligned, we update iteratively the delay estimate, when:

• cross-correlation calculation:

• define the maximum:

( ) 0.85c n >• If: then:

( )c n echo likelihood parameter. useful also for DTD and for the cancellation algorithms.

update

Acoustic Echo Cancellation double-talk detector

( )c n( )c nwith DT

without DT( ) ( )P D P D

( ) 0.05P D =

0.42cTH =

Acoustic Echo Cancellation AEC on the gains

( )( )10

( )

1

( )( , )

1 1 ( )

fixed

T n ipitch i

i

g nH z n

g n z a n z− −

=

=⎛ ⎞

− ⋅ + ⋅⎜ ⎟⎝ ⎠

∑

Fundamental to reduce theEnergy level

( ) ( ) ( ) ( )( ) ( ) ( ) ( ), ,y e v bn e v bng n f g n g n g n g n g n g n= ≈ + +Hyp:

Normalized Least Mean Square

( ) ( ) ( )( ) ( )( )( ) ( ) ( )

ˆˆ ˆ1 1.5 y exT

x x

g n g nh n h n c n g n

g n g n−

+ = + ⋅

( ) ( ) ( ) ( ) ( ) ( )ˆˆ Tu y e y xg n g n g n g n h n g n= − = −

Acoustic Echo Cancellation AEC on the pitch lag

The spectral behaviour is still similar to the original echo signal

Solution: modify the spectrum of the AMR subframe

Pitch lag is basically constant during voiced speech

Solution: “break” this regularity

NE

FE

{ }17,18,19,....,142,143rTΩ =

( ),pitch u rT i T=

Acoustic Echo Cancellation AEC on the LSFs

Modifying the LSFs, we can change the spectral coeherence of the signal

FE

NE

( ) ( ) ( ) ( )( ) ( ),ˆ 1u i bnoise il n c n l n c n l n= ⋅ + − ⋅

Noisy LSFs vector updated when VAD=0

White noise example:equidistant LSFs

Acoustic Echo Cancellation noise injection

Estimate noise statistics in the coded domain when VAD=0,HMM-based noisy parameters generator

Conclusions

VAD techniques have shown to be robust also with low SNR, with an accurate training phase.AEC techniques showed interesting performances, also comparable to linear-domain algorithms (20-25 dB ERLE) in average working conditions (Standard ITU)

Possible development of the algorithms in the near future

• PRO: possibility of modyfing the spectrum very easily• CON: difficult to increase ERLE in bad SNR and ERL conditions (not too relevant though)

Study And Evaluation of Innovative Algorithms for Voice ...

Documents

Transcript of Study And Evaluation of Innovative Algorithms for Voice ...