Study And Evaluation of Innovative Algorithms for Voice ...
Transcript of Study And Evaluation of Innovative Algorithms for Voice ...
Study And Evaluation of Innovative Algorithms for Voice Quality
Enhancement in Speech Signals Encoded Using ACELP (Algebraic Code Excited Linear Prediction)
Daniele GIACOBELLO
Introduction
• VQE techniques usually operate in the waveformdomain.
• In GSM/UMTS networks, the signal coming from themobile terminals has to be decoded, enhanced and encoded again.
• These operations introduce delays and areparticularly prone to adding further quantizationnoise.
• Furthermore, they do not exploit the information already present in a packet of coded speech.Solution: VQE in the coded domain
Voice Quality EnhancementVQE processing location in the network
BSC TRAUBTS
PSTN
MSC
BSC TRAUBTS
TDM
TDM
TDM
Coded speech
Coded speech
Coded speech
ADC Speech Encoder
Channel Encoder
Channel Decoder
Speech Decoder
G711 Encoder
DAC
Mobile Equipment BTS TRAU Phone
VQE
DAC Speech Decoder
Channel Decoder
Channel Encoder
Speech Encoder
G711 Decoder
GSM
VQE performed on linear PCM samples after the speech decoder:
Voice Quality EnhancementVQE processing location in the network – Next evolution
PSTN
RNCUMTSscenario
NODE-B 3G-MSC
RNCNODE-B
ADC Speech Encoder
Channel Encoder
Channel Decoder
DAC Speech Decoder
Channel Decoder
Channel Encoder
Speech Decoder
G711 Encoder DAC
Mobile Equipment BTS 3G-MSC Phone
Coded speech
Coded speech
TDM
Moving VQE before speech decoder or transcoder…
VQE
Thesis objectives
• Statistical analysis of the ACELP- AMR (Adaptive Multi-Rate) parameters
• Voice Activity Detector in the coded domain. • Performs the discrimination exploiting the statistical behavior
of the set of parameters that characterize a segment of coded speech signal
• Acoustic Echo Cancellation in the coded domain.• Working directly on the coded parameters
Codec AMR 12.2 kbit/s
• Parameters where we work on:– 10 LPC coefficients– Pitch gain and lag (LTP order 1)– Fixed codebook gain
( )( )10
( )
1
( )
1 1 ( )
codebook
T n ipitch i
i
g n
g n z a n z− −
=
⎛ ⎞− ⋅ + ⋅⎜ ⎟
⎝ ⎠∑
codework speech
n-th subframe
Voice Activity Detection
• Discrimination between noise and voice AMR frames.
• Necessary for a good implementation of the VQE algorithms.
Voice Activity Detection discriminative measures - LSFs
1 2 1 3 2 10 9 10( , , ,..., , )lsf l l l l l l l lπ′ = − − − −
( ) ( )29 9
1 1
19n n
VarDiffLsf lsf n lsf n= =
⎡ ⎤′ ′= − −⎢ ⎥⎣ ⎦∑ ∑
( )( )
( )( )
2 29
22 29 91 1 1
logn n n
lsf n lsf nEntropy
lsf n lsf n= = =
⎡ ⎤⎛ ⎞′ ′⎢ ⎥⎜ ⎟= −⎢ ⎥⎜ ⎟′ ′⎝ ⎠⎣ ⎦
∑∑ ∑
Voice Activity Detection discriminative measures - pitch lag and gain
( ) ( )24 4
0 01 1
1var4n n
Pitch T n T n= =
⎡ ⎤= −⎢ ⎥⎣ ⎦∑ ∑ Gcodebook Gcodebook=
• pitch lag remains constant during vocalized speech
• algebraic codebook gain directly related to the energy
Voice Activity Detectionexample
discriminative measures
Weighted fuzzy decision
smoothing rule
Training phase and constant updating of thresholds. Robustness to change in environments
Acoustic Echo Cancellation
Parametersextraction
+ VAD
Parametersextraction
+ VAD
DecDec
AECProcessing
AECProcessingDecDec
CodCod
CodCod
Network
Delay
Parametersextraction
+ VAD
Parametersextraction
+ VAD
Echo detection
+DTD
Echo detection
+DTD
VQE
030 250ms msτ≤ ≤
FE
NE
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )y t s t e t n t s t x t h t n t= + + = + ⊗ +
( ) ( )0h t tα δ τ= ⋅ −
Acoustic Echo Cancellation echo detector: initial estimate of network delay
( )( )( ) ( )( )
( )( ) ( )( )22, 6,...,50
x yxy
x y
E x n y nr
E x n E y n
τ μ μτ τ
τ μ μ
⎡ ⎤+ − −⎣ ⎦= =⎡ ⎤⎡ ⎤+ − −⎢ ⎥⎣ ⎦ ⎣ ⎦
( )0̂ arg max xyrτ
τ τ=
( )( ) ( ) ( ) ( )
, , , ,
10
, , , ,1
13
x y x y pitch x pitch y fixed x fixed ylsf i lsf i T T g g g gi
xy
r r r rr
τ τ τ ττ =
+ + +=∑
NE
FE
SNR=20dBERL=10dB
SNR=3dBERL=40dB
Acoustic Echo Cancellation echo detector: updating network delay estimate
( ) ( )0̂ 1 1x yVAD n VAD nτ− = ∧ =
( ) ( ) ( )( ) ( )2 2
, , 20,..., 20xy
E x m y mcc n
E x m E y m
ττ τ
+⎡ ⎤⎣ ⎦= = −⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦
( )( ) ( ) ( ) ( )
, , , ,
10
, , , ,1,
13
x y x y pitch x pitch y fixed x fixed ylsf i lsf i T T g g g gi
xy
r r r rcc n
τ τ τ ττ =
+ + +=∑
( ) ( )max ,xyc n cc n τ=
( ) ( )0̂ arg max ,xyn cc nτ
δτ τ=
• Once the time near-end and far-end axis are aligned, we update iteratively the delay estimate, when:
• cross-correlation calculation:
• define the maximum:
( ) 0.85c n >• If: then:
( )c n echo likelihood parameter. useful also for DTD and for the cancellation algorithms.
update
Acoustic Echo Cancellation double-talk detector
( )c n( )c nwith DT
without DT( ) ( )P D P D
( ) 0.05P D =
0.42cTH =
Acoustic Echo Cancellation AEC on the gains
( )( )10
( )
1
( )( , )
1 1 ( )
fixed
T n ipitch i
i
g nH z n
g n z a n z− −
=
=⎛ ⎞
− ⋅ + ⋅⎜ ⎟⎝ ⎠
∑
Fundamental to reduce theEnergy level
( ) ( ) ( ) ( )( ) ( ) ( ) ( ), ,y e v bn e v bng n f g n g n g n g n g n g n= ≈ + +Hyp:
Normalized Least Mean Square
( ) ( ) ( )( ) ( )( )( ) ( ) ( )
ˆˆ ˆ1 1.5 y exT
x x
g n g nh n h n c n g n
g n g n−
+ = + ⋅
( ) ( ) ( ) ( ) ( ) ( )ˆˆ Tu y e y xg n g n g n g n h n g n= − = −
Acoustic Echo Cancellation AEC on the pitch lag
The spectral behaviour is still similar to the original echo signal
Solution: modify the spectrum of the AMR subframe
Pitch lag is basically constant during voiced speech
Solution: “break” this regularity
NE
FE
{ }17,18,19,....,142,143rTΩ =
( ),pitch u rT i T=
Acoustic Echo Cancellation AEC on the LSFs
Modifying the LSFs, we can change the spectral coeherence of the signal
FE
NE
( ) ( ) ( ) ( )( ) ( ),ˆ 1u i bnoise il n c n l n c n l n= ⋅ + − ⋅
Noisy LSFs vector updated when VAD=0
White noise example:equidistant LSFs
Acoustic Echo Cancellation noise injection
Estimate noise statistics in the coded domain when VAD=0,HMM-based noisy parameters generator
Conclusions
VAD techniques have shown to be robust also with low SNR, with an accurate training phase.AEC techniques showed interesting performances, also comparable to linear-domain algorithms (20-25 dB ERLE) in average working conditions (Standard ITU)
Possible development of the algorithms in the near future
• PRO: possibility of modyfing the spectrum very easily• CON: difficult to increase ERLE in bad SNR and ERL conditions (not too relevant though)