Adaptive Filtering - Theory and Applicationssc.enseeiht.fr/doc/Seminar_Bermudez.pdf · Adaptive...
Transcript of Adaptive Filtering - Theory and Applicationssc.enseeiht.fr/doc/Seminar_Bermudez.pdf · Adaptive...
Adaptive Filtering - Theory and Applications
Jose C. M. Bermudez
Department of Electrical EngineeringFederal University of Santa Catarina
Florianopolis – SCBrazil
IRIT - INP-ENSEEIHT, ToulouseMay 2011
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 1 / 107
1 Introduction
2 Adaptive Filtering Applications
3 Adaptive Filtering Principles
4 Iterative Solutions for the Optimum Filtering Problem
5 Stochastic Gradient Algorithms
6 Deterministic Algorithms
7 Analysis
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 2 / 107
Estimation Techniques
Several techniques to solve estimation problems.
Classical Estimation
Maximum Likelihood (ML), Least Squares (LS), Moments, etc.
Bayesian Estimation
Minimum MSE (MMSE), Maximum A Posteriori (MAP), etc.
Linear Estimation
Frequently used in practice when there is a limitation in computational
complexity – Real-time operation
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 4 / 107
Linear Estimators
Simpler to determine: depend on the first two moments of data
Statistical Approach – Optimal Linear Filters◮ Minimum Mean Square Error◮ Require second order statistics of signals
Deterministic Approach – Least Squares Estimators◮ Minimum Least Squares Error◮ Require handling of a data observation matrix
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 5 / 107
Limitations of Optimal Filters and LS Estimators
Statistics of signals may not be available or cannot be accuratelyestimated
There may not be available time for statistical estimation (real-time)
Signals and systems may be non-stationary
Memory required may be prohibitive
Computational load may be prohibitive
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 6 / 107
Iterative Solutions
Search the optimal solution starting from an initial guess
Iterative algorithms are based on classical optimization algorithms
Require reduced computational effort per iteration
Need several iterations to converge to the optimal solution
These methods form the basis for the development of adaptivealgorithms
Still require the knowledge of signal statistics
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 7 / 107
Adaptive Filters
Usually approximate iterative algorithms and:
Do not require previous knowledge of the signal statistics
Have a small computational complexity per iteration
Converge to a neighborhood of the optimal solution
Adaptive filters are good for:
Real-time applications, when there is no time for statistical estimation
Applications with nonstationary signals and/or systems
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 8 / 107
Properties of Adaptive Filters
They can operate satisfactorily in unknown and possibly time-varyingenvironments without user intervention
They improve their performance during operation by learningstatistical characteristics from current signal observations
They can track variations in the signal operating environment (SOE)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 9 / 107
Adaptive Filtering Applications
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 10 / 107
Basic Classes of Adaptive Filtering Applications
System Identification
Inverse System Modeling
Signal Prediction
Interference Cancelation
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 11 / 107
System Identification
+ +
_
+x(n) d(n)
d(n)
y(n) e(n)
eo(n)
other signals
unknownsystem
adaptive
adaptive
algorithm
filter
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 12 / 107
Applications – System Identification
Channel Estimation
Communications systems
Objective: model the channel to design distortion compensation
x(n): training sequence
Plant Identification
Control systems
Objective: model the plant to design a compensator
x(n): training sequence
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 13 / 107
Echo Cancellation
Telephone systems and VoIP
Echo caused by network impedance mismatches or acousticenvironment
Objective: model the echo path impulse response
x(n): transmitted signal
d(n): echo + noise
H H
Tx
Rx
H H
Tx
Rx
EC
+
_
x(n)
d(n)e(n)
Figure: Network Echo Cancellation
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 14 / 107
Inverse System Modeling
Adaptive filter attempts to estimate unknown system’s inverse
Adaptive filter input usually corrupted by noise
Desired response d(n) may not be available
++ +
_UnknownSystem
Adaptive
AdaptiveFilter
Algorithm
Delay
othersignals
x(n)
s(n) e(n)
y(n)
d(n)z(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 15 / 107
Applications – Inverse System ModelingChannel Equalization
++ +
_Channel
Adaptive
AdaptiveFilter
Algorithm
Local gen.
x(n)
x(n)
s(n) e(n)
y(n)
d(n)z(n)
Objective: reduce intersymbol interference
Initially – training sequence in d(n)
After training: d(n) generated from previous decisions
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 16 / 107
Signal Prediction
+_
Delay
Adaptive
AdaptiveFilter
Algorithmothersignals
x(n − no)
x(n) e(n)
y(n)
d(n)
most widely used case – forward prediction
signal x(n) to be predicted from samples{x(n − no), x(n − no − 1), . . . , x(n − no − L)}
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 17 / 107
Application – Signal Prediction
DPCM Speech Quantizer - Linear Predictive Coding
Objective: Reduce speech transmission bandwidth
Signal transmitted all the time: quantization error
Predictor coefficients are transmitted at low rate
+
_+
+
Speech
signalsignal DPCM
Quantizer
Predictor
prediction errorQ[e(n)]
e(n)
y(n) + Q[e(n)] ∼ d(n)
d(n)
y(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 18 / 107
Interference Cancelation
One or more sensor signals are used to remove interference and noise
Reference signals correlated with the inteference should also beavailable
Applications:◮ array processing for radar and communications◮ biomedical sensing systems◮ active noise control systems
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 19 / 107
Application – Interference CancelationActive Noise Control
Ref: D.G. Manolakis, V.K. Ingle and S.M. Kogon, Statistical and Adaptive Signal Processing, 2000.
Cancelation of acoustic noise using destructive interference
Secondary system between the adaptive filter and the cancelationpoint is unavoidable
Cancelation is performed in the acoustic environmentJose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 20 / 107
Active Noise Control – Block Diagram
s++
−
AdaptiveAlgorithm
wo
w(n)
x(n)
xf (n)
S
S
y(n) ys(n) yg(n)
d(n)
z(n)
e(n)
g(ys)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 21 / 107
Adaptive Filtering Principles
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 22 / 107
Adaptive Filter Features
Adaptive filters are composed of three basic modules:
Filtering strucure◮ Determines the output of the filter given its input samples◮ Its weights are periodically adjusted by the adaptive algorithm◮ Can be linear or nonlinear, depending on the application◮ Linear filters can be FIR or IIR
Performance criterion◮ Defined according to application and mathematical tractability◮ Is used to derive the adaptive algorithm◮ Its value at each iteration affects the adaptive weight updates
Adaptive algorithm◮ Uses the performance criterion value and the current signals◮ Modifies the adaptive weights to improve performance◮ Its form and complexity are function of the structure and of the
performance criterion
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 23 / 107
Signal Operating Environment (SOE)
Comprises all informations regarding the properties of the signals andsystems
Input signals
Desired signal
Unknown systems
If the SOE is nonstationary
Aquisition or convergence mode: from start until close to bestperformance
Tracking mode: readjustment following SOE’s time variations
Adaptation can be
Supervised – desired signal is available ➪ e(n) can be evaluated
Unsupervised – desired signal is unavailable
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 24 / 107
Performance Evaluation
Convergence rate
Misadjustment
Tracking
Robustness (disturbances and numerical)
Computational requirements (operations and memory)
Structure◮ facility of implementation◮ performance surface◮ stability
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 25 / 107
Optimum versus Adaptive Filters in Linear Estimation
Conditions for this study
Stationary SOE
Filter structure is transversal FIR
All signals are real valued
Performance criterion: Mean-square error E[e2(n)]
The Linear Estimation Problem
+
−
Linear FIR Filter
w
x(n) y(n)
d(n)
e(n)
Jms = E[e2(n)]
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 26 / 107
The Linear Estimation Problem
+
−
Linear FIR Filter
w
x(n) y(n)
d(n)
e(n)
x(n) = [x(n), x(n − 1), · · · , x(n − N + 1)]T
y(n) = xT (n)w
e(n) = d(n) − y(n) = d(n) − xT (n)w
Jms = E[e2(n)] = σ2d − 2pT w + wT Rxxw
wherep = E[x(n)d(n)]; Rxx = E[x(n)xT (n)]
Normal Equations
Rxxwo = p ➪ wo = R−1xx p for Rxx > 0
Jmsmin= σ2
d − pT R−1xx p
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 27 / 107
What if d(n) is nonstationary?
+
−
Linear FIR Filter
wx(n) y(n)
d(n)
e(n)
x(n) = [x(n), x(n − 1), · · · , x(n − N + 1)]T
y(n) = xT (n)w(n)
e(n) = d(n) − y(n) = d(n) − xT (n)w(n)
Jms(n) = E[e2(n)] = σ2d(n) − 2p(n)T w(n) + wT (n)Rxxw(n)
wherep(n) = E[x(n)d(n)]; Rxx = E[x(n)xT (n)]
Normal Equations
Rxxwo(n) = p(n) ➪ wo(n) = R−1
xx p(n) for Rxx > 0
Jmsmin(n) = σ2
d(n) − pT (n)R−1xx p(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 28 / 107
Optimum Filters versus Adaptive Filters
Optimum Filters
Computep(n) = E[x(n)d(n)]
Solve Rxxwo = p(n)
Filter with wo(n) ➪
y(n) = xT (n)wo(n)
Nonstationary SOE:Optimum filter determinedfor each value of n
Adaptive Filters
Filtering: y(n) = xT (n)w(n)
Evaluate error: e(n) = d(n) − y(n)
Adaptive algorithm:
w(n + 1) = w(n) + ∆w[x(n), e(n)]
∆w(n) is chosen so that w(n) is close towo(n) for n large
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 29 / 107
Characteristics of Adaptive Filters
Search for the optimum solution on the performance surface
Follow principles of optimization techniques
Implement a recursive optimization solution
Convergence speed may depend on initialization
Have stability regions
Steady-state solution fluctuates about the optimum
Can track time varying SOEs better than optimum filters
Performance depends on the performance surface
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 30 / 107
Iterative Solutions for the
Optimum Filtering Problem
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 31 / 107
Performance (Cost) Functions
Mean-square error ➪ E[e2(n)] (Most popular)Adaptive algorithms: Least-Mean Square (LMS), Normalized LMS(NLMS), Affine Projection (AP), Recursive Least Squares (RLS), etc.
Regularized MSE
Jrms = E[e2(n)] + α‖w(n)‖2
Adaptive algorithm: leaky least-mean square (leaky LMS)
ℓ1 norm criterionJℓ1 = E[|e(n)|]
Adaptive algorithm: Sign-Error
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 32 / 107
Performance (Cost) Functions – continued
Least-mean fourth (LMF) criterion
JLMF = E[e4(n)]
Adaptive algorithm: Least-Mean Fourth (LMF)
Least-mean-mixed-norm (LMMN) criterion
JLMMN = E[αe2(n) +1
2(1 − α)e4(n)]
Adaptive algorithm: Least-Mean-Mixed-Norm (LMMN)
Constant-modulus criterion
JCM = E[(γ − |xT (n)w(n)|2
)2]
Adaptive algorithm: Constant-Modulus (CM)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 33 / 107
MSE Performance Surface – Small Input Correlation
−20 −10 0 10 20
−20
0
200
500
1000
1500
2000
2500
3000
3500
w1
w2
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 34 / 107
MSE Performance Surface – Large Input Correlation
−20 −15 −10 −5 0 5 10 15 20
−20
−10
0
10
200
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
w1
w2
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 35 / 107
The Steepest Descent Algorithm – Stationary SOE
Cost Function
Jms(n) = E[e2(n)] = σ2d − 2pT w(n) + wT (n)Rxxw(n)
Weight Update Equation
w(n + 1) = w(n) + µc(n)
µ: step-sizec(n): correction term (determines direction of ∆w(n))Steepest descent adjustment:
c(n) = −∇Jms(n) ➪ Jms(n + 1) ≤ Jms(n)
w(n + 1) = w(n) + µ[p − Rxxw(n)]
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 36 / 107
Weight Update Equation About the Optimum Weights
Weight Error Update Equation
w(n + 1) = w(n) + µ[p − Rxxw(n)]
Using p = Rxxwo
w(n + 1) = (I − µRxx)w(n) + µRxxwo
Weight error vector: v(n) = w(n) − wo
v(n + 1) = (I − µRxx)v(n)
Matrix I − µRxx must be stable for convergence (|λi| < 1)
Assuming convergence, limn→∞ v(n) = 0
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 37 / 107
Convergence Conditions
v(n + 1) = (I − µRxx)v(n); Rxx positive definite
Eigen-decomposition of Rxx
Rxx = QΛQT
v(n + 1) = (I − µQΛQT )v(n)
QT v(n + 1) = QT v(n) − µΛQT v(n)
Defining v(n + 1) = QT v(n + 1)
v(n + 1) = (I − µΛ)v(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 38 / 107
Convergence Properties
v(n + 1) = (I − µΛ)v(n)
vk(n + 1) = (1 − µλk)vk(n), k = 1, . . . , N
vk(n) = (1 − µλk)nvk(0)
Convergence modes
◮ monotonic if 0 < 1 − µλk < 1
◮ oscillatory if −1 < 1 − µλk < 0
Convergence if |1 − µλk| < 1 ➪ 0 < µ < 2λmax
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 39 / 107
Optimal Step-Size
vk(n) = (1 − µλk)nvk(0); convergence modes: 1 − µλk
max |1 − µλk|: slowest mode
min |1 − µλk|: fastest mode
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 40 / 107
Optimal Step-Size – continued
1
0
|1−
µλ
k|
1/λmax 1/λminµo µ
minµ
max{|1 − µλk|}
1 − µoλmin = −(1 − µoλmax)
µo = 2λmax+λmin
Optimal slowest modes: ±ρ−1ρ+1 ; ρ = λmax
λmin
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 41 / 107
The Learning Curve – Jms(n)
Jms(n) = Jmsmin+
Excess MSE︷ ︸︸ ︷
vT (n)Λv(n)= Jmsmin+
N∑
k=1
λkv2k(n)
Since vk(n) = (1 − µλk)nvk(0),
Jms(n) = Jmsmin+
N∑
k=1
λk(1 − µλk)2nv2
k(0)
λk(1 − µλk)2 > 0 ➪ monotonic convergence
Stability limit is again 0 < µ < 2λmax
Jms(n) converges faster than w(n)
Algorithm converges faster as ρ = λmax/λmin → 1
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 42 / 107
Simulation Results
x(n) = α x(n − 1) + v(n)
0 20 40 60 80 100 120 140 160 180 200−70
−60
−50
−40
−30
−20
−10
0
iteration
MS
E (
dB)
Steepest Descent − Mean Square Error
Input: White noise (ρ = 1)
0 20 40 60 80 100 120 140 160 180 200−70
−60
−50
−40
−30
−20
−10
0
iteration
MS
E (
dB)
Steepest Descent − Mean Square Error
Input: AR(1), α = 0.7 (ρ = 5.7)
Linear system identification - FIR with 20 coefficients
Step-size µ = 0.3
Noise power σ2v = 10−6
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 43 / 107
The Newton Algorithm
Steepest descent: linear approx. of Jms about the operating point
Newton’s method: Quadratic approximation of Jms
Expanding Jms(w) in Taylor’s series about w(n),
Jms(w) ∼Jms[w(n)] + ∇T Jms[w − w(n)]
+1
2[w − w(n)]T H(n)[w − w(n)]
Diferentiating w.r.t. w and equating to zero at w = w(n + 1),
∇Jms[w(n + 1)] = ∇Jms[w(n)] + H [w(n)][w(n + 1) − w(n)] = 0
w(n + 1) = w(n) − H−1[w(n)]∇Jms[w(n)]
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 44 / 107
The Newton Algorithm – continued
∇Jms[w(n)] = −2p + 2Rxxw(n)
H(w(n)) = 2Rxx
Thus, adding a step-size control,
w(n + 1) = w(n) − µR−1xx [−p + Rxxw(n)]
Quadratic surface ➪ conv. in one iteration for µ = 1 ,
Requires the determination of R−1xx /
Can be used to derive simpler adaptive algorithms
When H(n) is close to singular ➪ regularization
H(n) = 2Rxx + 2εI
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 45 / 107
Least Mean Squares (LMS) Algorithm
Can be interpreted in different ways
Each interpretation helps understanding the algorithm behavior
Some of these interpretations are related to the steepest descentalgorithm
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 47 / 107
LMS as a Stochastic Gradient Algorithm
Suppose we use the estimate Jms(n) = E[e2(n)] ≃ e2(n)
The estimated gradient vector becomes
∇Jms(n) =∂e2(n)
∂w(n)= 2e(n)
∂e(n)
∂w(n)
Since e(n) = d(n) − xT (n)w(n),
∇Jms(n) = −2e(n)x(n) (stochastic gradient)
and, using the steepest descent weight update equation,
w(n + 1) = w(n) + µe(n)x(n) (LMS weight update)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 48 / 107
LMS as a Stochastic Estimation Algorithm
∇Jms(n) = −2p + 2Rxxw(n)
Stochastic estimators
p = d(n)x(n) Rxx = x(n)xT (n)
Then,∇Jms(n) = −2d(n)x(n) + 2x(n)xT (n)w(n)
Using ∇Jms(n) is the steepest descent weight update,
w(n + 1) = w(n) + µe(n)x(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 49 / 107
LMS – A Solution to a Local Optimization
Error expressionse(n) = d(n) − xT (n)w(n) (a priori error)ǫ(n) = d(n) − xT (n)w(n + 1) (a posteriori error)
We want to maximize |ǫ(n) − e(n)| with |ǫ(n)| < |e(n)|
ǫ(n) − e(n) = −xT (n)∆w(n)
∆w(n) = w(n + 1) − w(n)
Expressing ∆w(n) as ∆w(n) = ˜∆w(n)e(n)
ǫ(n) − e(n) = −xT (n) ˜∆w(n)e(n)
For max |ǫ(n) − e(n)| ➪ ˜∆w(n) in the direction of x(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 50 / 107
➪ ˜∆w(n) = µx(n) and
∆w(n) = µx(n)e(n)
andw(n + 1) = w(n) + µe(n)x(n)
Asǫ(n) − e(n) = −xT (n)∆w(n) = −µxT (n)x(n)e(n)
|ǫ(n)| < |e(n)| requires |1 − µxT (n)x(n)| < 1, or
0 < µ < 2‖x(n)‖2 (stability region)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 51 / 107
Observations - LMS Algorithm
LMS is a noisy approximation of the steepest descent algorithm
The gradient estimate is unbiased
The errors in the gradient estimate lead to Jmsex(∞) 6= 0
Vector w(n) is now random
Steepest descent properties are no longer guaranteed➪ LMS analysis required
The instantaneous estimates allow tracking without redesign
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 52 / 107
Some Research Results
J. C. M. Bermudez and N. J. Bershad, “A nonlinear analytical model for thequantized LMS algorithm - the arbitrary step size case,” IEEE Transactions onSignal Processing, vol.44, No. 5, pp. 1175-1183, May 1996.
J. C. M. Bermudez and N. J. Bershad, “Transient and tracking performanceanalysis of the quantized LMS algorithm for time-varying system identification,”IEEE Transactions on Signal Processing, vol.44, No. 8, pp. 1990-1997, August1996.
N. J. Bershad and J. C. M. Bermudez, “A nonlinear analytical model for thequantized LMS algorithm - the power-of-two step size case,” IEEE Transactions onSignal Processing, vol.44, No. 11, pp. 2895- 2900, November 1996.
N. J. Bershad and J. C. M. Bermudez, “Sinusoidal interference rejection analysisof an LMS adaptive feedforward controller with a noisy periodic reference,” IEEETransactions on Signal Processing, vol.46, No. 5, pp. 1298-1313, May 1998.
J. C. M. Bermudez and N. J. Bershad, “Non-Wiener behavior of the Filtered-XLMS algorithm,” IEEE Trans. on Circuits and Systems II - Analog and DigitalSignal Processing, vol.46, No. 8, pp. 1110-1114, Aug 1999.
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 53 / 107
Some Research Results – continued
O. J. Tobias, J. C. M. Bermudez and N. J. Bershad, “Mean weight behavior of theFiltered-X LMS algorithm,” IEEE Transactions on Signal Processing, vol. 48, No.4, pp. 1061-1075, April 2000.
M. H. Costa, J. C. M. Bermudez and N. J. Bershad, “Stochastic analysis of theLMS algorithm with a saturation nonlinearity following the adaptive filter output,”IEEE Transactions on Signal Processing, vol. 49, No. 7, pp. 1370-1387, July 2001.
M. H. Costa, J. C, M. Bermudez and N. J. Bershad, “Stochastic analysis of theFiltered-X LMS algorithm in systems with nonlinear secondary paths,” IEEETransactions on Signal Processing, vol. 50, No. 6, pp. 1327-1342, June 2002.
M. H. Costa, J. C. M. Bermudez and N. J. Bershad, “The performance surface infiltered nonlinear mean square estimation,” IEEE Transactions on Circuits andSystems I, vol. 50, No. 3, p. 445-447, March 2003.
G. Barrault, J. C. M. Bermudez and A. Lenzi, “New Analytical Model for theFiltered-x Least Mean Squares Algorithm Verified Through Active Noise ControlExperiment,” Mechanical Systems and Signal Processing, v. 21, p. 1839-1852,2007.
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 54 / 107
Some Research Results – continued
N. J. Bershad, J. C. M. Bermudez and J. Y. Tourneret, “Stochastic Analysis of theLMS Algorithm for System Identification with Subspace Inputs,” IEEE Trans. onSignal Process., v. 56, p. 1018-1027, 2008.
J. C. M. Bermudez, N. J. Bershad and J. Y. Tourneret, “An Affine Combination ofTwo LMS Adaptive Filters Transient Mean-Square Analysis,” IEEE Trans. onSignal Process., v. 56, p. 1853-1864, 2008.
M. H. Costa, L. R. Ximenes and J. C. M. Bermudez, “Statistical Analysis of theLMS Adaptive Algorithm Subjected to a Symmetric Dead-Zone Nonlinearity at theAdaptive Filter Output,” Signal Processing, v. 88, p. 1485-1495, 2008.
M. H. Costa and J. C. M. Bermudez, “A Noise Resilient Variable Step-Size LMSAlgorithm,” Signal Processing, v. 88, no. 3, p. 733-748, March 2008.
P. Honeine, C. Richard, J. C. M. Bermudez, J. Chen and H. Snoussi, “ADecentralized Approach for Nonlinear Prediction of Time Series Data in SensorNetworks,” EURASIP Journal on Wireless Communications and Networking, v.2010, p. 1-13, 2010. (KNLMS)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 55 / 107
The Normalized LMS Algorithm – NLMS
Most Employed adaptive algorithm in real-time applications
Like LMS has different interpretations (even more)
Alleviates a drawback of the LMS algorithm
w(n + 1) = w(n) + µe(n)x(n)
◮ If amplitude of x(n) is large ➪ Gradient noise amplification◮ Sub-optimal performance when σ2
x varies with time (for instance,speech)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 56 / 107
NLMS – A Solution to a Local Optimization Problem
Error expressionse(n) = d(n) − xT (n)w(n) (a priori error)ǫ(n) = d(n) − xT (n)w(n + 1) (a posteriori error)
We want to maximize |ǫ(n) − e(n)| with |ǫ(n)| < |e(n)|
ǫ(n) − e(n) = −xT (n)∆w(n) (A)
∆w(n) = w(n + 1) − w(n)
For |ǫ(n)| < |e(n)| we impose the restriction
ǫ(n) = (1 − µ)e(n), |1 − µ| < 1
➪ ǫ(n) − e(n) = −µe(n) (B)
For max |ǫ(n) − e(n)| ➪ ∆w(n) in the direction of x(n)
➪∆w(n) = kx(n) (C)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 57 / 107
Using (A), (B) and (C),
k = µe(n)
xT (n)x(n)
and
w(n + 1) = w(n) + µe(n)x(n)
xT (n)x(n)(NLMS weight update)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 58 / 107
NLMS – Solution to a Constrained
Optimization Problem
Error sequence
y(n) =xT (n)w(n) (estimate of d(n))
e(n) =d(n) − y(n) = d(n) − xT (n)w(n) (estimation error)
Optimization (principle of minimal disturbance)
Minimize ‖∆w(n)‖2 = ‖w(n + 1) − w(n)‖2
subject to: xT (n)w(n + 1) = d(n)
This problem can be solved using the method of Lagrange multipliers
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 59 / 107
Using Lagrange multipliers, we minimize
f [w(n + 1)] = ‖w(n + 1) − w(n)‖2 + λ[d(n) − xT (n)w(n + 1)]
Differentiating w.r.t. w(n + 1) and equating the result to zero,
w(n + 1) = w(n) +1
2λx(n) (*)
Using this result in xT (n)w(n + 1) = d(n) yields
λ =2e(n)
xT (n)x(n)
Using this result in (*) yields
w(n + 1) = w(n) + µe(n)x(n)
xT (n)x(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 60 / 107
NLMS as an Orthogonalization Process
Conditions for the analysis◮ e(n) = d(n) − xT (n)w(n)◮ wo is the optimal solution (Wiener solution)◮ d(n) = xT (n)wo (no noise)◮ v(n) = w(n) − wo (weight error vector)
Error signal
e(n) = d(n) − xT (n)[v(n) + wo]
= xT (n)wo − xT (n)[v(n) + wo]
= −xT (n)v(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 61 / 107
e(n) = −xT (n)v(n)
Interpretation: To minimize the error v(n) should be orthogonal to allinput vectorsRestriction: We have only one vector x(n)
Iterative solution:
We can subtract from v(n) its component in the direction of x(n) ateach iteration
If there are N adaptive coefficients, v(n) could be reduced to zeroafter N orthogonal input vectors
Iterative projection extraction➪ Gram-Schmidt orthogonalization
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 62 / 107
Recursive orthogonalization
n = 0 : v(1) = v(0) − proj. of v(0) onto x(0)
n = 1 : v(2) = v(1) − proj. of v(1) onto x(1)
... :...
n + 1 : v(n + 1) = v(n) − proj. of v(n) onto x(n)
Projection of v(n) onto x(n)
Px(n)[v(n)] ={x(n)[xT (n)x(n)]−1xT (n)
}v(n)
Weight update equation
v(n + 1) = v(n) − µ
scalar︷ ︸︸ ︷
[xT (n)x(n)]−1
−e(n)︷ ︸︸ ︷
xT (n)v(n) x(n)
= v(n) + µe(n)x(n)
xT (n)x(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 63 / 107
NLMS has its owns problems!
NLMS solves the LMS gradient error amplification problem, but ...
What happens if ‖x(n)‖2 gets too small?
One needs to add some regularization
w(n + 1) = w(n) + µe(n)x(n)
xT (n)x(n)+ε(ε−NLMS)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 64 / 107
ε-NLMS – Stochastic Approximation of
Regularized Newton
Regularized Newton Algorithm
w(n + 1) = w(n) + µ[εI + Rxx]−1[p − Rxxw(n)]
Instantaneous estimates
p = x(n)d(n)
Rxx = x(n)xT (n)
Using these estimates
w(n + 1) = w(n) + µ[εI + x(n)xT (n)]−1x(n)
e(n)︷ ︸︸ ︷
[d(n) − xT (n)w(n)]
w(n + 1) = w(n) + µ[εI + x(n)xT (n)]−1x(n)e(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 65 / 107
w(n + 1) = w(n) + µ[εI + x(n)xT (n)]−1x(n)e(n)
Inversion of εI + x(n)xT (n) ?
Matrix Inversion Formula
[A + BCD]−1 = A−1 − A−1B[C−1 + DA−1B]−1DA−1
Thus,
[εI + x(n)xT (n)]−1 = ε−1I −ε−2
1 + ε−1xT (n)x(n)x(n)xT (n)
Post-multiplying both sides by x(n) and rearranging
[εI + x(n)xT (n)]−1x(n) =x(n)
ε + xT (n)x(n)
and
w(n + 1) = w(n) + µe(n)x(n)
xT (n)x(n) + ε
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 66 / 107
Some Research Results
M. H. Costa and J. C. M. Bermudez, “An improved model for the normalized LMSalgorithm with Gaussian inputs and large number of coefficients,” in Proc. of the2002 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP-2002), Orlando, Florida, pp. II- 1385-1388, May 13-17, 2002.
J. C. M. Bermudez and M. H. Costa, “A Statistical Analysis of the ε-NLMS andNLMS Algorithms for Correlated Gaussian Signals,” Journal of the BrazilianTelecommunications Society, v. 20, n. 2, p. 7-13, 2005.
G. Barrault, M. H. Costa, J. C. M. Bermudez and A. Lenzi, “A new analyticalmodel for the NLMS algorithm,” in Proc. 2005 IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP 2005) Pennsylvania, USA, vol.IV, p. 41-44, 2005.
J. C. M. Bermudez, N. J. Bershad and J.-Y. Tourneret, “An Affine Combination ofTwo NLMS Adaptive Filters - Transient Mean-Square Analysis,” In Proc.Forty-Second Asilomar Conference on Asilomar Conference on Signals, Systems &Computers, Pacific Grove, CA, USA, 2008.
P. Honeine, C. Richard, J. C. M. Bermudez and H. Snoussi, “Distributedprediction of time series data with kernels and adaptive filtering techniques insensor networks,” In Proc. Forty-Second Asilomar Conference on AsilomarConference on Signals, Systems & Computers, Pacific Grove, CA, USA, 2008.
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 67 / 107
The Affine Projection Algorithm
Consider the NLMS algorithm but using{x(n),x(n − 1), . . . ,x(n − P )}
Minimize ‖∆w(n)‖2 = ‖w(n + 1) − w(n)‖2
subject to:
xT (n)w(n + 1) = d(n)
xT (n − 1)w(n + 1) = d(n − 1)...
xT (n − P )w(n + 1) = d(n − P )
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 68 / 107
Observation matrix
X(n) = [x(n),x(n − 1), · · · ,x(n − P )]
Desired signal vector
d(n) = [d(n), d(n − 1), . . . , d(n − P )]
Error vector
e(n) = [e(n), e(n − 1), . . . , e(n − P )] = d(n) − XT (n)w(n)
Vector of the constraint errors
ec(n) = d(n) − XT (n)w(n + 1)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 69 / 107
Using Lagrange multipliers, we minimize
f [w(n + 1)] = ‖w(n + 1) − w(n)‖2 + λT [d(n) − XT (n)w(n + 1)]
Differentiating w.r.t. w(n + 1) and equating the result to zero,
w(n + 1) = w(n) +1
2X(n)λ (*)
Using this result in XT (n)w(n + 1) = d(n) yields
λ = 2[XT (n)X(n)]−1e(n)
Using this result in (*) yields
w(n + 1) = w(n) + µX(n)[XT (n)X(n)]−1e(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 70 / 107
Affine Projection
Solution to the Underdetermined Least-Squares Problem
We want minimize
‖ec(n)‖2 = ‖d(n) − XT (n)w(n + 1)‖2
where XT (n) is (P + 1) × N with (P + 1) < N
Thus, we look for the least-squares solution of the underdeterminedsystem
XT (n)w(n + 1) = d(n)
The solution is
w(n + 1) =[XT (n)
]+d(n) = X(n)[XT (n)X(n)]−1d(n)
Using d(n) = e(n) + XT (n)w(n) yields
w(n + 1) = w(n) + µX(n)[XT (n)X(n)]−1e(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 71 / 107
Observations:
Order of the AP algorithm: P + 1 (P = 0 → NLMS)
Convergence speed increases with P (but not linearly)
Computational complexity increases with P (not linearly)
If XT (n)X(n) is close to singular, use XT (n)X(n) + εI
The scalar error of NLMS becomes a vector error in AP (except forµ = 1)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 72 / 107
Affine Projection – Stochastic Approximation of
Regularized NewtonRegularized Newton Algorithm
w(n + 1) = w(n) + µ[εI + Rxx]−1[p − Rxxw(n)]
Estimates using time window statistics
p =1
P + 1
n∑
k=n−P
X(k)d(k) =1
P + 1X(n)d(n)
Rxx =1
P + 1
n∑
k=n−P
X(k)XT (k) =1
P + 1X(n)XT (n)
Using these estimates with ε = ε/(P + 1),
w(n + 1) = w(n) + µ[εI + X(n)XT (n)]−1X(n)
e(n)︷ ︸︸ ︷
[d(n) − XT (n)w(n)]
w(n + 1) = w(n) + µ[εI + X(n)XT (n)]−1X(n)e(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 73 / 107
w(n + 1) = w(n) + µ[εI + X(n)XT (n)]−1X(n)e(n)
Using the Matrix Inversion Formula
[εI + X(n)XT (n)]−1 = X(n)[εI + XT (n)X(n)]−1
and
w(n + 1) = w(n) + µX(n)[εI + XT (n)X(n)]−1e(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 74 / 107
Affine Projection Algorithm as a
Projection onto an Affine Subspace
Conditions for the analysis◮ e(n) = d(n) − XT (n)w(n)◮ d(n) = XT (n)wo defines the optimal solution in the least-squares
sense◮ v(n) = w(n) − wo (weight error vector)
Error vector
e(n) = d(n) − XT (n)[v(n) + w(n + 1)]
= XT (n)w(n + 1) − XT (n)[v(n) + w(n + 1)]
= −XT (n)v(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 75 / 107
e(n) = −XT (n)v(n)
Interpretation: To minimize the error v(n) should be orthogonal to allinput vectorsRestriction: We are going to use only {x(n),x(n− 1), . . . ,x(n−P )}
Iterative solution:
We can subtract from v(n) its projection onto the range of X(n) ateach iteration
v(n + 1) = v(n) − P X(n)v(n) (Proj. onto an affine subspace)
Using XT (n)v(n) = −e(n)
v(n + 1) = v(n) −{X(n)[XT (n)X(n)]−1XT (n)
}v(n)
= v(n) + µX(n)[XT (n)X(n)]−1e(n) (AP algorithm)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 76 / 107
Pseudo Affine Projection Algorithm
Major problem with the AP algorithmFor µ 6= 1, e(n) → e(n) ➪ large computational complexity
The Pseudo-AP algorithm replaces the input x(n) with its P -th orderautoregressive prediction
x(n) =
P∑
k=1
akx(n − k)
Using the least-squares solution for a = [a1, a2, . . . , aP ]T ,
a = [XT
p (n)Xp(n)]−1XT
p (n)x(n), Xp(n) = [x(n − 1), . . . , x(n − P )]
Now, we subtract from x(n) its projections onto the last P inputvectors
φ(n) = x(n) − Xp(n)a(n)
= x(n) −{Xp(n)[XT
p (n)Xp(n)]−1XTp (n)
}x(n)
= (I − P P )x(n); (P P : projection matrix)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 77 / 107
Example – Pseudo-AP versus AP
0 1000 2000 3000 4000 5000 6000−80
−70
−60
−50
−40
−30
−20
−10
0Mean Square Error
dB
Iterations
AR(8)=[−0.9 0.8 −0.7 0.6 −0.5 0.4 −0.3 0.2]N=64P=8SNR=80dBS=Null
AP(−72.82dB) PAP(−75.82)
α=0.7
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 78 / 107
Using φ as the new input vector for the NLMS algorithm,
w(n + 1) = w(n) + µφ(n)
φT (n)φ(n)e(n)
It can be shown that Pseudo-AP is identical to AP for an AR inputand µ = 1
Otherwise, Pseudo-AP is different from AP
Simpler to implement than AP for µ 6= 1
Can lead even to better steady-state results than AP for AR inputs
For AR inputs: NLMS with input “orthogonalization”
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 79 / 107
Some Research ResultsS. J. M. de Almeida, J. C. M. Bermudez, N. J. Bershad and M. H. Costa, “Astatistical analysis of the affine projection algorithm for unity step size andautoregressive inputs,” IEEE Transactions on Circuits and Systems - I, vol. 52, pp.1394-1405, July 2005.
S. J. M. Almeida, J. C. M. Bermudez and N. J. Bershad, “A Stochastic Model fora Pseudo Affine Projection Algorithm,” IEEE Transactions on Signal Processing, v.57, p. 107-118, 2009.
S. J. M. Almeida, M. H. Costa and J. C. M. Bermudez, “A Stochastic Model forthe Deficient Length Pseudo Affine Projection Adaptive Algorithm,” In Proc. 17thEuropean Signal Processing Conference (EUSIPCO), Aug. 24-28, Glasgow,Scotland, 2009.
S. J. M. Almeida, M. H. Costa and J. C. M. Bermudez, “A stochastic model forthe deficient order affine projection algorithm,” in Proc. ISSPA 2010, KualaLumpur, Malaysia, May 2010.
C. Richard, J. C. M. Bermudez and P. Honeine, “Online Prediction of Time SeriesData With Kernels,” IEEE Transactions on Signal Processing, v. 57, p.1058-1067, 2009.
P. Honeine, C. Richard, J. C. M. Bermudez, H. Snoussi, M. Essoloh and F.Vincent, “Functional estimation in Hilbert space for distributed learning in wirelesssensor networks,” In Proc. IEEE International Conference on Acoustics Speech andSignal Processing (ICASSP), Taipei, Taiwan, p. 2861-2864, 2009.
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 80 / 107
Recursive Least Squares Algorithm (RLS)
Based on a deterministic philosophy
Designed for the present realization of the input signals
Least squares method adapted for real time processing of temporalseries
Convergence speed is not strongly dependent on the input statistics
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 82 / 107
RLS – Problem Definition
Define
xo(n) = [x(n), x(n − 1), . . . , x(0)]T
x(n) = [x(n), x(n − 1), . . . , x(n − N + 1)]T input to the N-tap filter
Desired signal d(n)
Estimator of d(n): y(n) = xT (n)w(n)
Cost function – Squared Error
Jls(n) =n∑
k=0
e2(k) =n∑
k=0
[d(k) − xT (k)w(n)]2
= eT (n)e(n), e(n) = [e(n), e(n − 1), . . . , e(0)]T
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 83 / 107
In vector form
d(n) = [d(n), d(n − 1), . . . , d(0)]T
y(n) = [y(n), y(n − 1), . . . , y(0)]T , y(k) = xT (k)w(n)
y(n) =
N−1∑
k=0
wk(n)xo(n − k) = Θ(n)w(n)
Θ(n) =
x(n) x(n − 1) · · · x(n − N + 1)x(n − 1) x(n − 2) · · · x(n − N)
......
......
x(0) x(−1) · · · x(−N + 1)
= [xo(n),xo(n − 1), . . . xo(n − N + 1)]
e(n) = d(n) − Θ(n)w(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 84 / 107
e(n) = d(n) − Θ(n)w(n)
and we minimizeJls(n) = ‖e(n)‖2
Normal Equations
ΘT (n)Θ(n)w(n) = Θ
T (n)d(n)
R(n)w(n) = p(n)
Alternative representation for R and p
R(n) = ΘT (n)Θ(n) =
n∑
k=0
x(k)xT (k)
p(n) = ΘT (n)d(n) =
n∑
k=0
x(k)d(k)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 85 / 107
RLS – Observations
w(n) remains fixed for 0 ≤ k ≤ n to determine Jls(n)
Optimum vector w(n) is wo(n) = R−1
(n)p
The algorithm has growing memory
In an adaptive implementation,
wo(n) = R−1
(n)p
must be solved for each iteration n
Problems for an adaptive implementation◮ Dificulties with nonstationary signals (infinite data window)◮ Computational complexity
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 86 / 107
RLS - Forgetting the Past
Modified cost function
Jls(n) =
n∑
k=0
λn−ke2(k) =
n∑
k=0
λn−k[d(k) − xT (k)w(n)]2
= eT (n)Λe(n), Λ = diag[1, λ, λ2, . . . , λn]
Modified Normal Equations
ΘT (n)ΛΘ(n)w(n) = Θ
T (n)Λd(n)
R(n)w(n) = p(n)
where
R(n) = ΘT (n)ΛΘ(n) =
n∑
k=0
λn−kx(k)xT (k)
p(n) = ΘT (n)Λd(n) =
n∑
k=0
λn−kx(k)d(k)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 87 / 107
RLS – Recursive Updating
Correlation Matrix
R(n) =
n∑
k=0
λn−kx(k)xT (k)
=
n−1∑
k=0
λn−kx(k)xT (k) + x(n)xT (n)
= λR(n − 1) + x(n)xT (n)
Cross-correlation vector
p(n) =
n∑
k=0
λn−kx(k)d(k)
=
n−1∑
k=0
λn−kx(k)d(k) + x(n)d(n)
= λp(n − 1) + x(n)d(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 88 / 107
We have recursive updating expressions for R(n) and p(n)
However, we need a recursive updating for R−1
(n), as
wo(n) = R−1
(n)p(n)
Applying the matrix inversion lemma to
R(n) = λR(n − 1) + x(n)xT (n)
and defining P (n) = R−1
(n) yields
P (n) = λ−1P (n − 1) −λ−2P (n − 1)x(n)xT (n)P (n − 1)
1 + λ−1xT (n)P (n − 1)x(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 89 / 107
The Gain Vector
Definition
k(n) =λ−1P (n − 1)x(n)
1 + λ−1xT (n)P (n − 1)x(n)
Using this definition
P (n) = λ−1P (n − 1) − λ−1k(n)xT (n)P (n − 1)
k(n) can be written as
k(n) =
P(n)︷ ︸︸ ︷[λ−1P (n − 1) − λ−1k(n)xT (n)P (n − 1)
]x(n)
= R−1
(n)x(n)
➪ k(n) is the repres. of x(n) on the column space of R(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 90 / 107
RLS – Recursive Weight Update
w(n + 1) = wo(n)
Using
w(n + 1) = R−1
(n)p(n) = P (n)p(n)
p(n) = λp(n − 1) + x(n)d(n)
P (n) = λ−1P (n − 1) − λ−1k(n)xT (n)P (n − 1)
k(n) = P (n)x(n)
e(n) = d(n) − xT (n)w(n)
yields
w(n + 1) = w(n) + k(n)e(n) RLS weight update
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 91 / 107
RLS – Observations
The RLS convergence speed is not affected by the eigenvalues ofR(n)
Initialization: p(0) = 0, R(0) = δI, δ ≃ (1 − λ)σ2x
This results in R(n) changed to λnδI + R(n)➪ Biased estimate of wo(n) for small n(No problem for λ < 1 and large n)
Numerical problems in finite precision◮ Unstable for λ = 1◮ Loss of symmetry in P (n)
⋆ Evaluate only upper (lower) part and diagonal of P (n)⋆ Replace P (n) with [P (n) + P T (n)]/2 after updating from P (n − 1)
Numerical problems when x(n) → 0 and λ < 1
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 92 / 107
Some Research Results
C. Ludovico and J. C. M. Bermudez, “A recursive least squaresalgorithm robust to low- power excitation,” in. Proc. 2004 IEEEInternational Conference on Acoustics, Speech and Signal Processing(ICASSP-2004), Montreal, Canada, vol. II, p. 673-677, 2004.
C. Ludovico and J. C. M. Bermudez, “An improved recursive leastsquares algorithm robust to input power variation,” in Proc. IEEEStatistical Signal Processing Workshop, Bordeaux, France, 2005.
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 93 / 107
Performance Comparison
System identification, N=100
Input – AR(1) x(n) = 0.9x(n − 1) + v(n)
AP algorithm with P = 2
Step sizes designed for equal LMS and NLMS performances
Impulse response to be estimated
0 10 20 30 40 50 60 70 80 90 100−0.1
−0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
sample
impu
lse
resp
onse
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 94 / 107
Excess Mean Square Error - LMS, NLMS and AP
0 1000 2000 3000 4000 5000 6000−80
−70
−60
−50
−40
−30
−20
−10
0
10
iteration
EM
SE
(dB
)
EMSE for LMS, NLMS and AP(2)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 95 / 107
Excess Mean Square Error - LMS, NLMS, AP and RLS
0 1000 2000 3000 4000 5000 6000−80
−70
−60
−50
−40
−30
−20
−10
0
10
iteration
EM
SE
(dB
)
EMSE for LMS, NLMS, AP(2) and RLS
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 96 / 107
Performance Comparisons
Computational Complexity
Adaptive filter with N real coefficients and real signals
For the AP algorithm, K = P + 1
Algorithm × + /
LMS 2N + 1 2N
NLMS 3N + 1 3N 1
AP (K2 + 2K)N + K3 + K (K2 + 2K)N + K3 + K2
RLS N2 + 5N + 1 N2 + 3N 1
For N = 100, P = 2
Algorithm × + / ≃ factor
LMS 201 200 1
NLMS 301 300 1 1.5
AP 1, 530 1, 536 7.5
RLS 10, 501 10, 300 1 52.5
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 97 / 107
Typical values for acoustic echo cancellation (N = 1024, P = 2)
Algorithm × + / ≃ factor
LMS 2, 049 2, 048 1
NLMS 3, 073 3, 072 1 1.5
AP 15, 390 15, 396 7.5
RLS 1, 053, 697 1, 051, 648 1 514
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 98 / 107
How to Deal with Computational Complexity?
Not an easy task!!!
There are ”fast” versions for some algorithms (especially RLS)
What is usually not said is that ... speed can bring◮ Instability◮ Increased need for memory
➪ Most applications rely on simple solutions
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 99 / 107
Understanding the Adaptive Filter Behavior
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 100 / 107
What are Adaptive Filters?Adaptive filters are, by design, systems that are
Time-variant (w(n) is time-variant)
Nonlinear (y(n) is a nonlinear function of x(n))
Stochastic (w(n) is random)
LMS:
w(n + 1) = w(n) + µe(n)x(n)
=[I − µ x(n)xT (n)
]x(n) + µ d(n)x(n)
y(n) = xT (n)w(n)
They are difficult to understand
Different applications and signals require different analyses
Simplifying assumptions are necessary
➪ Good design requires good analytical models.Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 101 / 107
LMS AnalysisBasic equations:
d(n) = xT (n)wo + z(n)
e(n) = d(n) − xT (n)w(n)
w(n + 1) = w(n) + µ e(n)x(n)
v(n) = w(n) − wo (study about the optimum weight vector)
Mean Weight Behavior:Neglecting dependence between v(n) and x(n)xT (n),
E[v(n + 1)] = (I − µRxx)E[v(n)]
Follows the steepest descent trajectory
Convergence condition: 0 < µ < 2/λmax
limn→∞
E[w(n)] = wo
However: w(n) is random➪ We need to study its fluctuations about E[w(n)] → MSE
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 102 / 107
The Mean-Square Estimation Error
e(n) = eo(n) − xT (n)v(n), eo(n) = d(n) − xT (n)wo
Square the error equation
Take the expected value
Neglect the statistical dep. between v(n) and x(n)xT (n)
Jms(n) = E[e2o(n)] + Tr{RxxK(n)}, K(n) = E[v(n)vT (n)]
This expression is independent of the adaptive algorithm
Effect of the alg. on Jms(n) is determined by K(n)
Jmsmin= E[e2
o(n)]
Jmsex = Tr{RxxK(n)}
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 103 / 107
The Behavior of K(n) for LMS
Using the basic equations
v(n + 1) =[I + µ x(n)xT (n)
]v(n) + µ eo(n)x(n)
Post-multiply by vT (n + 1)
Take the expected value assuming v(n) and x(n)xT (n) independent
Evaluate the expectations◮ Higher order moments of x(n) require input pdf
Assuming Gaussian inputs
K(n + 1) =K(n) − µ [RxxK(n) + K(n)Rxx]
+ µ2 {RxxTr[RxxK(n)] + 2RxxK(n)Rxx}
+ µ2 RxxJmsmin
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 104 / 107
Design Guidelines Extrated from the Model
Stability
0 < µ <m
Tr[Rxx]; m = 2 or m = 2/3
Jms(n) converges as a function of
[(1 − µλi)
2 + 2µ2λ2i
]n, i = 1, . . . , N
Steady-State
Jmsex(∞) ≃µ
2Tr[Rxx]Jmsmin
M(∞) =Jmsex(∞)
Jmsmin
(MSE Misadjustment)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 105 / 107
Model Evaluation
x(n) = α x(n − 1) + v(n)
0 1000 2000 3000 4000 5000 6000−80
−70
−60
−50
−40
−30
−20
−10
0
10
EMSE. Blue: Simulation. Red: Analytical Model
EM
SE
LMS − Excess Mean Square Error
Input: White noise
0 1000 2000 3000 4000 5000 6000−80
−70
−60
−50
−40
−30
−20
−10
0
10
EMSE. Blue: Simulation. Red: Analytical Model
EM
SE
LMS − Excess Mean Square Error
Input: AR(1), α = 0.8
Linear system identification - FIR with 100 coefficients
Step-size µ = 0.05
Noise power σ2v = 10−6
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 106 / 107