Part 5 - Prediction Error Methods
description
Transcript of Part 5 - Prediction Error Methods
System IdentificationControl Engineering B.Sc., 3rd yearTechnical University of Cluj-Napoca
Romania
Lecturer: Lucian Busoniu
ARX models Model structures General PEM Implementation & Extensions
Part V
Prediction error methods
ARX models Model structures General PEM Implementation & Extensions
Table of contents
1 Identification of ARX models
2 Model structures
3 General prediction error methods
4 Implementation and extensions
We stay in the single-output, single-input case for all but the last section.
ARX models Model structures General PEM Implementation & Extensions
Classification
Recall Types of models from Part I:
1 Mental or verbal models2 Graphs and tables (nonparametric)3 Mathematical models, with two subtypes:
First-principles, analytical modelsModels from system identification
Prediction error methods produce parametric, polynomial models.
ARX models Model structures General PEM Implementation & Extensions
Table of contents
1 Identification of ARX models
Analytical development
Theoretical guarantee
Matlab example
2 Model structures
3 General prediction error methods
4 Implementation and extensions
ARX models Model structures General PEM Implementation & Extensions
Recall: discrete-time model
ARX models Model structures General PEM Implementation & Extensions
ARX model structure
We consider first the ARX model structure, where the output y(k) atthe current discrete time step is computed based on previous inputand output values:
y(k) + a1y(k − 1) + a2y(k − 2) + . . . + anay(k − na)
= b1u(k − 1) + b2u(k − 2) + . . . + bnbu(k − nb) + e(k)
equivalent toy(k) = −a1y(k − 1)− a2y(k − 2)− . . .− anay(k − na)
b1u(k − 1) + b2u(k − 2) + . . . + bnbu(k − nb) + e(k)
e(k) is the noise at step k .
Model parameters: a1, a2, . . . , ana and b1, b2, . . . , bnb.
Name: AutoRegressive (y(k) a function of previous y values) witheXogenous input (dependence on u)
ARX models Model structures General PEM Implementation & Extensions
Polynomial representation
Backward shift operator q−1:
q−1z(k) = z(k − 1)
where z(k) is any discrete-time signal.
Then:
y(k) + a1y(k − 1) + a2y(k − 2) + . . . + anay(k − na)
= (1 + a1q−1 + a2q−2 + . . . + anaq−na)y(k) =: A(q−1)y(k)
and:b1u(k − 1) + b2u(k − 2) + . . . + bnbu(k − nb)
= (b1q−1 + b2q−2 + . . . + bnbq−nb)u(k) =: B(q−1)u(k)
ARX models Model structures General PEM Implementation & Extensions
ARX model in polynomial form
Therefore, the ARX model is written compactly:
A(q−1)y(k) = B(q−1)u(k) + e(k)
The symbolic representation in the figure holds because:
y(k) =1
A(q−1)[B(q−1)u(k) + e(k)]
Remark: The ARX model is quite general, it can describe arbitrarylinear relationships between inputs and outputs. However, the noiseenters the model in a restricted way, and later we introduce modelsthat generalize this.
ARX models Model structures General PEM Implementation & Extensions
Linear regression model
Returning to the explicit recursive representation:
y(k) =− a1y(k − 1)− a2y(k − 2)− . . .− anay(k − na)
b1u(k − 1) + b2u(k − 2) + . . . + bnbu(k − nb) + e(k)
=[−y(k − 1) · · · −y(k − na) u(k − 1) · · · u(k − nb)
]·[a1 · · · ana b1 · · · bnb
]>+ e(k)
=:ϕ>(k)θ + e(k)
So in fact ARX obeys the standard model structure in linearregression!
Regressor vector: ϕ ∈ Rna+nb, previous output and input values.Parameter vector: θ ∈ Rna+nb, polynomial coefficients.
ARX models Model structures General PEM Implementation & Extensions
Identification problem
Consider now that we are given a vector of data u(k), y(k),k = 1, . . . , N, and we have to find the model parameters θ.
Then for any k :y(k) = ϕ>(k)θ + ε(k)
where ε(k) is now interpreted as an equation error.
Objective: minimize the mean squared error:
V (θ) =1N
N∑k=1
ε(k)2
Remark: When k ≤ na, nb, zero- and negative-time values for u andy are needed to construct ϕ. They can be taken 0 (assuming thesystem is in zero initial conditions).
ARX models Model structures General PEM Implementation & Extensions
Linear system
y(1) =[−y(−1) · · · −y(−na) u(−1) · · · u(−nb)
]θ
y(2) =[−y(0) · · · −y(1− na) u(0) · · · u(1− nb)
]θ
· · ·y(N) =
[−y(N − 1) · · · −y(N − na) u(N − 1) · · · u(N − nb)
]θ
Matrix form:y(1)y(2)
...y(N)
=
−y(−1) · · · −y(−na) u(−1) · · · u(−nb)−y(0) · · · −y(1− na) u(0) · · · u(1− nb)
......
......
−y(N − 1) · · · −y(N − na) u(N − 1) · · · u(N − nb)
· θY = Φθ
with notations Y ∈ RN and Φ ∈ RN×(na+nb).
ARX models Model structures General PEM Implementation & Extensions
ARX solution
From linear regression, to minimize 12
∑Nk=1 ε(k)2 the parameters are:
θ = (Φ>Φ)−1Φ>Y
Since the new V (θ) = 1N
∑Nk=1 ε(k)2 is proportional to the criterion
above, the same solution also minimizes V (θ).
However, the form above is impractical in system identification, sincethe number of data points N can be very large. Better form:
Φ>Φ =N∑
k=1
ϕ(k)ϕ>(k), Φ>Y =N∑
k=1
ϕ(k)y(k)
⇒ θ =
[N∑
k=1
ϕ(k)ϕ>(k)
]−1 [N∑
k=1
ϕ(k)y(k)
]
ARX models Model structures General PEM Implementation & Extensions
ARX solution (continued)
Remaining issue: the sum of N terms can grow very large, leading tonumerical problems: (matrix of very large numbers)−1· vector of verylarge numbers.
Solution: Normalize element values by diving them by N. Inequations, N simplifies so it has no effect on the analyticaldevelopment, but in practice it keeps the numbers reasonable.
θ =
[1N
N∑k=1
ϕ(k)ϕ>(k)
]−1 [1N
N∑k=1
ϕ(k)y(k)
]
ARX models Model structures General PEM Implementation & Extensions
Table of contents
1 Identification of ARX models
Analytical development
Theoretical guarantee
Matlab example
2 Model structures
3 General prediction error methods
4 Implementation and extensions
ARX models Model structures General PEM Implementation & Extensions
Main result
Assumptions1 There exists a true parameter vector θ0 so that:
y(k) = ϕ(k)θ0 + v(t)
with v(k) a stationary stochastic process independent from u(k).2 E
{ϕ(k)ϕ>(k)
}is a nonsingular matrix.
3 E {ϕ(k)v(k)} = 0.
Theorem
ARX identification is consistent: the estimated parameters θ tend tothe true parameters θ0 as N →∞.
ARX models Model structures General PEM Implementation & Extensions
Discussion of assumptions
1 Assumption 1 is equivalent to the existence of true polynomialsA0(q−1), B0(q−1) so that:
A0(q−1)y(k) = B0(q−1)u(k) + v(k)
To motivate Assumption 2, recall
θ =[
1N
∑Nk=1 ϕ(k)ϕ>(k)
]−1 [1N
∑Nk=1 ϕ(k)y(k)
]As N →∞, 1
N
∑Nk=1 ϕ(k)ϕ>(k) → E
{ϕ(k)ϕ>(k)
}.
2 E{ϕ(k)ϕ>(k)
}is nonsingular if the data is “sufficiently
informative” (e.g., u(k) should not be a simple feedback fromy(k); see Soderstrom & Stoica for more discussion).
3 E {ϕ(k)v(k)} = 0 if v(k) is white noise. For more details onAssumption 3 and the role of E {ϕ(k)v(k)} = 0, see Part VI.
ARX models Model structures General PEM Implementation & Extensions
Table of contents
1 Identification of ARX models
Analytical development
Theoretical guarantee
Matlab example
2 Model structures
3 General prediction error methods
4 Implementation and extensions
ARX models Model structures General PEM Implementation & Extensions
Experimental data
Consider we are given the following, separate, identification andvalidation data sets.
plot(id); and plot(val);
Remarks: Identification input: a so-called pseudo-random binarysignal, an approximation of (non-zero-mean) white noise. Validationinput: a sequence of steps.
ARX models Model structures General PEM Implementation & Extensions
Identifying an ARX modelmodel = arx(id, [na, nb, nk]);
Arguments:
1 Identification data.2 Array containing the orders of A and B and the delay nk .
Structure slightly different from theory: includes the explicit minimumdelay nk between inputs and outputs, useful to model systems withtime delays.
y(k) + a1y(k − 1)+a2y(k − 2) + . . . + anay(k − na)
= b1u(k − nk)+b2u(k − nk − 1) + . . . + bnbu(k − nk − nb + 1) + e(k)
A(q−1)y(k) = B(q−1)u(k − nk) + e(k), where:
A(q−1) = (1 + a1q−1 + a2q−2 + . . . + anaq−na)
B(q−1) = (b1 + b2q−1 + bnbq−nb+1)
Note: can transform this into theoretical structure by rewriting theright-hand side using a B(q−1) of order nk + nb − 1, with nk − 1leading zeros.
ARX models Model structures General PEM Implementation & Extensions
Model validation
Assuming the system is second-order with no time delay, we takena = 2, nb = 1, nk = 1. Validation: compare(model, val);
Results are quite bad.
ARX models Model structures General PEM Implementation & Extensions
Structure selection
Better idea: try many different structures and choose the best one.
Na = 1:15;Nb = 1:15;Nk = 1:5;NN = struc(Na, Nb, Nk);V = arxstruc(id, val, NN);
struc generates all combinations of orders in Na, Nb, Nk.arxstruc identifies for each combination an ARX model (on thedata in 1st argument), simulates it (on the data in the 2ndargument), and returns all the MSEs on the first row of V (seehelp arxstruc for the format of V).
ARX models Model structures General PEM Implementation & Extensions
Structure selection (continued)
To choose the structure with the smallest MSE:N = selstruc(V, 0);
For our data, N= [8, 7, 1].
Alternatively, graphical selection: N = selstruc(V, ’plot’);
(Later we learn other structure selection criteria than smallest MSE.)
ARX models Model structures General PEM Implementation & Extensions
Validation of best ARX model
model = arx(id, N); compare(model, val);
A better fit is obtained.
ARX models Model structures General PEM Implementation & Extensions
Table of contents
1 Identification of ARX models
2 Model structures
3 General prediction error methods
4 Implementation and extensions
ARX models Model structures General PEM Implementation & Extensions
Motivation
Before deriving the general PEM, we will introduce the general classof models to which it can be applied.
ARX models Model structures General PEM Implementation & Extensions
General model structure
y(k) = G(q−1)u(k) + H(q−1)e(k)
where G and H are discrete-time transfer functions – ratios ofpolynomials. Signal e(t) is zero-mean white noise.
By placing the common factors in the denominators of G and H inA(q−1), we get the more detailed form:
A(q−1)y(k) =B(q−1)
F (q−1)u(k) +
C(q−1)
D(q−1)e(k)
where A, B, C, D, F are all polynomials, of orders na, nb, nc, nd , nf .
ARX models Model structures General PEM Implementation & Extensions
General model structure (continued)
A(q−1)y(k) =B(q−1)
F (q−1)u(k) +
C(q−1)
D(q−1)e(k)
Very general form, all other linear forms are special cases of this. Notfor practical use, but to describe algorithms in a generic way.
ARX models Model structures General PEM Implementation & Extensions
ARMAX model structure
Setting F = D = 1 (i.e. orders nf = nd = 0), we get:
A(q−1)y(k) = B(q−1)u(k) + C(q−1)e(k)
Name: AutoRegressive, Moving Average (referring to noise model)with eXogenous input (dependence on u)
ARX models Model structures General PEM Implementation & Extensions
ARMAX: explicit form
A(q−1)y(k) = B(q−1)u(k) + C(q−1)e(k)
A(q−1) = 1 + a1q−1 + . . . + anaq−na
B(q−1) = b1q−1 + . . . + bnbq−nb
C(q−1) = 1 + c1q−1 + . . . + cncq−nc
y(k) + a1y(k − 1) + . . . + anay(k − na)
= b1u(k − 1) + . . . + bnbu(k − nb)
+ e(k) + c1e(k − 1) + . . . + cnce(k − nc)
with parameter vector:
θ = [a1, . . . , ana, b1, . . . , bnb, c1, . . . , cnc ]> ∈ Rna+nb+nc
Compared to ARX, ARMAX can model more intricate relationshipsbetween the disturbance and the inputs and outputs.
ARX models Model structures General PEM Implementation & Extensions
Special case of ARMAX: ARX
Setting C = 1 in ARMAX (nc = 0), we get:
A(q−1)y(k) = B(q−1)u(k) + e(k)
precisely the ARX model we worked with before.
ARX models Model structures General PEM Implementation & Extensions
Special case of ARX: FIR
Further setting A = 1 (na = 0) in ARX, we get:
y(k) = B(q−1)u(k) + e(k) =nb∑j=1
bju(k − j) + e(k)
=M−1∑j=0
h(j)u(k − j) + e(k)
the FIR model from correlation analysis!
(where h(0),the impulse response at time 0, is assumed 0 – i.e.system does not respond instantaneously to changes in input)
ARX models Model structures General PEM Implementation & Extensions
Difference between ARX and FIR
ARX: A(q−1)y(k) = B(q−1)u(k) + e(k)
FIR: y(k) = B(q−1)u(k) + e(k)
Since ARX includes recursive relationships between current andprevious outputs, it will be sufficient to take orders na and nb (atmost) equal to the order of the dynamical system.
FIR needs a sufficiently large order nb (or length M) to model theentire transient regime of the impulse response (in principle, we onlyrecover the correct model as M →∞).
⇒ more parameters ⇒ more data needed to identify them.
ARX models Model structures General PEM Implementation & Extensions
Overall relationship
General Form ⊃ ARMAX ⊃ ARX ⊃ FIR
ARMAX to ARX: Less freedom in modeling disturbance.ARX to FIR: More parameters required.
ARX models Model structures General PEM Implementation & Extensions
Output error
Other model forms are possible that are not special cases of ARMAX,e.g. Output Error, OE:
y(k) =B(q−1)
F (q−1)u(k) + e(k)
obtained for na = nc = nd = 0, i.e. A = C = D = 1.
This corresponds to simple, additive measurement noise on theoutput (the “output error”).
ARX models Model structures General PEM Implementation & Extensions
Table of contents
1 Identification of ARX models
2 Model structures
3 General prediction error methods
Stepping stones: ARX and ARMAX
General case
Matlab example
Theoretical guarantees
4 Implementation and extensions
ARX models Model structures General PEM Implementation & Extensions
ARX reinterpreted as a PEM
1 Compute predictions at each step, y(k) = ϕ>(k)θ givenparameters θ.
2 Compute prediction errors at each step, ε(k) = y(k)− y(k).3 Find a parameter vector θ minimizing criterion
V (θ) = 1N
∑Nk=1 ε2(k).
Prediction error methods are obtained by extending this procedure togeneral model structures.
ARX models Model structures General PEM Implementation & Extensions
ARX reinterpreted as a PEM (continued)
Remarks:
ARX predictor y(k) is chosen to achieve the error ε(k) = e(k).We will aim to achieve the same error in the general PEM,intuitively because we cannot do better.For the mathematically inclined: Such a predictor is optimal in the senseof minimizing the variance of ε(k).
The prediction error is for ARX just a rearrangement of theequation y(k) = ϕ>(k)θ + ε(k) = y(k) + ε(k).The criterion can be more general than the MSE, but we willalways use MSE in this section. For ARX, we already know howto minimize the MSE (from linear regression); for more generalmodels new methods will be introduced.
ARX models Model structures General PEM Implementation & Extensions
Intermediate step: PEM for 1st order ARMAX
To make things easier to follow, we first derive PEM for a 1st orderARMAX.
Looking at slide ARMAX: explicit form, we write 1st order ARMAX in aslightly different way:
y(k) = −ay(k − 1) + bu(k − 1) + ce(k − 1) + e(k)
ARX models Model structures General PEM Implementation & Extensions
1st order ARMAX: predictor
To achieve error e(k), the predictor must be:
y(k) = −ay(k − 1) + bu(k − 1) + ce(k − 1) (5.1)
This depends on unknown noise e(k − 1). However, we derive arecursive predictor formula that does not.
y(k − 1) = −ay(k − 2) + bu(k − 2) + ce(k − 2) (5.2)
From Eqn. (5.1) +c Eqn. (5.2):
y(k) + cy(k − 1)
=− ay(k − 1) + bu(k − 1) + ce(k − 1)
+ c(−ay(k − 2) + bu(k − 2) + ce(k − 2))
=− ay(k − 1) + bu(k − 1) + ce(k − 1)
+ c(−ay(k − 2) + bu(k − 2) + ce(k − 2) + e(k − 1)− e(k − 1))
=− ay(k − 1) + bu(k − 1) + ce(k − 1) + cy(k − 1)− ce(k − 1)
=(c − a)y(k − 1) + bu(k − 1)
ARX models Model structures General PEM Implementation & Extensions
1st order ARMAX: predictor (continued)
Final recursion:
y(k) = −cy(k − 1) + (c − a)y(k − 1) + bu(k − 1)
Requires initialization at y(0); this initial value is usually taken 0.
This initialization needs some technical conditions to work properly, we do notgo into them here.
ARX models Model structures General PEM Implementation & Extensions
1st order ARMAX: prediction error
Similar recursion:
ε(k) = y(k)− y(k)
ε(k − 1) = y(k − 1)− y(k − 1)
⇒ ε(k) + cε(k − 1) = y(k) + cy(k − 1)− (y(k) + cy(k − 1))
= y(k) + cy(k − 1)− ((c − a)y(k − 1) + bu(k − 1))
= y(k) + ay(k − 1)− bu(k − 1)
⇒ ε(k) = −cε(k − 1) + y(k) + ay(k − 1)− bu(k − 1)
Requires initialization of ε(0), usually taken 0.
ARX models Model structures General PEM Implementation & Extensions
Finding the parameters
Once the errors are available, the parameter θ is found by minimizingcriterion V (θ) = 1
N
∑Nk=1 ε2(k).
ARX models Model structures General PEM Implementation & Extensions
Table of contents
1 Identification of ARX models
2 Model structures
3 General prediction error methods
Stepping stones: ARX and ARMAX
General case
Matlab example
Theoretical guarantees
4 Implementation and extensions
ARX models Model structures General PEM Implementation & Extensions
Recall: General model structure
y(k) = G(q−1)u(k) + H(q−1)e(k)
where G and H are discrete-time transfer functions.
ARX models Model structures General PEM Implementation & Extensions
Prediction error
We start by deriving e(k).
y(k) = G(q−1)u(k) + H(q−1)e(k)
⇒ e(k) = H−1(q−1)(y(k)−G(q−1)u(k))
where H−1 is the inverse of the fraction of polynomials H.
Recall that predictor will be derived so that the prediction errorε(k) = y(k)− y(k) = e(k), so the formula above can also be used tocompute ε(k).
ARX models Model structures General PEM Implementation & Extensions
Predictor
To achieve error e(k), the predictor must be:
y(k) = y(k)− e(k) = Gu(k) + (H − 1)e(k)
= Gu(k) + (H − 1)H−1(y(k)−Gu(k))
= Gu(k) + (1− H−1)(y(k)−Gu(k))
= Gu(k) + (1− H−1)y(k)−Gu(k) + H−1Gu(k))
= (1− H−1)y(k) + H−1Gu(k)
=: L1y(k) + L2u(k)
where we skipped q−1 to make the equations readable.
Remarks:
New notations L1(q−1) = 1− H−1(q−1),L2(q−1) = H−1(q−1)G(q−1).In order to have a causal predictor (which only depends on pastvalues of the output and input), we need G(0) = 0 and H(0) = 1,leading to L1(0) = L2(0) = 0.
ARX models Model structures General PEM Implementation & Extensions
1st order ARMAX: finding the parameters
Once the predictors and errors are available, the parameter θ is foundby minimizing criterion V (θ) = 1
N
∑Nk=1 ε2(k).
Again, linear regression will not work in general. Computationalmethods to solve the minimization problem will be introduced in thenext section.
ARX models Model structures General PEM Implementation & Extensions
Specializing the framework to ARX
It is instructive to see how the formulas simplify in the ARX case.Rewriting ARX in the general model template:
y(k) = Gu(k) + He(k) =BA
u(k) +1A
e(k)
We have:
L1 = 1− H−1 = 1− A, L2 = H−1G = By(k) = L1y(k) + L2y(k) = (1− A)y(k) + Bu(k) = ϕ(k)θ
ε(k) = H−1(y(k)−Gu(k)) = Ay(k)− Bu(k)
which can be easily seen to be equivalent with the ARX formulation.
ARX models Model structures General PEM Implementation & Extensions
Specializing the framework to 1st order ARMAX
Homework! Plug in the polynomials for 1st order ARMAX and verifythat you obtain the same result as before.
ARX models Model structures General PEM Implementation & Extensions
Table of contents
1 Identification of ARX models
2 Model structures
3 General prediction error methods
Stepping stones: ARX and ARMAX
General case
Matlab example
Theoretical guarantees
4 Implementation and extensions
ARX models Model structures General PEM Implementation & Extensions
Experimental data
Consider again the experimental data on which ARX was applied.plot(id); and plot(val);
Recall theoretical ARMAX:
A(q−1)y(k) = B(q−1)u(k) + C(q−1)e(k)
ARX models Model structures General PEM Implementation & Extensions
Identifying an ARMAX modelmARMAX = armax(id, [na, nb, nc, nk]);
Arguments:
1 Identification data.2 Array containing the orders of A, B, C and the delay nk .
Like for ARX, structure includes the explicit minimum delay nkbetween inputs and outputs.
y(k) + a1y(k − 1) + a2y(k − 2) + . . . + anay(k − na)
= b1u(k − nk) + b2u(k − nk − 1) + . . . + bnbu(k − nk − nb + 1)
+ e(k) + c1e(k − 1) + c2e(k − 2) + . . . + cnce(k − nc)
A(q−1)y(k) = B(q−1)u(k − nk) + C(q−1)e(k), where:
A(q−1) = (1 + a1q−1 + a2q−2 + . . . + anaq−na)
B(q−1) = (b1 + b2q−1 + bnbq−nb+1)
C(q−1) = (1 + c1q−1 + c2q−2 + . . . + cncq−nc)
Remark: As for ARX, can transform into theoretical structure by usinga new B(q−1) of order nk + nb − 1, with nk − 1 leading zeros.
ARX models Model structures General PEM Implementation & Extensions
ARMAX model validation
Considering the system is 2nd order with no time delay, take na = 2,nb = 2, nc = 2, nk = 1. Validation: compare(val, mARMAX);
In contrast to ARX, results are good. Flexible noise model pays off.
ARX models Model structures General PEM Implementation & Extensions
Identifying an OE model
Recall OE model structure:
y(k) =B(q−1)
F (q−1)u(k) + e(k)
ARX models Model structures General PEM Implementation & Extensions
Identifying an OE model (continued)
mOE = oe(id, [nb, nf, nk]);
Arguments:
1 Identification data.2 Array containing the orders of B, F , and the delay nk .
y(k) =B(q−1)
F (q−1)u(k − nk) + e(k), where:
B(q−1) = (b1 + b2q−1 + bnbq−nb+1)
F (q−1) = (1 + f1q−1 + f2q−2 + . . . + fnf q−nf )
Remark: Like before, can transform into theoretical structure bychanging B.
ARX models Model structures General PEM Implementation & Extensions
OE model validation
Considering the system is second-order with no time delay, we takenb = 2, nf = 2, nk = 1. Validation: compare(val, mOE);
Results as good as ARMAX. System seems to obey both modelstructures.
ARX models Model structures General PEM Implementation & Extensions
Table of contents
1 Identification of ARX models
2 Model structures
3 General prediction error methods
Stepping stones: ARX and ARMAX
General case
Matlab example
Theoretical guarantees
4 Implementation and extensions
ARX models Model structures General PEM Implementation & Extensions
Preliminaries: Vector derivative and Hessian
Consider any function V (θ), V : Rn → R. Then:
dVdθ
=
∂V∂θ1∂V∂θ2...
∂V∂θn
,d2Vdθ2 =
∂2V∂θ2
1
∂2V∂θ1∂θ2
. . . ∂2V∂θ1θn
∂2V∂θ2∂θ1
∂2V∂θ2
2. . . ∂2V
∂θ2θn
......
......
∂2V∂θn∂θ1
∂2V∂θnθ2
. . . ∂2V∂θ2
n
ARX models Model structures General PEM Implementation & Extensions
Assumptions
Assumptions (simplified)1 Signals u(k) and y(k) are stationary stochastic processes.2 The input signal u(k) is “sufficiently informative”.3 Transfer functions G(q−1) and H(q−1) depend smoothly on θ.4 The Hessian d2V
dθ2 is nonsingular.
ARX models Model structures General PEM Implementation & Extensions
Discussion of assumptions
Assumption 2 is informal, formal name is ‘persistently exciting’input (it comes later).Recall that G and H have the parameters as coefficients, wedenote them by G(q−1; θ) and G(q−1; θ) to make thedependence explicit. Assumption 3 ensures we can differentiateG and H w.r.t. θ.Note that dG
dθ , dHdθ are vectors of transfer functions!
Recall V (θ) = 1N
∑Nk=1 ε2(k), the MSE. Assumption 4 ensures V
is not “flat” around minima, so the minima are uniquelyidentifiable.
ARX models Model structures General PEM Implementation & Extensions
Guarantee
Theorem 1
Define the limit V∞(θ) = limN→∞ V (θ). Given Assumptions 1–4, theidentification solution θ = arg minθ V (θ) converges to a minimumpoint θ∗ of V∞(θ) as N →∞.
Remark: This is a type of consistency guarantee, in the limit ofinfinitely many data points.
ARX models Model structures General PEM Implementation & Extensions
Further assumptions
Assumptions (simplified)5 The true system satisfies the model structure chosen. This
means there exists at least one θ0 so that for any input u(k) andthe corresponding output y(k) of the true system, we have:
y(k) = G(q−1; θ0)u(k) + H(q−1; θ0)e(k)
with e(k) white noise.6 The input u(k) is independent from the noise e(k) (the
experiment is performed in open loop).
ARX models Model structures General PEM Implementation & Extensions
Additional guarantee
Theorem 2
Under Assumptions 1-6, θ converges to a true parameter vector θ0 asN →∞.
Remark: Also a consistency guarantee. Theorem 1 guaranteed aminimum-error solution, whereas Theorem 2 additionally says thissolution corresponds to the true system, if the system satisfies themodel structure.
ARX models Model structures General PEM Implementation & Extensions
Table of contents
1 Identification of ARX models
2 Model structures
3 General prediction error methods
4 Implementation and extensions
Solving the optimization problem
Multiple inputs and outputs
MIMO ARX: Matlab example
ARX models Model structures General PEM Implementation & Extensions
Optimization problem
Objective of identification procedure: Minimize mean squared error
V (θ) =1N
N∑k=1
ε(k)2
where ε(k) are the prediction errors. In the general case:ε(k) = H−1(q−1)(y(k)−G(q−1)u(k))
Solution: θ = arg minθ
V (θ)
So far we took this solution for granted and investigated its properties.While in ARX linear regression could be applied to find θ, in generalthis does not work. Main implementation question:
How to solve the optimization problem?
ARX models Model structures General PEM Implementation & Extensions
Minimization via derivative root
Consider first the scalar case, θ ∈ R.
Idea: at any minimum, the derivativef (θ) = dV
dθ is zero. So, find a root of f (θ).
Remarks:Care must be taken to find aminimum and not a maximum orinflexion. This can be checked withthe second derivative, d2V
dθ2 = dfdθ > 0.
We may also find a local minimumwhich is larger (worse) than theglobal one.
ARX models Model structures General PEM Implementation & Extensions
Newton’s method for root finding
Start from some initial point θ0.At iteration `, next point θ`+1 is the intersection between abscissaand tangent at f in current point θ`. By geometry arguments:
θ`+1 = θ` −f (θ`)df (θ`)
dθ
Remarks:
Notation df (θ`)dθ means the value of derivative df
dθ at point θ`.
The slope of the tangent is df (θ`)dθ .
θ`+1 is the “best guess” for root given current point θ`.
ARX models Model structures General PEM Implementation & Extensions
Newton’s method for the identification problem
Replace f (θ) by dVdθ to get back to optimization problem:
θ`+1 = θ` −dV (θ`)
dθd2V (θ`)
dθ2
Extend to θ ∈ Rn. Recall that dVdθ is now an n-vector, and d2V
dθ2 ann × n-matrix (the Hessian):
θ`+1 = θ` −[
d2V (θ`)
dθ2
]−1 dV (θ`)
dθ
Add a step size α` > 0:
θ`+1 = θ` − α`
[d2V (θ`)
dθ2
]−1 dV (θ`)
dθ
Remark:
The stepsize helps with the convergence of the method, e.g.because in identification V is noisy.
ARX models Model structures General PEM Implementation & Extensions
Stopping criterion
Algorithm can be stopped:
When the difference between consecutive parameter vectors issmall, e.g. maxn
i=1 |θi,`+1 − θi,`| smaller than some presetthreshold.
orWhen the number of iterations ` exceeds a preset maximum.
ARX models Model structures General PEM Implementation & Extensions
Computing the derivatives
V (θ) =1N
N∑k=1
ε(k)2
Keeping in mind that ε(k) depends on θ, from matrix calculus:
dVdθ
=2N
N∑k=1
ε(k)dε(k)
dθ
d2Vdθ2 =
2N
N∑k=1
dε(k)
dθ
[dε(k)
dθ
]>+
2N
N∑k=1
ε(k)d2ε(k)
dθ2
where:dε(k)
dθ is the vector derivative and d2ε(k)dθ2 the Hessian of ε(k).
dε(k)dθ [ dε(k)
dθ ]>
is an n × n matrix.
ARX models Model structures General PEM Implementation & Extensions
Gauss-Newton
Ignore the second term in the Hessian of V and just use the first term:
H =2N
N∑k=1
dε(k)
dθ
[dε(k)
dθ
]>leading to the Gauss-Newton algorithm:
θ`+1 = θ` − α`H−1 dV (θ`)
dθ
Motivation:
Quadratic form of H gives better algorithm behavior.Simpler computation.
ARX models Model structures General PEM Implementation & Extensions
Example: 1st order ARMAX
Recall model and prediction error for 1st order ARMAX:
y(k) = −ay(k − 1) + bu(k − 1) + ce(k − 1) + e(k)
ε(k) = −cε(k − 1) + y(k) + ay(k − 1)− bu(k − 1)
We need dε(k)dθ = [∂ε(k)
∂a , ∂ε(k)∂b , ∂ε(k)
∂c ]>
. Differentiating second equation:
∂ε(k)
∂a= −c
∂ε(k − 1)
∂a+ y(k − 1)
∂ε(k)
∂b= −c
∂ε(k − 1)
∂b− u(k − 1)
∂ε(k)
∂c= −c
∂ε(k − 1)
∂c− ε(k − 1)
So, ∂ε(k)∂a , ∂ε(k)
∂b , ∂ε(k)∂c are dynamical signals that can be computed
using the recursions above, starting e.g. from 0 initial values.
ARX models Model structures General PEM Implementation & Extensions
Example: 1st order ARMAX (continued)
Finally, the algorithm is implemented as follows:
Given current value of parameter vector θ`, apply the recursionsabove to find signals dε(k)
dθ , k = 1, . . . , n.
Plug dε(k)dθ into equations for dV
dθ , d2Vdθ2
Apply Newton (or Gauss-Newton) update formula to find θ`+1.
ARX models Model structures General PEM Implementation & Extensions
Table of contents
1 Identification of ARX models
2 Model structures
3 General prediction error methods
4 Implementation and extensions
Solving the optimization problem
Multiple inputs and outputs
MIMO ARX: Matlab example
ARX models Model structures General PEM Implementation & Extensions
MIMO system
So far we considered y(k) ∈ R, u(k) ∈ R,Single-Input, Single-Output (SISO) systems:
y(k) = G(q−1)u(k) + H(q−1)e(k)
Many systems are Multiple-Input, Multiple-Output (MIMO).E.g., aircraft. Inputs: throttle, aileron, elevator, rudder.Outputs: airspeed, roll, pitch, yaw.
ARX models Model structures General PEM Implementation & Extensions
General MIMO model
Consider y(k) ∈ Rny , u(k) ∈ Rnu. Model:
y(k) = G(q−1)u(k) + H(q−1)e(k)y1(k)y2(k)
...yny (k)
= G(q−1)
u1(k)u2(k)
...unu(k)
+ H(q−1)
e1(k)e2(k)
...eny (k)
Different from SISO:
noise e(k) ∈ Rny is a random vector of the same size as y ,white-noise (independent, zero-mean).G(q−1) is a matrix of size ny × nuH(q−1) is a matrix of size ny × ny
The matrix elements are fractions of polynomials in q−1.
ARX models Model structures General PEM Implementation & Extensions
MIMO ARX
A(q−1)y(k) = B(q−1)u(k) + e(k)
A(q−1) = I + A1q−1 + . . . + Anaq−na
B(q−1) = B1q−1 + . . . + Bnbq−nb
where I is the ny × ny identity matrix, A1, . . . , Ana ∈ Rny×ny ,B1, . . . , Bnb ∈ Rny×nu.
Put in the general template:
y(k) = A−1(q−1)B(q−1)u(k) + A−1(q−1)e(k)
ARX models Model structures General PEM Implementation & Extensions
MIMO ARX: concrete example
Take na = 1, nb = 2, ny = 2, nu = 3. Then:
A(q−1)y(k) = B(q−1)u(k) + e(k)
A(q−1) = I + A1q−1
= I +
[a11
1 a121
a211 a22
1
]q−1
B(q−1) = B1q−1 + B2q−2
=
[b11
1 b121 b13
1b21
1 b221 b23
1
]q−1 +
[b11
2 b122 b13
2b21
2 b222 b23
2
]q−2
ARX models Model structures General PEM Implementation & Extensions
MIMO ARX: concrete example (continued)
([1 00 1
]+
[a11
1 a121
a211 a22
1
]q−1
) [y1(k)y2(k)
]
=
([b11
1 b121 b13
1b21
1 b221 b23
1
]q−1 +
[b11
2 b122 b13
2b21
2 b222 b23
2
]q−2
) u1(k)u2(k)u3(k)
+
[e1(k)e2(k)
]
Explicit relationship:
y1(k) + a111 y1(k − 1) + a12
1 y2(k − 1)
= b111 u1(k − 1) + b12
1 u2(k − 1) + b131 u3(k − 1)
+ b112 u1(k − 2) + b12
2 u2(k − 2) + b132 u3(k − 2) + e1(k)
y2(k) + a211 y1(k − 1) + a22
1 y2(k − 1)
= b211 u1(k − 1) + b22
1 u2(k − 1) + b231 u3(k − 1)
+ b212 u1(k − 2) + b22
2 u2(k − 2) + b232 u3(k − 2) + e2(k)
ARX models Model structures General PEM Implementation & Extensions
General error and predictor
The derivation mirrors that in the SISO case:
y(k) = Gu(k) + He(k)
⇒ e(k) = ε(k) = H−1(y(k)−Gu(k))
y(k) = y(k)− e(k) = Gu(k) + (H − I)e(k)
= Gu(k) + (H − I)H−1(y(k)−Gu(k))
= Gu(k) + (I − H−1)(y(k)−Gu(k))
= Gu(k) + (I − H−1)y(k)−Gu(k) + H−1Gu(k))
= (I − H−1)y(k) + H−1Gu(k)
=: L1y(k) + L2u(k)
where technical conditions must be satisfied, e.g. to ensure that H−1
is meaningful.
ARX models Model structures General PEM Implementation & Extensions
PEM criterion
Important difference between SISO and MIMO: identification criterion.The SISO criterion, the MSE:
V (θ) =1N
N∑k=1
ε(k)2
is no longer appropriate, since ε(k) is an ny -vector.
Define R(θ) = 1N
∑Nk=1 ε(k)ε>(k), an ny × ny matrix. Two options for
the new criterion:V (θ) = trace(R(θ))
V (θ) = det(R(θ))
Assuming ε(k) is zero-mean, R(θ) is an estimate of its covariance.
ARX models Model structures General PEM Implementation & Extensions
Table of contents
1 Identification of ARX models
2 Model structures
3 General prediction error methods
4 Implementation and extensions
Solving the optimization problem
Multiple inputs and outputs
MIMO ARX: Matlab example
ARX models Model structures General PEM Implementation & Extensions
Experimental data
Consider a continuous stirred-tank reactor:
Image credit: mathworks.com
Input: coolant flow Q
Outputs:
Concentration CA of substance A in the mixTemperature T of the mix
ARX models Model structures General PEM Implementation & Extensions
Experimental data
Left: identification, Right: validation
ARX models Model structures General PEM Implementation & Extensions
MIMO ARX in Matlab
A(q−1)y(k) = B(q−1)u(k) + e(k)
A(q−1) =
a11(q−1) a12(q−1) . . . a1ny (q−1)a21(q−1) a22(q−1) . . . a2ny (q−1)
. . .any1(q−1) any2(q−1) . . . anyny (q−1)
aij(q−1) =
{1 if i = j0 otherwise
+ aij1q−1 + . . . + aij
naijq−naij
B =
b11(q−1) b12(q−1) . . . b1nu(q−1)b21(q−1) b22(q−1) . . . b2nu(q−1)
. . .bny1(q−1) bny2(q−1) . . . bnynu(q−1)
bij(q−1) = bij
1q−nkij + . . . + bijnbij
q−nkij−nbij+1
ARX models Model structures General PEM Implementation & Extensions
Identifying a MIMO ARX modelm = arx(id, [Na, Nb, Nk]);
Arguments:
1 Identification data.2 Matrices with orders of polynomials in A, B, and delays nk :
Na =
na11 . . . na1ny. . .
nany1 . . . nanyny
Nb =
nb11 . . . nb1nu. . .
nbny1 . . . nbnynu
Nk =
nk11 . . . nk1nu. . .
nkny1 . . . nknynu
ARX models Model structures General PEM Implementation & Extensions
MIMO ARX results
Take na = 2, nb = 2, and nk = 1 everywhere in matrix elements:
Na = [2 2; 2 2]; Nb = [2; 2]; Nk = [1; 1];m = arx(id, [Na Nb Nk]);compare(m, val);
Appendix: Nonlinear ARX
Appendix: Nonlinear ARX (for project)
Appendix: Nonlinear ARX
Nonlinear ARX structure
Recall standard ARX:
y(k) =− a1y(k − 1)− a2y(k − 2)− . . .− anay(k − na)
b1u(k − 1) + b2u(k − 2) + . . . + bnbu(k − nb) + e(k)
Linear dependence on delayed outputs y(k − 1), . . . , y(k − na) andinputs u(k − 1), . . . , u(k − nb).
Nonlinear ARX (NARX) generalizes this to any nonlineardependence:
y(k) =g(y(k − 1), y(k − 2), . . . , y(k − na),
u(k − 1), u(k − 2), . . . , u(k − nb); θ)
+ e(k)
Function g is parameterized by θ ∈ Rn, and these parameters can betuned to fit identification data and thereby model a particular system.
In our case, g will be a neural network.
Appendix: Nonlinear ARX
Neural-net NARX: training data
y(k) =g(y(k − 1), y(k − 2), . . . , y(k − na),
u(k − 1), u(k − 2), . . . , u(k − nb); θ)
+ e(k)
y(k) =g(x(k); θ) + e(k)
with new notation for the network input:x(k) = [y(k − 1), . . . , y(k − na), u(k − 1), . . . , u(k − nb)]>
Then, the training dataset consists of input-output pairs:
(x(1), y(1)), (x(2), y(2)), . . . , (x(N), y(N))
where N=number of time steps in the data.
Note: Negative and zero-time y and u can be taken 0, assumingsystem in zero initial conditions.
Appendix: Nonlinear ARX
Neural-net NARX: training criterion
y(k) =g(x(k); θ)
ε(k) =y(k)− y(k) = y(k)− g(x(k); θ)
The criterion is V (θ) = 1N
∑Nk=1 ε2(k), the MSE. Training should return
a parameter vector θ minimizing V (θ).
So NARX identification is in fact:
A nonlinear prediction error method.And also a nonlinear regression problem.
Appendix: Nonlinear ARX
Using the NARX model
One-step ahead prediction: If true output sequence is known, thenetwork input x(k) is fully available:
x(k) =[y(k − 1), . . . , y(k − na), u(k − 1), . . . , u(k − nb)]>
y(k) =g(x(k); θ)
Example: On day k − 1, predict weather for day k .
Simulation: If true outputs unknown, use the previously predictedoutputs to construct an approximation of x(k):
x(k) =[y(k − 1), . . . , y(k − na), u(k − 1), . . . , u(k − nb)]>
y(k) =g(x(k); θ)
(predicted outputs at negative and zero time can also be taken 0.)
Example: Simulation of an aircraft’s response to emergency pilotinputs, that may be dangerous to apply to the real system.