Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador...
Transcript of Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador...
![Page 1: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/1.jpg)
© Jorge Salvador Marques, 2002
Classic Estimation
![Page 2: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/2.jpg)
© Jorge Salvador Marques, 2002
Summary
•Motivation
•Deterministic Methods
•Classic Probabilistic Methods
•Examples
![Page 3: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/3.jpg)
© Jorge Salvador Marques, 2002
ChallengeGiven the data points shown in the figure we wish to approximatethem by a 2nd order polynomial
y=c2 x2+ c1 x+ c0
Problem: how to compute c0, c1, c2.
-1 -0.5 0 0.5 10
1
2
3
x
y
Note: we assume that the x values are accurately known but the y values are corrupted by measurement noise.
![Page 4: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/4.jpg)
© Jorge Salvador Marques, 2002
Gauss (1777-1855)
![Page 5: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/5.jpg)
© Jorge Salvador Marques, 2002
Parabolic Fit
The absolute value of the errors must be small
ei - error
yi
Quadratic cost:2i
ieE ∑= <e,e>
![Page 6: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/6.jpg)
© Jorge Salvador Marques, 2002
Estimation of the coefficients
What coefficients minimize E ?
Stationarity condition:
0)(220 012
22 =−−−∑=∑=∑
∂∂
⇒=∂∂ p
iiiii
pii
ii
ippxcxcxcyxee
ccE
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
⎥⎥
⎦
⎤
⎢⎢
⎣
⎡
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
∑
∑
∑
=
∑∑∑
∑∑∑
∑∑∑
iixiy
iixiy
iiy
ccc
iixix
iixix
iix
iixix
iixix
iix
iix
iix
i
2210
22212
21
2111
internal products between basis functions
internal products between the observations and the basis functions
![Page 7: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/7.jpg)
© Jorge Salvador Marques, 2002
Results
-1 -0.5 0 0.5 10
1
2
3
![Page 8: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/8.jpg)
© Jorge Salvador Marques, 2002
Least Squares Method
Least squares method:
|||| e
cost
Given a set of observations (xi, yi). We wish to approximate yi by φ(xi,θ), θ being an unknown vector of parameters. The error is
ei = yi − φ(xi,θ)
How to obtain θ ?
Hypothesis: it is assumed that xi is accurately known.
2|||| minargˆ ii
e∑=θ
θ
normEuclidean theis||.||
![Page 9: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/9.jpg)
© Jorge Salvador Marques, 2002
Linear Model
iii exy += θφ )'(Model:
LS estimate: bA 1−=θ
)(
)'()(
1
1
iin
i
iin
i
yxb
xxA
φ
φφ
∑=
∑=
=
=
![Page 10: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/10.jpg)
© Jorge Salvador Marques, 2002
Proof∑ −−=i
iiii xyxyE ))'(()')'(( θφθφ
∑=∑⇒=∑ −−⇒=i
iii
iii
iii xxyxxyxddE θφφφθφφ
θ)'()()( 0))'()((2 0
Computing the derivative with respect to θ and using the appropriate properties
is a stationary point.
If the matrix
∑=i
ii xxd
Ed )'()(2 2
2φφ
θis positive definite,
Ε is minimized by .
bA 1ˆ −=θ
θ
![Page 11: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/11.jpg)
© Jorge Salvador Marques, 2002
Geometric InterpretationObservations y1, ..., yn, belong to a vector space E of dimension nm, in which m is the dimension of vector yi.
Model sequence φ(x1)’θ, ..., φ(xn)’θ, belongs to a subspace S ⊂ E with a lower dimension.
The least squares method computes the coefficients of the orthogonal projection of y onto the subspace S, using the internal product:
iii
yxyx ', ∑>=<
y projection is such that the projection error is orthogonal to all the vectors in S:
<e,b>=0, ∀ b∈ S (orthogonality condition)
![Page 12: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/12.jpg)
© Jorge Salvador Marques, 2002
Geometric Interpretationy
e
y
Space of signals generated by the model
If it is a linear subspace S (linear model), is the orthogonal projection ofy onto S
e
y
y Orthogonality conditions
<e,b>=0
b – any vector of S
y
E
![Page 13: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/13.jpg)
© Jorge Salvador Marques, 2002
Learning vs InferenceThe least squares method allows the solution of several problems.
? inference
Learning(model identification)
training data
it does not provide an uncertainty measure!
![Page 14: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/14.jpg)
© Jorge Salvador Marques, 2002
Estimation
yx
General problem: how to estimate x, θ, given y ?
Subproblems: obtain x from y, θ (inference)obtain θ from y, x (learning, identification)
y depends on x and on a vector θx is known or unknown
The learning problem is often based on observations obtained at different instants of time or even different experiments.The estimation of the unknown variable x is usually performed in each experiment.
![Page 15: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/15.jpg)
© Jorge Salvador Marques, 2002
Regression
Given (xi ,yi ), we wish to approximate yi by f(xi ,θ) with θ unknown.
Least squares estimate:
2|||| minargˆ ii
e∑=θ
θ ),( θiii xfye −=
Examples: linear combination of basis functions, neural networks.
x
y
Hypothesis (asymmetric): x is accurately known.
![Page 16: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/16.jpg)
© Jorge Salvador Marques, 2002
PredictionGiven y=( y1 , ..., yt-1 ), we wish to predict yt using a linear combination of past observations:
y1 = at yt-1 + ... + ap yt-p + et
The coefficients ai are estimated by least squares.
?
![Page 17: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/17.jpg)
© Jorge Salvador Marques, 2002
Frequency EstimationGiven a model
iii etAy ++= )cos( ϕω
We wish to estimate A,ω,ϕ ?
Available data: (ti, yi) i=1, ...,N
Idea: choose A,ω,ϕ minimizing ∑=i
ieE 2
non linear problem
![Page 18: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/18.jpg)
© Jorge Salvador Marques, 2002
Estimation of a Sinusoid
with noise
0 1 2 3 4 5 6 7-2
-1
0
1
2
3
0 1 2 3 4 5 6 7-2
-1
0
1
2
0 1 2 3 4 5 6 7-4
-3
-2
-1
0
1
2
3
4
0 1 2 3 4 5 6 7-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
without noise
3 points 7 points
Optimization in the interval: [2,0][10,0][5,0]),,( πφω ××∈A(white noise with σ=.5)
![Page 19: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/19.jpg)
© Jorge Salvador Marques, 2002
Restrictions
The least squares method has the following restrictions:
• does not allow a statistical description of the error: nothing is known about the error we obtain in the next experiment.
•Not robust
•Leads to difficult optimization problems when non linear models are used. In general, only local minima of E can be obtained.
![Page 20: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/20.jpg)
© Jorge Salvador Marques, 2002
Robustnessspurious observations
Not robust
Alternatives:• weighted LS (errors are weighted by confidence degrees)• robust estimation methods
(why ?)
![Page 21: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/21.jpg)
© Jorge Salvador Marques, 2002
RANSAC
•choose a minimal set of data allowing to estimate the parameters•compute the number of observations well approximated by the model•repeat the previous steps N times and at the end choose the estimate with bigger support(the estimate is refined using the support observations)
Refinement
Procedure
support=2 support=3 support=5
Generation of random hypothesis
![Page 22: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/22.jpg)
© Jorge Salvador Marques, 2002
Robust Methods
||)(|| minargˆ iix
ex ρ∑=
Robust estimator:
|||| e
ρ
•requires recursive numeric optimization
Example: LMed
![Page 23: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/23.jpg)
© Jorge Salvador Marques, 2002
Exercises1. Given a signal y=( y1 , ..., yN ), determine the coefficients of the linear predictor
N>>p
using the least squares method.
2. Generalize the previous problem to the case in which we want to predict the signal k steps ahead.
ptptt yayay −− ++= ...ˆ 11
![Page 24: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/24.jpg)
© Jorge Salvador Marques, 2002
WorkConsider two parabola the and observations belonging to each of them. However, we do not know which parabola is close to each observation. Estimate the coefficients of the parabola using the least squares method and robust methods.
Characterize the performance of both methods
![Page 25: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/25.jpg)
© Jorge Salvador Marques, 2002
Classic Probabilistic Methods
![Page 26: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/26.jpg)
© Jorge Salvador Marques, 2002
Classic vs Bayesian Methods
yx y – realization of a random variable
•classic: LS, EM•Bayesian: MAP, MSE
Estimation methods:
Classic methods consider x as a deterministic variable.
Bayesian methods consider x as a random variable characterized by an a posteriori distribution p(x/y)
x - unknown
![Page 27: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/27.jpg)
© Jorge Salvador Marques, 2002
Classic Estimation
•classic estimation methods are not based on general principles.
• An estimator is a mapθ(y) from the observation space into the parameter space.
• Each estimator is a random variable which can be statistically characterized. In general we wish to define estimators with a set of desired properties (e.g., unbiased and with low variance).
![Page 28: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/28.jpg)
© Jorge Salvador Marques, 2002
Maximum Likelihood Method
Maximum likelihood estimate (ML)
p(y/x)l(x)xlx
p(y/x)L(x)xLx
x
x
log )( maxargˆ
)( maxargˆ
==
==
(Fisher, 1921)
If y= y1,..., yN is a set of independent and identically distributed (iid) observations, then
∏=i
i xypxL )/()( ∑=i
i xypxl )/(log)(
oulikelihood function
log likelihood function
![Page 29: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/29.jpg)
© Jorge Salvador Marques, 2002
Normal Distribution
Problem: estimate the mean and covariance matrix of a normal distributionN(µ,R) from a set of n independent random variables y1, ..., yn.
Log likelihood function:
)()'(||log),( 121
2 µµµ −−∑−−= −ii
in yRyRKRl
Optimizing l we get
)'ˆ)(ˆ(ˆ ˆ ii1
1i
11 µµµ −−∑=∑=
==yyPy
n
in
n
in
Mahalanobis distance
![Page 30: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/30.jpg)
© Jorge Salvador Marques, 2002
Properties of ML estimates
)ˆ(ˆ then )( if MVMV yfzyfz ==•Invariance to nonlinear transformations of the data:
In the sequel we consider that we observed a sequence of n iid variables such that the density p of each variable belongs to the family of functions adopted to the model, i.e. p=px0
, where xo is the correct value of the parameter x.• the ML estimator is consistent ( converges to xo when n tends to infinity)• has a normal asymptotic distribution:
x
),0()ˆ( JNxxnd
o →−
where J is the Fisher information matrix:
}{T
dxdl
dxdlEJ =
![Page 31: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/31.jpg)
© Jorge Salvador Marques, 2002
ApproximationHow does the ML method work if the model is wrong ?
PropositionLet y be a sequence of n iid variables with density p and {pθ} a family of functions depending on x (model). If the ML estimates converge to when n tends to infinity, then minimizes the Kullback-Leiblerdivergence between p and {pθ} (model):
dyyp
ypypppDx
x )()(log)(],[ ∫=
x
The divergence is a non symmetric distance.
p
px
D
x
![Page 32: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/32.jpg)
© Jorge Salvador Marques, 2002
ML vs LS
When the observations are normally distributed
)()/( θθ ECeyp −=
The maximum likelihood method is equivalent to the least squares method.
![Page 33: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/33.jpg)
© Jorge Salvador Marques, 2002
Polynomial Approximation
Model:
Data: (x1, y1), (x2, y2), ..., (xn, yn)
),0(~ ]1[)'( )'( 22 σφθφ Nw x xxwxy iiiiii =+=
)(])'([)( 22
12 θθφθ
σECxyCl ii
i−=−∑−=
Log likelihood function:
where E is the least squares energy. The maximum likelihood estimateis equal to the least squares estimate.
Did we gain anything ? Yes. can be statistically characterized.θ
bA 1−=θ )( )'()(11
iin
iii
n
iyxbxxA φφφ ∑=∑=
==
![Page 34: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/34.jpg)
© Jorge Salvador Marques, 2002
Example – Frequency estimation
0 0.5 1 1.5 2 2.5 3-2
0
2
Energy Likelihood function
0 2 4 6 8 100
2
4
0 2 4 6 8 100
1
2
n=1
0 2 4 6 8 100
2
4
0 2 4 6 8 100
1
2
n=2
0 2 4 6 8 100
5
10
0 2 4 6 8 100
10
20
Alln=3
ω ω
observations rad/segNiw
iwitiy
1 )01,.0(~
)cos(
=
+=
ω
ω
rad/seg1ˆ =ω
![Page 35: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/35.jpg)
© Jorge Salvador Marques, 2002
Binary Detection
A binary symbol was transmitted in an analog channel in base band.
},{ AA−∈θ
At the receiver, the observed signal is
,...,n iwy ii 1 =+= θWe wish to estimate θ assuming that the noise is uncorrelated and
),0(~ 2σNwi
The log likelihood function is
θθθσσ i
ii
iyCyl ∑+=−∑−= 22
122
1 )()(
This is the expression of the matched filter.
![Page 36: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/36.jpg)
© Jorge Salvador Marques, 2002
ProofMean estimation:
0)(2)()'(0 11 =−∑−=−−∑∂∂
⇒=∂∂ −− µµµ
µµxRxRxl
ii
xin ∑= 1µ
multiplying by R we get
Note: see the derivative rules presented in Lecture 1
Covariance estimation :
0)()'(||log0 121
2 =−−∂∂
∑−∂∂−⇒=
∂∂ − µµ ii
i
n yRyR
RRR
l
0)')(( 111 =−−∑+− −−− RyyRnR iii
µµ
Multipying by R on the left and on the right
)')((ˆ 1 µµ −−∑= iiin yyR
![Page 37: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/37.jpg)
© Jorge Salvador Marques, 2002
Linear PredictionPrediction is a regression problm. In linear prediction the regression vector consists of the past values of the signal.
ipipii vyayay +++= −− ...11
),0(~ ' 2 σθφ Nvvy iiii += ]'...[ ]'...[ 11 piiip yyaa −−== φθ
The solution of this problem is similar to the estimation of parameters with a linear model (see polynomial approximation)
i=p+1, ...,n
![Page 38: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/38.jpg)
© Jorge Salvador Marques, 2002
Characterization of Estimators
An estimator is a random variable which depends on x, y.
2nd order properties
Mean
random }ˆ{}{ticdeterminis }ˆ{
xxExEBxxExB
−=−=
}ˆ{xE=µ
Covariance matrix
})ˆ)(ˆ{( TxxER µµ −−=
Bias
Notes:•ideal goal: B=0 e zero covariance. This goal is impossible in most problems.•Expected values are computed using p(y/x), when x is deterministic and with p(x,y) when x is random.
![Page 39: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/39.jpg)
© Jorge Salvador Marques, 2002
Crámer-Rao boundCan the covariance matrix of an estimator be the null matrix ?
Crámer-Rao theorem
1}ˆ{ −≥ JxCov
J -1 Crámer-Rao bound, J is the Fisher information matrix
} {T
dxdl
dxdlEJ = l log likelihood function
When The estimator is called eficiente.1}ˆ{ −= JxCov
(x determministic)
definition: A>B if and only if A-B positive definite.
![Page 40: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/40.jpg)
© Jorge Salvador Marques, 2002
ExampleIn practice it is not always easy to compute the Crámer-Rao bound. Onecase which can be easily addressed concerns the estimation of the meanvector of a normal distribution, given N observations y1,..., yN.
RCov N1}ˆ{ ≥µ
Proof: since p(y/x)=N(µ,R),
∑ −−−= −
ii
Ti yRyCl )()()( 1
21 µµµ ∑ −= −
iid
dl yR )(1 µµ
11111 })()({}{ −−−−− ==∑ −∑ −== NRRNRRRyyEREJj
ji
iT
ddl
ddl µµµµ
RJCRB N11 == −
Note: show that the ML estimator of µ is efficient.
![Page 41: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/41.jpg)
© Jorge Salvador Marques, 2002
Monte Carlo MethodMonte Carlo Method: Numeric evaluation of an estimator based on a large number of estimation experiments and statistical analysis of the results.
![Page 42: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/42.jpg)
© Jorge Salvador Marques, 2002
Exercises1. Let x be a binary variable such that P(0)=Po, where Po is an unknown
parameter. Determine the ML estimate of Po from n independent realizations of x.
2. Determine the ML estimator of the mean and covariance matrix of a normal distribution N(µ,R), given n independent observations. (see Apendix).
3. Given n samples of a random signal y described by an autoregressive model yt= a yt-1+ b wt, where w is a white process with distribution wt~N(0,1), determine coefficients a, b by the ML method.
4. Compute the Crámer-Rao bound for the estimation of the parameter ofa Rayleigh distribution, knowing N realizations y1, ..., yN.
![Page 43: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/43.jpg)
© Jorge Salvador Marques, 2002
Appendix – Matrices (I)
Transpose: C=A’Sum: C=A+BMultiplication: C=ABTrace: c=tr(A)Inversion: C=A-1
eigen values and eigen vectors: iiiii
k ii
k kjikij
ijijij
ijij
vAvvIAAAAAC
acAtrcbacABC
bacBACacAC
λλ ====
∑==∑==
+=+===
−−−
,
)(
'
111
Properties
• a matrix is non sigular if all its eigen values are non zero;• the rank of a matrix is equal to the number of eigen values different fromzero; a non singular matrix has full rank.•The eigen values of a symmetric matrix are real.
Operations
![Page 44: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/44.jpg)
© Jorge Salvador Marques, 2002
Appendix Matrices (II)• the trace of a matrix is equal to the summ of all its eigen values;• the determinant of a matrix is equal to the product of all its eigen values• the inverse of a matrix has the following properties:
111)( −−− = ABAB
22122111
1
][ ][][ ][ que em nnDnnC
nnBnnAHGFE
DCBA
×=×=×=×=
⎥⎦⎤
⎢⎣⎡=⎥⎦
⎤⎢⎣⎡
−
111
1
1
11 )(
−−−
−
−
−−
+=
−=
−=
−=
CEBDDDH
CEDG
EBDF
CBDAE
![Page 45: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/45.jpg)
© Jorge Salvador Marques, 2002
DerivativesThe derivative of a scalar function f(X) with respect to matrix X is a matrix
[ ]ijXd
XdfijdX
Xdf][
)()( =
Se f=f(X,Y) e Y=y(X), [ ] dYdf
dXdY
dXdf
dXdf '
+=
Properties
)'11(}1{
'''}'{
''''''}{
2}'{
}'{
''}{
}'{
'}{
−−−=−+=
+=
=
=
=
=
=
BAXXBAXtrdXd
AXBBXAAXBXtrdXd
AXBBXAAXBXtrdXd
XXXtrdXd
BABAXtrdXd
BAAXBtrdXd
AAXtrdXd
AAXtrdXd
)'(||||
)'(||||
)'(|log|
)'(||||
1
1
1
1
−
−
−
−
=
=
=
=
XXnX
XAXBAXB
XX
XXXdXd
nndXd
dXd
dXd
trace determinant
Note: extracted from Sage&Melsa
![Page 46: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate](https://reader033.fdocuments.us/reader033/viewer/2022042410/5f2839ea749ce46aa4736352/html5/thumbnails/46.jpg)
© Jorge Salvador Marques, 2002
BibliographyDuda, Hart, Stork, Pattern Classification, Wiley, 2001.
J. Marques, Reconhecimento de Padrões. Métodos Estatísticos e Neuronais, IST Press, 1999