EKF , UKF Pieter Abbeel UC Berkeley EECS
description
Transcript of EKF , UKF Pieter Abbeel UC Berkeley EECS
![Page 1: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/1.jpg)
EKF, UKF
Pieter AbbeelUC Berkeley EECS
Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics
![Page 2: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/2.jpg)
Kalman Filter = special case of a Bayes’ filter with dynamics model and sensory model being linear Gaussian:
Kalman Filter
2 -1
![Page 3: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/3.jpg)
At time 0: For t = 1, 2, …
Dynamics update:
Measurement update:
Kalman Filtering Algorithm
![Page 4: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/4.jpg)
4
Nonlinear Dynamical Systems Most realistic robotic problems involve nonlinear
functions:
Versus linear setting:
![Page 5: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/5.jpg)
5
Linearity Assumption Revisitedyy
x
xp(x)
p(y)
![Page 6: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/6.jpg)
6
Non-linear Function
“Gaussian of p(y)” has mean and variance of y under p(y)
yy
x
xp(x)
p(y)
![Page 7: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/7.jpg)
7
EKF Linearization (1)
![Page 8: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/8.jpg)
8
EKF Linearization (2)
p(x) has high variance relative to region in which linearization is accurate.
![Page 9: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/9.jpg)
9
EKF Linearization (3)
p(x) has small variance relative to region in which linearization is accurate.
![Page 10: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/10.jpg)
10
Dynamics model: for xt “close to” ¹t we have:
Measurement model: for xt “close to” ¹t we have:
EKF Linearization: First Order Taylor Series Expansion
![Page 11: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/11.jpg)
Numerically compute Ft column by column:
Here ei is the basis vector with all entries equal to zero, except for the i’t entry, which equals 1.
If wanting to approximate Ft as closely as possible then ² is chosen to be a small number, but not too small to avoid numerical issues
EKF Linearization: Numerical
![Page 12: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/12.jpg)
Given: samples {(x(1), y(1)), (x(2), y(2)), …, (x(m), y(m))}
Problem: find function of the form f(x) = a0 + a1 x that fits the samples as well as possible in the following sense:
Ordinary Least Squares
![Page 13: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/13.jpg)
Recall our objective: Let’s write this in vector notation:
, giving:
Set gradient equal to zero to find extremum:
Ordinary Least Squares
(See the Matrix Cookbook for matrix identities, including derivatives.)
![Page 14: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/14.jpg)
For our example problem we obtain a = [4.75; 2.00]
Ordinary Least Squares
a0 + a1 x
![Page 15: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/15.jpg)
More generally:
In vector notation: , gives:
Set gradient equal to zero to find extremum (exact same derivation as two slides back):
Ordinary Least Squares0 10 20 30 40
010
203020222426
![Page 16: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/16.jpg)
So far have considered approximating a scalar valued function from samples {(x(1), y(1)), (x(2), y(2)), …, (x(m), y(m))} with
A vector valued function is just many scalar valued functions and we can approximate it the same way by solving an OLS problem multiple times. Concretely, let then we have:
In our vector notation:
This can be solved by solving a separate ordinary least squares problem to find each row of
Vector Valued Ordinary Least Squares Problems
![Page 17: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/17.jpg)
Solving the OLS problem for each row gives us:
Each OLS problem has the same structure. We have
Vector Valued Ordinary Least Squares Problems
![Page 18: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/18.jpg)
Approximate xt+1 = ft(xt, ut) with affine function a0 + Ft xt by running least squares on samples from the function: {( xt(1), y(1)=ft(xt(1),ut), ( xt(2), y(2)=ft(xt(2),ut), …, ( xt(m), y(m)=ft(xt(m),ut)}
Similarly for zt+1 = ht(xt)
Vector Valued Ordinary Least Squares and EKF Linearization
![Page 19: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/19.jpg)
OLS vs. traditional (tangent) linearization:
OLS and EKF Linearization: Sample Point Selection
traditional (tangent)
OLS
![Page 20: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/20.jpg)
Perhaps most natural choice:
reasonable way of trying to cover the region with reasonably high probability mass
OLS Linearization: choosing samples points
![Page 21: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/21.jpg)
Numerical (based on least squares or finite differences) could give a more accurate “regional” approximation. Size of region determined by evaluation points.
Computational efficiency: Analytical derivatives can be cheaper or more
expensive than function evaluations Development hint:
Numerical derivatives tend to be easier to implement
If deciding to use analytical derivatives, implementing finite difference derivative and comparing with analytical results can help debugging the analytical derivatives
Analytical vs. Numerical Linearization
![Page 22: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/22.jpg)
At time 0: For t = 1, 2, …
Dynamics update:
Measurement update:
EKF Algorithm
![Page 23: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/23.jpg)
34
EKF Summary Highly efficient: Polynomial in measurement
dimensionality k and state dimensionality n: O(k2.376 + n2)
Not optimal! Can diverge if nonlinearities are large! Works surprisingly well even when all assumptions
are violated!
![Page 24: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/24.jpg)
35
Linearization via Unscented Transform
EKF UKF
![Page 25: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/25.jpg)
36
UKF Sigma-Point Estimate (2)
EKF UKF
![Page 26: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/26.jpg)
37
UKF Sigma-Point Estimate (3)
EKF UKF
![Page 27: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/27.jpg)
UKF Sigma-Point Estimate (4)
![Page 28: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/28.jpg)
Assume we know the distribution over X and it has a mean \bar{x}
Y = f(X)
EKF approximates f by first order and ignores higher-order terms
UKF uses f exactly, but approximates p(x).
UKF intuition why it can perform better
[Julier and Uhlmann, 1997]
![Page 29: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/29.jpg)
When would the UKF significantly outperform the EKF?
Analytical derivatives, finite-difference derivatives, and least squares will all end up with a horizontal linearization
they’d predict zero variance in Y = f(X)
Self-quiz
x
y
![Page 30: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/30.jpg)
Beyond scope of course, just including for completeness. A crude preliminary investigation of whether we can get EKF
to match UKF by particular choice of points used in the least squares fitting
![Page 31: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/31.jpg)
Picks a minimal set of sample points that match 1st, 2nd and 3rd moments of a Gaussian:
\bar{x} = mean, Pxx = covariance, i i’th column, x 2 <n
· : extra degree of freedom to fine-tune the higher order moments of the approximation; when x is Gaussian, n+· = 3 is a suggested heuristic
L = \sqrt{P_{xx}} can be chosen to be any matrix satisfying:
L LT = Pxx
Original unscented transform
[Julier and Uhlmann, 1997]
![Page 32: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/32.jpg)
Dynamics update: Can simply use unscented transform and
estimate the mean and variance at the next time from the sample points
Observation update: Use sigma-points from unscented transform to
compute the covariance matrix between xt and zt. Then can do the standard update.
Unscented Kalman filter
![Page 33: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/33.jpg)
[Table 3.4 in Probabilistic Robotics]
![Page 34: EKF , UKF Pieter Abbeel UC Berkeley EECS](https://reader035.fdocuments.us/reader035/viewer/2022062302/5681645f550346895dd63791/html5/thumbnails/34.jpg)
UKF Summary Highly efficient: Same complexity as EKF, with a
constant factor slower in typical practical applications
Better linearization than EKF: Accurate in first two terms of Taylor expansion (EKF only first term) + capturing more aspects of the higher order terms
Derivative-free: No Jacobians needed Still not optimal!