Lecture 13 Visual Inertial Fusion -...

59
Lecture 13 Visual Inertial Fusion Davide Scaramuzza http://rpg.ifi.uzh.ch / Institute of Informatics – Institute of Neuroinformatics

Transcript of Lecture 13 Visual Inertial Fusion -...

Page 1: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

Lecture 13Visual Inertial Fusion

Davide Scaramuzza

http://rpg.ifi.uzh.ch/

Institute of Informatics – Institute of Neuroinformatics

Page 2: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

Lab Exercise 9 – Today afternoon

Room ETH HG E 1.1 from 13:15 to 15:00

Work description: Bundle Adjustment

Page 3: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

3

Outline

Introduction

IMU model and Camera-IMU system

Different paradigms

Closed-form solution

Filtering approaches

Smoothing methods

- Fixed-lag Smoothing (aka sliding window estimators)

- Full smoothing methods

Camera-IMU extrinsic calibration and Synchronization

Page 4: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

4

What is an IMU?

Inertial Measurement Unit

Gyroscope: Angular velocity

Accelerometer: Linear Accelerations

Mechanical Gyroscope

Mechanical Accelerometer

Page 5: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

5

What is an IMU?

Different categories

Mechanical ($100,000-1M)

Optical ($20,000-100k)

MEMS (from 1$ (phones) to 1,000$ (higher cost because they have a microchip running a Kalman filter))

For small mobile robots & drones: MEMS IMU are mostly used

Cheap

Power efficient

Light weight and solid state

Page 6: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

6

MEMS Accelerometer

spring

capacitive divider

M M Maa

A spring-like structure connects the device to a seismic mass vibrating in a capacitve divider. A capacitive divider converts the displacement of the seismic mass into an electric signal. Damping is created by the gas sealed in the device.

Page 7: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

7

MEMS Gyroscopes

MEMS gyroscopes measure the Coriolis forces acting on MEMS vibrating structures (tuning forks, vibrating wheels, or resonant solids)

Their working principle is similar to the haltere of a fly

Haltere are small structures of some two-winged insects, such as flies. They are flapped rapidly and function as gyroscopes, informing the insect about rotation of the body during flight.

Page 8: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

8

Why IMU?

Monocular vision is scale ambiguous.

Pure vision is not robust enough

Low texture

High dynamic range

High speed motion

Robustness is a critical issue: Tesla accident

“The autopilot sensors on the Model S failed to distinguish a white tractor-trailer crossing the highway against a bright sky. ” [The Guardian]

Page 9: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

9

Why vision?

Pure IMU integration will lead to large drift (especially cheap IMUs)

Will see later mathematically

Intuition

- Integration of angular velocity to get orientation: if there is a bias in angular velocity, the error is proportional to t

- Double integration of acceleration to get position: if there is a bias in acceleration, the error of position is proportional to t2

- Worse, the actual position error also depends on the orientation error (see later).

Errors computed assuming that the device at rest: http://www.vectornav.com/support/library/imu-and-ins

Automotive,Smartphone,& Drone accelerometers

Page 10: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

10

Why visual inertial fusion?

IMU and vision are complementary

Cameras IMU

Precise in slow motion Rich information for other purposes

ᵡ Limited output rate (~100 Hz)ᵡ Scale ambiguity in monocular setupᵡ Lack of robustness

Robust High output rate (~1,000 Hz) Accurate at high acceleration

ᵡ Large relative uncertainty when at low acceleration/angular velocity

ᵡ Ambiguity in gravity / acceleration

What cameras and IMU have in common: both estimate the pose incrementally (known as dead-reckoning), which suffers from drifting over time. Solution: loop detection and loop closure

Page 11: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

11

Outline

Introduction

IMU model and Camera-IMU system

Different paradigms

Closed-form solution

Filtering approaches

Smoothing methods

- Fixed-lag Smoothing (aka sliding window estimators)

- Full smoothing methods

Camera-IMU extrinsic calibration and Synchronization

Page 12: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

12

IMU model: Measurement Model Measures angular velocity and acceleration in the body frame:

B WB B WB

g gt t t t ω ω b n

B WB BW W WB w

a at t t t t a R a g b n

where the superscript 𝑔 stands for Gyroscope and 𝑎 for Accelerometer

Notations:• Left subscript: reference frame in which the quantity is expressed• Right subscript {Q}{Frame1}{Frame2}: Q of Frame2 with respect to Frame1• Noises are all in the body frame

measurements noise

Page 13: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

13

IMU model: Noise Property Additive Gaussian white noise:

Trawny, Nikolas, and Stergios I. Roumeliotis. "Indirect Kalman filter for 3D attitude estimation.“https://github.com/ethz-asl/kalibr/wiki/IMU-Noise-Model

,g a

t tb b

,g a

t tn n

2

1 2 1 2

0E n t

E n t n t t t

( )b

t tb w

~ (0,1)

/ t

d

d

n k w k

w k N

Bias:

1

~ (0,1)

bd

bd b

k k k

t

w k N

b b w

i.e., the derivative of the bias is white Gaussian noise(so-called random walk)

The biases are usually estimated with the other states• can change every time the IMU is started• can change due to temperature change, mechanical pressure, etc.

Page 14: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

14

IMU model: Integration

Per component: {t} stands for {B}ody frame at time t

2

2 1 1

1

2

Wt Wt 2 1 Wt Wt wd

t

a

t

t t t t t t p p v R a b g

2

2 1

1

Wt Wt Wt wd

t

a

t

t t t t v v R a b g

Rotation is more involved, will use quaternion as example:

Trawny, Nikolas, and Stergios I. Roumeliotis. "Indirect Kalman filter for 3D attitude estimation."

Wt Wt

1

2t t t q ω q

2 1Wt 2 1 2 Wt 1,t t t t q q

is the state transition matrix. 1 2,t t

• Depends on initial position and velocity• The rotation R(t) is computed from the gyroscope

Page 15: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

15

IMU model: Integration

per component: {t} stands for {B}ody frame at time t

2

2 1 1

1

2

Wt Wt 2 1 Wt Wt wd

t

a

t

t t t t t t p p v R a b g

• Depends on initial position and velocity• The rotation R(t) is computed from the gyroscope

Trawny, Nikolas, and Stergios I. Roumeliotis. "Indirect Kalman filter for 3D attitude estimation."

Page 16: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

16

Camera-IMU System

There can be multiple cameras.

Page 17: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

17

Outline

Introduction

IMU model and Camera-IMU system

Different paradigms

Closed-form solution

Filtering approaches

Smoothing methods

- Fixed-lag Smoothing (aka sliding window estimators)

- Full smoothing methods

Camera-IMU extrinsic calibration and Synchronization

17

Page 18: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

18

Different paradigms Loosely coupled:

Treats VO and IMU as two separate (not coupled) black boxes

- Each black box estimates pose and velocity from visual (up to a scale) and inertial data (absolute scale)

Tightly coupled:

Makes use of the raw sensors’ measurements:

- 2D features

- IMU readings

- More accurate

- More implementation effort

In the following slides, we will only see tightly coupled approaches

Page 19: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

19

The Loosely Coupled Approach

Feature tracking

VOimages

Position (up to a scale) &

orientation

IMU Integration

Position

Orientation

Velocity

2D features

FusionRefined

Position

Orientation

VelocityIMU

measurements

Page 20: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

20

The Tightly Coupled Approach

Feature tracking

images

IMU measurements

2D features

FusionRefined

Position

Orientation

Velocity

Page 21: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

21

System states:

W WB W W 1 W 2 W; ; ; ; ; ; ;...,;

a g

Kt t t t t X p q v b b L L L

Filtering: Visual Inertial Formulation

Tightly coupled:

Loosely coupled: W WB W; ; ; ;

a gt t t t t X p q v b b

Corke, An Introduction to Inertial and Visual Sensing, IJRR’0721

Page 22: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

22

Outline

Introduction

IMU model and Camera-IMU system

Different paradigms

Closed-form solution

Filtering approaches

Smoothing methods

- Fixed-lag Smoothing (aka sliding window estimators)

- Full smoothing methods

Camera-IMU extrinsic calibration and Synchronization

Page 23: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

23

Closed-form Solution (1D case) The absolute pose 𝑥 is known up to a scale 𝑠, thus

𝑥 = 𝑠 𝑥

From the IMU

𝑥 = 𝑥0 + 𝑣0(𝑡1 − 𝑡0) +ඵ𝑡0

𝑡1

𝑎 𝑡 𝑑𝑡

By equating them

𝑠 𝑥 = 𝑥0 + 𝑣0 𝑡1 − 𝑡0 +ඵ𝑡0

𝑡1

𝑎 𝑡 𝑑𝑡

As shown in [Martinelli’14], for 6DOF, both 𝑠 and 𝑣0 can be determined in closed form from a single feature observation and 3 views. 𝑥0 can be set to 0.

Martinelli, Closed-form solution of visual-inertial structure from motion, International Journal of Computer Vision, 2014

Page 24: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

24

Closed-form Solution (1D case)

Martinelli, Closed-form solution of visual-inertial structure from motion, International Journal of Computer Vision, 2014

𝑠෦𝑥1 = 𝑣0 𝑡1 − 𝑡0 +ඵ𝑡0

𝑡1

𝑎 𝑡 𝑑𝑡

𝑠෦𝑥2 = 𝑣0 𝑡2 − 𝑡0 +ඵ𝑡0

𝑡2

𝑎 𝑡 𝑑𝑡𝑡0 𝑡1 𝑡2

𝐿1

෦𝑥1 (𝑡0−𝑡1)

෦𝑥2 (𝑡0−𝑡2)

𝑠𝑣0

=

ඵ𝑡0

𝑡1

𝑎 𝑡 𝑑𝑡

ඵ𝑡0

2

𝑎 𝑡 𝑑𝑡

24

Page 25: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

25

Closed-form Solution (general case)

Considers N feature observations and 6DOF case

Can be used to initialize filters and smoothers (which always need an initialization point)

More complex to derive than the 1D case. But it also reaches a linear system of equations that can be solved using the pseudoinverse:

𝑿 is the vector of unknowns:• 3D Point distances (wrt the first camera) • Absolute scale, • Initial velocity, • Gravity vector, • Biases

𝑨 and 𝑺 contain 2D feature coordinates, acceleration, and angular velocity measurements

• Martinelli, Vision and IMU data fusion: Closed-form solutions for attitude, speed, absolute scale, and bias determination, TRO’12• Martinelli, Closed-form solution of visual-inertial structure from motion, Int. Journal of Comp. Vision, JCV’14• Kaiser, Martinelli, Fontana, Scaramuzza, Simultaneous state initialization and gyroscope bias calibration in visual inertial aided

navigation, IEEE RAL’17 25

Page 26: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

26

Outline

Introduction

IMU model and Camera-IMU system

Different paradigms

Closed-form solution

Filtering approaches

Smoothing methods

- Fixed-lag Smoothing (aka sliding window estimators)

- Full smoothing methods

Camera-IMU extrinsic calibration and Synchronization

Page 27: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

27

Different paradigms

Filtering Fixed-lag Smoothing Full smoothing

Only updates the most recent states• (e.g., extended

Kalman filter)

Optimizes window of states• Marginalization• Nonlinear least squares

optimization

Optimize all states• Nonlinear Least squares

optimization

1 Linearization

Accumulation of linearization errors

Gaussian approximation of marginalized states

Fastest

Re-Linearize

Accumulation of linearization errors

Gaussian approximation of marginalized states

Fast

Re-Linearize

Sparse Matrices

Highest Accuracy

Slow (but fast with GTSAM)

Page 28: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

28

Filtering: Kalman Filter in a Nutshell

Assumptions: linear system, Gaussian noise

Kalman FilterSystem dynamics

1

1 1

x k A k x k

u k v k

z k H k x k w k

x(k): stateu(k): control input, can be 0z(k): measurement

0 00 ~ ,

~ 0,

~ 0,

x N x P

v k N Q k

w k N R k

0 00 , 0

m mx x P P

ˆ ˆ1 1 1

1 1 1

1

p m

T

p m

x k A k x k u k

P k A k P k A k

Q k

11

1

ˆ ˆ

ˆ

T

m p

m p

T

m p

P k P k H k R k H k

x k x k

P k H k R k z k H k x k

Prediction

Measurement update

Weight between the model prediction and measurement

Page 29: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

29

Filtering: Kalman Filter in a Nutshell Nonlinear system: linearization

Extended Kalman FilterSystem dynamics

11 , 1 , 1

,

k

k

x k q x k u k v k

z k h x k w k

Process and measurement noise and initial state are Gaussian.

1ˆ ˆ 1 , 1 ,0

1 1 1

1 1 1

p k m

T

p m

T

x k q x k u k

P k A k P k A k

L k Q k L k

1

(

)

ˆ ˆ ˆ ,0

T T

p p

T

m p k p

m p

K k P k H k H k P k H k

M k R k M k

x k x k K k z k h x

P k I K k H k P k

Key idea:• Linearize around the

estimated states• A(k) L(k) H(k) M(k) are

partial derivatives with respect to states and noise

Prediction

Measurement update

Page 30: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

30

System states:

W WB W W 1 W 2 W; ; ; ; ; ; ;...,;

a g

Kt t t t t X p q v b b L L L

Filtering: Visual Inertial Formulation

Process Model: from IMU

Tightly coupled:

Loosely coupled: W WB W; ; ; ;

a gt t t t t X p q v b b

• Integration of IMU states (rotation, position, velocity)• Propagation of IMU noise

• needed for calculating the Kalman Filter gain

Page 31: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

31

Filtering: Visual Inertial FormulationMeasurement Model: from camera

+C

x x

C

C

y y

C

xf c

zu

v yf c

z

2

proj

2

10

10

x x

y y

xf f

z z

yf f

z z

H

C

C CB BW W W CB

C

x

y

z

R R L p p

Pinhole projection (without distortion)

Transform point to camera frame

Landmark CB BW CW=H R R R

pose CB BW B H R R L

Drop C for clarity

proj pose

XH H H

proj Landmark

LH H H

BL

Page 32: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

32

Filtering: ROVIO

Bloesch, Michael, et al. "Iterated extended Kalman filter based visual-inertial odometry using direct photometric feedback“, IJRR’17

• EKF state: • Minimizes the photometric error instead of the reprojection error

W WB W W 1 W 2 W; ; ; ; ; ; ;...,;

a g

Kt t t t t X p q v b b L L L

32

Page 33: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

33

Filtering: Problems

Wrong linearization point:

Linearization depends on the current estimates of states, which may be erroneous

Complexity of the EKF grows quadratically in the number of estimated landmarks,

→ a few landmarks (~20) are typically tracked to allow real-time operation

Alternative: MSCKF [Mourikis & Roumeliotis, ICRA’07]: used in Google ARCore

Keeps a window of recent states and updates them using EKF

incorporate visual observations without including point positions into the states

Mourikis & Roumeliotis, A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation, TRO’16

Li, Mingyang, and Anastasios I. Mourikis, High-precision, consistent EKF-based visual–inertial odometry, IJRR’13

Page 34: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

Filtering: Google ARCore

Mourikis & Roumeliotis, A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation, TRO’16

Li, Mingyang, and Anastasios I. Mourikis, High-precision, consistent EKF-based visual–inertial odometry, IJRR’13

Page 35: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

35

Outline

Introduction

IMU model and Camera-IMU system

Different paradigms

Closed-form solution

Filtering approaches

Smoothing methods

- Fixed-lag Smoothing (aka sliding window estimators)

- Full smoothing methods

Camera-IMU extrinsic calibration and Synchronization

Page 36: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

VIO solved as a graph optimization problem over:

[Jung, CVPR’01] [Sterlow’04] [Bryson, ICRA’09] [Indelman, RAS’13] [Patron-Perez, IJCV’15][Leutenegger, RSS’13-IJRR’15] [Forster, RSS’15, TRO’17]

Smoothing methods

{X, L} = 𝑎𝑟𝑔𝑚𝑖𝑛{X, L}

𝑘=1

𝑁

𝑓 𝑥𝑘−1, 𝑢 − 𝑥𝑘 𝛬𝑘2 +

𝑘=1

𝑁

𝑖=1

𝑀

𝜋(𝑥𝑘 , 𝑙𝑖) − 𝑧𝑖𝑘 𝛴𝑖𝑘

2

IMU residuals Reprojection residuals

𝑥𝑘 = 𝑓 𝑥𝑘−1, 𝑢𝑧𝑖𝑘 = 𝜋(𝑥𝑘, 𝑙𝑖)

𝑋 = {𝑥1, … 𝑥𝑁}: Robot states (pose,velocity, acceleration)

𝐿 = {𝑙1, … , 𝑙𝑀}: 3D Landmarks

is the state transition function; 𝑢 is the set of IMU measurements

𝛬𝑘 is the covariance from the IMU integration𝛴𝑖𝑘is the covariance from the noisy 2D feature measurements

is the reprojection of the landmark 𝑖 in the camera frame 𝑘

Page 37: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

37

MAP: a nonlinear least squares problem

Bayesian Theorem

( | ) ( )( | )

( )

P B A P AP A B

P B

Max a Posteriori: given the observation, what is the optimal estimation of the states?

Gaussian Property: for iid variables

Applied to state estimation problem:• states (position, attitude, velocity, and 3D point position)• measurements (feature positions, IMU readings)

2

22 21

2

1( ,..., | , )

2

ix

kf x x e

Maximizing the probability is equivalent to minimizing the square root sum

Page 38: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

38

MAP: a nonlinear least squares problem SLAM as a MAP problem

1

0 11 2

( , | ) ( | , ) ( , )

( | , ) ( )

( ) ( | , ) ( | )k j

M

ii

M N

i i i k ki k

P X L Z P Z X L P X L

P z X L P X

P x P z x l P x x

• X L are independent, and no prior information about L

• Measurements are independent• Markov process model

X = {x1,…xN}: robot statesL = {l1,…}: 3D pointsZ = {zi, … zM}: feature positions

1( )

( , )k j

k k

i i i

x f x

z h x l

Page 39: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

39

MAP: a nonlinear least squares problem SLAM as a least squares problem

0 11 2

( , | ) ( ) ( | , ) ( | )k j

M N

i i i k ki k

P X L Z P x P z x l P x x

* *

{ , }

22

11 1{ , }

{ , } argmax ( , | )

arg min ( ) ( , )k jk

i

X L

N M

k k i i ik iX L

X L P X L Z

f x x h x l z

Notes:

Normalize the residuals with the variance of process noise and measurement noise (so-called Mahalanobis distance)

Without the prior, applying the property of Gaussian distribution:

Page 40: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

40

MAP: Nonlinear optimization

Gauss-Newton method2*

1arg min ( )

M

i ii

f

θ

θ θ z

Solve it iteratively2

*

1

1

arg min ( )M

s

i ii

s s

f

ε θ ε z

θ θ ε

Applying first-order approximation:2

*

1

2

1

* 1 T

arg min ( )

arg min ( )

( ) ( )

Ms

i i ii

Ms

i ii

T

f

ε θ z J ε

r θ J ε

ε J J J r θ

1 1

2 2

... ...

M M

J r

J rJ r

J r

Page 41: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

41

MAP: visual inertial formulation States

WB WB WB, , [k], ,

a g

Rk k k k k X p q v b b

L W1 W2 WL, ,...,X L L L

R R R L1 , 2 ,..., ,k X X X X X

1k Kf x x

,ik ij i

h x L z

Dynamics Jacobians

IMU integration w.r.t xk-1

Residual w.r.t. xk

Measurements Jacobians (same as filtering method)

Feature position w.r.t. pose

Feature position w.r.t. 3D coordinates

Combined:

Page 42: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

42

Fixed-lag smoothing: Basic Idea

Recall MAP estimation* 1 T

( ) ( )T

ε J J J r θ

is also called the Hessian matrix.T

J J

Hessian for full bundle adjustment: n x n, n number of all the states

pose, velocity | landmarks

If only part of the states are of interest, can we think of a way for simplification?

Page 43: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

43

Fixed-lag smoothing: Marginalization

Schur complement

A BM

C D

1D D CA B

1A A BD C

Schur complement of A in M

Schur complement of D in M

Reduced linear system

1 1

2 2

A B

C D

x b

x b

1 1

1 1

2 2

1 0 1 0

1 1

A B

CA C D CA

x b

x b

1

20

bA B

bD

1

2 2 1b b CA b

We can then just solve for x2, and (optionally) solve for x1 by back substitution.

Page 44: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

44

Fixed-lag smoothing: Marginalization

Generalized Schur complement

Any principal submatrix: selecting n rows and n columns of the same index (i.e., select any states to marginalize)

Nonsingular submatrix: use generalized inverse (e.g., Moore–Penrose pseudoinverse)

Special structure of SLAM

Marginalization causes fill-in, no longer maintaining the sparse structure.

Inverse of diagonal matrix is very efficient to calculate.

1D

Page 45: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

45

Fixed-lag smoothing: Implementation

States and formulations are similar to MAP estimation.

Which states to marginalize?

Old states: keep a window of recent frames

Landmarks: structureless

Marginalizing states vs. dropping the states

Dropping the states: loss of information, not optimal

Marginalization: optimal if there is no linearization error, but introduces fill-in, causing performance penalty

Therefore, dropping states is also used to trade accuracy for speed.

Leutenegger, Stefan, et al. "Keyframe-based visual–inertial odometry using nonlinear optimization."

Page 46: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

Fixed-lag smoothing: OKVIS

Leutenegger, OKVIS: Open Keyframe-based Visual-Inertial SLAM, IJRR’1546

Page 47: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

47

MAP: why it is slow

Re-linearization

Need to recalculate the Jacobian for each iteration

But it is also an important reason why MAP is accurate

The number of states is large

Will see next: fix-lag smoothing and marginalization

Re-integration of IMU measurements

The integration from k to k+1 is related to the state estimation at time k

PreintegrationLupton, Todd, and Salah Sukkarieh. "Visual-inertial-aided navigation for high-dynamic motion in built environments without initial conditions." Forster, Christian, et al. "IMU preintegration on manifold for efficient visual-inertial maximum-a-posteriori estimation."

Page 48: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

48

{𝜔, 𝑎} Δ ෨𝑅, Δ 𝑣, Δ 𝑝

Standard:Evaluate error in global frame:

Preintegration:Evaluate relative errors:

𝒆𝑅 = Δ ෨𝑅𝑇 Δ𝑅

𝒆v = Δv − Δv

𝒆𝑝 = Δ 𝑝 − Δ𝑝

𝒆𝑅 = 𝑅 𝜔,𝑅𝑘−1𝑇𝑅𝑘

𝒆v = ොv(𝜔, 𝑎, v𝑘−1) − v𝑘

𝒆𝑝 = Ƹ𝑝(𝜔, 𝑎, 𝑝𝑘−1) − 𝑝𝑘

𝑅, 𝑝, 𝑣

Repeat integration when previous state changes!

Preintegration of IMU deltas possible with no initial condition required.

Predicted Estimate

MAP: IMU Preintegration

Page 49: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

Solves the same optimization problem but:

Keeps all the frames (from the start of the trajectory)

To make the optimization efficient

it makes the graph sparser using keyframes

pre-integrates the IMU data between keyframes

Optimization salved using factor graphs (GTSAM)

Very fast because it only optimizes the poses which are affected by a new observation

Forster, Carlone, Dellaert, Scaramuzza, On-Manifold Preintegration for Real-Time Visual-Inertial Odometry,IEEE Transactions on Robotics (TRO), Feb. 2017, Best Paper Award 2018.

Full Smoothing: SVO+GTSAM & IMU Pre-integration

{X, L} = 𝑎𝑟𝑔𝑚𝑖𝑛{X, L}

𝑘=1

𝑁

𝑓 𝑥𝑘−1, 𝑢 − 𝑥𝑘 𝛬𝑘2 +

𝑘=1

𝑁

𝑖=1

𝑀

𝜋(𝑥𝑘 , 𝑙𝑖) − 𝑧𝑖𝑘 𝛴𝑖𝑘

2

IMU residuals Reprojection residuals

Page 50: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

50

Forster, Carlone, Dellaert, Scaramuzza, On-Manifold Preintegration for Real-Time Visual-Inertial Odometry,IEEE Transactions on Robotics, Feb. 2017.

Full Smoothing: SVO+GTSAM & IMU Pre-integration

Page 51: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

51

SVO + IMU Preintegration

Google ARCore Proposed OKVIS

Accuracy: 0.1% of the travel distance

Forster, Carlone, Dellaert, Scaramuzza, On-Manifold Preintegration for Real-Time Visual-Inertial Odometry,IEEE Transactions on Robotics, Feb. 2017.

Page 52: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

52

Recap

Closed form solution:

for 6DOF motion both 𝑠 and 𝑣0 can be determined 1 feature observation and at least 3 views [Martinelli, TRO’12, IJCV’14, RAL’16]

Can be used to initialize filters and smoothers

Filters: update only last state → fast if number of features is low (~20)

[Mourikis, ICRA’07, CVPR’08], [Jones, IJRR’11] [Kottas, ISER’12][Bloesch, IROS’15] [Wu et al., RSS’15], [Hesch, IJRR’14], [Weiss, JFR’13]

Open source: ROVIO [Bloesch, IROS’15, IJRR’17], MSCKF [Mourikis, ICRA’07] (i.e., Google ARCore)

Fixed-lag smoothers: update a window of states → slower but more accurate

[Mourikis, CVPR’08] [Sibley, IJRR’10], [Dong, ICRA’11], [Leutenegger, RSS’13-IJRR’15]

Open source: OKVIS [Leutenegger, RSS’13-IJRR’15]

Full-smoothing methods: update entire history of states → slower but more accurate

[Jung, CVPR’01] [Sterlow’04] [Bryson, ICRA’09] [Indelman, RAS’13] [Patron-Perez, IJCV’15] [Forster, RSS’15, TRO’16]

Open source: SVO+IMU [Forster, TRO’17]

Page 53: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

53

Open Problem: consistency

Filters

Linearization around different values of the same variable may lead to error

Smoothing methods

May get stuck in local minima

53

Page 54: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

54

Outline

Introduction

IMU model and Camera-IMU system

Different paradigms

Closed-form solution

Filtering approaches

Smoothing methods

- Fixed-lag Smoothing (aka sliding window estimators)

- Full smoothing methods

Camera-IMU extrinsic calibration and Synchronization

Page 55: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

55

Camera-IMU calibration Goal: estimate the rigid-body transformation 𝑻𝑩𝑪 and delay 𝑡𝑑 between a

camera and an IMU rigidly attached. Assume that the camera has already been intrinsically calibrated.

Data:

Image points of detected calibration pattern (checkerboard).

IMU measurements: accelerometer {𝑎𝑘} and gyroscope {𝜔𝑘}.

Furgale et al. "Unified Temporal and Spatial Calibration for Multi-Sensor Systems“, IROS’13.

Page 56: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

56

Camera-IMU calibration - Example Data acquisition: Move the sensor in front of a static calibration pattern,

exciting all degrees of freedom, and trying to make smooth motions.

Gyroscope

Accelerometer

Images.

Extract points from

calibration pattern.

Page 57: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

57

Camera-IMU calibration

Approach: Minimize a cost function (Furgale’13):

𝐽 𝜃 ≔ 𝐽𝑓𝑒𝑎𝑡 + 𝐽𝑎𝑐𝑐 + 𝐽𝑔𝑦𝑟𝑜 + 𝐽𝑏𝑖𝑎𝑠𝑎𝑐𝑐 + 𝐽𝑏𝑖𝑎𝑠𝑔𝑦𝑟𝑜

Unknowns: 𝑇𝐵𝐶 , 𝑡𝑑, 𝑔𝑤, 𝑇𝑊𝐵 𝑡 , 𝑏𝑎𝑐𝑐 𝑡 , 𝑏𝑔𝑦𝑟𝑜 𝑡

- 𝑔𝑤 = Gravity,

- 𝑇𝑊𝐵 𝑡 = 6-DOF trajectory of the IMU,

- 𝑏𝑎𝑐𝑐 𝑡 , 𝑏𝑔𝑦𝑟𝑜 𝑡 = 3-DOF biases of the IMU

Continuous-time modelling using splines for 𝑇𝑊𝐵 𝑡 , 𝑏𝑎𝑐𝑐 𝑡 , …

Numerical solver: Levenberg-Marquardt (i.e., Gauss-Newton).

Furgale et al. "Unified Temporal and Spatial Calibration for Multi-Sensor Systems“, IROS’13.

(Feature reprojection

Error)

𝑘

𝑎𝐼𝑀𝑈 𝑡𝑘 − 𝑡𝑑 − 𝑎𝐶𝑎𝑚 𝑡𝑘2

𝑘

𝜔𝐼𝑀𝑈 𝑡𝑘 − 𝑡𝑑 − 𝜔𝐶𝑎𝑚 𝑡𝑘2

න𝑑𝑏𝑎𝑐𝑐𝑑𝑡

𝑢

2

𝑑𝑢 න𝑑𝑏𝑔𝑦𝑟𝑜𝑑𝑡

𝑢

2

𝑑𝑢

Page 58: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

58

Software solution: (Furgale’13).

Generates a report after optimizing the cost function.

Furgale et al. "Unified Temporal and Spatial Calibration for Multi-Sensor Systems“, IROS’13.https://github.com/ethz-asl/kalibr/wiki/camera-imu-calibration

Camera-IMU calibration - Example

Residuals:

Reprojection error [px]: 0.0976 0.051

Gyroscope error [rad/s]: 0.0167 0.009

Accelerometer error [m/s^2]: 0.0595 0.031

Transformation T_ci: (imu to cam):

[[ 0.99995526 -0.00934911 -0.00143776 0.00008436]

[ 0.00936458 0.99989388 0.01115983 0.00197427]

[ 0.00133327 -0.0111728 0.99993669 -0.05054946]

[ 0. 0. 0. 1. ]]

Time shift (delay d)

cam0 to imu0: [s] (t_imu = t_cam + shift)

0.00270636270255

Gravity vector in target coords: [m/s^2]

[ 0.04170719 -0.01000423 -9.80645621]

Reprojection error

Angular velocity error

Estimated gyro biases

Page 59: Lecture 13 Visual Inertial Fusion - rpg.ifi.uzh.chrpg.ifi.uzh.ch/docs/teaching/2018/13_visual_inertial_fusion_advanced.pdf · divider converts the displacement of the seismic mass

59

Understanding Check

Are you able to answer the following questions?

Why is it recommended to use an IMU for Visual Odometry?

Why not just an IMU?

How does a MEMS IMU work?

What is the drift of an industrial IMU?

What is the IMU measurement model?

What causes the bias in an IMU?

How do we model the bias?

How do we integrate the acceleration to get the position formula?

What is the definition of loosely coupled and tightly coupled visual inertial fusions?

How can we use non-linear optimization-based approaches to solve for visual inertial fusion?