Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking....

46
Human Pose Tracking I: Basics David Fleet University of Toronto CIFAR Summer School, 2009

Transcript of Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking....

Page 1: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Human Pose Tracking I: BasicsDavid FleetUniversity of Toronto

CIFAR Summer School, 2009

Page 2: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Looking at People

Page 3: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Challenges: Complex pose / motion

People have many degrees of freedom, comprising an articulated

skeleton overlaid with soft tissue and deformable clothing.

Page 4: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Challenges: Complex movements

People move in complex ways, often communicating with subtle gestures

Page 5: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Challenges: Appearance, size and shape

People come in all shapes and sizes, with highly variable appearance.

Page 6: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Challenges: Appearance variability

Image appearance changes dramatically over time due to non-rigidity of body and clothing and lighting.

Page 7: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Challenges: Appearance variability

Image appearance changes dramatically over time due to non-rigidity of body and clothing and lighting.

Page 8: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Challenges: Context dependence

Perceived scene context influences object recognition.

[Courtesy of Antonio Torralba]

Page 9: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Challenges: Noisy and missing measurements

Ambiguities in pose are commonplace, due to background clutter apparent similarity of parts occlusions loose clothing …

Page 10: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Challenges: Depth and reflection ambiguities

image 3D model (camera view)

3D model (top view)

Multiple 3D poses may be consistent with a given image.

[courtesy of Cristian Sminchisescu]

Page 11: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Model-based pose tracking

Video input 3D articulated model

Page 12: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Outline

Introduction

Bayesian Filtering

Kinematic Motion Models

Discriminative Pose Estimation

Physics-Based Motion Models

Concluding remarks

Page 13: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Bayesian Filtering

State: n-vector comprising variables to be inferred:

- continuous variables [eg., position, velocity, shape, size, …]

- discrete state variables [eg., # objects, gender, activity, … ]

- state history:

Observations: data from which we estimate state:

- observation history: z1:t = (z1, ..., zt)

st

s1:t = (s1, ..., st)

zt = f(st)

Page 14: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Bayesian Filtering

Posterior distribution over states conditioned on observations

likelihood prior

independent of state

p(s1:t | z1:t)

Bayes’ rule:

p(s1:t | z1:t) =p(z1:t | s1:t) p(s1:t)

p(z1:t)

Filtering distribution: marginal posterior at current time

p(st | z1:t) =!

s1

!

st!1

p(s1:t | z1:t)

Page 15: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Simplifying model assumptions

1st-order Markov model for state dynamics:

likelihood at time τ

joint likelihood

sequence prior

one-step transition

prior

p(st | s1:t!1) = p(st | st!1)

so

p(s1:t) =

!

"t#

j=2

p(sj | sj!1)

$

% p(s1)

=t!

!=1

p(z! | s! )

Conditional independence of observations

p(z1:t | s1:t) = p(zt | st) p(z1:t!1 | s1:t!1)

Page 16: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Recursive form of filtering/posterior distribution

Filtering distribution:

likelihood

p(st | z1:t) =!

s1

· · ·!

st!1

p(s1:t | z1:t)

Prediction distribution (temporal prior):

p(st | z1:t!1) =!

st!1

p(st | st!1) p(st!1 | z1:t!1)

= c p(zt | st) p(st | z1:t!1)

Page 17: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Bayesian smoothing

Inverting the dynamics permits inference backwards in time:

Batch Algorithms (smoothing): Estimation of state sequences using the entire observation sequence (i.e., all past, present & future data):

- optimal and efficient but not always applicable

Online Algorithms (filtering): Casual estimation of occurs as observations become available, using present and past data only.

xt

p(s! | z! :t) = c p(z! | s! )!

s!+1

p(s! | s!+1) p(s!+1 | z!+1:t)

= c p(z! | s! ) p(s! | z!+1:t)

Smoothing distribution (forward-backward belief propagation):

p(s! | z1:t) =c

p(s! )p(z! | s! ) p(s! | z1:!!1) p(s! | z!+1:t)

Page 18: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Kalman filter

Assume linearity and Gaussianity for the observation and dynamical models:

st = A st!1 + !d !d ! N (0, Cd)

zt = M st + !m !m ! N (0, Cm)

Key Result: Prediction and filtering distributions are Gaussian, so they may be represented by sufficient statistics:

p(st | z1:t!1) =!

st!1

p(st | st!1) p(st!1 | z1:t!1) ! N (s!t , C!t )

p(st | z1:t) = c p(zt | st) p(st | z1:t!1) ! N (s+t , C+

t )

Page 19: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Depiction of filtering

posterior

predictionposterior

deterministic drift

stochastic diffusion

s s

ss

p(st!1 | z1:t!1)

p(st | z1:t!1)p(st | z1:t) incorporate data

zt

Page 20: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Kalman filter

First well-known uses in computer vision:

Road following by tracking lane markers [Dickmanns & Graefe, “Dynamic monocular machine vision.” Machine Vision and Applications, 1988]

Rigid structure from feature tracks under perspective projection [Broida et al., “Recursive estimation of 3D motion from monocular image sequence.” IEEE Trans. Aerosp. & Elec. Sys., 1990]

Page 21: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Multimodal likelihood functions

Measurement clutter and occlusion often cause multimodal likelihoods.

[Khan et al, CVPR ‘04]

Page 22: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Nonlinear dynamics

[Jepson et al, WSL Tracker, PAMI, 2001]

Object motion and interactions between objects often produce complex nonlinear dynamics (so Gaussianity is not preserved)

Page 23: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Approximate inference

Coping with multimodal, non-Gaussian distributions

Optimization (to find MAP solution)

- e.g., WSL tracker

Monte Carlo approximations

Page 24: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

WSL tracker

Goal: Tracking with precise alignment over long times

Problem: Changing appearance and unmodeled deformations

Key: Use ‘stable’ properties of appearance for tracking

Page 25: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

WSL tracker

Page 26: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Monte Carlo inference (Particle filters)

Approximate the filtering distribution using point samples:

By drawing a set of random samples from the filtering distribution, we could use samples statistics to approximate expectations

Let be a set of fair samples from distribution , then for functions

ES [f(s)] ! 1N

N!

j=1

f(s(j)) "# EP [f(s)]N!"

P(s)NS = {s(j)}f(s)

Problem: we don’t know how to draw samples from p(st | z1:t)

Page 27: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Importance sampling

weighted samples

Page 28: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Importance sampling

Weighted sample set S = {s(j), w(j)}

P(s)

Q(s) draw samples from a proposal distribution , with weights , then

s(j)

w(j) = w(s(j))

ES [f(s)] !N!

j=1

w(j)f(s(j)) "# EQ [w(s) f(s)]N!"

If then weighted sample statistics approximate expectations under , i.e.,

w(s) = P(s)/Q(s)

EQ [w(s) f(s)] =!

w(s) f(s)Q(s) ds

=!

f(s)P(s) ds

= EP [f(s)]

Page 29: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Particle filter

Simple particle filter approximates the filtering distribution by drawing samples from the prediction distribution:

p(st | z1:t) = c p(zt | st) p(st | z1:t!1)

w(s) =P(s)Q(s)

Q(s)P(s)

Page 30: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Particle filter

Simple particle filter approximates the filtering distribution by drawing samples from the prediction distribution:

[Gordon et al ’93; Isard & Blake ’98; Liu & Chen ’98, …]

p(st | z1:t) = c p(zt | st) p(st | z1:t!1)

posterior posteriortemporaldynamics

likelihood

sample sample normalize

p(zt | st)p(st | st!1) p(st | z1:t)p(st!1 | z1:t!1)

With resampling at each time step:

Page 31: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Particle filter

Given a weighted sample set , the prediction distribution is a mixture model

S = {s(j)t!1, w(j)

t!1}

p(st | z1:t!1) =N!

j=1

w(j) p(st | s(j)t!1)

To draw samples from it:- sample a component of the mixture by the treating weights as

mixing probabilities

- then sample from the associated dynamics pdf p(st | s(i)t!1)

sample1

01

Cumulative distribution of weights

u

i N

u!U(0, 1)

Page 32: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Particle filter

weighted sample set

re-sample & drift

diffuse & re-sample

compute likelihoods

weighted sample set

[Isard and Blake, IJCV ’98]

Page 33: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Lessons learned: Sampling efficiency

3D Kinematic Model(28D state: 22 joint angles, 6 global DOFs)

[Choo & Fleet, ICCV ‘01]

Page 34: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Likelihood and dynamics

Xj

sGiven the state, , and the articulated model, the 3D marker positions onto the 2D image plane:

dj(s) = Tj(Xj ; s)

Observation model:

d̂j = dj + !j , !j ! N (0;"2mI2)

where is isotropic Gaussian for translational & angular variables

Smooth dynamics:st = st!1 + !t

!t

Likelihood of observed 2D locations, :D = {d̂j}

p(D | s) ! exp

!

"" 12!2

m

#

j

|| d̂j " dj(s) ||2$

%

Page 35: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Performance

Estimator Variance: multiple runs with independent noise & sampling variance measured as MSE from ground truth (from MCMC)

Page 36: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Problem: Sampling efficiency

Number of samples needed depends on the effective volumes (entropies) of the prediction and posterior distributions.

With random sampling from the prediction density, the number of particles grow exponentially for samples to fall on states with high posterior. E.g., for D-dim spheres, with radii R and r,

effective number of ‘independent’ samples: Ne = 1/!

j

(w(j))2

Prediction

PosteriorN >

!R

r

"D

Page 37: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Hybrid Monte Carlo filter

Improved sampling through MCMC:

initialize a set of particles from a particle filter

select a subset from which to initiate MCMC with stochastic gradient search (hybrid Monte Carlo)

optimistic extrapolant

HMC filter particle filter

fixed variance

[Choo & Fleet, ICCV ‘01]

Page 38: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Mean estimates on independent trials

Black: Ground truth (at frame 10)Red: Mean state from 6 random trials

Particle Filter Hybrid MC Filter

Page 39: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Lessons learned: Effective proposals

If proposal and target distributions differ significantly, then:

most particle weights are near zero, and some modes get no samples, so the normalization constant c can be wildly wrong.

Prediction distributions make poor proposals: dynamics are often uncertain, and likelihoods are often peaked.

Q = p(st | z1:t!1)

Use the current observation to improve proposals.

Let be a continuous distribution obtained from some detector that yields target locations (e.g., a Gaussian mixture).

Then, just modify the proposal density and importance weights:

D(zt)

with w(st) =c p(zt | st)D(zt)

Q = D(zt) p(st | z1:t!1)

Page 40: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Lessons learned: Proper likelihoods

p(I | s) !!

y!Dfpf (I(y) | s)

!y!Db

pb(I(y))!

y pb(I(y))

=

!y!Df

pf (I(y) | s)!

y!Dbpb(I(y))

!y!Df

pb(I(y) | s)!

y!Dbpb(I(y))

="

y!Df

pf (I(y) | s)pb(I(y))

Do not compare states using different sets of observations. Explain the entire image or use likelihood ratios.

E.g., let pixels intensities, conditioned on state be independent where and are disjoint sets of foreground and background pixels, and and are the respective likelihood functions.

Divide by the background likelihood of all pixels (i.e., as if no target is present):

Df Db

pbpf

p(I | s)

Page 41: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Lessons Learned: Use the right state space

Despite the potential to approximate multimodal posteriors, tracking multi targets with a single target state space is unadvised.

Page 42: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Finding occlusion boundaries

Motion boundaries yield information about position and orientation of surface boundaries, and about relative surface depths

Page 43: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Finding occlusion boundaries

Estimation of smooth motion and occlusion boundaries on hybrid random fields with non-parametric Bayesian inference.

[Nestares and Fleet, CVPR ‘01]

Page 44: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Finding lips and lip reading

Probabilistic detection, tracking, and recognition of motion events in video, with learned models of image motion.

[Fleet, Black, Yacoob and Jepson, IJCV 2000]

Page 45: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Human pose tracking

Estimate the three-dimensional structure of people from video, with constraints on their shape, size, & motion.

[Sidenbladh, Black and Fleet, ECCV 2000]

Page 46: Human Pose Tracking I: Basicsfleet/courses/cifarSchool09/slidesFleet1.pdf · Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002. Blake, A., Visual tracking. In Mathematical

Selected references

Arulampalam, M. et al., A tutorial on particle filters for online nonlinear/non-Gaussian

Bayesian tracking. IEEE Trans Signal Processing 50(2), 2002.

Blake, A., Visual tracking. In Mathematical Models for Computer Vision, Paragios, Chen and Faugeras (ed), Springer 2005

Choo K. and Fleet, D., People tracking using Hybrid Monte Carlo filtering. Proc IEEE ICCV, 2001

Doucet, A. et al, On Sequential Monte Carlo sampling methods for Bayesian filtering. Stats and Computing 10, 2000

Gordon N. et al., Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings-F, 140(2), 1993

Isard M. & Blake A., Condensation: Conditional density propagation. IJCV 29, 1998

Jepson A. et al, Robust online appearance models for visual tracking. IEEE Trans PAMI 25(10), 2003

Khan Z. et al, A Rao-Blackwellized particle filter for Eigentracking. Proc IEEE CVPR 2004

Liu J and Chen, R. Sequential Monte Carlo methods for dynamic systems. JASA 93, 1998