Online Multi-camera Tracking with a Switiching State-Space Model

19
Online Multi-camera Tracking with a Switiching State-Space Model Wojciech Zajdel, A. Taylan Cemgil, and Ben KrÄose ICPR 2004

description

Online Multi-camera Tracking with a Switiching State-Space Model. Wojciech Zajdel, A. Taylan Cemgil, and Ben KrÄose ICPR 2004. Networks of Non-Overlapping Cameras. Yk={Ok, Dk} …. Yk={Ok, Dk} …. Yk={Ok, Dk} …. Yk={Ok, Dk} …. Yk={Ok, Dk} …. - PowerPoint PPT Presentation

Transcript of Online Multi-camera Tracking with a Switiching State-Space Model

Online Multi-camera Tracking with a Switiching State-Space

ModelWojciech Zajdel, A. Taylan Cemgil, and Ben KrÄose

ICPR 2004

Yk={Ok, Dk} ….

Yk={Ok, Dk}….

Ok = Description of observation (colour vector in this case) – (assumed noisy)

Dk = Camera no., time (assumed non-noisy)

Yk={Ok, Dk}….

Yk={Ok, Dk}….

Yk={Ok, Dk}….

Networks of Non-Overlapping Cameras

Appearance is a “Noisy Observation”

• Assume observed appearance is a random sample from some distribution of possible (probable) appearances of an object.

• Represent this as a latent variable with mean and covariance;Xk={mk,Vk}

• We have a prior (X) over the parameters of this model (a “Normal-Inverse Wishart distribution”)

Appearance Model

Tracking is Just Association

• Tracking is just associating our Ds (camera,time) with a particular object i.e.

{D1(n), D2(n),D3(n), …}

Defines a sequence of observations of “person n” over time.Also represent this information (redundantly) as;

Sk i.e. the label of person to which observation Yk is assigned

N.B. For K observations there is a maximum of K possible people! (i.e. we don’t know the people, but define potential people by each new observation)

But, how many actual people are there??

• We’ve said before the maximum for a sequence of K observations is K people.

• Ck is the actual number of trajectories (people); Ck<=K

• Related concept: Zk = index to last time person k was observed (can be NULL if first time person was observed)

Camera Network Topology

• Topology defines valid (or likely) paths through the network .. Defined (in a Markov like way) as:

P(Di+1(n) |Di

(n))

• i.e. the probability that observation Di+1 results from object n, given observation Di does.

• In this paper is uniform over possible paths (and 0 for impossible paths) .. But others have done more complex things.

A Predictive Model• Rather than search through the space of

possible associations and optimise some fitness measure, it is sometimes easier to define a predictive model and work backwards from the observations to estimate probabilities over the association variables;

Hk = {Sk, Ck, Zk(1),….. Zk

(k)} … i.e. the association variables

Association Variables

Appearance Distributions

Observations

Tracking as Filtering

• Once we have a predictive model we can “filter” data on a predict;observe;update cycle.

• This is (in some sense) an alternative to searching through possible latent variable values to maximise the posterior probability (e.g. MCMC as introduced by Krishna)…

• Only usually tractable under simplifying conditions, e.g.

Kalman filter; Gaussian probabilitiesParticle Filter; Probabilities represented as a finite number of samples

Predictive Model: Predict StepPredictive density (i.e. without considering Yk, the latest observation):

Current associations given previous; defined a-priori from topology

Joint probability of latent variables (i.e. the unknowns)*

(*possibly should be conditioned on past observations?)

Current appearance given previous appearance and associations; defined based on appearance of a person not changing and sampling new people from a prior

From the previous iteration(N.B. t0 is easy as there are no people)

Filtered density

Prediction (from previous slide) i.e. probability of latent variables

Probability of observation given latent parameters(i.e. associations + appearances)

Normalising factor

Probability of latent variables, given the current observation

N.B. Latent variables H are discrete, whereas the variables X are continuous

BUT: Result is a mixture of O(k!) density functions => intractable

How to filter??

• If all latent variables were discrete (which they are not) we could maintain probabilities for all combinations of latent variable values (but this might be a lot!)

• We could use something like a particle filter to approximate the densities (others have done this, but this is not what these guys have done)

Their Solution• Reformulate the filtered density using an

approximation that is more tractable

i.e. rather than maintaining a distribution over all of H (the possible associations .. Quite a big set potentially) a set of simpler distributions are maintained over S/C/Z at the current step (remember S= label of Xk, C=no. of trajectories, Z=last instance time).

The product of these simpler distributions approximates the true filtered density

Can go back to the original problem (i.e. estimating the complete H) by finding the product of marginals (more later!)

Labels and count at current step (discrete)

Appearance(continuous .. But assume “simple” distribution)

Time of last observation of k(discrete)

Their Solution – Presented Differently (Technical Report

Version)ftp://ftp.wins.uva.nl/pub/computer-systems/aut-sys/reports/IAS-UVA-04-03.pdf

(same thing, slightly different notation)

NB. Appearance conditioned on theta (same form as parameters of the prior on appearance .. An “Inverse Wishart density” )

An Aside: Marginals and Product of Marginals

• Imagine a joint density over 2 variables x and y p(x,y)

X Y

P(X,Y)

• If variables x and y are (reasonably) independent, then we can “marginalise” over one of the variables (or the other) by summing over all values.

X Y

P(X) P(Y)

n*m bins

n+m bins

=> We’ve removed the dependency & work with them separately…

• We can then go back to the original representation by taking the product for each pair of values of x and y:

• P(x,y) = p(x)*p(y)

Marginals and Product of Marginals

Results

• Method compared to;i) MCMC (similar idea to Krishnas presentation last week)ii) Multiple Hypothesis Tracking (i.e. a hypothesis pruning based method)

• It does better (others over-estimate no. of trajectories)

Drawbacks .. And solution

• K grows with number of observations and memory usage O(k2) .. Although complexity is only O(k) [I think]

• Pruning is used to keep this down (removing the least likely to be a trajectory end point)

Summary

There is more than one way to skin a cat;

• ADF (this paper) – Approximating the problem, solving exactly

• MCMC – Exact problem, but approximating the solution (stochastic)

• MHT - Exact problem, but approximating the solution (via hypothesis pruning)