The Laplacian Eigenmaps Latent Variable...

27
The Laplacian Eigenmaps Latent Variable Model with applications to articulated pose tracking Miguel ´ A. Carreira-Perpi ˜ n ´ an EECS, UC Merced http://faculty.ucmerced.edu/mcarreira-perpinan

Transcript of The Laplacian Eigenmaps Latent Variable...

The Laplacian EigenmapsLatent Variable Model

with applications to articulated pose tracking

Miguel A. Carreira-Perpinan

EECS, UC Merced

http://faculty.ucmerced.edu/mcarreira-perpinan

Articulated pose tracking

We want to extract the 3D pose of a moving person (e.g. 3Dpositions of several markers located on body joints) frommonocular video:

From the CMU motion-capture database, http://mo ap. s. mu.edu

Idea: model patterns of human motion using motion-capture data(also useful in psychology, biomechanics, etc.)

p. 1

Articulated pose tracking (cont.)

Some applications:

❖ recognising orientation (e.g. front/back),activities (running, walking. . . ), identity, sex

❖ computer graphics: rendering graphics model of the person(from different viewpoints)

❖ entertainment: realistic animation of cartoon characters inmovies and computer games

Difficult because:

❖ ambiguity of perspective projection 3D → 2D (depth loss)

❖ self-occlusion of body parts

❖ noise in image, clutter

❖ high-dimensional space of poses: this makes it hard to track(e.g. in a Bayesian framework)

p. 2

Articulated pose tracking (cont.)

❖ Pose = 3D positions of 30+ markers located on body joints:vector y ∈ R

D (D ∼ 100)

❖ Intrinsic pose x ∈ RL with L ≪ D:

✦ markers’ positions are correlated because of physicalconstraints (e.g. elbow and wrist are always ∼ 30 cm apart)

✦ so the poses y1,y2, . . . live in a low-dimensional manifoldwith dimension L ≪ D

p. 3

Articulatory inversion

The problem of recovering the sequence of vocal tract shapes (lips,tongue, etc.) that produce a given acoustic utterance.

?

?

artic

ulat

ory

confi

gura

tions

acoustic signal

A long-standing problem in speech research (but solvedeffortlessly by humans).

p. 4

Articulatory inversion (cont.)

Applications:

❖ speech coding

❖ speech recognition

❖ real-time visualisation of vocal tract (e.g. for speech productionstudies or for language learning)

❖ speech therapy (e.g. assessment of dysarthria)

❖ etc.

Difficult because:

❖ different vocal tract shapes can produce the same acoustics

❖ high-dimensional space of vocal-tract shapes—but, again,low-dimensional intrinsic manifold because of physicalconstraints

p. 5

Articulatory inversion (cont.)

Data collection:

❖ electromagnetic articulography (EMA) or X–ray microbeam:record 2D positions along midsagittal plane of several pelletslocated on tongue, lips, velum, etc.

X–ray microbeam database (U. Wisconsin) MOCHA database (U. Edinburgh & QMUC)

❖ Other techniques being developed: ultrasound, MRI, etc.

p. 6

Visualisation of blood test analytes

❖ One blood sample yields 20+ analytes = (glucose, albumin, Na+, LDL, . . . )

❖ The 2D map represents in different regions normal vs. abnormal samples.

❖ Extreme values of certain analytes are potentially associated with diseases(glucose: diabetes; urea nitrogen and creatinine: kidney disease; total bilirubin: liver).

all data Inpatients Outpatients Normal samples

glucose > 200 urea nitrogen > 50 creatinine > 2 total bilirubin > 5

p. 7

Visualisation of blood test analytes (cont.)

❖ The temporal trajectories (over a period of days) for different patientsindicate their evolution.

❖ Also useful to identify bad samples, e.g. due to machine malfunction.

Inpatient 345325 Outpatient 482892

Kazmierczak, Leen, Erdogmus & Carreira-Perpiñán, Clinical Chemistry and Laboratory Medicine 2007p. 8

Dimensionality reduction (manifold learning)

Given a high-dimensional data set Y = {y1, . . . ,yN} ⊂ RD,

assumed to lie near a low-dimensional manifold of dimensionL ≪ D, learn (estimate):

❖ Dimensionality reduction mapping F : y → x

❖ Reconstruction mapping f : x → y

x1

x2 x

Latentlow-dimensional space R

L

f

F y

F(y)

y1

y2

y3

f(x)

Observedhigh-dimensional space R

D

Manifold

p. 9

Dimensionality reduction (cont.)

Two large classes of nonlinear methods:

❖ Latent variable modelsProbabilistic, mappings, local optima, scale badly with dimension

❖ Spectral methodsNot probabilistic, no mappings, global optimum, scale well with dimension

They have developed separately so far, and have complementaryadvantages and disadvantages.

Our new method, LELVM, shares the advantages of both.

p. 10

Latent variable models (LVMs)

Probabilistic methods that learn a joint density model p(x,y) fromthe training data Y. This yields:

❖ Marginal densities p(y) =∫

p(y|x)p(x) dx and p(x)

❖ Mapping F(y) = E {x|y}

the mean of p(x|y) =p(y|x)p(x)

p(y)(Bayes’ th.)

❖ Mapping f(x) = E {y|x}

Several types:

❖ Linear LVMs: probabilistic PCA, factor analysis, ICA. . .

❖ Nonlinear LVMs:✦ Generative Topographic Mapping (GTM) (Bishop et al, NECO 1998)

✦ Generalised Elastic Net (GEN) (Carreira-Perpiñán et al 2005)

p. 11

Latent variable models (cont.)

Nonlinear LVMs are very powerful in principle:

❖ can represent nonlinear mappings

❖ can represent multimodal densities

❖ can deal with missing data

But in practice they have disadvantages:

❖ The objective function has many local optima, most of whichyield very poor manifolds

❖ Computational cost grows exponentially with latent dimensionL, so this limits L . 3Reason: need to discretise the latent space to compute p(y) =

R

p(y|x)p(x) dx

This has limited their use in certain applications.

p. 12

Spectral methods

❖ Very popular recently in machine learning: multidimensionalscaling, Isomap, LLE, Laplacian eigenmaps, etc.

❖ Essentially, they find latent points x1, . . . ,xN such thatdistances in X correlate well with distances in Y.Example: draw map of the US given city-to-city distances

distance(ym,yn) =

0 3.0 1.2 . . .

3.0 0 2.7 . . .

1.2 2.7 0 . . .

. . . . . . . . . . . . . . . . .

=⇒

x1 x2

x3

We focus on Laplacian eigenmaps.

p. 13

Spectral methods: Laplacian eigenmaps (Belkin & Niyogi,NECO 2003)

1. Neighborhood graph on dataset y1, . . . ,yN with weightededges wmn = exp

(− 1

2‖(ym − yn)/σ‖2 )

.

2. Set up quadratic optimisation problem

minX

tr(XLXT

)s.t. XDXT = I, XD1 = 0

X = (x1, . . . ,xN ), affinity matrix W = (wmn), degree matrix D = diag`

P

N

n=1wnm

´

, graph

Laplacian L = D − W.

Intuition: tr(XLXT

)= 1

2

n∼m wnm ‖xn − xm‖2 =⇒ place xn,

xm nearby if yn and ym are similar. The constraints fix thelocation and scale of X.

3. Solution: eigenvectors V = (v2, . . . ,vL+1) of D− 1

2WD− 1

2 , whichyield the low-dimensional points X = VTD− 1

2 .

p. 14

Spectral methods: Laplacian eigenmaps (cont.)

Example: Swiss roll (from Belkin & Niyogi, NECO 2003)

−10 −5 0 5 10 150

50

100

−15

−10

−5

0

5

10

15

High-dimensional space Low-dimensional spaceY = (y1, . . . ,yN) X = (x1, . . . ,xN)

p. 15

Spectral methods: Laplacian eigenmaps (cont.)

❖ Advantages:✦ No local optima (unique solution)✦ Yet succeeds with nonlinear, convoluted manifolds (if the

neighborhood graph is good)✦ Computational cost O(N3) or, for sparse graphs, O(N2)

✦ Can use any latent space dimension L (just use Leigenvectors)

❖ Disadvantages:✦ No mapping for points not in Y = (y1, . . . ,yN) or

X = (x1, . . . ,xN) (out-of-sample mapping)✦ No density p(x,y)

☞ What should the mappings and densities be for unseen points(not in the training set)?

p. 16

The Laplacian Eigenmaps Latent Variable Model

(Carreira-Perpiñán & Lu, AISTATS 2007)

Natural way to embed unseen points Yu = (yN+1, . . . ,yN+M)without perturbing the points Ys = (y1, . . . ,yN) previouslyembedded:

minXu∈RL×M

tr(

( Xs Xu )(

Lss LsuLus Luu

) (XT

sXT

u

))

That is, solve the LE problem but subject to keeping Xs fixed.Semi-supervised learning point of view: labelled data (Xs,Ys) (real-valued labels), unlabelled data

Yu, graph prior on Y = (Ys,Yu).

☞ Solution: Xu = −XsLsuL−1uu .

In particular, to embed a single unseen point y = Yu ∈ RD, we

obtain x = F(y) =∑N

n=1K((y−yn)/σ)

P

N

n′=1K((y−y

n′ )/σ)xn. This gives an

out-of-sample mapping F(y) for Laplacian eigenmaps.

p. 17

LELVM (cont.)

Further, we can define a joint probability model on x and y (thus aLVM) that is consistent with that mapping:

p(x,y) =1

N

N∑

n=1

Ky

(y − yn

σy

)

Kx

(x − xn

σx

)

p(y) =1

N

N∑

n=1

Ky

(y − yn

σy

)

p(x) =1

N

N∑

n=1

Kx

(x − xn

σx

)

F(y) =N∑

n=1

Ky((y − yn)/σy)∑N

n′=1 Ky((y − yn′)/σy)xn =

N∑

n=1

p(n|y)xn = E {x|y}

f(x) =N∑

n=1

Kx((x − xn)/σx)∑N

n′=1 Kx((x − xn′)/σx)yn =

N∑

n=1

p(n|x)yn = E {y|x}

The densities are kernel density estimates, the mappings areNadaraya-Watson estimators (all nonparametric).

p. 18

LELVM (cont.)

All the user needs to do is set:

❖ the graph parameters for Laplacian eigenmaps (as usual)

❖ σx, σy to control the smoothness of mappings & densities

Advantages: those of latent variable models and spectral methods:

❖ yields mappings (nonlinear, infinitely differentiable and basedon a global coordinate system)

❖ yields densities (potentially multimodal)

❖ no local optima

❖ succeeds with convoluted manifolds

❖ can use any dimension

❖ computational efficiency O(N3) or O(N2) (sparse graph)

Disadvantages: it relies on the success of Laplacian eigenmaps(which depends on the graph).

p. 19

LELVM example: spiral

Dataset: spiral in 2D; reduction to 1D.

σx = 5 · 10−5 σx = 1.5 · 10−4 σx = 2.5 · 10−4 σx = 4.5 · 10−4 σx = 1.5 · 10−3 GTM

p(x)−0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02 0.025

x−0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02 0.025

x−0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02 0.025

x−0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02 0.025

x−0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02 0.025

x

f(x)

−2 −1 0 1 2−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

−2 −1 0 1 2−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

−2 −1 0 1 2−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

−2 −1 0 1 2−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

−2 −1 0 1 2−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

−2 −1 0 1 2−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

σy = 0.05 σy = 0.1 σy = 0.15 σy = 0.2 σy = 0.25

p(y)

−2 −1 0 1 2−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

−2 −1 0 1 2−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

−2 −1 0 1 2−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

−2 −1 0 1 2−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

−2 −1 0 1 2−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

p. 20

LELVM example: motion-capture dataset

LELVM GTM

−0.03 −0.025 −0.02 −0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02−0.02

−0.015

−0.01

−0.005

0

0.005

0.01

0.015

0.02

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

GPLVM GPLVM with back-constraints

−1 −0.5 0 0.5 1

−1.5

−1

−0.5

0

0.5

1

1.5

−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8

−1.5

−1

−0.5

0

0.5

1

p. 21

LELVM example: mocap dataset (cont.)

Smooth interpolation (e.g. for animation):

−0.03 −0.025 −0.02 −0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02−0.02

−0.015

−0.01

−0.005

0

0.005

0.01

0.015

0.02

☞p. 22

LELVM example: mocap dataset (cont.)

Reconstruction of missing patterns (e.g. due to occlusion)using p(x|yobs) and the mode-finding algorithms of Carreira-Perpiñán, PAMI 2000, 2007:

−0.03 −0.025 −0.02 −0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02−0.02

−0.015

−0.01

−0.005

0

0.005

0.01

0.015

0.02

−0.03 −0.025 −0.02 −0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02−0.02

−0.015

−0.01

−0.005

0

0.005

0.01

0.015

0.02

−10

1

−2

−1

0

1

2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

−10

1

−2

−1

0

1

2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

yobs y

−10

1

−2

0

2−2

−1

0

1

2

−10

1

−2

0

2−2

−1

0

1

2

−10

1

−2

0

2−2

−1

0

1

2

yobs y (mode 1) y (mode 2)

p. 23

LELVM application: people tracking (Lu, Carreira-Perpiñán &Sminchisescu, NIPS 2007)

The probabilistic nature of LELVM allows seamless integration in aBayesian framework for nonlinear, nongaussian tracking (particlefilters):

❖ At time t: observation zt, unobserved state st = (dt,xt)rigid motion d, intrinsic pose x

❖ Prediction: p(st|z0:t−1) =∫

p(st|st−1) p(st−1|z0:t−1) dst−1

❖ Correction: p(st|z0:t) ∝ p(zt|st) p(st|z0:t−1)

We use the Gaussian mixture Sigma-point particle filter (v.d.Merwe &

Wan, ICASSP 2003).

❖ Dynamics: p(st|st−1) ∝

Gaussian︷ ︸︸ ︷

pd(dt|dt−1)

Gaussian︷ ︸︸ ︷

px(xt|xt−1)

LELVM︷ ︸︸ ︷

p(xt)

❖ Observation model: p(zt|st) given by 2D tracker with Gaussiannoise, and mapping from state to observations

x ∈ RL f (LELVM)−−−−−→ y ∈ R

3M −→ ⊕Perspective−−−−−−→ z ∈ R

2M

d ∈ R3

p. 24

LELVM application: people tracking (cont.)

❖ 3D mocap,smalltraining set

❖ 3D mocap,occlusion

❖ 3D mocap,front view

❖ CMU mocapdatabase

❖ Fred turning

❖ Fred walking

p. 25

LELVM: summary

❖ Probabilistic method for dimensionality reduction

❖ Natural, principled way of combining two large classes ofmethods (latent variable models and spectral methods),sharing the advantages of bothWe think it is asymptotically consistent (N → ∞).

Same idea applicable to out-of-sample extensions for LLE, Isomap, etc.

❖ Very simple to implement in practicetraining set + eigenvalue problem + kernel density estimate

❖ Useful for applications:✦ Priors for articulated pose tracking with multiple motions

(walking, dancing. . . ), multiple people✦ Low-dim. repr. of state spaces in reinforcement learning✦ Low-dim. repr. of degrees of freedom in humanoid robots✦ Visualisation of high-dim. datasets, with uncertainty

estimatesp. 26