Principal Manifolds and Probabilistic Subspaces for Visual Recognition Baback Moghaddam TPAMI, June...

Post on 13-Dec-2015

218 views 2 download

Tags:

Transcript of Principal Manifolds and Probabilistic Subspaces for Visual Recognition Baback Moghaddam TPAMI, June...

Principal Manifolds and Probabilistic Principal Manifolds and Probabilistic Subspaces for Visual RecognitionSubspaces for Visual Recognition

Baback MoghaddamBaback MoghaddamTPAMI, June 2002.TPAMI, June 2002.

John GaleottiJohn GaleottiAdvanced PerceptionAdvanced Perception

February 12, 2004February 12, 2004

It’s all about subspacesIt’s all about subspaces

Traditional subspacesTraditional subspaces PCAPCA ICAICA Kernel PCA (& neural network NLPCA)Kernel PCA (& neural network NLPCA)

Probabilistic subspacesProbabilistic subspaces

Linear PCALinear PCA

We already know thisWe already know thisMain propertiesMain properties

Approximate reconstructionApproximate reconstructionx x ≈ ≈ yy

Orthonormality of the basis Orthonormality of the basis TT=I=I

Decorrelated principal componentsDecorrelated principal componentsEyEyiiyyjji≠ji≠j = 0 = 0

Linear ICALinear ICA

Like PCA, but the components’ distribution is Like PCA, but the components’ distribution is designed to be sub/super Gaussian designed to be sub/super Gaussian statistical statistical independenceindependence

Main propertiesMain properties Approximate reconstructionApproximate reconstruction

x x ≈ ≈ AyAy NonorthogonalityNonorthogonality of the basis of the basis AA

AATTA≠IA≠I Near factorization of the joint distribution Near factorization of the joint distribution PP((yy))

PP(y)(y) ≈ ∏ ≈ ∏ pp(y(yii))

Nonlinear PCA (NLPCA)Nonlinear PCA (NLPCA)

AKA principal curvesAKA principal curvesEssentially nonlinear regressionEssentially nonlinear regressionFinds a curved subspace passing Finds a curved subspace passing

“through the middle of the data”“through the middle of the data”

Nonlinear PCA (NLPCA)Nonlinear PCA (NLPCA)

Main propertiesMain properties Approximate reconstructionApproximate reconstruction

yy = = ff((xx)) Nonlinear projectionNonlinear projection

x x ≈ g(≈ g(yy)) No prior knowledge regarding joint distribution of No prior knowledge regarding joint distribution of

the components (typical)the components (typical)PP((yy) = ?) = ?

Two main methodsTwo main methods Neural network encoderNeural network encoder Kernel PCA (KPCA)Kernel PCA (KPCA)

NLPCA neural network encoderNLPCA neural network encoder

Trained to match the output to the inputTrained to match the output to the inputUses a “bottleneck” layer to force a Uses a “bottleneck” layer to force a

lower-dimensional representationlower-dimensional representation

KPCAKPCA

Similar to kernel-based nonlinear SVMSimilar to kernel-based nonlinear SVMMaps data to a higher dimensional Maps data to a higher dimensional

space in which linear PCA is appliedspace in which linear PCA is applied Nonlinear input mappingNonlinear input mapping

(x):(x): NNLL, N<L, N<L Covariance is computed with dot-productsCovariance is computed with dot-products For economy, make For economy, make ((xx) implicit) implicit

kk((xxii,,xxjj) = ( ) = ( ((xxii) ) ((xxjj) )) )

KPCAKPCA

Does not require nonlinear optimizationDoes not require nonlinear optimization Is not subject to overfittingIs not subject to overfitting Requires no prior knowledge of network Requires no prior knowledge of network

architecture or number of dimensionsarchitecture or number of dimensions Requires the (unprincipled) selection of Requires the (unprincipled) selection of

an “optimal” kernel and its parametersan “optimal” kernel and its parameters

Nearest-neighbor recognitionNearest-neighbor recognition

Find labeled image most similar to N-dim input Find labeled image most similar to N-dim input vector using a suitable M-dim subspacevector using a suitable M-dim subspace

Similarity ex: Similarity ex: SS((II11,,II22) ) || ∆ || || ∆ ||-1-1,, ∆ = ∆ = II1 1 - - II22 Observation: Two types of image variationObservation: Two types of image variation

Critical:Critical: Images of Images of differentdifferent objects objects Incidental:Incidental: Images of Images of samesame object under object under

different lighting, surroundings, different lighting, surroundings, etc.etc.

Problem:Problem: Preceding subspace projections doPreceding subspace projections donot help distinguish variation typenot help distinguish variation typewhen calculating similaritywhen calculating similarity

Probabilistic similarityProbabilistic similarity

Similarity based on probability that Similarity based on probability that ∆∆ is is characteristic of incidental variationscharacteristic of incidental variations ∆∆ = image-difference vector (N-dim)= image-difference vector (N-dim) ΩΩII = incidental ( = incidental (intrapersonalintrapersonal) variations) variations ΩΩEE = critical ( = critical (extrapersonalextrapersonal) variations) variations

S Δ( ) = P ΩI Δ( ) =P Δ ΩI( )P ΩI( )

Δ ΩI( )P ΩI( ) + Δ ΩE( )P ΩE( )

Probabilistic similarityProbabilistic similarity

Likelihoods Likelihoods P(∆|Ω)P(∆|Ω) estimated using estimated using subspace density estimationsubspace density estimation

Priors Priors PP(Ω)(Ω) are set to reflect specific are set to reflect specific operating conditions (often uniform)operating conditions (often uniform)

Two images are of the same object if Two images are of the same object if P(ΩP(ΩII|∆) > P(Ω|∆) > P(ΩEE|∆) |∆) S(∆) S(∆) > 0.5 > 0.5

S Δ( ) = P ΩI Δ( ) =P Δ ΩI( )P ΩI( )

Δ ΩI( )P ΩI( ) + Δ ΩE( )P ΩE( )

Subspace density estimationSubspace density estimation

Necessary for each Necessary for each P(∆|Ω),P(∆|Ω), ΩΩ ΩΩII, , ΩΩEE Perform PCA on training-sets of ∆ for each Perform PCA on training-sets of ∆ for each ΩΩ

The covariance matrix (∑) will define a GaussianThe covariance matrix (∑) will define a Gaussian Two subspaces:Two subspaces:

FF = M-dimensional principal subspace of ∑ = M-dimensional principal subspace of ∑ FF = non-principal subspace orthogonal to = non-principal subspace orthogonal to FF

yyii = ∆ projected onto principal eigenvectors = ∆ projected onto principal eigenvectors ii = ranked eigenvalues = ranked eigenvalues

Non-principal eigenvalues are typically unknown Non-principal eigenvalues are typically unknown and are estimated by fitting a function of the form and are estimated by fitting a function of the form f f --

nn to the known eigenvalues to the known eigenvalues

Subspace density estimationSubspace density estimation

22(∆) = PCA residual (reconstruction error)(∆) = PCA residual (reconstruction error) = density in non-principal subspace= density in non-principal subspace

≈ ≈ average of (estimated) average of (estimated) FF eigenvalues eigenvalues P(∆|Ω) P(∆|Ω) is marginalized into each subspaceis marginalized into each subspace

Marginal density is exact in Marginal density is exact in FF Marginal density is approximate in Marginal density is approximate in FF

Efficient similarity computationEfficient similarity computation

After doing PCA, use a whitening transform to After doing PCA, use a whitening transform to preprocess the labeled images into single preprocess the labeled images into single coefficients for each of the principal subspaces:coefficients for each of the principal subspaces:

where where and V are matrices of the principal and V are matrices of the principal eigenvalues and eigenvectors of either ∑eigenvalues and eigenvectors of either ∑ II or ∑ or ∑EE

At run time, apply the same whitening transform At run time, apply the same whitening transform to the input imageto the input image

Efficient similarity computationEfficient similarity computation

The whitening transform reduces the marginal The whitening transform reduces the marginal Gaussian calculations in the principal subspaces Gaussian calculations in the principal subspaces FF to simple Euclidean distances to simple Euclidean distances

The denominators are easy to precomputeThe denominators are easy to precompute

Efficient similarity computationEfficient similarity computation

Further speedup can be gained by using a Further speedup can be gained by using a maximum likelihood (ML) rule instead of a maximum likelihood (ML) rule instead of a maximum a posteriori (MAP) rule:maximum a posteriori (MAP) rule:

Typically, ML is only a few percent less Typically, ML is only a few percent less accurate than MAP, but ML is twice as fastaccurate than MAP, but ML is twice as fast In general, In general, ΩΩEE seems less important than seems less important than ΩΩII

Similarity ComparisonSimilarity Comparison

Eig

enfa

ce (

PC

A)

Sim

ilar

ity P

rob abilistic Sim

ilarity

ExperimentsExperiments

21x12 low-res faces, aligned and normalized21x12 low-res faces, aligned and normalized 5-fold cross validation5-fold cross validation

~ 140 unique individuals per subset~ 140 unique individuals per subset No overlap of individuals between subsets to test No overlap of individuals between subsets to test

generalization performancegeneralization performance 80% of the data only determines subspace(s)80% of the data only determines subspace(s) 20% of the data is divided into labeled images and 20% of the data is divided into labeled images and

query images for nearest-neighbor testingquery images for nearest-neighbor testing Subspace dimensions = Subspace dimensions = dd = 20 = 20

Chosen so PCA ~ 80% accurateChosen so PCA ~ 80% accurate

ExperimentsExperiments

KPCAKPCA Empirically tweaked Gaussian, polynomial, Empirically tweaked Gaussian, polynomial,

and sigmoidal kernelsand sigmoidal kernels Gaussian kernel performed the best, so it Gaussian kernel performed the best, so it

is used in the comparisonis used in the comparisonMAPMAP

Even split of the 20 subspace dimensionsEven split of the 20 subspace dimensionsMMEE = M = MII = d/2 = 10 so that M = d/2 = 10 so that MEE + M + MII = 20 = 20

ResultsResults

Recognition accuracy (percent)Recognition accuracy (percent)

N-DimensionalNearest Neighbor(no subspace)

ResultsResults

Recognition accuracy vs subspace dimensionalityRecognition accuracy vs subspace dimensionality

Note: data split 50/50 fortraining/testing ratherthan using CV

ConclusionsConclusions

Bayesian matching outperforms all other tested Bayesian matching outperforms all other tested methods and even achieves ≈ 90% accuracy with methods and even achieves ≈ 90% accuracy with only 4 projections (2 for each class of variation)only 4 projections (2 for each class of variation)

Bayesian matching is an order of magnitude faster Bayesian matching is an order of magnitude faster to train than KPCAto train than KPCA

Bayesian superiority with higher resolution images Bayesian superiority with higher resolution images verified in independent US Army FERIT testsverified in independent US Army FERIT tests

Wow!Wow! You should use this You should use this

My resultsMy results

50% Accuracy50% Accuracy Why so bad?Why so bad?

I implemented all suggested approximationsI implemented all suggested approximations Poor data--hand registeredPoor data--hand registered Too little dataToo little data

Note: data split 50/50 fortraining/testing ratherthan using CV

My resultsMy results

My dataMy data

His dataHis data