Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

59
7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 1/59 Outline 1 Introduction Feature Learning Correspondence in Computer Vision Relational feature learning 2 Learning relational features Sparse Coding Review Encoding relations Inference Learning 3 Factorization, eigen-spaces and complex cells Factorization Eigen-spaces, energy models, complex cells 4 Applications Applications Conclusions Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 33 / 174

Transcript of Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Page 1: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 1/59

Outline

1 IntroductionFeature LearningCorrespondence in Computer VisionRelational feature learning

2 Learning relational featuresSparse Coding Review

Encoding relationsInferenceLearning

3 Factorization, eigen-spaces and complex cellsFactorizationEigen-spaces, energy models, complex cells

4 ApplicationsApplicationsConclusions

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 33 / 174

Page 2: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 2/59

Outline

1 IntroductionFeature LearningCorrespondence in Computer VisionRelational feature learning

2 Learning relational featuresSparse Coding Review

Encoding relationsInferenceLearning

3 Factorization, eigen-spaces and complex cellsFactorizationEigen-spaces, energy models, complex cells

4 ApplicationsApplicationsConclusions

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 34 / 174

Page 3: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 3/59

Modeling data with latent variables

z

yf (

z

)g(y )

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 35 / 174

Page 4: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 4/59

Sparse Coding Review

× z5

× z4y

× z3

× z2

× z1

Model an image-patch as the superposition of basis functions , or“lters ”:y =

k

W ·k zk , y j =k

w jk zk

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 36 / 174

Page 5: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 5/59

Feature Learning Graphical Model

y j

z

zk

y

w jk

yα j =

k

w jk zαk

Synthesis model

Parameters w jk connect pixels y j with code components zk

Dimensionality of z can be smaller, larger, or same as y

When the dimensionality is the same or larger, then z must beconstrained , eg. by forcing it to be sparse .

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 37 / 174

Page 6: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 6/59

Feature Learning Graphical Model

y j

z

zk

y

w jk

yα j =

k

w jk zαk

Learning

Given data-set y 1 , . . . , y N , adapt parameters W , inferringz 1 , . . . , z N along the way.Unsupervised learning.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 37 / 174

Page 7: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 7/59

Feature Learning Graphical Model

y j

z

zk

y

w jk

yα j =

k

w jk zαk

Learning

For example

minW, z 1 ,..., z N

1N

α

y α −k

zαk W .k 2 + λ

k

|zαk |

Alternating between W and all zα

.Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 37 / 174

Page 8: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 8/59

Feature Learning Graphical Model

y j

zzk

y

w jk

yα j =

k

w jk zαk

Inference (“Analysis”)Given new image y , compute z .This is how we do recognition.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 37 / 174

Page 9: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 9/59

Feature learning models

y j

z

zk

y

w jk

y j =k

w jk zk

Many Variants

Probabilistic vs. Non-probabilistic;Directed vs. undirected;Mixture vs.factorial vs. non-symmetric

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 38 / 174

Page 10: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 10/59

Feature learning models

y j

w jk

zk

by j

y

z

bzk

y j =k

w jk zk + by j

In practice: add bias terms.But we drop these for now to avoid clutter.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 38 / 174

Page 11: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 11/59

Feature learning models

y j

z

zk

y

w jk

y j =k

w jk zk

Some sparse coding models make inference easy:

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 38 / 174

Page 12: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 12/59

Feature learning models

y j

z

zk

y

w jk p(y j |z ) = sigmoidk

w jk zk

p(zk |y ) = sigmoid

j

w jk y j

Restricted Boltzmann machine (RBM)

p(y , z ) = 1Z exp jk w jk y j zk

Contrastive Divergence learning (Hebbian-style learning)

Inference: p(zk |y ) = sigmoid j w jk y j

Further advantage: Allows for stacking (d e ep le a rning).Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 38 / 174

Page 13: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 13/59

Feature learning models

y j

z

zk

y

w jk p(y j |z ) = sigmoidk

w jk zk

p(zk |y ) = sigmoid

j

w jk y j

Restricted Boltzmann machine (RBM)

p(y , z ) = 1Z exp jk w jk y j zk

Contrastive Divergence learning (Hebbian-style learning)

Inference: p(zk |y ) = sigmoid j w jk y j

Further advantage: Allows for stacking (d e ep le a rning).Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 38 / 174

Page 14: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 14/59

Feature learning models

zk

y j

wkj

a jk

y

y

y j

zk = sigmoid j

a jk y j

y j =k

w jk zk

AutoencoderAdd inference parameters A , and set z = sigmoid ( Ay )Learning: min W,A α y α − W sigmoid ( Ay α ) 2

Add a sparsity penalty for z , or corrupt inputs during training (Vincent et al., 2008)

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 38 / 174

Page 15: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 15/59

Feature learning models

y j

z

y

zk

w jk

y j =k

w jk zk

Independent Components Analysis (ICA)Learning: Make responses sparse, while keeping lters sensible

minW

W T y 1

s.t . W T W = I

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 39 / 174

Page 16: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 16/59

Feature learning summary

z (y ) = W T y

y (z ) = W z

y j

z

zk

y

w jk

Linear inference summaryA lot of methods dene inference through linear dependenciesbetween y and z .PCA, ICA, Restricted Boltzmann Machine, Autoencoder, Mixtureof Gaussians, KMeans, ...

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 40 / 174

l

Page 17: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 17/59

Feature learning summary

z (y ) = W T y

y (z ) = W z

y j

z

zk

y

w jk

Feature learning summary

Almost all methods yield Gabor lters when trained on naturalimages.Almost all based on the same rationale:Tease apart the hidden causes of variability in the data.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 40 / 174

O li

Page 18: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 18/59

Outline

1 IntroductionFeature LearningCorrespondence in Computer VisionRelational feature learning

2 Learning relational featuresSparse Coding Review

Encoding relationsInferenceLearning

3 Factorization, eigen-spaces and complex cellsFactorizationEigen-spaces, energy models, complex cells

4 ApplicationsApplicationsConclusions

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 41 / 174

S di f i i ?

Page 19: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 19/59

Sparse coding of images pairs?

?

x y

z

How to extend sparse coding to model relations?Sparse coding on the concatenation ?

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 42 / 174

S di f i i ?

Page 20: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 20/59

Sparse coding of images pairs?

?

x y

z

How to extend sparse coding to model relations?Sparse coding on the concatenation ?

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 42 / 174

S di g f i g i ?

Page 21: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 21/59

Sparse coding of images pairs?

?

x y

z

How to extend sparse coding to model relations?Sparse coding on the concatenation ?

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 42 / 174

Sparse coding on the concatenation ?

Page 22: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 22/59

Sparse coding on the concatenation ?

A case study: Translations of binary, one-d images.Suppose images are random and can change in one of threeways:

Example Image x : Possible Imagey :

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 43 / 174

Sparse coding on the concatenation ?

Page 23: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 23/59

Sparse coding on the concatenation ?

A hidden variable that collects evidence for a shift to the right.What if the images are random or noisy?Can we pool over more than one pixel?

zk

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 44 / 174

Sparse coding on the concatenation ?

Page 24: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 24/59

Sparse coding on the concatenation ?

A hidden variable that collects evidence for a shift to the right.What if the images are random or noisy?Can we pool over more than one pixel?

zk

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 44 / 174

Sparse coding on the concatenation ?

Page 25: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 25/59

Sparse coding on the concatenation ?

A hidden variable that collects evidence for a shift to the right.What if the images are random or noisy?Can we pool over more than one pixel?

zk

?

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 44 / 174

Sparse coding on the concatenation ?

Page 26: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 26/59

Sparse coding on the concatenation ?

Obviously not, because now the hidden unit would get equallyhappy if it would see the non-shift (second pixel from the left).The problem: Hidden variables act like OR-gates, that accumulateevidence.

zk

?

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 45 / 174

Cross-products

Page 27: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 27/59

Cross products

Intuitively, it seems, what we need instead are logical ANDs, whichcan represent coincidences (eg. Zetzsche et al., 2003, 2005).This amounts to using the outer product L := outer( x , y ):

We can unroll this matrix, and let this be the data:zk

wijk

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 46 / 174

Cross-products

Page 28: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 28/59

Cross products

In the shift-example, every component L ij of the outer-productmatrix will constitute evidence for exactly one type of shift.Hiddens pool over products of pixels.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 47 / 174

Cross-products

Page 29: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 29/59

Cross products

In the shift-example, every component L ij of the outer-productmatrix will constitute evidence for exactly one type of shift.Hiddens pool over products of pixels.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 47 / 174

Cross-products

Page 30: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 30/59

Cross products

In the shift-example, every component L ij of the outer-productmatrix will constitute evidence for exactly one type of shift.Hiddens pool over products of pixels.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 47 / 174

A family of manifolds

Page 31: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 31/59

y

y

x

An different perspective:

Feature learning reveals the (local) manifold structure in data.When y is a transformed version of x , we can still think of y asbeing conned to a manifold, but it will be a conditional manifold .Idea: Learn a model for y , but let parameters be a function of x .

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 48 / 174

Three-way interactions

Page 32: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 32/59

y

y j

x i

x y

z

zk

If we use a linear function, we have w jk (x ) = i wi jk x i .Inference turns into:

zk = j

w jk y j = j i

wijk x i y j =ij

wijk x i y j

Hidden units are a bilinear function of the two input images.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 49 / 174

Conditional sparse coding

Page 33: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 33/59

p g

y j

x i

x y

z

zk

This is a feature learning model, whose parameters are

modulated by inputs.So this is a conditional feature learning model.(Tenenbaum, Freeman; 2000), (Grimes, Rao; 2005), (Ohlshausen;2007), (Memisevic, Hinton; 2007)

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 50 / 174

An alternative visualization

Page 34: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 34/59

y jx i

zk

z

x y

Each hidden variable can blend in one slice W ··k of the parametertensor.Each slice does linear regression in “pixel space”.So for binary hiddens, this is a mixture of 2K image warps .

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 51 / 174

Outline

Page 35: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 35/59

1 IntroductionFeature Learning

Correspondence in Computer VisionRelational feature learning

2 Learning relational featuresSparse Coding ReviewEncoding relationsInferenceLearning

3 Factorization, eigen-spaces and complex cellsFactorizationEigen-spaces, energy models, complex cells

4 ApplicationsApplicationsConclusions

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 52 / 174

Inference

Page 36: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 36/59

y j

x i

x y

z

zk

Given any two sets of variables, it is easy to infer the third.As a result, inference is basically the same as in any standardsparse coding model.(Graph is tri-partite, sparse coding bi-partite.)

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 53 / 174

Inference

Page 37: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 37/59

y j

x i

x y

z

zk

Inferring z (given x and y )Infer the transformation z by modulating parameters linearly:

zk = j

w jk y j =k i

wijk x i y j =ij

wijk x i y j

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 53 / 174

Inference

Page 38: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 38/59

y j

x i

x y

z

zk

Inferring z (given x and y )The meaning of z : The transformation that takes x to y (or viceversa).

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 53 / 174

Inference

Page 39: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 39/59

y j

x i

x y

z

zk

Inferring y (given x and z )For y , we have

y j =k

w jk zk =k i

wijk x i zk =ik

wijk x i zk

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 53 / 174

Inference

Page 40: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 40/59

y j

x i

x y

z

zk

Inferring y (given x and z )The meaning of y : “x transformed according to z ”.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 53 / 174

Inference

Page 41: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 41/59

y j

x i

x y

z

zk

Inference can mean various other things in addition.For example, given x and y , how likely are these to cometogether?More on this type of inference later.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 54 / 174

Outline

Page 42: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 42/59

1 IntroductionFeature Learning

Correspondence in Computer VisionRelational feature learning2 Learning relational features

Sparse Coding ReviewEncoding relationsInferenceLearning

3 Factorization, eigen-spaces and complex cellsFactorization

Eigen-spaces, energy models, complex cells4 Applications

ApplicationsConclusions

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 55 / 174

Learning

Page 43: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 43/59

y j

x i

x y

z

zk

Training data are now pairs (x α , y α ) – the points we want torelate.

The parameter-gating relation shows that one way to train thismodel is:

Conditional sparse codingPredict y from x , inferring z along the way as usual.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 56 / 174

Conditional sparse coding

Page 44: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 44/59

y j

x i

x y

z

zk

The cost that data-case (x α , y α ) contributes is:

jyα j −

ikwijk x αi zαk )2

Differentiating with respect to wijk just like before.Inference is still linear wrt. parameters.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 57 / 174

Conditional sparse coding

Page 45: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 45/59

Conditional sparse coding is predictive coding :We model the next time frame, given the previous one.

Inference then provides an encoding of the transformation.This is often a sensible strategy, but not always as we shall see.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 58 / 174

Example: Gated Boltzmann machine

Page 46: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 46/59

y j

x i

x y

z

zk

For a restricted Boltzmann machine, this amounts to changing theenergy function into a three-way energy (Memisevic, Hinton;2007):

E (x , y , z ) =ijk

wijk x i y j zk

Then p(y , z |x ) = 1Z (x ) exp E (x , y , z ) ,

Z (x ) = y ,z exp E (x , y , z )Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 59 / 174

Example: Gated Boltzmann machine

Page 47: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 47/59

y j

x i

x y

z

zk

For a restricted Boltzmann machine, this amounts to changing theenergy function into a three-way energy (Memisevic, Hinton;2007):

E (x , y , z ) =ijk

wijk x i y j zk

Then p(y , z |x ) = 1Z (x ) exp E (x , y , z ) ,

Z (x ) = y ,z exp E (x , y , z )Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 59 / 174

Example: Gated auto-encoder

Page 48: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 48/59

x izzk

y j y

y j

x

Similar for autoencoders.

Both, encoder and decoder weights turn into functions of x .Learning the same as in a standard auto-encoder modeling y .The model is still a DAG, so back-prop works exactly like in astandard autoencoder.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 60 / 174

Other examples

Page 49: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 49/59

(Grimes, Rao; 2005): Bi-linear sparse coding(Ohlshausen et al.; 2007): Conditional, bi-linear sparse coding(Luecke, et al.; 2007): Neurally inspired control unit networks

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 61 / 174

Toy example: Conditionally trained “Hiddenow elds”

Page 50: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 50/59

ow-elds

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 62 / 174

Toy example: Conditionally trained “Hiddenow elds” inhibitory connections

Page 51: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 51/59

ow-elds , inhibitory connections

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 63 / 174

Toy example: Learning optical ow

Page 52: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 52/59

x testx y z y (x test , z )

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 64 / 174

“Combinatorial owelds”

Page 53: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 53/59

x testx y z y (x test , z )

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 65 / 174

Joint training

Page 54: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 54/59

y jx i

zk

z

x y

Conditional training makes it hard to answer questions like:“How likely are the given images transforms of one another?”To answer questions like these, we require a joint image model, p(x , y |z ), given mapping units.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 66 / 174

Joint training

Page 55: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 55/59

y jx i

zk

z

x y

E (x , y , z ) =ijk

wijk x i y j zk

p(x , y , z ) =1

Z exp E (x , y , z )

Z =x ,y ,z

exp E (x , y , z )

(Susskind et al., 2011): Three-way Gibbs sampling in a GatedBoltzmann Machine.Can apply this to matching tasks (more later).

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 67 / 174

Joint training

Page 56: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 56/59

x izzk

y j y

y j

x

For the auto-encoder, there is a simple hack:

Add up two conditional costs: j yα

j − ik wijk x αi zα

k )2+ i x αi − jk wijk yα

j zαk )2

This forces parameters to be able to transform in both directions.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 68 / 174

Joint training

Page 57: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 57/59

y jzzk

x i x

x i

y

x

For the auto-encoder, there is a simple hack:

Add up two conditional costs: j yα

j − ik wijk x αi zα

k )2+ i x αi − jk wijk yα

j zαk )2

This forces parameters to be able to transform in both directions.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 68 / 174

Pool over products

Page 58: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 58/59

Take-home messageTo gather evidence for a transformation,

let each hidden unit pool over products of input-components.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 69 / 174

Some references

Page 59: Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

7/31/2019 Mvfl_part2 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

http://slidepdf.com/reader/full/mvflpart2-cvpr2012-spatio-temporal-and-higher-order-feature-learning 59/59

y jx i

zk

z

x y

(Hinton; 1981), (v.d. Malsburg; 1981)(Grimes, Rao; 2005): Bi-linear sparse coding.(Tenenbaum, Freeman; 2000), (Grimes, Rao; 2005), (Ohlshausen;2007), (Memisevic, Hinton; 2007), (Susskind, et al., 2011)(Zetzsche et al.; 2003, 2005)

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 70 / 174