Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

58
Multiview Feature Learning Roland Memisevic Uni Frankfurt Tutorial at CVPR 2012 Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 1 / 174

Transcript of Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Page 1: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Multiview Feature Learning

Roland Memisevic

Uni Frankfurt

Tutorial at CVPR 2012

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 1 / 174

Page 2: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Higher-order Feature Learning

http://www.cs.toronto.edu/~rfm/

multiview-feature-learning-cvpr/index.html

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 2 / 174

Page 3: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Outline

1 IntroductionFeature LearningCorrespondence in Computer VisionRelational feature learning

2 Learning relational featuresSparse Coding ReviewEncoding relationsInferenceLearning

3 Factorization, eigen-spaces and complex cellsFactorizationEigen-spaces, energy models, complex cells

4 ApplicationsApplicationsConclusions

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 3 / 174

Page 4: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Outline

1 IntroductionFeature LearningCorrespondence in Computer VisionRelational feature learning

2 Learning relational featuresSparse Coding ReviewEncoding relationsInferenceLearning

3 Factorization, eigen-spaces and complex cellsFactorizationEigen-spaces, energy models, complex cells

4 ApplicationsApplicationsConclusions

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 4 / 174

Page 5: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

What this is about

Extending feature learning to model relations.“Bi-linear models”, “energy-models”, “complex cells”,“spatio-temporal features”, “covariance features”, “bi-linearclassification”, “mapping units”, “quadrature features”, “gatedBoltzmann machine”, “mcrbm”, ...

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 5 / 174

Page 6: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Recognition tasks

Recognition has become a focus of interest in Computer Vision.Recognition of static objects started to work very well. In fact –Recognition is getting quite serious.

(PASCAL challenge)

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 6 / 174

Page 7: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

“It’s the feature, stupid!”

A main reason is the use of the right representation:Recognition started to work after the community converged onlocal features, like SIFT.With the right representation, the choice of top level classifier(SVM, logreg, NN) doesn’t matter all that much.

(PASCAL challenge)

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 7 / 174

Page 8: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Recognition with local features

Task: Recognize the building.Two approaches:

1 Bag-Of-Features2 Convolution

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 8 / 174

Page 9: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Bag-Of-Features

Bag-Of-Features1 Find interest points (AKA keypoints).2 Crop patches around interest points.3 Represent each patch with a sparse local descriptor (“features”).4 Add all local descriptors to obtain a global descriptor for the

image.Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 9 / 174

Page 10: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Bag-Of-Features

Bag-Of-Features1 Find interest points (AKA keypoints).2 Crop patches around interest points.3 Represent each patch with a sparse local descriptor (“features”).4 Add all local descriptors to obtain a global descriptor for the

image.Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 9 / 174

Page 11: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Bag-Of-Features

f1

fn

Bag-Of-Features1 Find interest points (AKA keypoints).2 Crop patches around interest points.3 Represent each patch with a sparse local descriptor (“features”).4 Add all local descriptors to obtain a global descriptor for the

image.Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 9 / 174

Page 12: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Bag-Of-Features

fM1

fMnf1n

f11

Bag-Of-Features1 Find interest points (AKA keypoints).2 Crop patches around interest points.3 Represent each patch with a sparse local descriptor (“features”).4 Add all local descriptors to obtain a global descriptor for the

image.Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 9 / 174

Page 13: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Bag-Of-Features

fM1

fMnf1n

f11

Bag-Of-Features1 Find interest points (AKA keypoints).2 Crop patches around interest points.3 Represent each patch with a sparse local descriptor (“features”).4 Add all local descriptors to obtain a global descriptor for the

image.Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 9 / 174

Page 14: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Convolutional

Convolutional1 Crop patches along a regular grid (dense or not).2 Represent each patch with a local descriptor.3 Concatenate all descriptors into a very large vector.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 10 / 174

Page 15: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Convolutional

Convolutional1 Crop patches along a regular grid (dense or not).2 Represent each patch with a local descriptor.3 Concatenate all descriptors into a very large vector.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 10 / 174

Page 16: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Convolutional

Convolutional1 Crop patches along a regular grid (dense or not).2 Represent each patch with a local descriptor.3 Concatenate all descriptors into a very large vector.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 10 / 174

Page 17: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Classification

high-rise

?

cathedral

f2

f1

When images are represented as points in Rn, we can use asimple classifier to do recognition.Eg., Logistic regression, SVM, NN, ...(There are various extensions, like fancy pooling, etc.)

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 11 / 174

Page 18: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Feature Learning

f1

fn

How do we get good features?Option B: Engineer them. SIFT, HOG, LBP, etc.Natural Images are not nandom.Option A: Learn the representation from image data.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 12 / 174

Page 19: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Feature Learning

f1

fn

How do we get good features?Option B: Engineer them. SIFT, HOG, LBP, etc.Natural Images are not nandom.Option A: Learn the representation from image data.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 12 / 174

Page 20: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Feature Learning

f1

fn

How do we get good features?Option B: Engineer them. SIFT, HOG, LBP, etc.Natural Images are not nandom.Option A: Learn the representation from image data.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 12 / 174

Page 21: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Feature Learning

f1

fn

How do we get good features?Option B: Engineer them. SIFT, HOG, LBP, etc.Natural Images are not nandom.Option A: Learn the representation from image data.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 12 / 174

Page 22: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Why Feature Learning

f1

fn

Feature Learning, Dictionary Learning, Receptive FieldLearning, Sparse Coding, etc.

Helps overcome tedious engineering.Helps adapt models to different data domains (including a model’sown representations! – “deep learning”)Biologically consistent.Brings us closer to end-to-end learning of vision systems.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 13 / 174

Page 23: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Feature Learning Works

(CIFAR) (NORB)

More importantly... it works wellSee, eg., (Coates, et al., 2011)

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 14 / 174

Page 24: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Feature Learning

yj

z

zk

y

wjk

Feature LearningEncode patch y using latent variables z.Learn weights W from training data of image patches.

Feature Learning works well for recognition, so...

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 15 / 174

Page 25: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Feature Learning

yj

z

zk

y

wjk

Feature LearningEncode patch y using latent variables z.Learn weights W from training data of image patches.

Feature Learning works well for recognition, so...

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 15 / 174

Page 26: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Outline

1 IntroductionFeature LearningCorrespondence in Computer VisionRelational feature learning

2 Learning relational featuresSparse Coding ReviewEncoding relationsInferenceLearning

3 Factorization, eigen-spaces and complex cellsFactorizationEigen-spaces, energy models, complex cells

4 ApplicationsApplicationsConclusions

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 16 / 174

Page 27: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Beyond object recognition

Can we do more with Feature Learning than recognize things?

Good features work well for object recognition.But brains can do much more than recognize objects.A large number of vision tasks goes beyond object recognition.In surprisingly many vision tasks, the relationship between imagescarries the relevant information

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 17 / 174

Page 28: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Beyond object recognition

Can we do more with Feature Learning than recognize things?

Good features work well for object recognition.But brains can do much more than recognize objects.A large number of vision tasks goes beyond object recognition.In surprisingly many vision tasks, the relationship between imagescarries the relevant information

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 17 / 174

Page 29: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Correspondences in Computer Vision

Correspondence is one of the most ubiquitous problems inComputer Vision.

Some correspondence tasks in VisionTrackingStereoGeometryOptical FlowInvariant RecognitionOdometryAction RecognitionContours, Within-image structure

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 18 / 174

Page 30: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Correspondences in Computer Vision

Correspondence is one of the most ubiquitous problems inComputer Vision.

Some correspondence tasks in VisionTrackingStereoGeometryOptical FlowInvariant RecognitionOdometryAction RecognitionContours, Within-image structure

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 18 / 174

Page 31: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Correspondences in Computer Vision

Correspondence is one of the most ubiquitous problems inComputer Vision.

Some correspondence tasks in VisionTrackingStereoGeometryOptical FlowInvariant RecognitionOdometryAction RecognitionContours, Within-image structure

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 18 / 174

Page 32: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Correspondences in Computer Vision

Correspondence is one of the most ubiquitous problems inComputer Vision.

Some correspondence tasks in VisionTrackingStereoGeometryOptical FlowInvariant RecognitionOdometryAction RecognitionContours, Within-image structure

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 18 / 174

Page 33: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Correspondences in Computer Vision

Correspondence is one of the most ubiquitous problems inComputer Vision.

Some correspondence tasks in VisionTrackingStereoGeometryOptical FlowInvariant RecognitionOdometryAction RecognitionContours, Within-image structure

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 18 / 174

Page 34: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Correspondences in Computer Vision

Correspondence is one of the most ubiquitous problems inComputer Vision.

Some correspondence tasks in VisionTrackingStereoGeometryOptical FlowInvariant RecognitionOdometryAction RecognitionContours, Within-image structure

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 18 / 174

Page 35: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Correspondences in Computer Vision

Correspondence is one of the most ubiquitous problems inComputer Vision.

Some correspondence tasks in VisionTrackingStereoGeometryOptical FlowInvariant RecognitionOdometryAction RecognitionContours, Within-image structure

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 18 / 174

Page 36: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Correspondences in Computer Vision

Correspondence is one of the most ubiquitous problems inComputer Vision.

Some correspondence tasks in VisionTrackingStereoGeometryOptical FlowInvariant RecognitionOdometryAction RecognitionContours, Within-image structure

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 18 / 174

Page 37: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Correspondences in Computer Vision

Correspondence is one of the most ubiquitous problems inComputer Vision.

Some correspondence tasks in VisionTrackingStereoGeometryOptical FlowInvariant RecognitionOdometryAction RecognitionContours, Within-image structure

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 18 / 174

Page 38: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Heider and Simmel

Adding frames is not just about adding proportionally moreinformation.The relationships between frames contain additional information,that is not present in any single frame.See Heider and Simmel, 1944: Any single frame shows a bunchof geometric figures. The motions reveal the story.

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 19 / 174

Page 39: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Outline

1 IntroductionFeature LearningCorrespondence in Computer VisionRelational feature learning

2 Learning relational featuresSparse Coding ReviewEncoding relationsInferenceLearning

3 Factorization, eigen-spaces and complex cellsFactorizationEigen-spaces, energy models, complex cells

4 ApplicationsApplicationsConclusions

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 20 / 174

Page 40: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Learning features to model correspondences

If correspondences matter in vision, can we learn them?

?

x y

z

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 21 / 174

Page 41: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Learning features to model correspondences

It turns out that this requireslatent variables to act likegates, that dynamicallychange the connectionsbetween fellow variables.

zk

xi

yj

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 22 / 174

Page 42: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Learning features to model correspondences

This amounts to lettingvariables multiply connectionsbetween other variables.And it is equivalent to havingthree-way multiplicativeinteractions.

zk

xi

yj

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 23 / 174

Page 43: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Learning features to model correspondences

Learning and inference in thepresence of gating variables is(slightly) different fromlearning without.We can set things up, suchthat inference is almostunchanged. Yet the meaningof the latent variables will beentirely different.

zk

xi

yj

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 24 / 174

Page 44: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Learning features to model correspondences

Multiplicative interactions allowhidden variables to blend in awhole “sub”-network.This leads to a qualitativelyquite different behaviour fromall common bi-partite featurelearning models.

zk

xi

yj

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 25 / 174

Page 45: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Multiplicative interactions

Brief history of multiplication“Mapping units” (Hinton; 1981), “dynamic mappings” (v.d.Malsburg; 1981)Binocular+Motion Energy models (Adelson, Bergen; 1985),(Ozhawa, DeAngelis, Freeman; 1990), (Fleet et al., 1994)Higher-order neural nets, “Sigma-Pi-units”Bi-linear models (Tenenbaum, Freeman; 2000), (Ohlshausen;1994), (Grimes, Rao; 2005)Subspace SOM (Kohonen, 1996)ISA, topographic ICA (Hyvarinen, Hoyer; 2000), (Karklin, Lewicki;2003): Higher-order within image structure(2006 –) GBM, mcRBM, RAE, convISA, applications...

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 26 / 174

Page 46: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Multiplicative interactions

Brief history of multiplication“Mapping units” (Hinton; 1981), “dynamic mappings” (v.d.Malsburg; 1981)Binocular+Motion Energy models (Adelson, Bergen; 1985),(Ozhawa, DeAngelis, Freeman; 1990), (Fleet et al., 1994)Higher-order neural nets, “Sigma-Pi-units”Bi-linear models (Tenenbaum, Freeman; 2000), (Ohlshausen;1994), (Grimes, Rao; 2005)Subspace SOM (Kohonen, 1996)ISA, topographic ICA (Hyvarinen, Hoyer; 2000), (Karklin, Lewicki;2003): Higher-order within image structure(2006 –) GBM, mcRBM, RAE, convISA, applications...

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 26 / 174

Page 47: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Multiplicative interactions

Brief history of multiplication“Mapping units” (Hinton; 1981), “dynamic mappings” (v.d.Malsburg; 1981)Binocular+Motion Energy models (Adelson, Bergen; 1985),(Ozhawa, DeAngelis, Freeman; 1990), (Fleet et al., 1994)Higher-order neural nets, “Sigma-Pi-units”Bi-linear models (Tenenbaum, Freeman; 2000), (Ohlshausen;1994), (Grimes, Rao; 2005)Subspace SOM (Kohonen, 1996)ISA, topographic ICA (Hyvarinen, Hoyer; 2000), (Karklin, Lewicki;2003): Higher-order within image structure(2006 –) GBM, mcRBM, RAE, convISA, applications...

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 26 / 174

Page 48: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Multiplicative interactions

Brief history of multiplication“Mapping units” (Hinton; 1981), “dynamic mappings” (v.d.Malsburg; 1981)Binocular+Motion Energy models (Adelson, Bergen; 1985),(Ozhawa, DeAngelis, Freeman; 1990), (Fleet et al., 1994)Higher-order neural nets, “Sigma-Pi-units”Bi-linear models (Tenenbaum, Freeman; 2000), (Ohlshausen;1994), (Grimes, Rao; 2005)Subspace SOM (Kohonen, 1996)ISA, topographic ICA (Hyvarinen, Hoyer; 2000), (Karklin, Lewicki;2003): Higher-order within image structure(2006 –) GBM, mcRBM, RAE, convISA, applications...

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 26 / 174

Page 49: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Multiplicative interactions

Brief history of multiplication“Mapping units” (Hinton; 1981), “dynamic mappings” (v.d.Malsburg; 1981)Binocular+Motion Energy models (Adelson, Bergen; 1985),(Ozhawa, DeAngelis, Freeman; 1990), (Fleet et al., 1994)Higher-order neural nets, “Sigma-Pi-units”Bi-linear models (Tenenbaum, Freeman; 2000), (Ohlshausen;1994), (Grimes, Rao; 2005)Subspace SOM (Kohonen, 1996)ISA, topographic ICA (Hyvarinen, Hoyer; 2000), (Karklin, Lewicki;2003): Higher-order within image structure(2006 –) GBM, mcRBM, RAE, convISA, applications...

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 26 / 174

Page 50: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Multiplicative interactions

Brief history of multiplication“Mapping units” (Hinton; 1981), “dynamic mappings” (v.d.Malsburg; 1981)Binocular+Motion Energy models (Adelson, Bergen; 1985),(Ozhawa, DeAngelis, Freeman; 1990), (Fleet et al., 1994)Higher-order neural nets, “Sigma-Pi-units”Bi-linear models (Tenenbaum, Freeman; 2000), (Ohlshausen;1994), (Grimes, Rao; 2005)Subspace SOM (Kohonen, 1996)ISA, topographic ICA (Hyvarinen, Hoyer; 2000), (Karklin, Lewicki;2003): Higher-order within image structure(2006 –) GBM, mcRBM, RAE, convISA, applications...

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 26 / 174

Page 51: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Multiplicative interactions

Brief history of multiplication“Mapping units” (Hinton; 1981), “dynamic mappings” (v.d.Malsburg; 1981)Binocular+Motion Energy models (Adelson, Bergen; 1985),(Ozhawa, DeAngelis, Freeman; 1990), (Fleet et al., 1994)Higher-order neural nets, “Sigma-Pi-units”Bi-linear models (Tenenbaum, Freeman; 2000), (Ohlshausen;1994), (Grimes, Rao; 2005)Subspace SOM (Kohonen, 1996)ISA, topographic ICA (Hyvarinen, Hoyer; 2000), (Karklin, Lewicki;2003): Higher-order within image structure(2006 –) GBM, mcRBM, RAE, convISA, applications...

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 26 / 174

Page 52: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Mapping units 1981

(Hinton, 1981)

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 27 / 174

Page 53: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Mapping units 1981

(Hinton, 1981)

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 27 / 174

Page 54: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Example application: Action recognition

(Hollywood 2)

(Marszałek et al., 2009)

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 28 / 174

Page 55: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

ISA applied to action recognition

(Le, et al., 2011)KTH Hollywood2 UCF YouTube

until 2011 92.1 50.9 85.6 71.2hierarchical ISA 93.9 53.3 86.5 75.8

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 29 / 174

Page 56: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Learning higher-order features

(Ranzato et al., 2010)

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 30 / 174

Page 57: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Bi-linear classification

x y

xi

z

zk

yj

Let labels act like gates. (Nair et al., 2009; Memisevic et al., 2010)SVMs NNet RBM DEEP GSM

dataset/model: SVMRBF SVMPOL NNet DBN1 DBN3 SAA3 GSM (unfact)rectangles 2.15 2.15 7.16 4.71 2.60 2.41 0.83 (0.56)rect.-images 24.04 24.05 33.20 23.69 22.50 24.05 22.51 (23.17)mnistplain 3.03 3.69 4.69 3.94 3.11 3.46 3.70 (3.98)convexshapes 19.13 19.82 32.25 19.92 18.63 18.41 17.08 (21.03)mnistbackrand 14.58 16.62 20.04 9.80 6.73 11.28 10.48 (11.89)mnistbackimg 22.61 24.01 27.41 16.15 16.31 23.00 23.65 (22.07)mnistrotbackimg 55.18 56.41 62.16 52.21 47.39 51.93 55.82 (55.16)mnistrot 11.11 15.42 18.11 14.69 10.30 10.30 11.75 (16.15)

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 31 / 174

Page 58: Mvfl_part1 Cvpr2012 Spatio-temporal and Higher-Order Feature Learning

Tracking

(Bazzani et al.), (Larochelle, Hinton, 2011)

Roland Memisevic (Uni Frankfurt) Multiview Feature Learning Tutorial at CVPR 2012 32 / 174