Lecture11

85
Visual Recognition: Learning Features Raquel Urtasun TTI Chicago Feb 21, 2012 Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 1 / 82

Transcript of Lecture11

Page 1: Lecture11

Visual Recognition: Learning Features

Raquel Urtasun

TTI Chicago

Feb 21, 2012

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 1 / 82

Page 2: Lecture11

Learning Representations

Sparse coding

Deep architectures

Topic models

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 2 / 82

Page 3: Lecture11

Image Classification using BoW

K-means

Dense/Sparse SIFT

Dictionary

Dictionary Learning

VQ Coding

Dense/Sparse SIFT

Spatial Pyramid Pooling

Nonlinear SVM

Image Classification

[Source: K. Yu]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 3 / 82

Page 4: Lecture11

BoW+SPM: the architecture

Nonlinear SVM is not scalable

VQ coding may be too coarse

Average pooling is not optimal

Why not learn the whole thing?

e.g, SIFT, HOG

VQ Coding Average Pooling (obtain histogram)

Nonlinear SVM

Local Gradients Pooling

[Source: K. Yu]Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 4 / 82

Page 5: Lecture11

Sparse Architecture

Nonlinear SVM is not scalable → Scalable linear classifier

VQ coding may be too coarse → Better coding methods

Average pooling is not optimal → Better pooling methods

Why not learn the whole → Deep learning

Better Coding Better Pooling Scalable Linear

Classifier

Local Gradients Pooling

e.g, SIFT, HOG

[Source: A. Ng]Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 5 / 82

Page 6: Lecture11

Feature learning problem

Given a 14× 14 image patch x, can represent it using 196 real numbers.

Problem: Can we find a learn a better representation for this?

Given a set of images, learn a better way to represent image than pixels.

[Source: A. Ng]Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 6 / 82

Page 7: Lecture11

Learning an image representation

Sparse coding [Olshausen & Field,1996]

Input: Images x (1), x (2), · · · , x (m) (each in <n×n)

Learn: Dictionary of bases φ1, · · · , φk (also <n×n), so that each input x canbe approximately decomposed as:

x ≈k∑

j=1

ajφj

such that the aj ’s are mostly zero, i.e., sparse

[Source: A. Ng]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 7 / 82

Page 8: Lecture11

Sparse Coding Illustration

!!!!"#$%&#'!()#*+,! -+#&.+/!0#,+,!1!!"#"$#"!%&23!!45/*+,6!

" 0.8 * + 0.3 * + 0.5 *

! " 0.8 * !'% + 0.3 * !&( + 0.5 * !%'"

!789!89!:9!89!!"#9!89!:9!89!!"$9!89!:9!89!!"%9!:;!!<!7#=9!:9!#>?;!!!!1@+#$%&+!&+A&+,+.$#BC.2!!

D+,$!+E#)A'+!

FC)A#G$!H!+#,I'J!I.$+&A&+$#0'+!

[Source: A. Ng]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 8 / 82

Page 9: Lecture11

More examples

Method hypothesizes that edge-like patches are the most basic elements of ascene, and represents an image in terms of the edges that appear in it.

Use to obtain a more compact, higher-level representation of the scene thanpixels.

[Source: A. Ng]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 9 / 82

Page 10: Lecture11

Sparse Coding details

Input: Images x (1), x (2), · · · , x (m) (each in <n×n)

Obtain dictionary elements and weights by

mina,φ

m∑i=1

||x (i) −k∑

j=1

a(i)j φj ||

22 + λ

k∑j=1

|a(i)j |1︸ ︷︷ ︸sparsity

Alternating minimization with respect to φj ’s and a’s.

The second is harder, the first one is closed form.

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 10 / 82

Page 11: Lecture11

Fast algorithms

Solving for a is expensive.

Simple algorithm that works well

Repeatedly guess sign (+, - or 0) of each of the ai ’s.Solve for ai ’s in closed form. Refine guess for signs.

Other algorithms such as projective gradient descent, stochastic subgradientdescent, etc

[Source: A. Ng]Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 11 / 82

Page 12: Lecture11

Recap of sparse coding for feature learning

Training:

Input: Images x (1), x (2), · · · , x (m) (each in <n×n)

Learn: Dictionary of bases φ1, · · · , φk (also <n×n),

mina,φ

m∑i=1

||x (i) − k∑j=1

a(i)j φj ||

2 + λ

k∑j=1

|a(i)j |

Test time:

Input: novel x (1), x (2), · · · , x (m) and learned φ1, · · · , φk .

Solve for the representation a1, · · · , ak for each example

mina

m∑i=1

||x − k∑j=1

ajφj ||2 + λk∑

j=1

|aj |

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 12 / 82

Page 13: Lecture11

Sparse coding recap

Much better than pixel representation, but still not competitive with SIFT,etc.

Three ways to make it competitive:

Combine this with SIFT.Advanced versions of sparse coding, e.g., LCC.Deep learning.

[Source: A. Ng]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 13 / 82

Page 14: Lecture11

Sparse Classification

[Source: A. Ng]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 14 / 82

Page 15: Lecture11

K-means vs sparse coding

[Source: A. Ng]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 15 / 82

Page 16: Lecture11

Why sparse coding helps classification?

The coding is a nonlinear feature mapping

Represent data in a higher dimensional space

Sparsity makes prominent patterns more distinctive

[Source: K. Yu]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 16 / 82

Page 17: Lecture11

A topic model view to sparse coding

Each basis is a direction or a topic.

Sparsity: each datum is a linear combination of only a few bases.

Applicable to image denoising, inpainting, and super-resolution.

Both figures adapted from CVPR10 tutorial by F. Bach, J. Mairal, J. Ponce and G. Sapiro

Basis 1

Basis 2

[Source: K. Yu]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 17 / 82

Page 18: Lecture11

A geometric view to sparse coding

Each basis is somewhat like a pseudo data point anchor point

Sparsity: each datum is a sparse combination of neighbor anchors.

The coding scheme explores the manifold structure of data

Data manifold

Basis

Data

[Source: K. Yu]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 18 / 82

Page 19: Lecture11

Influence of Sparsity

When SC achieves the best classification accuracy, the learned bases are likedigits – each basis has a clear local class association.

Error: 4.54% Error: 3.75% Error: 2.64%

[Source: K. Yu]Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 19 / 82

Page 20: Lecture11

The image classification setting for analysis

Learning an image classifier is a matter of learning nonlinear functions onpatches

f (x)︸︷︷︸f .images

= wT x =m∑i=1

ai (wTφ(i)) =

m∑i=1

ai f (φ(i))︸ ︷︷ ︸f .patches

where x =∑m

i=1 aiφ(i)

Sparse Coding

Dense local feature

Linear Pooling

Linear SVM

[Source: K. Yu]Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 20 / 82

Page 21: Lecture11

Nonlinear learning via local coding

We assume xi ≈∑k

j=1 ai,jφj and thus

f (xi ) =k∑

j=1

ai,j f (φj)

data points bases

locally linear

[Source: K. Yu]Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 21 / 82

Page 22: Lecture11

How to learn the non-linear function

1 Learning the dictionary φ1, · · · , φk from unlabeled data

2 Use the dictionary to encode data xi → ai,1, · · · , ai,k

3 Estimate the parameters

Nonlinear local learning via learning a global linear function

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 22 / 82

Page 23: Lecture11

Local Coordinate Coding (LCC):

If f(x) is (alpha, beta)-Lipschitz smooth

Locality term Function approximation

error

Coding error

A good coding scheme should [Yu et al. 09]

have a small coding error,

and also be sufficiently local

[Source: K. Yu]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 23 / 82

Page 24: Lecture11

Results

The larger dictionary, the higher accuracy, but also the higher comp. cost

(Yu et al. 09)

(Yang et al. 09)

The same observation for Caltech256, PASCAL, ImageNet

[Source: K. Yu]Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 24 / 82

Page 25: Lecture11

Locality-constrained linear coding

A fast implementation of LCC [Wang et al. 10]

Dictionary learning using k-means and code for x based on

Step 1 ensure locality: find the K nearest bases [φj ]j∈J(x)Step 2 ensure low coding error:

mina||x −

∑j∈J(x)

ai,jφj ||2, s.t.∑

j∈J(x)

ai,j = 1

[Source: K. Yu]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 25 / 82

Page 26: Lecture11

Results

[Source: K. Yu]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 26 / 82

Page 27: Lecture11

Interpretation of BoW + linear classifier

data points cluster centers

Piece-wise local constant (zero-order)

[Source: K. Yu]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 27 / 82

Page 28: Lecture11

Support vector coding [Zhou et al, 10]

data points cluster centers

Piecewise local linear (first-order) Local tangent

[Source: K. Yu]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 28 / 82

Page 29: Lecture11

Details...

Let [ai , 1, · · · , ai,k ] be the VQ coding of xi

Global linear weights to be

learned

Super-vector codes of data

No.1 position in PASCAL VOC 2009

[Source: K. Yu]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 29 / 82

Page 30: Lecture11

Summary of coding algorithms

All lead to higher-dimensional, sparse, and localized coding

All explore geometric structure of data

New coding methods are suitable for linear classifiers.

Their implementations are quite straightforward.

[Source: K. Yu]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 30 / 82

Page 31: Lecture11

PASCAL VOC 2009

No.1 for 18 of 20 categories

PASCAL: 20 categories, 10,000 images

They use only HOG feature on gray images

Ours Best of

Other Teams Difference Classes

[Source: K. Yu]Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 31 / 82

Page 32: Lecture11

ImageNet Large-scale Visual Recognition Challenge 2010

ImageNet: 1000 categories, 1.2 million images for training

[Source: K. Yu]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 32 / 82

Page 33: Lecture11

ImageNet Large-scale Visual Recognition Challenge 2010

150 teams registered worldwide, resulting 37 submissions from 11 teams

SC achieved 52% for 1000 class classification.

Teams Top 5 Hit Rate

Our team: NEC-UIUC 72.8%

Xerox European Lab, France 66.4%

Univ. of Tokyo 55.4%

Univ. of California, Irvine 53.4%

MIT 45.6%

NTU, Singapore 41.7%

LIG, France 39.3%

IBM T. J. Waston Research Center 30.0%

National Institute of Informatics, Tokyo 25.8%

SRI (Stanford Research Institute) 24.9%

[Source: K. Yu]Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 33 / 82

Page 34: Lecture11

Learning Codebooks for Image Classification

Replacing Vector Quantization by Learned Dictionaries

unsupervised: [Yang et al., 2009]

supervised: [Boureau et al., 2010, Yang et al., 2010]

[Source: Mairal]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 34 / 82

Page 35: Lecture11

Discriminative Training

[Source: Mairal]Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 35 / 82

Page 36: Lecture11

Discriminative Dictionaries

Figure: Top: reconstructive, Bottom: discriminative, Left: Bicycle, Right:Background

[Source: Mairal]Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 36 / 82

Page 37: Lecture11

Application:Edge detection

[Source: Mairal]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 37 / 82

Page 38: Lecture11

Application:Edge detection

[Source: Mairal]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 38 / 82

Page 39: Lecture11

Application:Authentic from fake

[Source: Mairal]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 39 / 82

Page 40: Lecture11

Predictive Sparse Decomposition (PSD)

Feed-forward predictor function for feature extraction

mina,φ,K

m∑i=1

||x (i) − k∑j=1

a(i)j φj ||

2 + λ

k∑j=1

|a(i)j |+ β||a− C (X ,K )||22

with e.g., C (X ,K ) = g · tanh(x ∗ k)

Learning is done by

1) Fix K and a, minimize to get optimal φ

2) Update a and K using optimal φ

3) Scale elements of φ to be unit norm.

[Source: Y. LeCun]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 40 / 82

Page 41: Lecture11

Predictive Sparse Decomposition (PSD)

12 x12 natural image patches with 256 dictionary elements

Figure: (Left) Encoder, (Right) Decoder

[Source: Y. LeCun]Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 41 / 82

Page 42: Lecture11

Training Deep Networks with PSD

Train layer wise [Hinton, 06]

C (X ,K 1)C (f (K 1),K 2)· · ·

Each layer is trained on the output f (x) produced from previous layer.

f is a series of non-linearity and pooling operations

[Source: Y. LeCun]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 42 / 82

Page 43: Lecture11

Object Recognition - Caltech 101

[Source: Y. LeCun]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 43 / 82

Page 44: Lecture11

Problems with sparse coding

Sparse coding produces filters that are shifted version of each other

It ignores that it’s going to be used in a convolutional fashion

Inference in all overlapping patches independently

Problems with sparse coding

1) the representations are redundant, as the training and inference are done atthe patch level

2) inference for the whole image is computationally expensive

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 44 / 82

Page 45: Lecture11

Solutions via Convolutional Nets

Problems

1) the representations are redundant, as the training and inference are done atthe patch level

2) inference for the whole image is computationally expensive

Solutions

1) Apply sparse coding to the entire image at once, with the dictionary as aconvolutional filter bank

`(x , z ,D) =1

2||x −

K∑k=1

Dk ∗ zk ||22 + |z |1

with Dk a s × s 2D filter, x is w × h image and zk a 2D(w + s − 1)× (h + s − 1) feature.

2) Use feed-forward, non-linear encoders to produce a fast approx. to thesparse code

`(x , z ,D,W ) =1

2||x −

K∑k=1

Dk ∗ zk ||22 +K∑

k=1

||zk − f (W k ∗ x)||22 + |z |1

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 45 / 82

Page 46: Lecture11

Convolutional Nets

Figure: Image from Kavukcuoglu et al. 10. (left) Dictionary from sparse coding.(right) Dictionary learned from conv. nets.

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 46 / 82

Page 47: Lecture11

Stacking

One or more additional stages can be stacked on top of the first one.

Each stage then takes the output of its preceding stage as input andprocesses it using the same series of operations with different architecturalparameters like size and connections.

Figure: Image from Kavukcuoglu et al. 10. Second layer.

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 47 / 82

Page 48: Lecture11

Convolutional Nets

[Source: Y. LeCun]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 48 / 82

Page 49: Lecture11

Convolutional Nets

[Source: H. Lee]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 49 / 82

Page 50: Lecture11

INRIA pedestrians

Figure: Image from Kavukcuoglu et al. 10. Second layer.

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 50 / 82

Page 51: Lecture11

Many more deep architectures

Restricted Bolzmann machines (RBMs)

Deep belief nets

Deep Boltzmann machines

Stacked denoising auto-encoders

· · ·

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 51 / 82

Page 52: Lecture11

Topic Models

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 52 / 82

Page 53: Lecture11

Topic Models

Topic models have become a powerful unsupervised learning tool for textsummarization, document classification, information retrieval, linkprediction, and collaborative filtering.

Topic modeling introduces latent topic variables in text and image thatreveal the underlying structure.

Topics are a group of related words

[Source: J. Zen]Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 53 / 82

Page 54: Lecture11

Probabilistic Topic Modeling

Treat data as observations that arise from a generative model composed oflatent variables.

The latent variables reflect the semantic structure of the document.

Infer the latent structure using posterior inference (MAP):

What are the topics that summarize the document network?

Predict new data by the estimated topic model

How the new data fit into the estimated topic structures?

[Source: J. Zen]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 54 / 82

Page 55: Lecture11

Latent Dirichlet Allocation (LDA) [Blei et al. 03]

Simple intuition: a document exhibits multiple topics

[Source: J. Zen]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 55 / 82

Page 56: Lecture11

Generative models

Each document is a mixture of corpus-wide topics.

Each word is drawn from one of those topics.

[Source: J. Zen]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 56 / 82

Page 57: Lecture11

The posterior distribution (inference)

Observations include documents and their words. Our goal is to infer theunderlying topic structures marked by ? from observations.

[Source: J. Zen]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 57 / 82

Page 58: Lecture11

What does it mean with visual words

[Source: B. Schiele]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 58 / 82

Page 59: Lecture11

How to read Graphical Models

[Source: B. Schiele]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 59 / 82

Page 60: Lecture11

LDA

[Source: J. Zen]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 60 / 82

Page 61: Lecture11

LDA generative process

[Source: J. Zen]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 61 / 82

Page 62: Lecture11

LDA as matrix factorization

[Source: B. Schiele]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 62 / 82

Page 63: Lecture11

LDA as matrix factorization

[Source: B. Schiele]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 62 / 82

Page 64: Lecture11

Dirichlet Distribution

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 63 / 82

Page 65: Lecture11

LDA inference

[Source: J. Zen]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 64 / 82

Page 66: Lecture11

LDA: intractable exact inference

[Source: J. Zen]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 65 / 82

Page 67: Lecture11

LDA: approximate inference

[Source: J. Zen]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 66 / 82

Page 68: Lecture11

Computer Vision Applications

Natural Scene Categorization [Fei-Fei and Perona, 05]

Object Recognition [Sivic et al. 05]

Object Recognition to model distribution on patches [Fritz et al. ]

Activity Recognition [Niebles et al. ]

Text and image [Saenko et al. ]

· · ·

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 67 / 82

Page 69: Lecture11

Natural Scene Categorization

A different model is learned for each class.

[Source: J. Zen]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 68 / 82

Page 70: Lecture11

Natural Scene Categorization

In reality is a bit more complicated as it reasons about class [Fei-Fei et al.,05].

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 69 / 82

Page 71: Lecture11

Natural Scene Categorization

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 70 / 82

Page 72: Lecture11

Classification Results

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 71 / 82

Page 73: Lecture11

More Results

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 72 / 82

Page 74: Lecture11

Object Recognition [Sivic et al. 05]

Matrix Factorization view

[Source: B. Schiele]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 73 / 82

Page 75: Lecture11

Most likely words given topic

[Source: B. Schiele]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 74 / 82

Page 76: Lecture11

Most likely words given topic

[Source: B. Schiele]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 74 / 82

Page 77: Lecture11

Object Recognition [Sivic et al. 05]

Matrix Factorization view

[Source: B. Schiele]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 75 / 82

Page 78: Lecture11

Object Recognition [Sivic et al. 05]

[Source: B. Schiele]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 76 / 82

Page 79: Lecture11

Object Recognition [Sivic et al. 05]

[Source: B. Schiele]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 76 / 82

Page 80: Lecture11

Object Recognition [Fritz et al.]

[Source: B. Schiele]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 77 / 82

Page 81: Lecture11

Learning of Generative Decompositions [Fritz et al.]

[Source: B. Schiele]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 78 / 82

Page 82: Lecture11

Topics [Fritz et al.]

[Source: B. Schiele]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 79 / 82

Page 83: Lecture11

Supervised Classification [Fritz et al.]

[Source: B. Schiele]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 80 / 82

Page 84: Lecture11

UIUC Cars

[Source: B. Schiele]

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 81 / 82

Page 85: Lecture11

Extensions

Introduce supervision, e.g., discriminative LDA, supervised LDA

Introduce spatial information

Multi-view approaches, e.g., correlated LDA, Pachenko

Use infinite mixtures, e.g., HDP, PY processes to model heavy tales.

And many more.

Raquel Urtasun (TTI-C) Visual Recognition Feb 21, 2012 82 / 82