High Level Computer Vision Topic Models in Vision€¦ · High Level Computer Vision - July 15,...

High Level Computer Vision !Topic Models in Vision

Bernt Schiele - [email protected] Mario Fritz - [email protected] !http://www.d2.mpi-inf.mpg.de/cv

High Level Computer Vision - July 15, 2o14

Topic Models for Object Recognition

slide credit Fei-Fei et. al.


Today

!

• Unsupervised learning ‣ topic models

‣ object class discovery

‣ extension to spatial information

3


How many object categories are there?

~10,000 to 30,000

Biederman 1987slide credit Fei-Fei et. al.


Datasets and Their Issues In Object Recognition

Caltech101/256[Fei-‐Fei et al, 2004][Griffin et al, 2007]

PASCAL

[Everingham et al, 2009]

MSRC[Sho<on et al. 2006]



ESP[Ahn et al, 2006]

LabelMe[ Russell et al,

2005]

TinyImageTorralba et al. 2007

Lotus Hill[ Yao et al, 2007]

Datasets and Their Issues In Object Recognition



Size of existing datasets

Datasets # of categories # of images per category

# of total images

Collected by

Caltech101 101 ~100 ~10K Human

Lotus Hill ~300 ~ 500 ~150K Human

LabelMe 183 ~200 ~30K Human

Ideal ~30K >>10^2 A LOT Machine



• An ontology of images based on WordNet • Collected using Amazon Mechanical Turk

~105+ nodes ~108+ images

shepherd dog, sheep dog

German shepherdcollieanimal

Deng, Dong, Socher, Li & Fei-‐Fei, CVPR 2009slide credit Fei-Fei et. al.


image-‐net.org

21,841 categories, 14,197,122 images• Animals ‣ Fish

‣ Bird

‣ Mammal ‣ Invertebrate

• Scenes ‣ Indoors

‣ Geological formations

• Sport AcSviSes

• Fabric Materials

• InstrumentaSon

– Tool – Appliances

– …

• Plants – …


High Level Computer Vision - July 15, 2o14Deng, Wei, Socher, Li, Li, Fei-‐Fei, CVPR 2009

“Cycling”


High Level Computer Vision - July 15, 2o14Deng, Wei, Socher, Li, Li, Fei-‐Fei, CVPR 2009

“Drawing room, withdrawing room”



How much supervision do you need to learn models of objects?


Object label + segmentation

Agarwal & Roth ’02, Leibe & Schiele ’03, Torralba et al. ’05

LabelMe, PASCAL, TU Darmstadt, MIT scenes and objects

MIT+CMU frontal faces

Viola & Jones ’01 Rowley et al. ’98

•


Weakly Supervised: Object appears somewhere in the image

Fergus et al. ’03, Csurka et al. ’04, Dorko & Schmid ’05

Caltech 101, PASCAL, MSRC

airplane

motorbike

face

car


Image + text caption

Corel, Flickr, Names+faces, ESP game

Barnard et al. ’03, Berg et al. ’04


Images onlyGiven a collection of unlabeled images, discover visual

object categories and their segmentation

• Which images contain the same object(s) ? • Where is the object in the image?


Training Conditions

• Object Labels + Segmentation !

• Object Label + Bounding Box !

• Object Label !

• Image and text !!

• Images only ‣ example technique: LDA

17

bridge, sky, water


Unsupervised Learning

• Application: ‣ refining web search (outlier removal)

‣ discovering object classes

!

• No labels -> generative models • Modeling p(x) • What can the data distribution tell us about the problem? • Simple examples: ‣ clustering (e.g. k-means)

‣ gaussian mixture models

• Today: ‣ “multinomial mixture model” with a twist

18


Gaussian Mixture

• Only red data points are observed • We can recover cluster/class structure

by making distribution assumption • But how to do this with visual words?

19

p(x) =X

i

p(x|ci)p(ci)

p(x|c1)

p(x|c2)

p(c2)p(c1)

✓

z

x

Nc

C

µ

�


How to read graphical models

• Encoded independence assumption (all the arrows you don’t see) !!!

• Plate notation !!!!!

• Observed vs. Unobserved

20

Learning as Inference

�

v1 v2 v3 · · · vN

(a)

�

vn

N

(b)

Figure 9.1: (a): Belief network for cointossing model. (b): Plate notation equiv-alent of (a). A plate replicates the quanti-ties inside the plate a number of times asspecified in the plate.

In the above�N

n=1 I [vn = 1] is the number of occurrences of heads, which we more conveniently denote as

NH . Likewise,�N

n=1 I [vn = 0] is the number of tails, NT . Hence

p(�|v1, . . . , vN ) ⇧ p(�)�NH (1� �)NT (9.1.6)

For an experiment with NH = 2, NT = 8, the posterior distribution is

p(� = 0.1|V) = k ⇤ 0.15⇤ 0.12 ⇤ 0.98 = k ⇤ 6.46⇤ 10�4 (9.1.7)

p(� = 0.5|V) = k ⇤ 0.8⇤ 0.52 ⇤ 0.58 = k ⇤ 7.81⇤ 10�4 (9.1.8)

p(� = 0.8|V) = k ⇤ 0.05⇤ 0.82 ⇤ 0.28 = k ⇤ 8.19⇤ 10�8 (9.1.9)

where V is shorthand for v1, . . . , vN . From the normalisation requirement we have 1/k = 6.46 ⇤ 10�4 +7.81⇤ 10�4 + 8.19⇤ 10�8 = 0.0014, so that

p(� = 0.1|V) = 0.4525, p(� = 0.5|V) = 0.5475, p(� = 0.8|V) = 0.0001 (9.1.10)

as shown in fig(9.2b). These are the ‘posterior’ parameter beliefs. In this case, if we were asked to choose asingle a posteriori most likely value for �, it would be � = 0.5, although our confidence in this is low sincethe posterior belief that � = 0.1 is also appreciable. This result is intuitive since, even though we observedmore Tails than Heads, our prior belief was that it was more likely the coin is fair.

Repeating the above with NH = 20, NT = 80, the posterior changes to

p(� = 0.1|V) = 1�1.93⇤10�6, p(� = 0.5|V) = 1.93⇤10�6, p(� = 0.8|V) = 2.13⇤10�35 (9.1.11)

fig(9.1c), so that the posterior belief in � = 0.1 dominates. This is reasonable since in this situation, thereare so many more tails than heads that this is unlikely to occur from a fair coin. Even though we a priorithought that the coin was fair, a posteriori we have enough evidence to change our minds.

9.1.2 Making decisions

In itself, the Bayesian posterior merely represents our beliefs and says nothing about how best to summarisethese beliefs. In situations in which decisions need to be taken under uncertainty we need to additionallyspecify what the utility of any decision is, as in chapter(7).

�0.1 0.5 0.8

(a)

�0.1 0.5 0.8

(b)

�0.1 0.5 0.8

(c)

Figure 9.2: (a): Prior encoding our be-liefs about the amount the coin is biased toheads. (b): Posterior having seen NH = 2heads and NT = 8 tails. (c): Posteriorhaving seen NH = 20 heads and NH = 80tails. Assuming any value of 0 ⌅ � ⌅ 1 ispossible the ML setting is � = 0.2 in bothNH = 2, NT = 8 and NH = 20, NT = 80.

182 DRAFT June 18, 2011

x x

x

yz p(x, y, z) = p(z|x)p(y|x)p(x)

Object

[Barber,Bishop]

Position Orientation

example:


Topic Models

• Originally developed for text and document analysis

• Unsupervised learning without any labels

• Given ‣ documents in bag of words

representation (histogram of word counts)

• Assumption ‣ each documents is about a subset of

possible topics

‣ each topic has characteristic words

• Goal ‣ infer characteristic word distributions

for each topic

‣ infer which topics a document is about

21

Figure from [Blei]


Latent Topic Models (pLSA, LDA, ...)

22

answering two kinds of similarities: assessing the similarity between two documents, and assessing the associative similarity between two words. We close by considering how generative models have the potential to provide further insight into human cognition.

2. Generative Models

A generative model for documents is based on simple probabilistic sampling rules that describe how words in documents might be generated on the basis of latent (random) variables. When fitting a generative model, the goal is to find the best set of latent variables that can explain the observed data (i.e., observed words in documents), assuming that the model actually generated the data. Figure 2 illustrates the topic modeling approach in two distinct ways: as a generative model and as a problem of statistical inference. On the left, the generative process is illustrated with two topics. Topics 1 and 2 are thematically related to money and rivers and are illustrated as bags containing different distributions over words. Different documents can be produced by picking words from a topic depending on the weight given to the topic. For example, documents 1 and 3 were generated by sampling only from topic 1 and 2 respectively while document 2 was generated by an equal mixture of the two topics. Note that the superscript numbers associated with the words in documents indicate which topic was used to sample the word. The way that the model is defined, there is no notion of mutual exclusivity that restricts words to be part of one topic only. This allows topic models to capture polysemy, where the same word has multiple meanings. For example, both the money and river topic can give high probability to the word BANK, which is sensible given the polysemous nature of the word.

loan

PROBABILISTIC GENERATIVE PROCESS

TOPIC 1

money

loan

bankmoney

bank

river

TOPIC 2

river

riverstream

bank

bank

stream

bank

loan

DOC1: money1 bank1 loan1

bank1 money1 money1

bank1 loan1

DOC2: money1 bank1

bank2 river2 loan1 stream2

bank1 money1

DOC3: river2 bank2

stream2 bank2 river2 river2

stream2 bank2

1.0

.5

.5

1.0

TOPIC 1

TOPIC 2

DOC1: money? bank? loan?

bank? money? money?

bank? loan?

DOC2: money? bank?

bank? river? loan? stream?

bank? money?

DOC3: river? bank?

stream? bank? river? river?

stream? bank?

STATISTICAL INFERENCE

?

?

?

Figure 2. Illustration of the generative process and the problem of statistical inference underlying topic models

The generative process described here does not make any assumptions about the order of words as they appear in documents. The only information relevant to the model is the number of times words are produced. This is known as the bag-of-words assumption, and is common to many statistical models of language including LSA. Of course, word-order information might contain important cues to the content of a document and this information is not utilized by the model. Griffiths, Steyvers, Blei, and Tenenbaum (2005) present an extension of the topic model that is sensitive to word-order and automatically learns the syntactic as well as semantic factors that guide word choice (see also Dennis, this book for a different approach to this problem).

The right panel of Figure 2 illustrates the problem of statistical inference. Given the observed words in a set of documents, we would like to know what topic model is most likely to have generated the data. This involves inferring the probability distribution over words associated with each topic, the distribution over topics for each document, and, often, the topic responsible for generating each word.

3

wor

ds C = � �

documents

wor

ds

topics

topi

cs

documentsTOPICMODEL

wor

ds

documents

C = U D VT

wor

ds

dimsdims

dim

s

dim

s

documentsLSA

normalizedco-occurrence matrix

mixtureweights

mixturecomponents

Figure 6. The matrix factorization of the LSA model compared to the matrix factorization of the topic model

Other applications. The statistical model underlying the topic modeling approach has been extended to include other sources of information about documents. For example, Cohn and Hofmann (2001) extended the pLSI model by integrating content and link information. In their model, the topics are associated not only with a probability distribution over terms, but also over hyperlinks or citations between documents. Recently, Steyvers, Smyth, Rosen-Zvi, and Griffiths (2004) and Rosen-Zvi, Griffiths, Steyvers, and Smyth (2004) proposed the author-topic model, an extension of the LDA model that integrates authorship information with content. Instead of associating each document with a distribution over topics, the author-topic model associates each author with a distribution over topics and assumes each multi-authored document expresses a mixture of the authorsV topic mixtures. The statistical model underlying the topic model has also been applied to data other than text. The grade-of-membership (GoM) models developed by statisticians in the 1970s are of a similar form (Manton, Woodbury, & Tolley, 1994), and Erosheva (2002) considers a GoM model equivalent to a topic model. The same model has been used for data analysis in genetics (Pritchard, Stephens, & Donnelly, 2000).

4. Algorithm for Extracting Topics

The main variables of interest in the model are the topic-word distributions � and the topic distributions � for each document. Hofmann (1999) used the expectation-maximization (EM) algorithm to obtain direct estimates of � and �. This approach suffers from problems involving local maxima of the likelihood function, which has motivated a search for better estimation algorithms (Blei et al., 2003; Buntine, 2002; Minka, 2002). Instead of directly estimating the topic-word distributions � and the topic distributions � for each document, another approach is to directly estimate the posterior distribution over z (the assignment of word tokens to topics), given the observed words w, while marginalizing out � and �. Each zi gives an integer value [ 1..T ] for the topic that word token i is assigned to. Because many text collections contain millions of word token, the estimation of the posterior over z requires efficient estimation procedures. We will describe an algorithm that uses Gibbs sampling, a form of Markov chain Monte Carlo, which is easy to implement and provides a relatively efficient method of extracting a set of topics from a large corpus (Griffiths & Steyvers, 2004; see also Buntine, 2004, Erosheva 2002 and Pritchard et al., 2000). More information about other algorithms for extracting topics from a corpus can be obtained in the references given above.

Markov chain Monte Carlo (MCMC) refers to a set of approximate iterative techniques designed to sample values from complex (often high-dimensional) distributions (Gilks, Richardson, & Spiegelhalter, 1996). Gibbs sampling (also known as alternating conditional sampling), a specific form of MCMC, simulates a high-dimensional distribution by sampling on lower-dimensional subsets of variables where each subset is conditioned on the value of all others. The sampling is done sequentially and proceeds until the sampled values approximate the target distribution. While the Gibbs procedure we will describe does not provide direct estimates of � and �, we will show how � and � can be approximated using posterior estimates of z.

7

• Documents are occurrence histograms over the vocabulary

• Topics are distributions over vocabulary

• Unsupervised learning of mixture model

Figure from [Griffiths]


Latent Dirichlet Allocation (LDA)

Blei, et al. 2003wij - words

zij - topic assignments

- topic mixing weightsθi

φk - word mixing weights


Latent Dirichlet Allocation (LDA)Blei, et al. 2003

wij - words


- topic mixing weights - (multinomials)

θi

φk- word mixing weights (multinomials)


Dirichlet Distribution

• Dirichlet distribution is conjugate distribution to the multinomial distribution ‣ Dirichlet prior can be integrated out analytically and yields again a Dirichlet

distribution !!

‣ Example: K = 3 - Dirchlet distribution over the three variables

is confined to a simplex

Dir(µ|�) =�(�0)

�(�1) . . .�(�K)

K�

k=1

µ�k�1k

µ1

µ2

µ3

{�k} = 1{�k} = 0.1 {�k} = 10

µ1, µ2, µ3


Inferencewij - words


- topic mixing weightsθi

φk - word mixing weights

Use Gibbs sampler to sample topic assignments

•Only need to maintain counts of topic assignments •Sampler typically converges in less than 50 iterations •Run time is less than an hour

[Griffiths & Steyvers 2004]


Gibbs Sampling

• Random initialization of z variables

• Iteratively sample from each zi given all the other z

• after burn in phase -> samples from the true distribution !

• True Bayesian would keep distribution over z

• In practice: often single sample is used

27

generated by arbitrarily mixing the two topics. Each circle corresponds to a single word token and each row to a document (for example, document 1 contains 4 times the word BANK). In Figure 7, the color of the circles indicate the topic assignments (black = topic 1; white = topic 2). At the start of sampling (top panel), the assignments show no structure yet; these just reflect the random assignments to topics. The lower panel shows the state of the Gibbs sampler after 64 iterations. Based on these assignments, Equation 4 gives the following estimates for the distributions over words for topic 1 and 2: and

. Given the size of the dataset, these estimates are reasonable

reconstructions of the parameters used to generate the data.

39.' ,29.' ,32.' )1()1()1( BANKLOANMONEY III

35.' ,4.' ,25.' )2()2()2( BANKSTREAMRIVER III

River Stream Bank Money Loan12345678910111213141516

River Stream Bank Money Loan12345678910111213141516

Figure 7. An example of the Gibbs sampling procedure.

Exchangeability of topics. There is no a priori ordering on the topics that will make the topics identifiable between or even within runs of the algorithm. Topic j in one Gibbs sample is theoretically not constrained to be similar to topic j in another sample regardless of whether the samples come from the same or different Markov chains (i.e., samples spaced apart that started with the same random assignment or samples from different random assignments). Therefore, the different samples cannot be averaged at the level of topics. However, when topics are used to calculate a statistic which is invariant to the ordering of the topics, it becomes possible and even important to average over different Gibbs samples (see Griffiths and Steyvers, 2004). Model averaging is likely to improve results because it allows sampling from multiple local modes of the posterior.

Stability of Topics. In some applications, it is desirable to focus on a single topic solution in order to interpret each individual topic. In that situation, it is important to know which topics are stable and will reappear across samples and which topics are idiosyncratic for a particular solution. In Figure 8, an analysis is shown of the degree to which two topic solutions can be aligned between samples from different Markov chains. The TASA corpus was taken as

9



Gibbs Sampling

28

z_i

z_j

random initialization

initialize iterate:

sample from p(z_i|z_j) sample from p(z_j|z_i)

after burn in period (red) you start drawing samples from

the joint distribution


Examples from Text

29

showed four other topics from this solution). In each of these topics, the word PLAY is given relatively high probability related to the different senses of the word (playing music, theater play, playing games).

word prob. word prob. word prob.MUSIC .090 LITERATURE .031 PLAY .136

DANCE .034 POEM .028 BALL .129SONG .033 POETRY .027 GAME .065PLAY .030 POET .020 PLAYING .042SING .026 PLAYS .019 HIT .032

SINGING .026 POEMS .019 PLAYED .031BAND .026 PLAY .015 BASEBALL .027

PLAYED .023 LITERARY .013 GAMES .025SANG .022 WRITERS .013 BAT .019

SONGS .021 DRAMA .012 RUN .019DANCING .020 WROTE .012 THROW .016

PIANO .017 POETS .011 BALLS .015PLAYING .016 WRITER .011 TENNIS .011RHYTHM .015 SHAKESPEARE .010 HOME .010ALBERT .013 WRITTEN .009 CATCH .010

MUSICAL .013 STAGE .009 FIELD .010

Topic 166Topic 82Topic 77

Figure 9. Three topics related to the word PLAY.

Document #29795

Bix beiderbecke, at age060 fifteen207, sat174 on the slope071 of a bluff055 overlooking027 the mississippi137 river137. He was listening077 to music077 coming009 from a passing043 riverboat. The music077 had already captured006 his heart157 as well as his ear119. It was jazz077. Bix beiderbecke had already had music077 lessons077. He showed002 promise134 on the piano077, and his parents035 hoped268 he might consider118 becoming a concert077 pianist077. But bix was interested268 in another kind050 of music077. He wanted268 to play077 the cornet. And he wanted268 to play077 jazz077...

Document #1883

There is a simple050 reason106 why there are so few periods078 of really great theater082 in our whole western046 world. Too many things300 have to come right at the very same time. The dramatists must have the right actors082, the actors082 must have the right playhouses, the playhouses must have the right audiences082. We must remember288 that plays082 exist143 to be performed077, not merely050 to be read254. ( even when you read254 a play082 to yourself, try288 to perform062 it, to put174 it on a stage078, as you go along.) as soon028 as a play082 has to be performed082, then some kind126 of theatrical082...

Document #21359

Jim296 has a game166 book254. Jim296 reads254 the book254. Jim296 sees081 a game166 for one. Jim296 plays166 the game166. Jim296 likes081 the game166 for one. The game166 book254 helps081 jim296. Don180 comes040 into the house038. Don180 and jim296 read254 the game166 book254. The boys020 see a game166 for two. The two boys020 play166 the game166. The boys020 play166 the game166 for two. The boys020 like the game166. Meg282 comes040 into the house282. Meg282 and don180 and jim296 read254 the book254. They see a game166 for three. Meg282 and don180 and jim296 play166 the game166. They play166...

Figure 10. Three TASA documents with the word play.

In a new context, having only observed a single word PLAY, there would be uncertainty over which of these topics could have generated this word. This uncertainty can be reduced by observing other less ambiguous words in context. The disambiguation process can be described by the process of iterative sampling as described in the previous section (Equation 4), where the assignment of each word token to a topic depends on the assignments of the other words in the context. In Figure 10, fragments of three documents are shown from TASA that use PLAY in

11











Document #29795


Document #1883


Document #21359




11











Document #29795


Document #1883


Document #21359




11











Document #29795


Document #1883


Document #21359




11



Discovering Objects and Their Location in Images

J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, W. T. Freeman.


Visual analogy

document !

word !

topics

- image !- visual word !- objects


System overview

Input image Compute visual words Discover visual topics


Bag of words• LDA model assumes exchangeability • Order of words does not matter

Stack visual word histograms as columns in matrix !Throw away spatial information!

DictionaryHistogramVisual wordsInterest regions

2

20

4

415

35 18

3921

1061

2

3

4

1


Low-rank matrix factorization

• Latent Semantic Analysis (Deerwester, et al. 1990) • Probabilistic Latent Semantic Analysis (Hofmann 2001)


Apply to Caltech 4 + background images

Faces 435 Motorbikes 800 Airplanes 800 Cars (rear) 1155 Background 900 Total: 4090


Most likely words given topic

Topic 1

Topic 2

Word 1

Word 2

Word 1

Word 2


Most likely words given topic

Topic 3

Topic 4

Word 1

Word 2

Word 1

Word 2


Polysemy

Regions that map to the same visual word:

In English, “bank” refers to: 1. an institution that handles money 2. the side of a river


Image clustering

Average confusion:

faces

motorbikes

cars-

rear

airplanes

background

faces

motorbikes

cars-

rear

airplanes

background

faces

motorbikes

cars-

rear

airplanes

background

faces

motorbikes

cars-

rear

airplanes

background

Confusion matrices:


Image as a mixture of topics (objects)


Discussion

• Discovered visual topics corresponding to object categories from a corpus of unlabeled images

• Used visual words representation and topic discovery models from the text understanding community

• Classification on unseen images is comparable to supervised methods on Caltech 5 dataset

• The discovered categories can be localized within an image


Learning Object Categories from contaminated data

Rob Fergus

Li Fei-Fei Pietro Perona

Andrew Zisserman


Google’s variable search performance


Web Image Query• Use Google’s automatic translation tool to translate keyword • Use: German, French, Italian, Spanish, Chinese, English • Take first 5 images returned using translated keywords to give validation set


Overall learning scheme

Collect images from Google

Keyword

Find regions

VQ regions

Learn 8 topic pLSA model

Propose bounding boxes

Learn 8 topic TSI-pLSA model

Translate keyword

Collect validation set

Pick best topic

Classifier


Motorbike – pLSA


Motorbike – TSI-pLSA


Car Rear – TSI-pLSA


Decomposition, Discovery, and Detection of Visual Categories Using Topic Models

54

Mario Fritz Bernt Schiele


• Learning of generative decompositions: ‣ learning intermediate representation instead of

object discovery • Learnt components range from local to global

!• Unsupervised google re-ranking task

!• Supervised detection by combining generative

decompositions with discriminative classifier !

• Quantitative Evaluation ‣ faster learning curve

‣ comparison to state-of-the-art on detection

‣ comparison to hard coded special purpose features

Generative Decomposition of Visual Categories for Unsupervised Learning and Supervised Detection

55


Object Representation

• Dense Object Representation: ‣ oriented gradient histogram representation ([Dalal&Triggs’05], [Bissacco06] )

‣ spatial grid of local histograms

‣ 9 possible edge orientation per local histogram !!!!!!

‣ e.g. 16 x 10 grid x 9 orientations

• Goal: ‣ Learn for each object a decomposition into sub-structures

‣ Learn for all objects a set of shared sub-structures

56


Object Representation

57

�1�=T�

j=1

+�2� +�3� +�5� + . . .

sparsity prior Dirichlet(�)

sparsity prior Dirichlet(�)


• Probabilistic Topic Model ‣ Document is composed (contains) a mixture of topics

‣ Each topics generates a mixture of words

‣ (Latent-Dirichlet-Allocation Model [Blei’03] estimated by Gibbs Sampling [Griffiths’04]) !

!!!

!!

• Dense Object Representation: ‣ Document = Image of an Object

‣ Word = localized edge orientation

Learning of Generative Decompositions

58

High Level Computer Vision - July 15, 2o14 59

Unsupervised Learning of Topics/Feature Representation (Pascal’06 dataset with 10 classes)


Visualizing Neighborhood in Latent (Topic) Space by k-means clustering

60


Unsupervised Learning Task: Results on Google-Reranking Task

61

• Task: Improve precision first 3 pages from google image search !

• Topics trained per Class ‣ cluster images in

latent topic space

‣ re-ranking using largest topic-clusters

!

• Quantitive results ‣ precision at 15% recall

‣ better than local features approach [Fergus05]


Extension to Supervised Detection

• Generative/Discriminative Training ‣ generative model:

- topic model decomposes objects into a mixture of topics

- objects represented via topic distribution vector

‣ discriminative training - SVM-classifier (RBF-kernel/Chi-Square kernel) using topic distribution vector

!!

62


Improved Learning Curve Experiment on UIUC cars

• SVM-Training using intermediate topic-representation: ‣ maximal performance reached

with 150 training samples

63

• SVM trained directly on oriented gradient histograms

‣ maximal performance reached with 250 training samples

‣ performance overall lower


Pascal 2006 Visual Object Class Challenge (10 classes)

64


Pascal 2006 Visual Object Class Challenge (10 classes)

• Outperforms best approaches on ‣ cars (+ 5.75% AP)

‣ bus (+ 9.14% AP)

‣ bicycle (+ 5.67% AP)

• Among the best ‣ motorbikes

• About average for articulated objects ‣ cat

‣ cow

‣ dog

‣ horse

‣ sheep

‣ person

65


slide credits

• Josef Sivic • Trevor Darrell • Rob Fergus • Fei-Fei Li

66

High Level Computer Vision Topic Models in Vision€¦ · High Level Computer Vision - July 15,...

Documents

Transcript of High Level Computer Vision Topic Models in Vision€¦ · High Level Computer Vision - July 15,...