(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy...

(Infinitely) Deep Learningin Vision

Max Welling (UCI)

collaborators:

Ian Porteous (UCI) Evgeniy Bart UCI/Caltech)

Pietro Perona (Caltech)

Outline

• Nonparametric Bayesian Taxonomy models for object categorization

• Hierarchical representations from networks of HDPs

Motivation• Building systems that learn for a lifetime, from “construction to destruction”

• E.g. unsupervised learning of object category taxonomies. (with E. Bart, I Porteous and P. Perona)

• Hierarchical models can help to:• Act as a prior to transfer information to new categories• Fast recognition• Classify at appropriate level of abstraction (Fidodogmammal)• Can define similarity measure (kernel)

• Nonparametric Bayesian framework allows models to grow their model complexity without bound (with growing dataset size)

Nonparametric Model for Visual Taxonomy

c

taxonomy

0.7

0.26

0.04

word distributionfor topic k

visual word

image / scene

dete

ctio

n

topic 1topic 2

topic k

prior over trees is nested CRP (Blei et al. 04)-a path is more popular if it has been traveled a lot

300 images from Corel database.

(experiments andfigures by E. Bart)

Taxonomy of Quilts

Beyond Trees?

• Deep belief nets are more powerful alternatives to taxonomies (in a modeling sense).

• Nodes in the hierarchy represent overlapping and increasingly abstract categories

• More sharing of statistical strength

• Proposal: stack LDA models

LDA (Blei, Ng, Jordan ‘02) ijX w token i in doc. j was assigned to type w (observed).

ijZ k token i in image j was assigned to topic k (hidden).

[ ]j Z

[ ]zX

ijX

ijZ

[ ]j Z image-specific distribution over topics.

[ ]zX Topic-specific distribution

over words.

Stage-wise LDA

1 1[ ]j ijz

10 [ ]ijzx

1ijZ

ijX

2 2[ ]j ijz

21 1[ ]ijzZ

2ijZ

• Use Z1 as pseudo-data for next layer.• After second LDA model is fit, we have 2 distributions over Z1.• We combine these distributions by taking their mixture.

1 1 1 1[ ] (1 ) [ ]Z Z

LDA

Special Words Layer

1 1[ ]j ijz

10 [ ]ijzx

1ijZ

2 2[ ]j ijz

21 1[ ]ijzZ

2ijZ

..0[ ]j ijx

ijX

• At the bottom layer we have an image-specific distribution over words.• It filters out image-idiosyncrasies which are not modeled well by topics

Special words topic model(Chemudgunda, Steyvers, Smyth, 06)

Last layer that has any dataassigned to it.

0[ ]j ijx

1 1[ ]j ijZ

[ ]j ijZ

[ ]L Lj ijZ

10 [ ]ijzx

21 1[ ]ijzZ

1 1[ ]LL L

ijzZ

ijX

1ijZ

ijZ

LijZ

..

1[ ]ijzZ

..

A switching variable haspicked this level – all layersabove are disconnected.

ModelAt every level a switching variable picks either or . The lowest level at which was picked disconnects the upstream variables.

0[ ]j ijx

1 1[ ]j ijZ

[ ]j ijZ

[ ]L Lj ijZ

10 [ ]ijzx

21 1[ ]ijzZ

1 1[ ]LL L

ijzZ

ijX

1ijZ

ijZ

LijZ

..2[ ]ijzZ

..

Collapsed Gibbs Sampling

• Given X, perform an upward pass to compute posterior probabilities for each level. Sample a level.

• From that level, sample all downstream Z-variables. (ignore upstream Z-variables)

Marginalize out

The Digits ...

All experiments done by I. Porteous(and finished 2 hours ago).

(I deeply believe in)

0[ ]j ijx This level filters out image-idiosyncrasies.No information from this level is “transferred” to test-data

10 [ ]ijzx

21 1[ ]ijzz 1

0 [ ]ijzx

1z

(level 1 topic distributions)

(level 2 topic distributions)

Brightness = average level assignment

Assignment to Levels

Properties

• Properties which are more specific to an image/document are explained at lower levels of hierarchy.

They act as a data-filters for the higher layers

• Higher levels become increasingly abstract, with larger “receptive fields”and higher variance (complex cell property). Limitation?

• Higher levels therefore “own” less data. Hence higher levels are have larger plasticity.

• The more data, the more levels become populated.

We infer the number of layers. • By marginalizing out parameters , all variables become coupled., Z

Conclusion• Nonparametric Bayesian models good candidate for “lifelong learning”

– need to improve computational efficiency & memory requirements

• Algorithm for growing object taxonomies as a function of observed data

• Proposal for deep belief net based on stacking LDA modules– more flexible representation & more sharing of statistical strength than taxonomy

• Infinite Extension:– LDA HDP

– mixture over levels Dirichlet process

– nr. hidden variables per layer and nr layers inferred

• demo?

(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy...

Documents

Transcript of (Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy...