(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy...
-
Upload
benedict-hunter -
Category
Documents
-
view
224 -
download
4
Transcript of (Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy...
(Infinitely) Deep Learningin Vision
Max Welling (UCI)
collaborators:
Ian Porteous (UCI) Evgeniy Bart UCI/Caltech)
Pietro Perona (Caltech)
Outline
• Nonparametric Bayesian Taxonomy models for object categorization
• Hierarchical representations from networks of HDPs
Motivation• Building systems that learn for a lifetime, from “construction to destruction”
• E.g. unsupervised learning of object category taxonomies. (with E. Bart, I Porteous and P. Perona)
• Hierarchical models can help to:• Act as a prior to transfer information to new categories• Fast recognition• Classify at appropriate level of abstraction (Fidodogmammal)• Can define similarity measure (kernel)
• Nonparametric Bayesian framework allows models to grow their model complexity without bound (with growing dataset size)
Nonparametric Model for Visual Taxonomy
c
taxonomy
0.7
0.26
0.04
word distributionfor topic k
visual word
image / scene
dete
ctio
n
topic 1topic 2
topic k
prior over trees is nested CRP (Blei et al. 04)-a path is more popular if it has been traveled a lot
300 images from Corel database.
(experiments andfigures by E. Bart)
Taxonomy of Quilts
Beyond Trees?
• Deep belief nets are more powerful alternatives to taxonomies (in a modeling sense).
• Nodes in the hierarchy represent overlapping and increasingly abstract categories
• More sharing of statistical strength
• Proposal: stack LDA models
LDA (Blei, Ng, Jordan ‘02) ijX w token i in doc. j was assigned to type w (observed).
ijZ k token i in image j was assigned to topic k (hidden).
[ ]j Z
[ ]zX
ijX
ijZ
[ ]j Z image-specific distribution over topics.
[ ]zX Topic-specific distribution
over words.
Stage-wise LDA
1 1[ ]j ijz
10 [ ]ijzx
1ijZ
ijX
2 2[ ]j ijz
21 1[ ]ijzZ
2ijZ
• Use Z1 as pseudo-data for next layer.• After second LDA model is fit, we have 2 distributions over Z1.• We combine these distributions by taking their mixture.
1 1 1 1[ ] (1 ) [ ]Z Z
LDA
Special Words Layer
1 1[ ]j ijz
10 [ ]ijzx
1ijZ
2 2[ ]j ijz
21 1[ ]ijzZ
2ijZ
..0[ ]j ijx
ijX
• At the bottom layer we have an image-specific distribution over words.• It filters out image-idiosyncrasies which are not modeled well by topics
Special words topic model(Chemudgunda, Steyvers, Smyth, 06)
Last layer that has any dataassigned to it.
0[ ]j ijx
1 1[ ]j ijZ
[ ]j ijZ
[ ]L Lj ijZ
10 [ ]ijzx
21 1[ ]ijzZ
1 1[ ]LL L
ijzZ
ijX
1ijZ
ijZ
LijZ
..
1[ ]ijzZ
..
A switching variable haspicked this level – all layersabove are disconnected.
ModelAt every level a switching variable picks either or . The lowest level at which was picked disconnects the upstream variables.
0[ ]j ijx
1 1[ ]j ijZ
[ ]j ijZ
[ ]L Lj ijZ
10 [ ]ijzx
21 1[ ]ijzZ
1 1[ ]LL L
ijzZ
ijX
1ijZ
ijZ
LijZ
..2[ ]ijzZ
..
Collapsed Gibbs Sampling
• Given X, perform an upward pass to compute posterior probabilities for each level. Sample a level.
• From that level, sample all downstream Z-variables. (ignore upstream Z-variables)
Marginalize out
The Digits ...
All experiments done by I. Porteous(and finished 2 hours ago).
(I deeply believe in)
0[ ]j ijx This level filters out image-idiosyncrasies.No information from this level is “transferred” to test-data
10 [ ]ijzx
21 1[ ]ijzz 1
0 [ ]ijzx
1z
(level 1 topic distributions)
(level 2 topic distributions)
Brightness = average level assignment
Assignment to Levels
Properties
• Properties which are more specific to an image/document are explained at lower levels of hierarchy.
They act as a data-filters for the higher layers
• Higher levels become increasingly abstract, with larger “receptive fields”and higher variance (complex cell property). Limitation?
• Higher levels therefore “own” less data. Hence higher levels are have larger plasticity.
• The more data, the more levels become populated.
We infer the number of layers. • By marginalizing out parameters , all variables become coupled., Z
Conclusion• Nonparametric Bayesian models good candidate for “lifelong learning”
– need to improve computational efficiency & memory requirements
• Algorithm for growing object taxonomies as a function of observed data
• Proposal for deep belief net based on stacking LDA modules– more flexible representation & more sharing of statistical strength than taxonomy
• Infinite Extension:– LDA HDP
– mixture over levels Dirichlet process
– nr. hidden variables per layer and nr layers inferred
• demo?