LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning...
-
Upload
alexander-wilcox -
Category
Documents
-
view
221 -
download
0
Transcript of LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning...
LOCUS(Learning Object Classes with Unsupervised Segmentation)
A variational approach to learning model-based segmentation.
John Winn Microsoft Research Cambridge
with Nebojsa Jojic, MSR Redmond
7th July 2006
Overview
Learning object models
The LOCUS model
Experiments & results
Extensions to LOCUS
Goal
Long Term Goal
Recognise ~10,000 object classes.
Learning from ‘buckets’ of images
Horsemodel
Learningalgorithm
•Object Segmentation
•Object Recognition
•Object Detection
Object segmentation
+Horsemodel
LOCUS
Related work
Constellation modelsWeakly supervised
Probabilistic framework
Sparse
No segmentation
Object class recognition by unsupervised scale-invariant learning. R. Fergus, P. Perona, and A. Zisserman. CVPR 2003A Bayesian approach to unsupervised One-Shot learning of Object categories. L. Fei-Fei, R. Fergus, and P. Perona. ICCV 2003
Fragment-based
Learning to segment. E. Borenstein and S. Ullman. ECCV 2004Combining top-down and bottom-up segmentation. E. Borenstein, E. Sharon, and S. Ullman. CVPR 2004
Dense model
Supervised
Non-probabilistic
No global shape model
Codebook-based
Combined object categorization and segmentation with an implicit shape model. B. Leibe, A. Leonardis, and B. Schiele. ECCV ‘04
Probabilistic
Dense model
Supervised
Ad-hoc inference
OBJ CUTProbabilistic
Dense model
Supervised
Requires video
LOCUS overview
Weakly supervised learning Buckets of images - no annotation required.
Probabilistic generative modelof both object and background.
Dense modelAll pixels modelled, not just at interest points.
Combines global and local cuesModels global shape and local appearance + edges.
Iterative inference processSimultaneous localisation, segmentation, pose estimation.
The LOCUS model
LOCUS model
Deformation field D
Position & size T
Class shape π Class edge sprite μo,σo
Edge image e
Image
Object appearance λ1
Background appearance λ0
Mask m
Shared between images
Different for each image
LOCUS model: appearance
background
object
Mask m
Background mixture coefficients
λ0
Objectmixture coefficients
λ1Image z
Shared mixture components:
LOCUS model: mask
background
object
8-neighbour Markov Random Field (as used in GrabCut)
favours segmentation along contrast edges
LOCUS model: shape/position
…
…
TNT4T2 T3T1
Transformation
Class shape π
Iterative inference
…
…
TNT4T2 T3T1
Class shape π
Iteration #1
Iterative inference
…
…
TNT4T2 T3T1
Class shape π
Iteration #2
Iterative inference
…
…
TNT4T2 T3T1
Class shape π
Iteration #3
Iterative inference
…
…
TNT4T2 T3T1
Class shape π
Iteration #5
Iterative inference
…
…
TNT4T2 T3T1
Class shape π
Iteration #8
Iterative inference
…
…
TNT4T2 T3T1
Class shape π
Iteration #12
Non-rigid objectsClass shape π
Translation and scale is not enough.
LOCUS model: pose
Class shape π
T 0 50 100 150 200
0
50
100
150
Deformation field D
5x5 blocks
Prior ensures smoothness
LOCUS model: poseClass shape π
TD1 TD2 TD3 TDN
…
…
LOCUS model: edge
TD1 TD2 TD3 TDN
…
…
Edge images e …
Original images
Class edge sprite μo,σo
LOCUS model: overview
Deformation field D
Position & size T
Class shape π Class edge sprite μo,σo
Edge image e
Image
Object appearance λ1
Background appearance λ0
Mask m
Shared between images
Different for each image
Inference
Aim to infer all latent variables, For each image: background appearance λ0, object
appearance λ1, deformation D, transformation T, mask m, Class variables: shape π, edge sprite μo, σo.
Bayesian inference is carried out using variational message passing with a fully factorised variational distribution.
Optimisation of grid-structured variational free energy terms (relating to the deformation field D and the mask m) achieved using graph cuts.
Experiments & results
Experiments
LOCUS applied to 8 sets of 20 images each containing objects of the same class.
•Horses•Faces•Cars (rear)•Cars (side)
•Motorbikes•Aeroplanes•Cows•Trees
For each class, we ran separate experiments for color and texture appearance models.
Results: horses
Results: horses
Results: cars
Results: cars
Results: remaining classes
Cars (rear)Faces Motorbikes Planes Cows Trees
Segmentation accuracy
Horses Cars (side)
LOCUS (color)LOCUS (texture)
unannotated training images
93.1%93.0%
91.4%94.0%
Borenstein et al.hand-segmented training images
93.6% -
Each image segmented separately
88.6% 82.1%
To evaluate segmentation quantitively, we used hand segmentations for horses and cars (side).
Object registration
Transformation + deformation field registers object outlines (and some internal edges).
Object registration
Extensions to LOCUS
Recognition + segmentation
Object recognition using only global shape:
Overall: 88% accuracy.
Probabilistic Index Maps2 indices 9 indices
Each image has a ‘palette’ of appearance models – palette invariance.
Probabilistic Index Maps
Learning objects from video
Object shape
Object edge sprite
Locumotion
Add flow and track constraints to achieve motion segmentation:
Tracking/flow estimation by Larry Zitnick
Conclusions
LOCUS gives unsupervised segmentations of accuracy equivalent to state-of-the-art supervised methods.
General-purpose model allows:Object localisationPose estimationObject segmentationMotion segmentation/object trackingObject recognition/detection (in combination
with discriminative model)
Questions ?