Generative Models for Image Analysis
description
Transcript of Generative Models for Image Analysis
Generative Models for Image Analysis
Stuart Geman(with E. Borenstein, L.-B. Chang, W. Zhang)
I. Bayesian (generative) image modelsII. Feature distributions and data distributionsIII. Conditional modelingIV. Sampling and the choice of null distributionV. Other applications of conditional modeling
I. Bayesian (generative) image modelsPrior
( )
set of possible "interpretations" or "parses"
a particular interpretation
probability model on
* very structured and constrained
I
x I
P x I
* organizing principles: hierarchy and reusability
(Amit, Buhmann, Pogio, Yuille, Zhu, etc.)
* non-Markovian (context/content sensitive)
Conditional likelihood
( | ) ( | ) ( )P x y P y x P x
( | )
image
conditional probability model
y
P y x
Posterior
focus here on ( | )P y x
II. Feature distributions and data distributions
S{ }s s Sy y
pixel intensity at sy s S
( )
( )
"feature" e.g.
variance of patch
histogram of gradients, sift features, etc.
template correlation
probability model undergF
f y
P f ' ' category
(edge, corner, eye, face, ...)
g
image patch
Model patch through a feature model:
e.g. detection and recognition of eyes
S{ }s s Sy y
pixel intensity at sy s S
image patch
2 2
(1 )
1( )( )
| |( ) ( ) ( , )
1 1( ) ( )
| | | |
( )
Consider
and the model T
T
s ss
T
s ss s
ceC T
T T y yS
f y c y corr T y
T T y yS S
P c e
1,..., ,Problem: given samples of eye patches, learn and Ny y T
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
3
3.5
actually:
1
(1 ( )), 1
1 1
( ),..., ( )
( ( ),..., ( )) ( ( ))
Tempting to PRETEND that the data is :
T k
T
T T N
N Nc ye
T T T N C T kk k
c y c y
L c y c y P c y e
(1 ( ))
(1 ( ))
,
( ( ))
1( )
caution: is different from
T
T
T
c yeC T
c yeY
T
P c y e
P y eZ
(1 ( )),
1
,...,
( ) ( ( )) ( | ( ))
( ,..., ) ( | ( ))
1
1
BUT the data is y and
T
T k
N
e e eY C T Y T T
Nc y e
T N Y k T T kk
y
P y P c y P y C c y
L y y e P y C c y
The first is fine for estimating λ but not fine for estimating T
Use maximum likelihood…but what is the likelihood?
?
III. Conditional modeling
( )
( ) ( ( )) ( | ( ))
( ( )) ( | ).
For any category (e.g. "eye") and feature
Easy to model ; hard to model
g g gY F Y
g gF Y
g F f Y
P y P f y P y F f y
P f y P y F f
0
0
( ) ( )
( )
( )
Principle: start with a "null" or "background"
distribution and choose
1. consistent with , and
2. otherwise "as close as possible" to
gY Y
gF
Y
P y P y
P f
P y
0
0
: ( )( )
0
( ) ( ),
( ) arg min ( || )
( ) ( ( )) ( | ( ))
(( || ) ( ) log
has distribution
Specifically, given , and a null distribution
choose
(where
YgF
gF Y
gY Y Y
P F YP f
g gY F Y
P f P y
P y D P P
P y P f y P y F f y
PD P P P y
)
( ) is K-L divergence)
ydy
P y
Conditional modeling: a perturbation of the null distribution
Estimation
1
01
1
0
01
0
0 0
,..., ( )
( ,..., ) ( ( )) ( | ( ))
( ( )) ( | ( ))
( )
( ( )) ( | ( ))
( ( )) ( |
Given , and
=
gN F
Ng
N F k Y k kk
gNF k Y k k
k Y k
gF k Y k k
F k Y k
y y P f
L y y P f y P y F f y
P f y P y F f y
P y
P f y P y F f y
P f y P y F
1
01
( ))
( ( ))
( ( )) =
N
k k
gNF k
k F k
f y
P f y
P f y
Much Easier!
Example: learning eye templates
0
1
(1 ( )) 0
( ) ( ) ( , ),
( ) ( ( )) ( | ( ))
( | ( ))
M
M
m=1
Take and model patch as a
MIXTURE:
=
T m m mm
m Tm
m m m
T
e eY m C T Y T T
m
c y
m Y T T
f y c y corr T y
P y P c y P y C c y
e P y C c y
S{ }s s Sy y
pixel intensity at sy s S
image patch
Example: learning eye templates1 1 1 1
0
11
0
011
( ,..., | ,..., , ,..., , ,..., )
( ( )) ( | ( ))
( ( )) ( | ( ))
( )
(
M
M
=
T m m mm
T m m mm
Tm
N m m m
Ne
m C T k Y k T T kmk
eNC T k Y k T T k
mmk Y k
eC
m
L y y T T
P c y P y C c y
P c y P y C c y
P y
P
0
0 011
011
(1 ( ))
011
( )) ( | ( ))
( ( )) ( | ( ))
( ( ))
( ( ))
( ( ))
M
M
M
=
=
m m m
T m m mm
T mm
T mm
m T km
m
T mm
NT k Y k T T k
mk C T k Y k T T k
eNC T k
mmk C T k
c yN
mmk C T k
c y P y C c y
P c y P y C c y
P c y
P c y
e
P c y
Example: learning eye templates
0 | |1( ) ( )
256
Take (for now)
(iid uniform)SYP y
2| | ( )0 2( ( ))
Then, by a Central Limit Theorem:
|S|
2
T
T
S C y
C tP c y e
Example: learning eye templates
2
1 1 1 1
0
11
(1 ( ))
| | ( )11 2
( ,..., | ,..., , ,..., , ,..., )
( ( )) ( | ( ))M
M
|S|
2
T m m mm
m T km
m
T
N m m m
Ne
m C T k Y k T T kmk
c yN
m S C ymk
L y y T T
P c y P y C c y
e
e
Maximize the data likelihood for the mixing probabilities, the feature parameters, and the templates themselves…
Example: learning (right) eye templates
Example: learning (right) eye templates
2
(1 ( ))(1 ( ))
| | ( )1 11 1 2
)M M
What if we forget all this nonsense and just maximize
(instead of ?|S|
2
m T km
m T k mm
m T
c yN N
c y
m m S C ym mk k
ee
e
How good are the templates? A classification experiment…
0
0
1
( ) ( ( )) ( | ( ))
( ) ( ( )) ( | ( ))M
In general
or a mixture of these models
m
g gY F Y
g gY m F m Y m m
m
P y P f y P y F f y
P y P f y P y F f y
( ) is any function (feature),
such as a correlation with a SUBIMAGE.
Thus can index
* alternative models (e.g. 8 eye templates)
* transformations of scale, rotation, ...
mf y
m
(e.g. as in work of Amit and Trouve)
How good are the templates? A classification experiment…
Classify East Asian and South Asian * mixing over 4 scales, and 8 templates
East Asian: (L) examples of training images (M) progression of EM (R) trained templates
South Asian: (L) examples of training images (M) progression of EM (R) trained templates
Classification Rate: 97%
Other examples: noses 16 templates multiple scales, shifts, and rotations
samples from training set learned templates
Other examples: mixture of noses and mouths
samples from training set(1/2 noses, 1/2 mouths)
32 learned templates
Other examples: train on 58 faces …half with glasses…half without
32 learned templates
samples from training set
8 learned templates
Other examples: train on 58 faces …half with glasses…half without
8 learned templates
random eight of the 58 faces
row 2 to 4, top to bottom:templates ordered by posterior likelihood
Other examples: train random patches (“sparse representation”)
500 random 15x15 training patches from random internet images
24 10x10 templates
Other examples: coarse representation( ) ( , ( )),
( ) ( ( ), )?)
use where downconvert
(go other way for super res.:
f y Corr T D y D
f y Corr D T y
training of 8 low-res (10x10) templates
sample from training set(down-converted images)
IV. Sampling and the choice of null distribution 0
(1 ( )) 0( ) ( | ( ))M
m=1
Take a closer look at by (approximately) sampling from
m Tm
m m m
Y
c ygY m Y T T
P
P y e P y C c y
0 | | 1
| | | |
( )
* Standardize templates and patches:
* View & as distributions on the unit sphere in
* Then sample from by
1. choosing a mixing compo
m mm
m m
g SY Y
gY
T T y yT y
T T y y
P P R
y P y
(1 )
0
,...,1nent m according to
2. choosing a correlation according to
3. choosing a sample according to
and computing :
m
m
M
c
Y
c e
y P
y
(approximate) sampling…
| | 1unit sphere in SR
mT
y
y
c
(approximate) sampling…
032 samples from mixture model with white noiseYP
(approximate) sampling…
032 samples from mixture model with Caltech 101YP
(approximate) sampling…
032 samples from mixture model with from outdoor scenesYP
(approximate) sampling…
0
max | ( ) || |
32 samples from mixture model with random
patches satisfying
Y
s
s Ss
P
y y
y y
V. Other applications of conditional modeling
( )1. when two templates overlapgYP y
0 ( ) { }
2. Gibbs sampling: the problem is to draw a sample
from some distribution s s SP y y y
( )
* Given a sample at iteration from some
probability , visit a site t
t
P y s S
\( |* Replace by a sample from ) s s S sy P y y
\ \
01 \ \
0
: ( ) ( )
( ) ( ) ( | )
arg min ( || )
Then
S s t S s
t t S s s S s
P P y P y
P y P y P y y
D P P
3. Hierarchical models and the Markov Dilemma
{0,1}
1 'pair of eyes'
p
p
x
x
{0,1}
1 'left eye'l
l
x
x
{0,1}
1
'right eye'r
r
x
x
Markov model
Markov property…
Estimation
Computation
Representation
1Given , there are
probabilistic constraints
on the poses and
appearances of the
left and right eyes.
px
Hierarchical models and the Markov Dilemma{0,1}
1 'pair of eyes'
p
p
x
x
{0,1}
1 'left eye'l
l
x
x
{0,1}
1
'right eye'r
r
x
x
Markov
model0 ( )
1 ( )
( )
( ( ) |1 ( ) 1)A
More generally
Markov distribution
e.g. pair of eyes
attribute (e.g. relative
poses of two eyes)
P de
B
B
P x
x x B
a x
A a x x
sired
conditional distribution
1 0
: ( ( )|1 ( ) 1)( ( )|1 ( ) 1)
1 ( ) 1
1 00
( ) arg min ( || )
( ( ) |1 ( ) 1)( ) ( )
( ( ) |1 ( ) 1)
Choose
then
B
A B
B
P P A a x xP A a x x
x
A B
B
P x D P P
P A a x xP x P x
P A a x x
characters, plate sides
generic letter, generic number, L-junctions of sides
license plates
parts of characters, parts of plate sides
plate boundaries, strings (2 letters, 3 digits, 3 letters, 4 digits)
license numbers (3 digits + 3 letters, 4 digits + 2 letters)
Hierarchical models and the Markov dilemma
Original image Zoomed license region
Top object: Markov distribution
Top object: perturbed (“content-sensitive”) distribution
Hierarchical models and the Markov dilemma
PATTERN SYNTHESIS
= PATTERN ANALYSIS
Ulf Grenander