Visual Recognition: Inference in Graphical Modelsrurtasun/courses/Visual... · Visual Recognition:...
Transcript of Visual Recognition: Inference in Graphical Modelsrurtasun/courses/Visual... · Visual Recognition:...
Visual Recognition: Inference in Graphical Models
Raquel Urtasun
TTI Chicago
Feb 28, 2012
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 1 / 31
Graphical models
Applications
Representation
Inference
message passing (LP relaxations)graph cuts
Learning
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 2 / 31
Inference with graph cuts
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 3 / 31
Submodular Functions
A Pseudo-boolean function f : {0, 1}n → < is submodular if
f (A) + f (B) ≥ f (A ∨ B)︸ ︷︷ ︸OR
+ f (A ∧ B)︸ ︷︷ ︸AND
∀A,B ∈ {0, 1}n
Example: n = 2, A = [1, 0], B = [0, 1]
f ([1, 0]) + f ([0, 1]) ≥ f ([1, 1]) + f ([0, 0])
Sum of submodular functions is submodular → Easy to proof.
Some energies in computer vision can be submodular
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 4 / 31
Minimizing submodular Functions
Pairwise submodular functions can be transformed to st-mincut/max-flow[Hammer, 65].
Very low running time ∼ O(n)
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 5 / 31
The ST-mincut problem
Suppose we have a graph G = {V ,E ,C}, with vertices V , Edges E andcosts C .
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 6 / 31
The ST-mincut problem
An st-cut (S,T) divides the nodes between source and sink.
The cost of a st-cut is the sum of cost of all edges going from S to T
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 7 / 31
The ST-mincut problem
The st-mincut is the st-cut with the minimum cost
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 8 / 31
Back to our energy minimization
Construct a graph such that
1 Any st-cut corresponds to an assignment of x
2 The cost of the cut is equal to the energy of x : E(x)
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 9 / 31
St-mincut and Energy Minimization
[Source: P. Kohli]
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 10 / 31
How are they equivalent?
A B
C D
0 1
0
1 xi
xj
= A + 0 0
C-A C-A
0 1
0
1
0 D-C
0 D-C
0 1
0
1
0 B+C-A-D
0 0
0 1
0
1 + +
if x1=1 add C-A
if x2 = 1 add D-C
B+C-A-D ! 0 is true from the submodularity of !ij
A = !ij (0,0) B = !ij(0,1) C = !ij
(1,0) D = !ij (1,1)
!ij (xi,xj) = !ij(0,0) + (!ij(1,0)-!ij(0,0)) xi + (!ij(1,0)-!ij(0,0)) xj + (!ij(1,0) + !ij(0,1) - !ij(0,0) - !ij(1,1)) (1-xi) xj
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 11 / 31
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 12 / 31
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 12 / 31
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 12 / 31
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 12 / 31
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 12 / 31
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 12 / 31
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 12 / 31
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 12 / 31
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 12 / 31
How to compute the St-mincut?
[Source: P. Kohli]
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 13 / 31
How does the code look like
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 14 / 31
How does the code look like
[Source: P. Kohli]
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 14 / 31
How does the code look like
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 14 / 31
How does the code look like
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 14 / 31
Graph cuts for multi-label problems
Exact Transformation to QPBF [Roy and Cox 98] [Ishikawa 03] [Schlesingeret al. 06] [Ramalingam et al. 08]
Very high computational cost
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 15 / 31
Alternative: Move making
[Source: P. Kohli]
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 16 / 31
Alternative: Move making
[Source: P. Kohli]
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 16 / 31
Computing the Optimal Move
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 17 / 31
Move Making Algorithms
[Source: P. Kohli]
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 18 / 31
Energy Minimization
Consider pairwise MRFs
E (f ) =∑
{p,q}∈N
Vp,q(fp, fq) +∑p
Dp(fp)
with N defining the interactions between nodes, e.g., pixels
Dp non-negative, but arbitrary.
Same as before, where Vp,q ≡ −θα and Dp ≡ −θp.
This is the graph-cuts notation.
Important to notice it’s the same thing.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 19 / 31
Energy Minimization
Consider pairwise MRFs
E (f ) =∑
{p,q}∈N
Vp,q(fp, fq) +∑p
Dp(fp)
with N defining the interactions between nodes, e.g., pixels
Dp non-negative, but arbitrary.
Same as before, where Vp,q ≡ −θα and Dp ≡ −θp.
This is the graph-cuts notation.
Important to notice it’s the same thing.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 19 / 31
Energy Minimization
Consider pairwise MRFs
E (f ) =∑
{p,q}∈N
Vp,q(fp, fq) +∑p
Dp(fp)
with N defining the interactions between nodes, e.g., pixels
Dp non-negative, but arbitrary.
Same as before, where Vp,q ≡ −θα and Dp ≡ −θp.
This is the graph-cuts notation.
Important to notice it’s the same thing.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 19 / 31
Energy Minimization
Consider pairwise MRFs
E (f ) =∑
{p,q}∈N
Vp,q(fp, fq) +∑p
Dp(fp)
with N defining the interactions between nodes, e.g., pixels
Dp non-negative, but arbitrary.
Same as before, where Vp,q ≡ −θα and Dp ≡ −θp.
This is the graph-cuts notation.
Important to notice it’s the same thing.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 19 / 31
Metric vs Semimetric
Two general classes of pairwise interactions
Metric if it satisfies for any set of labels α, β, γ
V (α, β) = 0 ↔ α = β
V (α, β) = V (β, α) ≥ 0
V (α, β) ≤ V (α, γ) + V (γ, β)
Semi-metric if it satisfies for any set of labels α, β, γ
V (α, β) = 0 ↔ α = β
V (α, β) = V (β, α) ≥ 0
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 20 / 31
Metric vs Semimetric
Two general classes of pairwise interactions
Metric if it satisfies for any set of labels α, β, γ
V (α, β) = 0 ↔ α = β
V (α, β) = V (β, α) ≥ 0
V (α, β) ≤ V (α, γ) + V (γ, β)
Semi-metric if it satisfies for any set of labels α, β, γ
V (α, β) = 0 ↔ α = β
V (α, β) = V (β, α) ≥ 0
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 20 / 31
Examples for 1D label set
Truncated quadratic is a semi-metric
V (α, β) = min(K , |α− β|2)
with K a constant.
Truncated absolute distance is a metric
V (α, β) = min(K , |α− β|)
with K a constant.
For multi-dimensional, replace | · | by any norm.
Potts model is a metric
V (α, β) = K · T (α 6= β)
with T (·) = 1 if the argument is true and 0 otherwise.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 21 / 31
Examples for 1D label set
Truncated quadratic is a semi-metric
V (α, β) = min(K , |α− β|2)
with K a constant.
Truncated absolute distance is a metric
V (α, β) = min(K , |α− β|)
with K a constant.
For multi-dimensional, replace | · | by any norm.
Potts model is a metric
V (α, β) = K · T (α 6= β)
with T (·) = 1 if the argument is true and 0 otherwise.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 21 / 31
Examples for 1D label set
Truncated quadratic is a semi-metric
V (α, β) = min(K , |α− β|2)
with K a constant.
Truncated absolute distance is a metric
V (α, β) = min(K , |α− β|)
with K a constant.
For multi-dimensional, replace | · | by any norm.
Potts model is a metric
V (α, β) = K · T (α 6= β)
with T (·) = 1 if the argument is true and 0 otherwise.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 21 / 31
Examples for 1D label set
Truncated quadratic is a semi-metric
V (α, β) = min(K , |α− β|2)
with K a constant.
Truncated absolute distance is a metric
V (α, β) = min(K , |α− β|)
with K a constant.
For multi-dimensional, replace | · | by any norm.
Potts model is a metric
V (α, β) = K · T (α 6= β)
with T (·) = 1 if the argument is true and 0 otherwise.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 21 / 31
Binary Moves
α− β moves works for semi-metrics
α expansion works for V being a metric
Figure: Figure from P. Kohli tutorial on graph-cuts
For certain x1 and x2, the move energy is sub-modular QPBF
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 22 / 31
Swap Move
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 23 / 31
Swap Move
[Source: P. Kohli]
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 23 / 31
Expansion Move
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 24 / 31
Expansion Move
[Source: P. Kohli]Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 24 / 31
More formally
Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .
There is a one to one correspondence between labelings f and partitions P.
Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.
The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.
Given a label l , a move from a partition P (labeling f ) to a new partition P ′
(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′
l ⊂ Pl .
An α-expansion move allows any set of image pixels to change their labelsto α.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 25 / 31
More formally
Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .
There is a one to one correspondence between labelings f and partitions P.
Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.
The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.
Given a label l , a move from a partition P (labeling f ) to a new partition P ′
(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′
l ⊂ Pl .
An α-expansion move allows any set of image pixels to change their labelsto α.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 25 / 31
More formally
Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .
There is a one to one correspondence between labelings f and partitions P.
Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.
The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.
Given a label l , a move from a partition P (labeling f ) to a new partition P ′
(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′
l ⊂ Pl .
An α-expansion move allows any set of image pixels to change their labelsto α.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 25 / 31
More formally
Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .
There is a one to one correspondence between labelings f and partitions P.
Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.
The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.
Given a label l , a move from a partition P (labeling f ) to a new partition P ′
(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′
l ⊂ Pl .
An α-expansion move allows any set of image pixels to change their labelsto α.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 25 / 31
More formally
Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .
There is a one to one correspondence between labelings f and partitions P.
Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.
The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.
Given a label l , a move from a partition P (labeling f ) to a new partition P ′
(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′
l ⊂ Pl .
An α-expansion move allows any set of image pixels to change their labelsto α.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 25 / 31
More formally
Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .
There is a one to one correspondence between labelings f and partitions P.
Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.
The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.
Given a label l , a move from a partition P (labeling f ) to a new partition P ′
(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′
l ⊂ Pl .
An α-expansion move allows any set of image pixels to change their labelsto α.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 25 / 31
Example
Figure: (a) Current partition (b) local move (c) α− β-swap (d) α-expansion.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 26 / 31
Algorithms
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 27 / 31
Finding optimal Swap move
Given an input labeling f (partition P) and a pair of labels α, β we want tofind a labeling f̂ that minimizes E over all labelings within one α− β-swapof f .
This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gαβ = (Vαβ , Eαβ).
The structure of this graph is dynamically determined by the currentpartition P and by the labels α, β.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 28 / 31
Finding optimal Swap move
Given an input labeling f (partition P) and a pair of labels α, β we want tofind a labeling f̂ that minimizes E over all labelings within one α− β-swapof f .
This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gαβ = (Vαβ , Eαβ).
The structure of this graph is dynamically determined by the currentpartition P and by the labels α, β.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 28 / 31
Finding optimal Swap move
Given an input labeling f (partition P) and a pair of labels α, β we want tofind a labeling f̂ that minimizes E over all labelings within one α− β-swapof f .
This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gαβ = (Vαβ , Eαβ).
The structure of this graph is dynamically determined by the currentpartition P and by the labels α, β.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 28 / 31
Graph Construction
The set of vertices includes the two terminals α and β, as well as imagepixels p in the sets Pα and Pβ (i.e., fp ∈ {α, β}).
Each pixel p ∈ Pαβ is connected to the terminals α and β, called t-links.
Each set of pixels p, q ∈ Pαβ which are neighbors is connected by an edgeep,q
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 29 / 31
Computing the Cut
Any cut must have a single t-link not cut.
This defines a labeling
There is a one-to-one correspondences between a cut and a labeling.
The energy of the cut is the energy of the labeling.
See Boykov et al, ”fast approximate energy minimization via graph cuts”PAMI 2001.
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 30 / 31
Properties
For any cut, then
Raquel Urtasun (TTI-C) Visual Recognition Feb 28, 2012 31 / 31