Bayesian Belief Propagation and Image Interpretation

43
Bayesian Belief Propagation and Image Interpretation Presenter: David Rosenberg March 13, 2002

description

Bayesian Belief Propagation and Image Interpretation. March 13, 2002. Presenter: David Rosenberg. Overview. Deals with problems in which we want to estimate local scene properties that may depend, to some extent, on global properties - PowerPoint PPT Presentation

Transcript of Bayesian Belief Propagation and Image Interpretation

Page 1: Bayesian Belief Propagation and Image Interpretation

Bayesian Belief Propagation and Image Interpretation

Presenter: David Rosenberg

March 13, 2002

Page 2: Bayesian Belief Propagation and Image Interpretation

Overview

• Deals with problems in which we want to estimate local scene properties that may depend, to some extent, on global properties

• Paper demonstrates that Bayesian Belief Propagation (BBP) is a very good technique for this class of problems– In the paper’s examples, the answers are often

significantly better and converge significantly faster.

Page 3: Bayesian Belief Propagation and Image Interpretation

An Introductory Problem: Interpolation

• Find a sequence of consecutive segments that– approximate our data points and– has small derivatives for each segment.

? ?

Page 4: Bayesian Belief Propagation and Image Interpretation

Interpolation Problem (continued)

• We can formalize this problem as minimizing the following cost functional:

* *1 1( , , , , , )n nY y y y y

*

* 2 21

: observed

( ) ( ) ( )k

k k i iik y

J Y y y y y

• Standard solutions to minimization problems:– Gradient Descent / Relaxation• Gauss-Seidel relaxation• Successive over relaxation (SOR)

– Simulated Annealing

NOTE: J(Y) is a sum of terms, each containing “neighboring” variables

Page 5: Bayesian Belief Propagation and Image Interpretation

The Core Idea

• We can rewrite certain cost functional minimization problems as MAP estimate problems for Markov Random Fields

• This is important to because Bayesian Belief Propagation gives optimal solutions very quickly, for MRFs with certain graph structures

Page 6: Bayesian Belief Propagation and Image Interpretation

Mapping Cost Minimization to MAP

• Suppose our cost functional has the form:

( ) ( )C

C CY Y

J Y J Y

• Note that minimizing J(Y) is equivalent to maximizing exp( - J(Y) ).

• Then we can also find Y that maximizes( ) exp( ( ))

exp( ( ))

C

C

J YC C

Y Y

C CY Y

e J Y

J Y

Already looks like a product of localized potentials.

Page 7: Bayesian Belief Propagation and Image Interpretation

Mapping Cost Minimization to MAP (continued)

• By constraining J to be a sum, we’ve reduced our problem to the maximization of:

• Since this function is strictly positive, we can normalize to create a PDF.

exp( ( ))C

C CY Y

J Y

1( ) exp( ( ))

C

C CY Y

P Y J YZ

• (This could be a Gibbs distribution!)

Page 8: Bayesian Belief Propagation and Image Interpretation

Mapping Cost Minimization to MAP (continued)

• So finding the y’s that minimize J(Y), subject to the observations that constrain some y*s is equivalent to finding the mode (peak) of the distribution P(Y|Y*).

• This is just the MAP estimate of Y given Y*.

Page 9: Bayesian Belief Propagation and Image Interpretation

Cost Minimization to MAP on MRF (continued)

• We have

1( ) exp( ( ))

C

C CY Y

P Y J YZ

• If we can associate each r.v. in Y to a node of a

graph G– such that each of the YC’s is a clique in G,

– then P(Y) is a Gibbs distribution w.r.t. G.

• If P(Y) is a Gibbs distribution w.r.t. a graph G, – then the r.v.’s Y are a Markov random field (MRF),– (Hammersley-Clifford Theorem)

Page 10: Bayesian Belief Propagation and Image Interpretation

MAP on MRF to Cost Function Minimization

• Start with the MAP problem on an MRF.• Every MRF has a Gibbs distribution,– also by the Hammersley-Clifford theorem.

• By reversing our steps, we will find a cost function J(Y) whose minimization corresponds to the MAP estimate on the MRF.

• Thus any problem we can solve by finding the MAP estimate on an MRF, we can also solve by minimizing some cost functional.

Page 11: Bayesian Belief Propagation and Image Interpretation

Our Simplified Problem (from paper)• We have– hidden “scene” variables: Xj– observed “image” variables Yj

• We assume that the following graph structure is implicit in our cost functional:

• The Problem:– Given some Yj’s, estimate the Xj’s

y1

),( 11 yx

),( 21 xx

),( 22 yx

),( 32 xx

),( 33 yx

x1

y2

x2

y3

x3

Page 12: Bayesian Belief Propagation and Image Interpretation

Straightforward Exact Inference

• Given the joint PDF– typically specified using potential functions

• We can just marginalize out to – get the aposteriori distribution for each Xj

• We can immediately extract the– MAP estimate -- just the mode of the aposteriori

distribution– Least squares estimate -- just the expected value of

the aposteriori distribution

Page 13: Bayesian Belief Propagation and Image Interpretation

),,,,,(sumsummean 3213211321

yyyxxxPxxxx

MMSE

Derivation of belief propagationy1

),( 11 yx

),( 21 xx

),( 22 yx

),( 32 xx

),( 33 yx

x1

y2

x2

y3

x3

Page 14: Bayesian Belief Propagation and Image Interpretation

),(),(sum

),(),(sum

),(mean

),(),(

),(),(

),(sumsummean

),,,,,(sumsummean

3233

2122

111

3233

2122

111

3213211

3

2

1

321

321

xxyx

xxyx

yxx

xxyx

xxyx

yxx

yyyxxxPx

x

x

xMMSE

xxxMMSE

xxxMMSE

The posterior factorizes

y1

),( 11 yx

),( 21 xx

),( 22 yx

),( 32 xx

),( 33 yx

x1

y2

x2

y3

x3

Page 15: Bayesian Belief Propagation and Image Interpretation

Propagation rules

y1

),( 11 yx

),( 21 xx

),( 22 yx

),( 32 xx

),( 33 yx

x1

y2

x2

y3

x3),(),(sum

),(),(sum

),(mean

),(),(

),(),(

),(sumsummean

),,,,,(sumsummean

3233

2122

111

3233

2122

111

3213211

3

2

1

321

321

xxyx

xxyx

yxx

xxyx

xxyx

yxx

yyyxxxPx

x

x

xMMSE

xxxMMSE

xxxMMSE

Page 16: Bayesian Belief Propagation and Image Interpretation

Propagation rules

y1

),( 11 yx

),( 21 xx

),( 22 yx

),( 32 xx

),( 33 yx

x1

y2

x2

y3

x3

),(),(sum

),(),(sum

),(mean

3233

2122

111

3

2

1

xxyx

xxyx

yxx

x

x

xMMSE

)( ),( ),(sum)( 23222211

21

2

xMyxxxxMx

Page 17: Bayesian Belief Propagation and Image Interpretation

Belief, and message updates

jii =

ijNk

jkjji

xi

ji xMxxxM

j \)(ij )(),( )(

j

)(

)( )(jNk

jkjjj xMxb

Page 18: Bayesian Belief Propagation and Image Interpretation

Optimal solution in a chain or tree:Belief Propagation

• “Do the right thing” Bayesian algorithm.• For Gaussian random variables over time:

Kalman filter.• For hidden Markov models: forward/backward

algorithm (and MAP variant is Viterbi).

Page 19: Bayesian Belief Propagation and Image Interpretation

No factorization with loops!

y1

x1

y2

x2

y3

x3

),(),(sum

),(),(sum

),(mean

3233

2122

111

3

2

1

xxyx

xxyx

yxx

x

x

xMMSE

31 ),( xx

Page 20: Bayesian Belief Propagation and Image Interpretation

The (Discrete) Interpolation Problem

• Used the integers {1,…,5} as the domain and range.

• Used evidence:

0 1 2 3 4 5 60

1

2

3

4

5

6

Page 21: Bayesian Belief Propagation and Image Interpretation

The (Discrete) Interpolation Problem

• How do we put the evidence into the MRF?– As a prior on the random variables.– Comes from the noise or sensor model.

• I tried two priors:– 1. (example priors)• Observed 1 --> Prior: 25 16 9 4 1• Observed 3 --> Prior: 9 16 25 16 9

– 2. (example priors)• Observed 1 --> Prior: 625 256 81 4 1

y1

),( 11 yx

),( 21 xx

),( 22 yx

),( 32 xx

),( 33 yx

x1

y2

x2

y3

x3

Page 22: Bayesian Belief Propagation and Image Interpretation

The (Discrete) Interpolation Problem

• How do we specify the derivative constraint?– We adjust the potential functions between adjacent

random variables:• We want potential functions that look something

like: –10 1 1 1 11 10 1 1 11 1 10 1 11 1 1 10 11 1 1 1 10

• I call the ratio “10:1” the tightness. y1

),( 11 yx

),( 21 xx

),( 22 yx

),( 32 xx

),( 33 yx

y2

x2

y3

x3x1

Page 23: Bayesian Belief Propagation and Image Interpretation

Results for First Prior

2 4

2

4

2 4

2

4

2 4

2

4

2 4

2

4

2 4

2

4

2 4

2

4

2 4

2

4

2 4

2

4

2 4

2

4

2 4

2

4

2 4

2

4

2 4

2

4

2 4

2

4

2 4

2

4

2 4

2

4

2 4

2

4

2 4

2

4

2 4

2

4

Tightness = 2 Tightness = 4 Tightness = 6

Page 24: Bayesian Belief Propagation and Image Interpretation

Results for Second Prior

1 2 3 4 5

2

4

1 2 3 4 5

2

4

1 2 3 4 5

2

4

1 2 3 4 5

2

4

1 2 3 4 5

2

4

1 2 3 4 5

2

4

1 2 3 4 5

2

4

1 2 3 4 5

2

4

1 2 3 4 5

2

4

1 2 3 4 5

2

4

1 2 3 4 5

2

4

1 2 3 4 5

2

4

1 2 3 4 5

2

4

1 2 3 4 5

2

4

1 2 3 4 5

2

4

1 2 3 4 5

2

4

1 2 3 4 5

2

4

1 2 3 4 5

2

4

Tightness = 2 Tightness = 4 Tightness = 6

Page 25: Bayesian Belief Propagation and Image Interpretation

Weiss’s Examples

• Interior/exterior example.• Motion example• In both examples, BBP had results that were

much better, and converged much faster than other techniques.

Page 26: Bayesian Belief Propagation and Image Interpretation

Conclusions: When to use BBP?

• Among all problems expressible as cost function minimization.

• Among problems expressible as MAP or MMSE problems on MRFs– Graph topology should be relatively sparse.• Messages per iteration increases linearly with the

number of edges– Reasonably small number of dimensions for r.v.

distributions.

• Approximate Inference

Page 27: Bayesian Belief Propagation and Image Interpretation

EXTRA SLIDES

Page 28: Bayesian Belief Propagation and Image Interpretation

Slide on Weiss’s Motion Detection

Page 29: Bayesian Belief Propagation and Image Interpretation

Mention some approximate inference approaches

Page 30: Bayesian Belief Propagation and Image Interpretation

Complexity issues with message passing

• How long are messages• How many messages do we have to pass per

iteration• How many iterations until convergence• Problem quickly becomes intractible

Page 31: Bayesian Belief Propagation and Image Interpretation

Slides on message passing with jointly gaussian distributions???

Page 32: Bayesian Belief Propagation and Image Interpretation

BACKUP SLIDES

Page 33: Bayesian Belief Propagation and Image Interpretation

Markov Random Fields

• Let G be an undirected graph– nodes: {1, …, n}

• Associate a random variable X_t to each node t in G.

• (X_1, …, X_n) is a Markov random field on G if– Every r.v. is independent of its nonneighbors

conditioned on its neighbors.– P(X_t=x_t | X_s = x_s for all s \neq t} = P(X_t=x_t | X_s

= x_s for all s\in N(t)),where N(s) be the set of neighbors of a node s.

Page 34: Bayesian Belief Propagation and Image Interpretation

Specifying a Markov Random Field

• Nice if we could just specify P( X | N(X) )for all r.v.’s X (as with Bayesian networks)

• Unfortunately, this will overspecify the joint PDF.– E.g. X_1 -- X_2. • Joint PDF has 3 degrees of freedom• Conditiona PDFs X_1|X_2 and X_2|X_1 have 2

degrees of freedom each

• The Hammersley-Clifford Theorem helps to specify MRFs

Page 35: Bayesian Belief Propagation and Image Interpretation

The Gibbs Distribution

• A Gibbs distribution w.r.t. graph G is a probability mass function that can be expressed in the form– P(x_1, … , x_n) = Prod _ Cliques C V_C(x_1, .., x_n)– where V_C(x_1, …, x_n) depends only on those x_I in

C.

• We can combine potential functions into products from maximal cliques, so– P(x_1, … , x_n) = Prod _ MaxCliques C V_C(x_1, ..,

x_n)– This may be better in certain circumstances because

we don’t have to specify as many potential functions

Page 36: Bayesian Belief Propagation and Image Interpretation

Hammersley Clifford Theorem

• Let the r.v’s {X_j} have a positive joint probability mass function.

• Then the Hammersley Clifford Theorem says that {X_j} is a Markov random field on graph G iff it has a Gibbs distirubtion w.r.t G.– Side Note: Hammserley and Clifford discovered this theorem in 1971, but they

didn’t publish it because they kept thinking they should be able to remove or relax the positivity assumption. They couldn’t. Clifford published the result in 1990.

• Specifying the potential functions is equivalent to specifying the joint probability distribution of all variables.

• Now it’s easy to specify a valid MRF– still not easy to determine the degrees of freedom in the

distribution (normalization)

Page 37: Bayesian Belief Propagation and Image Interpretation
Page 38: Bayesian Belief Propagation and Image Interpretation

Incorporating Evidence nodes into MRFs

• We would like to have nodes that don’t change their beliefs -- they are just observations.

• Can we do this via the potential functions on the non-maximal clique containing just that node?

• I tink this is what they do in the Yair Weiss implementation

• What if we don’t want to specify a potential function? Make it identically one, since it’s in a product.

Page 39: Bayesian Belief Propagation and Image Interpretation

From cost functional to transition matrix

Page 40: Bayesian Belief Propagation and Image Interpretation

From cost functional to update rule

Page 41: Bayesian Belief Propagation and Image Interpretation

From update rule to transition matrix

Page 42: Bayesian Belief Propagation and Image Interpretation

The factoriation into pair wise potentials -- good for general Markov

networks

Page 43: Bayesian Belief Propagation and Image Interpretation

Other Stuff

• For shorthand, we will write x = (x_1, …, x_n).