Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2....

25
Chapter 8-3 Markov Random Fields 1

Transcript of Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2....

1

Chapter 8-3Markov Random Fields

Topics

1. Introduction1. Undirected Graphical Models2. Terminology

2. Conditional Independence3. Factorization Properties

1. Maximal Cliques2. Hammersley-Clifford Theorem

4. Potential Function and Energy Function5. Image de-noising example6. Relation to Directed Graphs

1. Converting directed graphs to undirected1. Moralization

2. D-map, I-map and Perfect map

2

3

1. Introduction

• Directed graphical models specify– Factorization of joint distribution over set of variables

– Define set of conditional independence properties thatmust be satisfied by any distribution that factorizesaccording to graph

• MRF is an undirected graphical model that also– Specifies a factorization

– Conditional independence relations

4

Markov Random Field Terminology

• Also known as Markov network orundirected graphical model

• Set of nodes corresponding to variables or groups of variables

• Set of links connecting pairs of nodes

• Links are undirected (do not carry arrows)

• Conditional independence is an important concept

5

2. Conditional Independence

• In directed graphs conditionalindependence tested by d-separation– Whether two sets of nodes were blocked

– Definition of blocked subtle due to presenceof head-to-head nodes

• In MRFs asymmetry between parent-childremoved– Subtleties with head-to-head no longer arise

6

Conditional Independence Test

• Identify three sets of nodes A, B and C• To test conditional independence property

A B|C• Consider all possible paths from nodes in set A

to nodes in set B• If all such paths pass through one or more

nodes in C then path is blocked andindependence holds

• If there is a path that is unblocked– May not necessarily hold

– There will be at least some distribution for whichconditional independence does not hold

7

Conditional Independence

• Every path from any node in A to Bpasses through C

• No explaining away– Testing for independence simpler

than in directed graphs

• Alternative view– Remove all nodes in set C together

with all their connecting links

– If no paths from A to B thenconditional independence holds

8

Markov Blanket for Undirected Graph

• A simple form for MRFs

• A node is conditionallyindependent of all nodesexcept for neighboringnodes

– Where

• For conditional independence to hold– factorization is such that xi and xj do not appear in the same factor– leads to graph concept of clique

3. Factorization Properties• Seek a factorization rule corresponding to conditional

independence test described earlier• Notion of locality needed

• Consider two nodes xi and xj not connected by a link– They are conditionally independent given all other nodes in graph

• Because there is no direct path between them and• All other paths pass through nodes that are observed and hence those

paths are blocked

– Expressed as

p(xi,xj | x\{i, j}) = p(xi | x\{i, j})p(xj | x\{i, j})

x\{i, j} denotes set x of all variables with xi and xj removed

9

xi xj

10

Clique in a graph

• Subset of nodes in graphsuch that there exists a link

5 cliques oftwo nodes

between all pairs of nodes insubset– Set of nodes in clique are fully

connected

• Maximal Clique– Not possible to include any

other nodes in the graph in theset without ceasing to be aclique

Two Maximal cliques

11

• Functions of maximal cliques• Set of variables in clique C is denoted xC

• Joint distribution is written as a product of potential functions: C(xC)

• Where Z, called the partition function, is anormalization constant

Z =∑∏ψC(xC)x C

Factors as Cliques

12

• UI is set of distributions that areconsistent with set of conditionalindependence statements read fromthe graph using graph separation

• UF are set of distributions that can beexpressed as factorization of the form

• Hammersley-Clifford theorem statesthat UI and UF are identical

Graphical Model as Filter

13

4. Potential Functions

positive

where E(xC) is called an energy function• Exponential representation is calledBoltzmann distribution

• Total energy obtained by adding energies ofmaximal cliques

• Potential functions ψC(xC) should be strictly

• Convenient to express them as exponentials

where

14

5. Illustration: Image de-noising

• Noise removal from binary image

• Observed noisy image– Binary pixel values yi {-1,+1},

i=1,..,D

• Unknown noise-free image– Binary pixel values xi {-1,+1},

i=1,..,D

• Noisy image assumed to randomlyflip sign of pixels with smallprobability

15

Markov Random Field Model

• Known– Strong correlation between input xi and output yi

• Since noise level is small

– Neighboring pixels xi and xj are strongly correlated• Property of images

• This prior knowledge captured using MRF– Whose undirected graph is shown above

output

input

16

Energy Functions

• Graph has two types of cliques• With two variables each1. {xi,yi} expresses correlation between variables

• Choose simple energy function –η xiyi

• Lower energy (higher probability) when xi and yihave same sign

2. {xi,xj} which are neighboring pixels• Choose −β xixj

• For same reasons

17

• The hxi term biases towards pixel values that have oneparticular sign– E.g., more white than black– h = 0 means that prior probabilities of the two states of xi are equal

• Which defines a joint distribution over x and y given by

Potential Function

• Complete energy function of model

Cliques of all pairs ofneighboring pixels in entire image

Cliques ofinput and output pixels

The smaller E(x,y), the larger p(x,y)

18

De-noising problem statement• We fix y to observed pixels in the noisy

image

• p(x|y) is a conditional distribution over allnoise-free images– Called Ising model in statistical physics

• We wish to find an image x that has ahigh probability

19

De-noising algorithm• Gradient ascent

– Set xi = yi for all i– Take one node xj at a time • evaluate total energy for states xi = +1 and xi = 1 • keeping all other node variables fixed

– Set xj to value for which energy is lower • This is a local computation • which can be done efficiently

– Repeat for all pixels until • a stopping criterion is met

– Nodes updated systematically • by raster scan or randomly

• Finds a local maximum (which need not be global)• Algorithm is called Iterated Conditional Modes (ICM)

Image Restoration Results

• Parameters = 1.0, = 2.1,h = 0

Result of ICMGlobal maximum obtained byGraph Cut algorithm

Noise Free imageNoisy image where 10%of pixels are corrupted

20

Some Observations on de-noisingalgorithm

• The denoising algorithm given is an algorithm for findingthe most likely x

– Called an inference algorithm with graphical models

• It was assumed that parameters β and η are known• Parameter values can be determined by another gradient

descent algorithm that learns from truthed noisy images– Which can be set up by taking gradient of E(x,y) w.r.t. parameters– Note that each pixel will have to be truthed

• Note that β=0 means that the links are removed– Therefore xi = yi for all i

21

22

6. Relation to Directed Graphs

• Two graphical frameworks for representingprobability distributions

• Converting directed to undirected– Plays important role in exact inference

technique such junction-tree algorithm

• Converting undirected to directed is lessimportant– Presents problems due to normalization

constraints

23

Converting to Undirected graph• Joint distribution of directed

p(x) = p(x1)p(x2|x1)p(x3|x2)…p(xN|xN-1)

• In undirected graph– Maximal cliques are pairs of

neighboring nodes

• Done by identifying

Simple directed graph

Equivalent undirectedgraph

xN-1 xN

• We wish to write Joint Distribution as

ψ1,2(x1,x2) = p(x1)p(x2 | x1)

ψ 2,3(x2,x3) = p(x3 | x2)

• For nodes on directed graph having

just one parent

– Replace directed link with undirected link

• For nodes with more than one parent

– Conditional terms such as p(x4|x1,x2,x3)should become cliques

• Add links between all pairs of parents– Called moralization

Simple directed graphp(x)=p(x1)p(x2)p(x3)p(x4|x1,x2,x3)

Equivalent moral graph

Generalize Construction

24

Simple directed graph

Equivalent undirected graph

xN-1 xN

25

Directed graph that is aperfect map satisfying

A ⊥ B|φC ⊥ D| A∪B

D-map, I-map and Perfect-Map• D (dependency) map of a distribution

– Every conditional independence statement satisfiedby the distribution is reflected in the graph

– A completely disconnected graph is a trivial D-mapfor any distribution

• I (independence) map of a graph– Every conditional independence statement implied bythe graph is satisfied by a specific distribution

– A fully connected graph is a trivial I-map for anydistribution

P = set of alldistributionsD = distributionsthat canbe representedas a perfect mapusing a directedgraphU = distributionsrepresented as

perfect mapusing undirectedgraph

• Perfect map is both an I map and D mapUndirected graph:Perfect map satisfying

A ⊥ B|φA ⊥ B|C

A ⊥ B|C ∪D

No undirected graphover same 3 variablesthat is a perfect map

No directed graph over same 4variables that implies same set ofconditional independence properties