Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2....
-
Upload
lydia-reeves -
Category
Documents
-
view
224 -
download
3
Transcript of Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2....
Topics
1. Introduction1. Undirected Graphical Models2. Terminology
2. Conditional Independence3. Factorization Properties
1. Maximal Cliques2. Hammersley-Clifford Theorem
4. Potential Function and Energy Function5. Image de-noising example6. Relation to Directed Graphs
1. Converting directed graphs to undirected1. Moralization
2. D-map, I-map and Perfect map
2
3
1. Introduction
• Directed graphical models specify– Factorization of joint distribution over set of variables
– Define set of conditional independence properties thatmust be satisfied by any distribution that factorizesaccording to graph
• MRF is an undirected graphical model that also– Specifies a factorization
– Conditional independence relations
4
Markov Random Field Terminology
• Also known as Markov network orundirected graphical model
• Set of nodes corresponding to variables or groups of variables
• Set of links connecting pairs of nodes
• Links are undirected (do not carry arrows)
• Conditional independence is an important concept
5
2. Conditional Independence
• In directed graphs conditionalindependence tested by d-separation– Whether two sets of nodes were blocked
– Definition of blocked subtle due to presenceof head-to-head nodes
• In MRFs asymmetry between parent-childremoved– Subtleties with head-to-head no longer arise
6
Conditional Independence Test
• Identify three sets of nodes A, B and C• To test conditional independence property
A B|C• Consider all possible paths from nodes in set A
to nodes in set B• If all such paths pass through one or more
nodes in C then path is blocked andindependence holds
• If there is a path that is unblocked– May not necessarily hold
– There will be at least some distribution for whichconditional independence does not hold
7
Conditional Independence
• Every path from any node in A to Bpasses through C
• No explaining away– Testing for independence simpler
than in directed graphs
• Alternative view– Remove all nodes in set C together
with all their connecting links
– If no paths from A to B thenconditional independence holds
8
Markov Blanket for Undirected Graph
• A simple form for MRFs
• A node is conditionallyindependent of all nodesexcept for neighboringnodes
– Where
• For conditional independence to hold– factorization is such that xi and xj do not appear in the same factor– leads to graph concept of clique
3. Factorization Properties• Seek a factorization rule corresponding to conditional
independence test described earlier• Notion of locality needed
• Consider two nodes xi and xj not connected by a link– They are conditionally independent given all other nodes in graph
• Because there is no direct path between them and• All other paths pass through nodes that are observed and hence those
paths are blocked
– Expressed as
p(xi,xj | x\{i, j}) = p(xi | x\{i, j})p(xj | x\{i, j})
x\{i, j} denotes set x of all variables with xi and xj removed
9
xi xj
10
Clique in a graph
• Subset of nodes in graphsuch that there exists a link
5 cliques oftwo nodes
between all pairs of nodes insubset– Set of nodes in clique are fully
connected
• Maximal Clique– Not possible to include any
other nodes in the graph in theset without ceasing to be aclique
Two Maximal cliques
11
• Functions of maximal cliques• Set of variables in clique C is denoted xC
• Joint distribution is written as a product of potential functions: C(xC)
• Where Z, called the partition function, is anormalization constant
Z =∑∏ψC(xC)x C
Factors as Cliques
12
• UI is set of distributions that areconsistent with set of conditionalindependence statements read fromthe graph using graph separation
• UF are set of distributions that can beexpressed as factorization of the form
• Hammersley-Clifford theorem statesthat UI and UF are identical
Graphical Model as Filter
13
4. Potential Functions
positive
where E(xC) is called an energy function• Exponential representation is calledBoltzmann distribution
• Total energy obtained by adding energies ofmaximal cliques
• Potential functions ψC(xC) should be strictly
• Convenient to express them as exponentials
where
14
5. Illustration: Image de-noising
• Noise removal from binary image
• Observed noisy image– Binary pixel values yi {-1,+1},
i=1,..,D
• Unknown noise-free image– Binary pixel values xi {-1,+1},
i=1,..,D
• Noisy image assumed to randomlyflip sign of pixels with smallprobability
15
Markov Random Field Model
• Known– Strong correlation between input xi and output yi
• Since noise level is small
– Neighboring pixels xi and xj are strongly correlated• Property of images
• This prior knowledge captured using MRF– Whose undirected graph is shown above
output
input
16
Energy Functions
• Graph has two types of cliques• With two variables each1. {xi,yi} expresses correlation between variables
• Choose simple energy function –η xiyi
• Lower energy (higher probability) when xi and yihave same sign
2. {xi,xj} which are neighboring pixels• Choose −β xixj
• For same reasons
17
• The hxi term biases towards pixel values that have oneparticular sign– E.g., more white than black– h = 0 means that prior probabilities of the two states of xi are equal
• Which defines a joint distribution over x and y given by
Potential Function
• Complete energy function of model
Cliques of all pairs ofneighboring pixels in entire image
Cliques ofinput and output pixels
The smaller E(x,y), the larger p(x,y)
18
De-noising problem statement• We fix y to observed pixels in the noisy
image
• p(x|y) is a conditional distribution over allnoise-free images– Called Ising model in statistical physics
• We wish to find an image x that has ahigh probability
19
De-noising algorithm• Gradient ascent
– Set xi = yi for all i– Take one node xj at a time • evaluate total energy for states xi = +1 and xi = 1 • keeping all other node variables fixed
– Set xj to value for which energy is lower • This is a local computation • which can be done efficiently
– Repeat for all pixels until • a stopping criterion is met
– Nodes updated systematically • by raster scan or randomly
• Finds a local maximum (which need not be global)• Algorithm is called Iterated Conditional Modes (ICM)
Image Restoration Results
• Parameters = 1.0, = 2.1,h = 0
Result of ICMGlobal maximum obtained byGraph Cut algorithm
Noise Free imageNoisy image where 10%of pixels are corrupted
20
Some Observations on de-noisingalgorithm
• The denoising algorithm given is an algorithm for findingthe most likely x
– Called an inference algorithm with graphical models
• It was assumed that parameters β and η are known• Parameter values can be determined by another gradient
descent algorithm that learns from truthed noisy images– Which can be set up by taking gradient of E(x,y) w.r.t. parameters– Note that each pixel will have to be truthed
• Note that β=0 means that the links are removed– Therefore xi = yi for all i
21
22
6. Relation to Directed Graphs
• Two graphical frameworks for representingprobability distributions
• Converting directed to undirected– Plays important role in exact inference
technique such junction-tree algorithm
• Converting undirected to directed is lessimportant– Presents problems due to normalization
constraints
23
Converting to Undirected graph• Joint distribution of directed
p(x) = p(x1)p(x2|x1)p(x3|x2)…p(xN|xN-1)
• In undirected graph– Maximal cliques are pairs of
neighboring nodes
• Done by identifying
Simple directed graph
Equivalent undirectedgraph
xN-1 xN
• We wish to write Joint Distribution as
ψ1,2(x1,x2) = p(x1)p(x2 | x1)
ψ 2,3(x2,x3) = p(x3 | x2)
• For nodes on directed graph having
just one parent
– Replace directed link with undirected link
• For nodes with more than one parent
– Conditional terms such as p(x4|x1,x2,x3)should become cliques
• Add links between all pairs of parents– Called moralization
Simple directed graphp(x)=p(x1)p(x2)p(x3)p(x4|x1,x2,x3)
Equivalent moral graph
Generalize Construction
24
Simple directed graph
Equivalent undirected graph
xN-1 xN
25
Directed graph that is aperfect map satisfying
A ⊥ B|φC ⊥ D| A∪B
D-map, I-map and Perfect-Map• D (dependency) map of a distribution
– Every conditional independence statement satisfiedby the distribution is reflected in the graph
– A completely disconnected graph is a trivial D-mapfor any distribution
• I (independence) map of a graph– Every conditional independence statement implied bythe graph is satisfied by a specific distribution
– A fully connected graph is a trivial I-map for anydistribution
P = set of alldistributionsD = distributionsthat canbe representedas a perfect mapusing a directedgraphU = distributionsrepresented as
perfect mapusing undirectedgraph
• Perfect map is both an I map and D mapUndirected graph:Perfect map satisfying
A ⊥ B|φA ⊥ B|C
A ⊥ B|C ∪D
No undirected graphover same 3 variablesthat is a perfect map
No directed graph over same 4variables that implies same set ofconditional independence properties