STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 ·...
Transcript of STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 ·...
![Page 1: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/1.jpg)
STAT 598LProbabilistic Graphical Models
Instructor: Sergey Kirshner
Markov Networks
![Page 2: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/2.jpg)
Motivating Example
• Is there a Bayesian Network that is a P-map for {(A ⊥ B │ C, D), (C ⊥ D │ A, B)}?– No other independence except for application of
symmetry, so the rest of the parents are dependent (in a P-map)
– Skeleton
– Adding directions• Without loss of generality, A->C
• Cannot have B->C (A->C<-B)
• Cannot have D->B (C->B<-D)
• Cannot have A->D (A->D<-B)STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D
No BN P-map!
![Page 3: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/3.jpg)
Undirected Model• Is there a different framework that can represent
these dependencies?– What if we had undirected separation instead of d-
separation?
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D
• Markov networks (Markov random fields, MRFs)– Represent conditional independence
relations with an undirected graph
– Encode functional dependence using potential functions or factors
![Page 4: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/4.jpg)
Factors
STAT 598L: Probabilistic Graphical Models (Markov Networks)
{X1,X2,…,Xn} = set of variables{Y1,Y2,…,Yk} ⊆ {X1,X2,…,Xn} -- subset of variables
Val(Y1) x Val(Y2)x … x Val(Yk)0 R+
φscope[φ]
=
= factor
Joint probability = product of factors
Factor = measure of relationship for a group of variables
![Page 5: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/5.jpg)
Example
STAT 598L: Probabilistic Graphical Models (Markov Networks)
normalization constant(partition function)
Gibbs distribution
![Page 6: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/6.jpg)
Example (continued)
STAT 598L: Probabilistic Graphical Models (Markov Networks)
How many free parameters?3+3+3+3=12
![Page 7: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/7.jpg)
Factors and Free Parameters
• For this analysis, stick to binary variables
• Each factor of k variables = 2k-1 free parameters
• Assume all factors are of the same size– nCk ways possible factors (O(nk))
– Total of O(nk2k) free parameters
– Compare to O(2n) for a full table
• Conclusion: even using large factors reduces the number of free parameters
STAT 598L: Probabilistic Graphical Models (Markov Networks)
![Page 8: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/8.jpg)
BNs: Special Case
STAT 598L: Probabilistic Graphical Models (Markov Networks)
![Page 9: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/9.jpg)
Factor Operations: Product
STAT 598L: Probabilistic Graphical Models (Markov Networks)
X=x Y=y φ1(x,y)
1 1 0.4
1 0 0.7
0 1 1
0 0 0.8
Y=y Z=z φ2(y,z)
1 1 0.3
1 0 0.9
0 1 0.5
0 0 1
X=x Y=y Z=z φ12(x,y,z)
1 1 1 0.12
1 1 0 0.36
1 0 1 0.35
1 0 0 0.7
0 1 1 0.3
0 1 0 0.9
0 0 1 0.4
0 0 0 0.8
![Page 10: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/10.jpg)
Conditional Independence?
• What about {a,c}, {a,d}, {b,c}, and {b,d}?– They cannot be made independent!
– Edges connect variables in the same scope
– Resulting graph = Markov network
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D
![Page 11: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/11.jpg)
Factorization: Formal Definition• Given: Gibbs distribution P with non-negative factors Φ={φ1,…,φK}, and a Markov network H
• P factorizes over H: scope of every factor corresponds to a complete subgraph of H
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D
A
C
B
D
![Page 12: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/12.jpg)
Factorization
• Collection of factors is not unique– Are the scopes {{A,B}, {A,C}, and {B,C}}, or is it just
{A,B,C}?
– Networks can obscure scopes (structures) of original factors
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
![Page 13: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/13.jpg)
Graphical Model
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Graphical Model = Graph + Parameters
Bayesian Network =parents in
chain decomposition
+conditional probability
distributions
Markov network =variables in
factors + factors
![Page 14: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/14.jpg)
Undirected vs Directed Model• Bayesian networks:
– DAG => dimensionality reduction with chain rule for probability (simple justification)
– Possible causal dependence (interpretation the edge directions)
– Parameters are interpretable
– Represented independencies depend on the order of variables (drawback)
• Undirected model:– No ordering to consider! (Fewer objects, one less uncertainty to worry
about)
– Intuition using exponential models (later in the course)
– Difficult to interpret (and to illicit) the parameters
STAT 598L: Probabilistic Graphical Models (Markov Networks)
![Page 15: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/15.jpg)
Representational Power: BN vs MN
• Can Bayesian Networks represent all independencies from Markov Network?– No: {(A ⊥ B │ C, D), (C ⊥ D │ A, B)}
• Can Markov Networks represent all independencies from Bayesian Networks– No: A -> B <- C
• What is the overlap?– Later
STAT 598L: Probabilistic Graphical Models (Markov Networks)
![Page 16: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/16.jpg)
Graph Separation
• Need to establish conditional independence from undirected graph properties
• Active path = none of the intermediate variables are observed
• No active paths = separation
• Monotonic: adding observed variables can only reduce active paths
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D
E
blocked
Set of global independencies (global Markov property)
![Page 17: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/17.jpg)
Representation Theorem for BNs
STAT 598L: Probabilistic Graphical Models (Markov Networks)
P factorizes according to GEach variable is independent of its non-descendants given its parents
Local Markov assumption
independencies graph structure
![Page 18: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/18.jpg)
Representation Theorem for MNs
STAT 598L: Probabilistic Graphical Models (Markov Networks)
P factorizes according to Hglobal independencies set by
scopes of factors
Global Markov property
independencies graph structure
A
C
B
D
E
?
![Page 19: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/19.jpg)
Representation Theorem for MNs
• Proof: Need to show
– Case 1: Assume• Partition Di so that either
Di⊆A∪C or Di⊆B∪C
STAT 598L: Probabilistic Graphical Models (Markov Networks)independencies graph structure
P factorizes according to Hglobal independencies set by
scopes of factors
A
B
C
![Page 20: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/20.jpg)
Representation Theorem for MNs
• Proof: Need to show
– Case 2:
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A B
CU1
U2
independencies graph structure
P factorizes according to Hglobal independencies set by
scopes of factors
![Page 21: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/21.jpg)
Converse?
• Think xor
STAT 598L: Probabilistic Graphical Models (Markov Networks)
P factorizes according to Hglobal independencies set by
scopes of factors
Global Markov property
independencies graph structure
A
C
B
D
E
![Page 22: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/22.jpg)
Hammersley-Clifford Theorem
STAT 598L: Probabilistic Graphical Models (Markov Networks)
P factorizes according to Hglobal independencies set by
scopes of factors
Global Markov property
independencies graph structure
A
C
B
D
E
If P is positive and
![Page 23: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/23.jpg)
• Interpreting the statement
• Sketch of proof (by construction):– All factors not in the trail are uniform (remove
nodes and edges not in the trail)
– Make the remaining factors almost deterministic
Completeness of separation
STAT 598L: Probabilistic Graphical Models (Bayesian Networks)
Active trail between X and Y given Z X and Y are dependent given Z in some P that factorizes according to H
![Page 24: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/24.jpg)
More General Result
STAT 598L: Probabilistic Graphical Models (Bayesian Networks)
Soundness
Intuition: Two binary variables X and Y;3-d space of possible factors with a 2-d manifold for independence
Completeness (almost)
X Y
![Page 25: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/25.jpg)
Representation Theorem for BNs
STAT 598L: Probabilistic Graphical Models (Markov Networks)
P factorizes according to GEach variable is independent of its non-descendants given its parents
Local Markov assumption
independencies graph structure
![Page 26: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/26.jpg)
Other Ways to Encode Independence
• Local Markov independence:
• Pairwise independence:
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Markov blanket (local)
Pairwise Markov independencies
![Page 27: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/27.jpg)
Relation Between Independencies
• Two separated nodes will also be separated by the neighbors for either node
• Variables corresponding to non-adjacent are conditionally independent given the variables corresponding to neighbors– Conditionally independent also given the rest of
the variables (monotonic)
STAT 598L: Probabilistic Graphical Models (Markov Networks)
global local pariwise
![Page 28: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/28.jpg)
Converse
• For all disjoint A, B, and C,
– Induction on size of C• |C|=n-2:
• |C|=k-1<n-2, case I:
STAT 598L: Probabilistic Graphical Models (Markov Networks)
globalpairwise
&
&
&
![Page 29: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/29.jpg)
Converse
• For all disjoint A, B, and C,
– Induction on size of C• |C|=k-1<n-2, case II:
• Assume |A|=|B|=1, otherwise approach as in case I
STAT 598L: Probabilistic Graphical Models (Markov Networks)
globalpairwise
&
&
&
![Page 30: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/30.jpg)
Equivalence
• Given P is positive– Global Markov property
– Local Markov property
– Pairwise Markov property
STAT 598L: Probabilistic Graphical Models (Markov Networks)
![Page 31: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/31.jpg)
How To Recover MNs from Distribution
• If P is positive– Check whether A ⊥ B | X-A-B or
– Find smallest C such that A ⊥ C | X-A-C• C=MBP(A) (Markov blanket)
– In both cases, the graph is a minimal I-map of P
– Graphs are the same – such I-map is unique!
• If P is not positive– No guarantee that the resulting graph is an I-map
of P
STAT 598L: Probabilistic Graphical Models (Markov Networks)
![Page 32: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/32.jpg)
Finding P-maps
• If P-map exists– Find a minimal I-map
– It is also a P-map!
• Does it always exist?– Think v-structure
STAT 598L: Probabilistic Graphical Models (Markov Networks)
![Page 33: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/33.jpg)
Alternative Parametrizations
• Structure of the Markov network may hide the scopes of the factors– Think complete graph: is it one factor with all
variables in the scope or a product of factors with pairs of variables in the scope?
• May want to make factorization more explicit in the structure
STAT 598L: Probabilistic Graphical Models (Markov Networks)
![Page 34: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/34.jpg)
Factor Graphs
• Bipartite graph: variables vs factors
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
DA CB D
![Page 35: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/35.jpg)
Log-Linear Model
• Product into a sum
• Convert factors into a finer set of features• Break down factors further (context)
• Different features may share same scope
STAT 598L: Probabilistic Graphical Models (Markov Networks)
energy functions
weights features
![Page 36: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/36.jpg)
Ising Model
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Binary xis
![Page 37: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/37.jpg)
STAT 598L: Probabilistic Graphical Models (Markov Networks)http://www.cis.upenn.edu/~jshi/GraphTutorial/
![Page 38: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/38.jpg)
Recap
• Parameterizations for Markov networks– Features
– Overparameterizations
– How many parameters are free?
– Canonical parameterization
STAT 598L: Probabilistic Graphical Models (Markov Networks)
![Page 39: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/39.jpg)
Plan
• Proof of Hammersley-Clifford theorem (if there is interest)
• Justification for Markov networks using Maximum Entropy principle (later)
• Relating Bayesian and Markov networks– Proof of soundness theorem for Bayesian
networks
– Determining which Markov networks are P-maps for which Bayesian networks
STAT 598L: Probabilistic Graphical Models (Markov Networks)
![Page 40: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/40.jpg)
Information Theory
• P(X) encodes our uncertainty about X– Some variables are more uncertain than others
– How can we quantify this intuition?• Entropy: average number of bits required to encode X
• Entropy is maximized when X is uniform
STAT 598L: Probabilistic Graphical Models (Markov Networks) 40
P(X) P(Y)
X Y
( ) ( ) ( ) ( )∑=
=
xP xP
xPxp
EXH 1log1log
From Carlos Guestrin’s 10-708 Probabilistic Graphical Models Fall 2008 at CMU
![Page 41: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/41.jpg)
Maximum Entropy Principle
• Given everything else the same, pick a distribution with the maximum entropy– Closest to uniform
• Example: ¾ kangaroo’s are left-handed and ¾drink Foster’s– Want to reconstruct the full probability table
knowing only p11+p12=0.75 and p11+p21=0.75
– Have 3 free parameters and only 2 constraints leaving 1 free parameter
STAT 598L: Probabilistic Graphical Models (Markov Networks)
11 12
21 22
p pp p
![Page 42: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/42.jpg)
MaxEnt Principle Continued
• Since we are not given that left-handedness is correlated with Foster drunkedness, ideally do not want to introduce the correlation into the model
• Which objective function to maximize?
• Entropy is (the only) such function– Want to maximize HP(X) subject to the constraints
p11+p12=0.75 and p11+p21=0.75
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Gull S.F., Skilling J. (1984), “The Maximum Entropy Method,” in Indirect Imaging
![Page 43: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/43.jpg)
Direct Solution
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Left-handedness is independent of Foster drunkedness!
![Page 44: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/44.jpg)
Round-about Solution
• Constraints = Lagrange multipliers
STAT 598L: Probabilistic Graphical Models (Markov Networks)
![Page 45: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/45.jpg)
Round-about Solution
• How to find the weights?– Plug in the log-linear model for P(x) and maximize F(x)
– Or, satisfy the constraints
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Log-linear model!
![Page 46: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/46.jpg)
MaxEnt in a More General Setting• Given a set of constraints
– General solution to the MaxEnt formulation is
• Log-linear model is an approximation to a distribution that preserves some properties (constraints) while making the distribution as close to uniform as possible– Duality between constraints and weights
STAT 598L: Probabilistic Graphical Models (Markov Networks)
![Page 47: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/47.jpg)
Soundness of d-separation
STAT 598L: Probabilistic Graphical Models (Markov Networks)
For all P that factorizes according to G
G is an I-map for P
G is a BN structure for P
d-separation in G
conditional independencein Plocal graph property
global separation property
![Page 48: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/48.jpg)
Proof Outline
• Given evidence, convert Bayesian network into an equivalent Markov network– Construct such network
– Show that it is an equivalent Markov network
• Use separation property of the Markov network to prove the theorem
STAT 598L: Probabilistic Graphical Models (Markov Networks)
![Page 49: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/49.jpg)
Constructing MNs from BNs
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D E
A
C
B
D E
moralized graph
I-mapminimal I-map
G H
![Page 50: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/50.jpg)
Constructing MNs from BNs with Evidence
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D E
A
C
B
D E
moralized graph
G H
![Page 51: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/51.jpg)
P-map for Moral Graphs
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D E
A
C
B
D E
moral graph moralized graph
minimal I-map
G H
Proof: pick an active (minimal) trailin G. Show it is in H.
Two cases:Trail has no v-structures -- no marked nodes-- same trail is in HTrail has v-structures – v-structure is covered-- not minimal -- contradiction
![Page 52: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/52.jpg)
Soundness for d-separation
• What if the graph is not moral?– What if immoralities did not matter?
– They are if effects or their descendant is in evidence
• Only consider the subgraphs for which immoralities have a descendant in the evidence– Upward closure of evidence nodes
STAT 598L: Probabilistic Graphical Models (Markov Networks)
![Page 53: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/53.jpg)
Upward Closure and Its MN
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D E
G
A
C
B
D
G’
Exercise 3.8: BN(G’) agrees with BN(G) over nodes of G’
barren node
A
C
B
D
H
![Page 54: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/54.jpg)
Soundness of d-separation
• Consider X and Y d-separated by Z
• Build an upward closure for X∪Y∪Z
• d-separation is equivalent to separation in H
• Separation in H implies conditional independence
STAT 598L: Probabilistic Graphical Models (Markov Networks)
For all P that factorizes according to G
G is a BN structure for P
d-separation in G
conditional independencein P
A
C
B
D E
G
A
C
B
D
H
![Page 55: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/55.jpg)
From Markov Networks to Bayesian Networks
• As seen before, Markov networks cannot represent immoralities
• Can show that if a Bayesian network G is a minimal I-map for some Markov network structure H, it contains no immoralities
• No immoralities = every three nodes with v-structure are covered
• Undirected cycle of length >3 = v-structure– Must have a chord
• All BN I-maps of Markov networks are chordal– No BN P-map exists for a non-chordal MN
STAT 598L: Probabilistic Graphical Models (Markov Networks)
![Page 56: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks](https://reader033.fdocuments.us/reader033/viewer/2022050308/5f70c5ac1929d678224dd548/html5/thumbnails/56.jpg)
Markov Networks: Summary• Mass/density = normalized product of factors
• Represent conditional independence with independence graphs– Conditional independence = separation in the graph
– Global separation = local separation (Markov blanket) = pairwiseseparation, all in positive distributions
• Interpretation: closest to uniform under constraints specified by features– Scope of features determines the structure of the graph
(representation theorem)
• Relationship between Markov and Bayesian networks– MNs cannot represent v-structures of BNs
– BNs cannot represent chordless loops of MNs
– Chordal graphs can be represented (as P-maps) by bothSTAT 598L: Probabilistic Graphical Models (Markov Networks)