Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A...
Transcript of Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A...
Undirected Graphical Models(Markov random fields)
Seung-Hoon Na
Chonbuk National University
Hopfield networks• Bidirectional associative memory
Bidirectional associative memory
• In general,
• After some iters, BAM goes to a fixed point (x,y):
…
Bidirectional associative memory
• Hebbian learning
– Then,
• For a set of m vector pairs
Bidirectional associative memory
– 𝒙0: initial vector
– the excitation vector:
– (𝒙0, 𝒚0): a stable state of the network if
smaller (because of the minus sign) if the vector 𝑾𝒚0
𝑇 lies closer to 𝒙0
The energy function
Bidirectional associative memory
• The energy function E of BAM:
• Generally, using a threshold-unit for each node
Hopfield networks
• Asynchronous BAM
– Each unit computes its excitation at random times and changes its state to 1 or −1 independently of the others and according to the sign of its total excitation
– For each update, the energy function is decreasing
• Hopfield network
– A special case of an asynchronous BAM: x=y
– Each unit is connected to all other units except itself
– With necessary conditions for weight matrix W
• 1) The symmetry 2) Zero diagonal
Hopfield networks• Network with a non-zero diagonal
• Network with asymmetric connections,
(1, 1, 1) (−1,−1,−1)
Hopfield networks
• Energy function
Hopfield networks: Example• A flip-flop network
– Only stable states are (1,−1) and (−1, 1).
Energy function of a flip-flop
Hopfield networks: Example
Hopfield networks: Example
Hopfield networks• A fully connected Ising model with a
symmetric weight matrix 𝑾 = 𝑾𝑇
• Exact inference is intractable
• Iterative conditional modes (ICM): just sets each node to its most likely (lowest energy) state, given all its neighbors
Learning MRFs: Moment matching
• MRF
• Log-likelihood:
• Gradient for 𝜽𝑐 :
Moment matching
Learning MRFs: Latent variable models
• Log-likelihood:
• Gradient:
• Alternatively, we can use a generalized EM
Approximate methods for
Learning MRFs
• Pseude-likelihood• Stochatic maximum likelihood• Iterative proportional fitting• Generalized iterative scaling
Pseudo likelihood
– 𝑃 𝑦 ≈ ς𝑖 𝑃(𝑦𝑖|𝑀𝐵 𝑦𝑖 )
The representation used by pseudo likelihood
𝑀𝐵 𝑦𝑖 : Markov blanket of 𝑦𝑖
Approximate methods for Learning MRFs:
Stochatic maximum likelihood
Approximate the model expectation using MC sampling
Iterative proportional fitting
•
𝑝 𝒚 𝜽 =1
𝑍𝑒𝑥𝑝
𝑐
𝜽𝑐𝑇𝝓𝑐(𝒚) =
1
𝑍ෑ
𝑐
𝜓𝑐(𝒚𝑐) = 𝑝 𝒚 𝝍
logL=σ𝑖 log 𝑝 𝒚(𝑖) 𝜽 = σ𝑖 σ𝑐 log𝜓𝑐(𝒚𝑐(𝑖)) − log𝑍
𝜕 𝑙𝑜𝑔𝐿
𝜕 𝜓𝑐(𝑦𝑐)=σ𝑖 𝐼(𝒚𝑐
(𝑖)= 𝑦𝑐)
𝜓𝑐(𝑦𝑐 )−σ𝒚∈𝑌(𝑦𝑐)
ς𝑐′≠𝑐𝜓𝑐′(𝒚𝑐′)
𝑍
i-th example에서 clique c에해당되는 potential function
𝑍 =
𝒚
ෑ
𝑐
𝜓𝑐(𝒚𝑐)
𝜕 𝑙𝑜𝑔𝐿
𝜕 𝜓𝑐(𝑦𝑐)=σ𝑖 𝐼(𝒚𝑐
(𝑖)= 𝑦𝑐)
𝜓𝑐(𝑦𝑐 )−1
𝑍
𝒚∈𝑌(𝑦𝑐)
𝑍 𝑝(𝒚|𝝍)
𝜓𝑐(𝑦𝑐 )
𝑌(𝑦𝑐): cli𝑞𝑢𝑒가 𝑦𝑐값을 갖는모든 y집합
Iterative proportional fitting
𝜕 𝑙𝑜𝑔𝐿
𝜕 𝜓𝑐(𝑦𝑐)=σ𝑖 𝐼(𝒚𝑐
(𝑖)= 𝑦𝑐)
𝜓𝑐(𝑦𝑐 )−1
𝑍
𝒚∈𝑌(𝑦𝑐)
𝑍 𝑝(𝒚|𝝍)
𝜓𝑐(𝑦𝑐 )
𝜕 𝑙𝑜𝑔𝐿
𝜕 𝜓𝑐(𝑦𝑐)=𝑝𝑒𝑚𝑝(𝑦𝑐)
𝜓𝑐(𝑦𝑐 )−𝑝𝑚𝑜𝑑𝑒𝑙 𝑦𝑐 𝝍
𝜓𝑐 𝑦𝑐
Fixed point algorithm by making the gradient zero
𝜓𝑐 𝑦𝑐 ← 𝜓𝑐 𝑦𝑐𝑝𝑚𝑜𝑑𝑒𝑙 𝑦𝑐 𝝍
𝑝𝑒𝑚𝑝(𝑦𝑐)
Iterative proportional fitting
Learning MRFs:
Generalized iterative scaling
Markov Logic Networks
• A Markov logic network 𝐿 is a set of pairs (𝐹𝑖 , 𝑤𝑖), where 𝐹𝑖 is a formula in first-order logic and 𝑤𝑖 is a real number.
• Together with a finite set of constants 𝐶 ={𝑐1, 𝑐2, . . . , 𝑐 𝐶 }, 𝐿 defines a Markov network 𝑀𝐿,𝐶 :
𝑀𝐿,𝐶 contains one binary node for each possible grounding of each predicate appearing in 𝐿. The value of the node is 1 if the ground atom is true, and 0 otherwise.
𝑀𝐿, 𝐶 contains one feature for each possible grounding of each formula 𝐹𝑖 in L. The value of this feature is 1 if the ground formula is true, and 0 otherwise. The weight of the feature is the 𝑤𝑖 associated with 𝐹𝑖 in 𝐿.
Markov Logic Networks
• The probability distribution of a possible world 𝑥:
𝑛𝑖(𝑥): the number of true groundings of 𝐹𝑖 in 𝑥
𝑥{𝑖}: the state (truth values) of the atoms appearing in 𝐹
Markov Logic Networks: Example
Markov Logic Networks: Example
Markov Logic Networks
• Inference
Markov Logic Networks
• Learning
– Use Pseudo-likelihood to approximation the model probability