Structure Learning of Partitioned Markov...

20
Structure Learning of Partitioned Markov Networks Song Liu 1 , Taiji Suzuki 2 , Masashi Sugiyama 3 and Kenji Fukumizu 1 1. The Institute of Statistical Mathematics 2. Tokyo Institute of Technology, JST PRESTO 3. University of Tokyo ICML 2016 NYC 1

Transcript of Structure Learning of Partitioned Markov...

Page 1: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

Structure Learning of Partitioned Markov Networks

Song Liu1, Taiji Suzuki2, Masashi Sugiyama3 and Kenji Fukumizu1

1. The Institute of Statistical Mathematics2. Tokyo Institute of Technology, JST PRESTO

3. University of Tokyo

ICML 2016 NYC

1

Page 2: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

1. Markov Network

2. Partitioned Markov Network

3. Learning Partitioned Markov Network

4. Experiments

5. Conclusion

Outline

2

Page 3: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

• Markov Network: undirected graphical model, encoding the conditional independence between random variables.

Markov Network (MN)

𝑃(𝑋1, 𝑋2, … , 𝑋𝑚)

𝑋𝑢 and 𝑋𝑣 are notconnected

𝑋𝑢 ⊥ 𝑋𝑣| 𝑋∖ 𝑋𝑢,𝑋𝑣

𝑋𝑢

𝑋𝑣

3

Page 4: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

• Hammersley-Clifford Theorem (Hammersley & Clifford,1971).

• Learning a sparse factorization of density function.

• 𝚯 = argmax𝚯σ𝒙 log 𝑝(𝒙;𝚯) − 𝜆 𝚯1, 𝚯 ∈ ℝ𝑚×𝑚.

• (Friedman et al., 2008)

Learning Sparse Structure of MN

Cliques 𝒄: fully connected sub-graph

1 2

𝑋𝜙(𝑥1, 𝑥2)

𝑝 𝒙 ∝ෑ

𝑐∈𝑪

𝜙(𝒙𝑐)

4

Sparsity inducing norm

Page 5: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

• Too few samples.

• Full model is beyond our understanding.

• We are only interested in partial connections!

Why not learn a full MN?

5

Not a good idea…

Page 6: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

1. Markov Network

2. Partitioned Markov Network

3. Learning Partitioned Markov Network

4. Experiments

5. Conclusion

Outline

6

Page 7: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

Some networks are naturally partitioned.

Partitioned NetworksBritish Parliament

7

(WIKIPEDIA: House of Commons of the United Kingdom)

Page 8: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

Partitioned MN

• Discover intergroup links of a partitioned

network is key in many applications

𝑋𝑣

𝑋𝑢

a full network

?

?? ?

??

?

𝑋𝑣

𝑋𝑢

a partitioned network

8

Focus!

Page 9: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

Partitioned (Local) Markovian Property

Partitioned Markovian Property:∀𝑋𝑢 ∈ 𝑋1, ∀𝑋𝑣∈ 𝑋2,

If 𝑋𝑢 and 𝑋𝑣 are not connected, 𝑋𝑢 ⊥ 𝑋𝑣 | 𝑋\{𝑋𝑢,𝑋𝑣}

𝑋 = (𝑋1, 𝑋2) is a partition of 𝑋.

Partitioned MN does not guarantee:∀𝑋𝑢, 𝑋𝑣 ∈ 𝑋, If 𝑋𝑢 and 𝑋𝑣 are not connected,

𝑋𝑢 ⊥ 𝑋𝑣 | 𝑋\{𝑋𝑢,𝑋𝑣}

𝑋1 𝑋2

?

???

9

Page 10: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

Local Factorization of Partitioned MN

10

Recall, the full MN factorizes over the cliques (Hammersley Clifford Theorem).

What can we say about the partitioned MN?

Cliques 𝒄: fully connected sub-graph

1 2

𝑋𝜙(𝑥1, 𝑥2)

𝑝 𝒙 ∝ෑ

𝑐∈𝑪

𝜙(𝒙𝑐)

Page 11: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

• An analog to Hamersley-Clifford Theorem

• Factorization is over a novel structure: Passage.

Factorization of Partitioned MN

Partitioned Ratio𝑋1 𝑋2

𝑝(𝑥1, 𝑥2)𝑝(𝑥1)𝑝(𝑥2)

𝑩: the set of Passage structures.

11

𝑏∈𝑩

𝜙(𝒙𝑏)

Page 12: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

Passage

𝑋1 𝑋2

Passage…𝑋1 𝑋2

𝑋1 𝑋2

We define passage 𝐵 a sub-graph of 𝐺, such that𝑋𝐵 ∩ 𝑋1 ≠ ∅, 𝑋𝐵 ∩ 𝑋2 ≠ ∅, ∀𝑋𝑢 ∈ 𝑋1 ∩ 𝑋𝐵 and∀𝑋𝑣 ∈ 𝑋2 ∩ 𝑋𝐵, 𝑋𝑢, 𝑋𝑣 is in the edge set.

12

Page 13: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

1. Markov Network

2. Partitioned Markov Network

3. Learning Partitioned Markov Network

4. Experiments

5. Conclusion

Outline

13

Page 14: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

• Knowing factorization of Partitioned Ratio = Knowing passages structures in Partitioned MN.

• Pairwise assumption:𝑝(𝑥1,𝑥2)

𝑝(𝑥1)𝑝(𝑥2)∝ ς𝑏∈𝑩𝜙 𝒙𝑏 = ς𝑏∈𝑩ς𝑋𝑢,𝑋𝑣∈𝑋𝐵

ℎ𝑢,𝑣 𝑥𝑢, 𝑥𝑣

From Partitioned Ratio to Structure

If 𝑥𝑢 ∈ 𝑋1, 𝑥𝑣 ∈ 𝑋2 do not appear in the same passage factor at least once, 𝑋𝑢 ⊥ 𝑋𝑣| \𝑋𝑢, 𝑋𝑣 (See Proposition 2 in the paper).

14

Page 15: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

• Given the partition 𝑃 𝒙 = 𝑃 𝒙1, 𝒙2 , 𝒙 ∈ 𝑅𝑚 and

paired sample 𝒙1𝑖, 𝒙2

𝑖

𝑖=1

𝑛∼ 𝑃, PR can be modelled

as:

𝑔 𝒙; 𝜽 =1

𝑁 𝜽exp

𝑢≤𝑣

𝜽𝑢,𝑣⊤ 𝒇 𝒙𝑢,𝑣 ,

• 𝑁 𝜽 is a normalization term:

• 𝑁 𝜽 := ∫ 𝑝 𝒙1 ∫ 𝑝 𝒙2 exp σ𝑢,𝑣 𝜽𝑢,𝑣⊤ 𝒇 𝒙𝑢,𝑣 𝑑𝒙2𝑑𝒙𝟏

Learning Pairwise Partitioned Ratio

15

Page 16: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

• 𝑁 𝜽 can be approximated by U-statistics

(Hoeffding, 1963)

• 𝑁 𝜽 ≈ 𝑁 𝜽 =12𝑛

σ𝑗≠𝑘 exp σ𝑢,𝑣𝜽𝑢,𝑣⊤ 𝒇 𝒙𝑢,𝑣

𝑗,𝑘

• where 𝒙[𝑗,𝑘] = 𝒙1𝑗, 𝒙2

𝑘.

Learning Pairwise Partitioned Ratio

መ𝜃 = argmax𝚯

𝑃X log 𝑔 𝒙; 𝜽 − 𝜆

𝑢,𝑣

||𝜽𝑢,𝑣||2

Maximum Likelihood Mutual Information: (Suzuki et al., 2008):

Included for sparsity

16

Page 17: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

1. Markov Network

2. Partitioned Markov Network

3. Learning Partitioned Markov Network

4. Experiments

5. Conclusion

Outline

17

Page 18: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

• Learn the Bipartisanship in U.S. 109th senate.

• 45 Democrats, 55 Republicans.

• PMN learned over 645 recorded votes (Yea, Nay, not voting).

• Increasing regularization parameter until more than 15 edges.

Experiments: Bipartisanship

vote agreementvote disagreement

18

Page 19: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

1. Markov Network

2. Partitioned Markov Network

3. Learning Partitioned Markov Network

4. Experiments

5. Conclusion

Outline

22

Page 20: Structure Learning of Partitioned Markov Networksbigdata.nii.ac.jp/eratokansyasai3/wp-content/.../2016/09/0810_3_son… · Structure Learning of Partitioned Markov Networks Song Liu1,

• We learn interactions across two groups of random variables.

• Such interaction is expressed via Partitioned Markov Network.

• Learning sparse factorization of Partitioned Ratio leads to the discovery of interactionsbetween two groups.

Conclusion

23