Graph Partitioning using Bayesian Inference on GPUctcyang/pub/gtc-slides2018.pdf · Overview 1...

Post on 31-Jul-2020

7 views 0 download

Transcript of Graph Partitioning using Bayesian Inference on GPUctcyang/pub/gtc-slides2018.pdf · Overview 1...

Graph Partitioning using Bayesian Inference on GPU

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens

UC Davis, NVIDIA intern

ctcyang@ucdavis.edu

March 26, 2018

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 1 / 63

Overview

1 Introduction

2 Stochastic Block Model

3 Bayesian inference for graph partitioning

4 Parallelization strategy

5 Experiments

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 2 / 63

Problem: How can we break this graph up into smallerpieces so we can understand it?

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 3 / 63

Problem definition

Problem 1

Can MCMC be sped up by using a GPU?

Problem 2

How is convergence affected?

Problem 3

Is this a scalable solution?

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 4 / 63

Problem definition

Problem 1

Can MCMC be sped up by using a GPU?

Problem 2

How is convergence affected?

Problem 3

Is MCMC a scalable solution to the graph clustering problem?

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 5 / 63

Problem definition

Problem 1

Can MCMC be sped up by using a GPU?

Problem 2

How is convergence affected?

Problem 3

Is this a scalable solution?

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 6 / 63

Related work

Minimum-cut method

Hierarchical clustering

Girvan–Newman algorithm

Modularity maximization

Clique-based methods

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 7 / 63

Generative models

Idea

Before thinking of how to partition, we should come up with a model thatgenerates what we are looking for.

Want:

The parameters should describe block structure in a graph.

The parameter values are unknown, but can be inferred from the dataand the current state in a principled, statistical way.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 8 / 63

Stochastic Block Model (SBM)

Holland, Laskey, and Leinhardt. ”Stochastic blockmodels: First steps.”Social networks 5.2 (1983)

Parameters: ηi → probability a node belongs to block i

Mrs → probability an edge exists between block r and block s

Rules for placing N nodes in B blocks:

1 Sample bi ∼ Cat(η) to obtain each node’s colour.

2 Sample eij ∼ Poisson(M) to determine which two blocks r and s theedge connects.

3 Sample i ∼ Uniform(nr ) and j ∼ Uniform(ns) to get two nodes inblocks r and s respectively for edge eij .

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 9 / 63

Stochastic Block Model (SBM)

Holland, Laskey, and Leinhardt. ”Stochastic blockmodels: First steps.”Social networks 5.2 (1983)

Parameters: ηi → probability a node belongs to block i

Mrs → probability an edge exists between block r and block s

Rules for placing N nodes in B blocks:

1 Sample bi ∼ Cat(η) to obtain each node’s colour.

2 Sample eij ∼ Poisson(M) to determine which two blocks r and s theedge connects.

3 Sample i ∼ Uniform(nr ) and j ∼ Uniform(ns) to get two nodes inblocks r and s respectively for edge eij .

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 10 / 63

Stochastic Block Model (SBM)

Holland, Laskey, and Leinhardt. ”Stochastic blockmodels: First steps.”Social networks 5.2 (1983)

Parameters: ηi → probability a node belongs to block i

Mrs → probability an edge exists between block r and block s

Rules for placing N nodes in B blocks:

1 Sample bi ∼ Cat(η) to obtain each node’s colour.

2 Sample eij ∼ Poisson(M) to determine which two blocks r and s theedge connects.

3 Sample i ∼ Uniform(nr ) and j ∼ Uniform(ns) to get two nodes inblocks r and s respectively for edge eij .

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 11 / 63

Formulate clustering as exact recovery problem

1 Given G and b(t), find M(t).

2 Given G and M(t), find arg maxb P(b|G ,M). This becomes b(t+1).

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 12 / 63

Exact recovery problem

1 Given G and b(t), find M(t).

2 Given G and M(t), find arg maxb P(b|G ,M). This becomes b(t+1).

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 13 / 63

Exact recovery problem

1 Given G and b(t), find M(t).

2 Given G and M(t), find arg maxb P(b|G ,M). This becomes b(t+1).

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 14 / 63

Bayesian inference

We want to find partition b that maximizes:

P(b|G ,M) =P(G |b,M)P(b,M)

P(G )

Taking negative logs of both sides, we want to minimize Σ:

Σ = − logP(G |b,M)− logP(b,M) + logP(G )

S is the amount of information required to describe the graph when themodel is known.L is the amount of information required to describe the model.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 15 / 63

Bayesian inference

We want to find partition b that maximizes:

P(b|G ,M) =P(G |b,M)P(b,M)

P(G )

Taking negative logs of both sides, we want to minimize Σ:

Σ = − logP(G |b,M)︸ ︷︷ ︸S

− logP(b,M)︸ ︷︷ ︸L

+ logP(G )︸ ︷︷ ︸constant

S is the amount of information required to describe the graph when themodel is known.L is the amount of information required to describe the model.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 16 / 63

Computing terms

S can be found by counting the number of configurations of the graph.The fewer configurations, the better our model fits the graph:

S = log( 1

Ω

)= log

( ∏rs Mrs !∏

r k+r !∏

r k−r !

)−1

L can be found by counting:

L = log

((B

N

))+ logN!−

∑r

log nr !︸ ︷︷ ︸b term

+ log

((B2

E

))︸ ︷︷ ︸

M term

Design decision: Ignore L for now in prototype, but leave room for it to beadded in the future.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 17 / 63

Computing terms

S can be found by counting the number of configurations of the graph.The fewer configurations, the better our model fits the graph:

S = log( 1

Ω

)= log

( ∏rs Mrs !∏

r k+r !∏

r k−r !

)−1

L can be found by counting:

L = log

((B

N

))+ logN!−

∑r

log nr !︸ ︷︷ ︸b term

+ log

((B2

E

))︸ ︷︷ ︸

M term

Design decision: Ignore L for now in prototype, but leave room for it to beadded in the future.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 18 / 63

Intuition

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 19 / 63

Intuition

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 20 / 63

Combinatorial optimization problem

So we want to partition b s.t. Σ is minimized.

However for a graph of B blocks and N nodes, there are BN manypossible partitions b we would need to compute that quantity for.

We need an efficient way to traverse large state space.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 21 / 63

MCMC sampling

1 Propose move.

2 Calculate move acceptance probability.

3 Commit move.

Upside: Stationary distribution will converge to probability distribution weare trying to find.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 22 / 63

Merge phase

Merge phase:

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 23 / 63

Merge phase

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 24 / 63

Nodal (MCMC) phase

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 25 / 63

Nodal (MCMC) phase

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 26 / 63

MCMC sampling applied to solve graph partitioning

Merge phase1 Propose move2 Calculate change in objective function3 Get block move that improves objective function the most4 Commit move5 Goto 1) until nblocksinitial

r blocks left

MCMC phase1 Propose move2 Calculate change in objective function3 Calculate move acceptance probability4 Commit move5 Goto 1) until MCMC chain has converged

Do Merge phase, MCMC phase, Merge phase, MCMC phase, etc.until target cluster count has been reached.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 27 / 63

MCMC sampling applied to solve graph partitioning

Merge phase1 Propose move2 Calculate change in objective function3 Get block move that improves objective function the most4 Commit move5 Goto 1) until nblocksinitial

r blocks left

MCMC phase1 Propose move2 Calculate change in objective function3 Calculate move acceptance probability4 Commit move5 Goto 1) until MCMC chain has converged

Do Merge phase, MCMC phase, Merge phase, MCMC phase, etc.until target cluster count has been reached.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 28 / 63

MCMC sampling applied to solve graph partitioning

Merge phase1 Propose move2 Calculate change in objective function3 Get block move that improves objective function the most4 Commit move5 Goto 1) until nblocksinitial

r blocks left

MCMC phase1 Propose move2 Calculate change in objective function3 Calculate move acceptance probability4 Commit move5 Goto 1) until MCMC chain has converged

Do Merge phase, MCMC phase, Merge phase, MCMC phase, etc.until target cluster count has been reached.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 29 / 63

1. Propose move

Counter-based RNG allows O(1) skip-ahead for each thread.

This allows independent random numbers to be generated within adevice function.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 30 / 63

2. Calculate objective function

Problem: How do we compute the objective function as if we have alreadymade the move, but without actually changing our graph?

Key insight: Merge move and node move can be both expressed as thesimultaneous element-wise addition of rows and columns of a matrix.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 31 / 63

2. Calculate objective function

Problem: How do we compute the objective function as if we have alreadymade the move, but without actually changing our graph?

Key insight: Merge move and node move can be both expressed as thesimultaneous element-wise addition of rows and columns of a matrix.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 32 / 63

We have a graph

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 33 / 63

How to express in matrix notation node 1 being movedfrom blue to yellow?

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 34 / 63

Elementwise move node 1’s out-edge contribution fromblue to yellow

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 35 / 63

Elementwise move node 1’s out-edge contribution fromblue to yellow

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 36 / 63

Elementwise move node 1’s in-edge contribution from blueto yellow

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 37 / 63

Elementwise move node 1’s in-edge contribution from blueto yellow

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 38 / 63

Move complete

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 39 / 63

2. Calculate objective function

For sparse matrices, elementwise addition is equivalent to doing a setunion.

Warp-wide sorting network allows us to do set unions using registermemory.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 40 / 63

3. Commit move

Triple matrix product used to update model between Merge and MCMCphases.

Hypothesis 1: Committing merge moves in parallel does not affectconvergence rate.

Hypothesis 2: Committing MCMC moves in parallel does not affectconvergence rate.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 41 / 63

Parallelization summary

Reference impl. Our contributionCPU Seq CPU Par GPU Seq GPU Par

Propose move par par par parMerge Calculate obj par par par par

Commit move seq seq seq par

Propose move seq par par parMCMC Calculate obj seq par par par

Commit move seq seq par par

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 42 / 63

Experimental Setup

Hardware:

CPU: Intel Core i7-5820K CPU @ 3.30GHz, 32GB RAM

GPU: Titan Xp, 12GB RAM

Datasets:

Nodes 50 100 1K 5K 20K 50K 500K

Edges 319 6K 20K 102K 409K 1M 10M

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 43 / 63

Experimental Setup

Hardware:

CPU: Intel Core i7-5820K CPU @ 3.30GHz, 32GB RAM

GPU: Titan Xp, 12GB RAM

Datasets:

Synthetic datasets with ground truth partitions for each node.

Nodes 50 100 1K 5K 20K 50K 500K

Edges 319 6K 20K 102K 409K 1M 10M

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 44 / 63

Speedup comparison

0

2

4

6

8

10

12

14

16

18

50 100 1000 5000 20000 50000 500000

Speedup

NumberofNodes

CPUSeq CPUPar

GPUSeq GPUPar

Figure: Speedup comparison across four implementations.Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 45 / 63

Runtime breakdown

0

0.2

0.4

0.6

0.8

1

1.2

CPUSeq

CPUPar

GPUSeq

GPUPar

CPUSeq

CPUPar

GPUSeq

GPUPar

CPUSeq

CPUPar

GPUSeq

GPUPar

CPUSeq

CPUPar

GPUSeq

GPUPar

CPUSeq

CPUPar

GPUSeq

GPUPar

CPUSeq

CPUPar

GPUSeq

GPUPar

50 100 1000 5000 20000 50000

Build Merge MCMC

Figure: Runtime breakdown between four implementations.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 46 / 63

Rate of convergence

-600000

-500000

-400000

-300000

-200000

-100000

0

100000

200000

0 500000 1000000 1500000 2000000

GPU CPUSeq CPUPar

Figure: Change in objective function plotted against number of moves.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 47 / 63

Rate of convergence (in runtime)

-600000

-500000

-400000

-300000

-200000

-100000

0

100000

200000

0 50 100 150 200 250

GPU CPUSeq CPUPar

Figure: Change in objective function plotted against runtime in seconds.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 48 / 63

Raw runtime numbers and accuracy

CPU Seq CPU Par GPU Seq GPU ParNodes Time (s) Acc (%) Time (s) Acc (%) Time (s) Acc (%) Time (s) Acc (%)50 0.519 100 0.519 100 0.0876 100 0.0603 100100 0.802 100 0.531 82 0.2249 100 0.1779 1001000 5.193 81.41 0.939 100 3.153 100 1.5649 1005000 16.443 90 2.255 81.7 27.093 92.943 3.113 87.620000 118.201 94.6 29.97 93.93 51.519 96.5 7.671 88.550000 272.249 89.8 97.68 87.15 2902.4 97.6 23.707 89.2

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 49 / 63

Takeaways

It is surprisingly easy to make MCMC converge.

However, it’s a different story to make MCMC scalable.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 50 / 63

Future work

Use specialized triple matrix product kernel to take advantage ofknowledge about matrix structure.

Use load-balancing methods such as TWC to handle unbalanced data.

Try newer Bayesian inference methods such as minibatch MCMC andADVA (auto differentiation variational inference) that claim to scalebetter with data size than standard MCMC.

Add multi-GPU support.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 51 / 63

Questions?

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 52 / 63

Stochastic Block Model (SBM)

Holland, Laskey, and Leinhardt. ”Stochastic blockmodels: First steps.”Social networks 5.2 (1983)

Given N nodes in B blocks:

State: bi → block node i belongs to

Parameters: ηi → probability a node belongs in block i

λrs → probability an edge exists between block r and block s

1 Sample each node i.i.d. over ηi to obtain each node’s colour.

2 Sample each edge i.i.d. over Poi(λrs) to obtain blocks r and s theyconnect. For each edge, sample one node in block r with probability1nr

and one node in block s with probability 1ns

to determine whichtwo nodes the edge connects.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 53 / 63

Stochastic Block Model (SBM)

Holland, Laskey, and Leinhardt. ”Stochastic blockmodels: First steps.”Social networks 5.2 (1983)

Given N nodes in B blocks:

State: bi → block node i belongs to

Parameters: ηi → probability a node belongs in block i

λrs → probability an edge exists between block r and block s

The probability of generating a graph G and partition b given parametersη, λ assuming a Bernoulli edge distribution is:

P(G |b,M) =∏i

ηbi∏i<j

λAij

bibj(1− λbibj )

1−Aij

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 54 / 63

Variant of SBM we will use

Non-parametric: use Bayesian formulation instead of maximumlikelihood.

This solves the over-fitting problem.

Degree-corrected: add additional parameters ki for every node irepresenting its propensity for high degree

This accounts for the power law degree distribution that manyreal-world graphs exhibit.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 55 / 63

Expression

Taking negative logs of both sides:

− logP(b|G ,M) = − logP(G |b,M)︸ ︷︷ ︸S

− logP(b,M)︸ ︷︷ ︸L

+ logP(G )︸ ︷︷ ︸constant

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 56 / 63

Sequential MCMC for graph partitioning

Input: b: N × 1 current block assignment vector, M: B × B interblockedge count matrix, A: N × N adjacency matrix

1: procedure MCMCSequential(b,M,A)2: for node i do3: Propose random move for i : block r → s4: Acceptance probability:

5: paccept = min[exp(−β∆S)ps→r

pr→s, 1]

6: Perform move by updating b,M

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 57 / 63

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 58 / 63

Generative models

Idea

Before thinking of how to partition, we should come up with a model ofwhat we are looking for.

The parameters should describe block structure.

The parameter values are unknown, but can be inferred from the dataand the current state in a principled, statistical way.

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 59 / 63

Generative models: Sketch of algorithm

Given data G and an initial guess of partition b(0), we can compute M(1)

and b(1):

1 Compute model parameters M(1) using G and b(1).

2 Make better guess for partition b(1) using Bayesian inference:

arg maxb

P(b|G ,M) = arg maxb

P(G |b,M)P(b,M)

P(G )

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 60 / 63

Computing terms

S can be found by counting the number of configurations of the graph.The fewer configurations, the better our model fits the graph:

S =1

Ω

=( ∏

rs Mrs !∏r k

+r !∏

r k−r !

)−1

L can be found by counting:

L = log

((B

N

))+ logN!−

∑r

log nr !︸ ︷︷ ︸b term

+ log

((B2

E

))︸ ︷︷ ︸

M term

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 61 / 63

Variable-at-a-time Metropolis-Hastings

Algorithm 1 Sequential MCMC.

Input: b0: N × 1 state vector initialized randomlyOutput: bT : N × 1 vector equal to stationary distribution1: for iteration t = 1, 2, ... do2: for node i = 1, 2, ...,N do

3: Propose: b(cand)i ∼ q(bti |bt−1)

4: Acceptance probability:

α = min (q(bt−1

i |bcandi )π(bcandi )

q(bcandi |bt−1i )π(bt−1

i ), 1)

5: u ∼ Uniform(0, 1)6: if u < α then7: Accept proposal: bti ← bcandi

8: else9: Reject proposal: bti ← bt−1

i

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 62 / 63

Where SBM fits into machine learning

Hidden Markov Model

Latent Variable Model

Variational auto-encoders

Carl Yang, Steven Dalton, Maxim Naumov, Michael Garland,Aydın Buluc, John D. Owens (NVIDIA)Final Presentation March 26, 2018 63 / 63