CS246: Mining Massive Datasets Jure Leskovec,...

39
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu

Transcript of CS246: Mining Massive Datasets Jure Leskovec,...

Page 1: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

http://cs246.stanford.edu

Page 2: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

2

Nodes: Football Teams Edges: Games played

Can we identify node groups?

(communities, modules, clusters)

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 3: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

3

NCAA conferences

Nodes: Football Teams Edges: Games played

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 4: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

4

Can we identify functional modules?

Nodes: Proteins Edges: Physical interactions

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 5: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

5

Functional modules

Nodes: Proteins Edges: Physical interactions

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 6: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

6

Can we identify social communities?

Nodes: Facebook Users Edges: Friendships

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 7: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

7

High school Summer internship

Stanford (Squash) Stanford (Basketball)

Social communities

Nodes: Facebook Users Edges: Friendships

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 8: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Non-overlapping vs. overlapping communities

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 8

Page 9: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

9

Network Adjacency matrix

Nodes

Nod

es

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 10: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

What is the structure of community overlaps: Edge density in the overlaps is higher!

10

Communities as โ€œtilesโ€ 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 11: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

11

This is what we want! Communities in a network

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 12: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

1) Given a model, we generate the network:

2) Given a network, find the โ€œbestโ€ model

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 12

C

A

B

D E

H

F

G

C

A

B

D E

H

F

G

Generative model for networks

Generative model for networks

Page 13: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Goal: Define a model that can generate networks The model will have a set of โ€œparametersโ€ that we

will later want to estimate (and detect communities)

Q: Given a set of nodes, how do communities โ€œgenerateโ€ edges of the network?

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 13

C

A

B

D E

H

F

G

Generative model for networks

Page 14: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Generative model B(V, C, M, {pc}) for graphs: Nodes V, Communities C, Memberships M Each community c has a single probability pc

Later we fit the model to networks to detect communities

14

Model

Network

Communities, C

Nodes, V

Model

pA pB

Memberships, M

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 15: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

AGM generates the links: For each For each pair of nodes in community ๐‘จ๐‘จ,

we connect them with prob. ๐’‘๐’‘๐‘จ๐‘จ The overall edge probability is:

15

Model

โˆโˆฉโˆˆ

โˆ’โˆ’=vu MMc

cpvuP )1(1),(

Network

Communities, C

Nodes, V Community Affiliations

pA pB

Memberships, M

If ๐’–๐’–,๐’—๐’— share no communities: ๐‘ท๐‘ท ๐’–๐’–,๐’—๐’— = ๐œบ๐œบ Think of this as an โ€œORโ€ function: If at least 1 community says โ€œYESโ€ we create an edge

๐‘ด๐‘ด๐’–๐’– โ€ฆ set of communities node ๐’–๐’– belongs to

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 16: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

16

Model

Network 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 17: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

AGM can express a variety of community structures: Non-overlapping, Overlapping, Nested

17 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 18: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Detecting communities with AGM:

18

C

A

B

D E

H

F

G

Given a Graph, find the Model 1) Affiliation graph M 2) Number of communities C 3) Parameters pc

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 19: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Maximum Likelihood Principle (MLE): Given: Data ๐‘ฟ๐‘ฟ Assumption: Data is generated by some model ๐’‡๐’‡(๐šฏ๐šฏ) ๐’‡๐’‡ โ€ฆ model ๐šฏ๐šฏ โ€ฆ model parameters

Want to estimate ๐‘ท๐‘ท๐’‡๐’‡ ๐‘ฟ๐‘ฟ ๐šฏ๐šฏ): The probability that our model ๐’‡๐’‡ (with parameters ๐œฃ๐œฃ)

generated the data

Now letโ€™s find the most likely model that could have generated the data: arg max

ฮ˜ ๐‘ท๐‘ท๐’‡๐’‡ ๐‘ฟ๐‘ฟ ๐šฏ๐šฏ)

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 19

Page 20: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Imagine we are given a set of coin flips Task: Figure the bias of a coin! Data: Sequence of coin flips: ๐‘ฟ๐‘ฟ = [๐Ÿ๐Ÿ,๐ŸŽ๐ŸŽ,๐ŸŽ๐ŸŽ,๐ŸŽ๐ŸŽ,๐Ÿ๐Ÿ,๐ŸŽ๐ŸŽ,๐ŸŽ๐ŸŽ,๐Ÿ๐Ÿ] Model: ๐’‡๐’‡ ๐šฏ๐šฏ = return 1 with prob. ฮ˜, else return 0 What is ๐‘ท๐‘ท๐’‡๐’‡ ๐‘ฟ๐‘ฟ ๐šฏ๐šฏ ? Assuming coin flips are independent So, ๐‘ท๐‘ท๐’‡๐’‡ ๐‘ฟ๐‘ฟ ๐šฏ๐šฏ = ๐‘ท๐‘ท๐’‡๐’‡ ๐ŸŽ๐ŸŽ ๐šฏ๐šฏ โˆ— ๐‘ท๐‘ท๐’‡๐’‡ ๐Ÿ๐Ÿ ๐šฏ๐šฏ โˆ— ๐‘ท๐‘ท๐’‡๐’‡ ๐Ÿ๐Ÿ ๐šฏ๐šฏ โ€ฆ โˆ— ๐‘ท๐‘ท๐’‡๐’‡ ๐ŸŽ๐ŸŽ ๐šฏ๐šฏ What is ๐‘ท๐‘ท๐’‡๐’‡ ๐Ÿ๐Ÿ ๐šฏ๐šฏ ? Simple, ๐‘ท๐‘ท๐’‡๐’‡ ๐ŸŽ๐ŸŽ ๐šฏ๐šฏ = ๐šฏ๐šฏ

Then, ๐‘ท๐‘ท๐’‡๐’‡ ๐‘ฟ๐‘ฟ ๐šฏ๐šฏ = ๐šฏ๐šฏ๐Ÿ‘๐Ÿ‘ ๐Ÿ๐Ÿ โˆ’ ๐šฏ๐šฏ ๐Ÿ“๐Ÿ“ For example: ๐‘ท๐‘ท๐’‡๐’‡ ๐‘ฟ๐‘ฟ ๐šฏ๐šฏ = ๐ŸŽ๐ŸŽ.๐Ÿ“๐Ÿ“ = ๐ŸŽ๐ŸŽ.๐ŸŽ๐ŸŽ๐ŸŽ๐ŸŽ๐Ÿ‘๐Ÿ‘๐ŸŽ๐ŸŽ๐ŸŽ๐ŸŽ๐ŸŽ๐ŸŽ

๐‘ท๐‘ท๐’‡๐’‡ ๐‘ฟ๐‘ฟ ๐šฏ๐šฏ = ๐Ÿ‘๐Ÿ‘๐Ÿ–๐Ÿ–

= ๐ŸŽ๐ŸŽ.๐ŸŽ๐ŸŽ๐ŸŽ๐ŸŽ๐Ÿ“๐Ÿ“๐ŸŽ๐ŸŽ๐ŸŽ๐ŸŽ๐ŸŽ๐ŸŽ

What did we learn? Our data was most likely generated by coin with bias ๐šฏ๐šฏ = ๐Ÿ‘๐Ÿ‘/๐Ÿ–๐Ÿ–

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 20 ๐‘ท๐‘ท๐’‡๐’‡๐‘ฟ๐‘ฟ๐šฏ๐šฏ

๐šฏ๐šฏ

๐šฏ๐šฏโˆ— = ๐Ÿ‘๐Ÿ‘/๐Ÿ–๐Ÿ–

Page 21: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

How do we do MLE for graphs? Model generates a probabilistic adjacency matrix We then flip all the entries of the probabilistic

matrix to obtain the binary adjacency matrix

The likelihood of AGM generating graph G:

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 21

0.25 0.10 0.10 0.04 0.05 0.15 0.02 0.06 0.05 0.02 0.15 0.06 0.01 0.03 0.03 0.09

1 1 0 0 0 1 0 1 1 0 1 1 0 1 0 1

For every pair of nodes ๐’–๐’–,๐’—๐’— AGM gives the prob. ๐’‘๐’‘๐’–๐’–๐’—๐’— of them being linked

Flip biased coins

)),(1(),()|(),(),(

vuPvuPGPGvuGvu

โˆ’ฮ ฮ =ฮ˜โˆ‰โˆˆ

Page 22: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Given graph G and ฮ˜ we calculate likelihood that ฮ˜ generated G: P(G|ฮ˜)

0.25 0.10 0.10 0.04

0.05 0.15 0.02 0.06

0.05 0.02 0.15 0.06

0.01 0.03 0.03 0.09 ฮ˜=B(V, C, M, {pc})

1 0 1 1

0 1 0 1

1 0 1 1

1 1 1 1

G

P(G|ฮ˜)

)),(1(),()|(),(),(

vuPvuPGPGvuGvu

โˆ’ฮ ฮ =ฮ˜โˆ‰โˆˆ

G

2/13/2014 22 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 23: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Our goal: Find ๐šฏ๐šฏ = ๐‘ฉ๐‘ฉ(๐‘ฝ๐‘ฝ,๐‘ช๐‘ช,๐‘ด๐‘ด, ๐’‘๐’‘๐‘ช๐‘ช ) such that:

How do we find ๐‘ฉ๐‘ฉ(๐‘ฝ๐‘ฝ,๐‘ช๐‘ช,๐‘ด๐‘ด, ๐’‘๐’‘๐‘ช๐‘ช ) that maximizes the likelihood? No nice way, have to search over

all affiliation graphs But, donโ€™t despairโ€ฆ

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 23

ฮ˜P( | ) AGM

arg max ฮ˜

๐‘ฎ๐‘ฎ

Page 24: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Relaxation: Memberships have strengths

๐‘ญ๐‘ญ๐’–๐’–๐‘จ๐‘จ: The membership strength of node ๐’–๐’– to community ๐‘จ๐‘จ (๐‘ญ๐‘ญ๐’–๐’–๐‘จ๐‘จ = ๐ŸŽ๐ŸŽ: no membership) Each community ๐‘จ๐‘จ links nodes independently:

๐‘ท๐‘ท๐‘จ๐‘จ ๐’–๐’–,๐’—๐’— = ๐Ÿ๐Ÿ โˆ’ ๐ž๐ž๐ž๐ž๐ž๐ž (โˆ’๐‘ญ๐‘ญ๐’–๐’–๐‘จ๐‘จ โ‹… ๐‘ญ๐‘ญ๐’—๐’—๐‘จ๐‘จ)

24

๐‘ญ๐‘ญ๐’–๐’–๐‘จ๐‘จ

u v

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 25: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Community membership strength matrix ๐‘ญ๐‘ญ

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 25

๐‘ญ๐‘ญ =

j

Communities

Nod

es

๐‘ญ๐‘ญ๐’–๐’–๐‘จ๐‘จ โ€ฆ strength of ๐’–๐’–โ€™s membership to ๐‘จ๐‘จ

๐‘ญ๐‘ญ๐’—๐’— โ€ฆ vector of community membership strengths of ๐’–๐’–

๐‘ท๐‘ท๐‘จ๐‘จ ๐’–๐’–,๐’—๐’— = ๐Ÿ๐Ÿ โˆ’ ๐ž๐ž๐ž๐ž๐ž๐ž (โˆ’๐‘ญ๐‘ญ๐’–๐’–๐‘จ๐‘จ โ‹… ๐‘ญ๐‘ญ๐’—๐’—๐‘จ๐‘จ) Probability of connection is

proportional to the product of strengths

Prob. that at least one common community ๐‘ช๐‘ช links the nodes: ๐‘ท๐‘ท ๐’–๐’–,๐’—๐’— = ๐Ÿ๐Ÿ โˆ’ โˆ ๐Ÿ๐Ÿ โˆ’ ๐‘ท๐‘ท๐‘ช๐‘ช ๐’–๐’–,๐’—๐’—๐‘ช๐‘ช

Page 26: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Community ๐‘จ๐‘จ links nodes ๐’–๐’–,๐’—๐’— independently: ๐‘ท๐‘ท๐‘จ๐‘จ ๐’–๐’–,๐’—๐’— = ๐Ÿ๐Ÿ โˆ’ ๐ž๐ž๐ž๐ž๐ž๐ž (โˆ’๐‘ญ๐‘ญ๐’–๐’–๐‘จ๐‘จ โ‹… ๐‘ญ๐‘ญ๐’—๐’—๐‘จ๐‘จ)

Then prob. at least one common ๐‘ช๐‘ช links them: ๐‘ท๐‘ท ๐’–๐’–,๐’—๐’— = ๐Ÿ๐Ÿ โˆ’ โˆ ๐Ÿ๐Ÿ โˆ’ ๐‘ท๐‘ท๐‘ช๐‘ช ๐’–๐’–,๐’—๐’—๐‘ช๐‘ช = ๐Ÿ๐Ÿ โˆ’ ๐ž๐ž๐ž๐ž๐ž๐ž (โˆ’โˆ‘ ๐‘ญ๐‘ญ๐’–๐’–๐‘ช๐‘ช โ‹… ๐‘ญ๐‘ญ๐’—๐’—๐‘ช๐‘ช๐‘ช๐‘ช )

= ๐Ÿ๐Ÿ โˆ’ ๐ž๐ž๐ž๐ž๐ž๐ž (โˆ’๐‘ญ๐‘ญ๐’–๐’– โ‹… ๐‘ญ๐‘ญ๐’—๐’—๐‘ป๐‘ป) For example:

26

๐‘ญ๐‘ญ๐’–๐’–:

๐‘ญ๐‘ญ๐’—๐’—: Then: ๐‘ญ๐‘ญ๐’–๐’– โ‹… ๐‘ญ๐‘ญ๐’—๐’—๐‘ป๐‘ป = ๐ŸŽ๐ŸŽ.๐Ÿ๐Ÿ๐ŸŽ๐ŸŽ And: ๐‘ท๐‘ท ๐’–๐’–,๐’—๐’— = ๐Ÿ๐Ÿ โˆ’ ๐’†๐’†๐’†๐’†๐’‘๐’‘ โˆ’๐ŸŽ๐ŸŽ.๐Ÿ๐Ÿ๐ŸŽ๐ŸŽ = ๐ŸŽ๐ŸŽ.๐Ÿ๐Ÿ๐Ÿ๐Ÿ But: ๐‘ท๐‘ท ๐’–๐’–,๐’˜๐’˜ = ๐ŸŽ๐ŸŽ.๐Ÿ–๐Ÿ–๐Ÿ–๐Ÿ– ๐‘ท๐‘ท ๐’—๐’—,๐’˜๐’˜ = ๐ŸŽ๐ŸŽ

๐‘ญ๐‘ญ๐’˜๐’˜: Node community

membership strengths 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

0 1.2 0 0.2

0.5 0 0 0.8

0 1.8 1 0

Page 27: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Another way to think about this is: Think of a weighted graph (but we donโ€™t see the

weights) ๐’–๐’–,๐’—๐’— interact with strength ๐‘ฟ๐‘ฟ๐’–๐’–๐’—๐’—๐‘จ๐‘จ ~๐‘ƒ๐‘ƒ๐‘ƒ๐‘ƒ๐‘ƒ๐‘ƒ๐‘ƒ๐‘ƒ(๐‘ญ๐‘ญ๐’–๐’–๐‘จ๐‘จ โ‹… ๐‘ญ๐‘ญ๐’—๐’—๐‘จ๐‘จ) Total interaction strength is the sum: ๐‘ฟ๐‘ฟ๐’–๐’–๐’—๐’— = โˆ‘ ๐‘ฟ๐‘ฟ๐’–๐’–๐’—๐’—๐‘ช๐‘ช๐‘ช๐‘ช Then: ๐‘ฟ๐‘ฟ๐’–๐’–๐’—๐’—~๐‘ƒ๐‘ƒ๐‘ƒ๐‘ƒ๐‘ƒ๐‘ƒ๐‘ƒ๐‘ƒ(โˆ‘ ๐‘ญ๐‘ญ๐’–๐’–๐‘ช๐‘ช โ‹… ๐‘ญ๐‘ญ๐’—๐’—๐‘ช๐‘ช๐ถ๐ถ ) We observe an edge if there is non-zero interaction: ๐‘ƒ๐‘ƒ ๐‘‹๐‘‹๐‘ข๐‘ข๐‘ข๐‘ข > 0 = 1 โˆ’ ๐‘ƒ๐‘ƒ ๐‘‹๐‘‹๐‘ข๐‘ข๐‘ข๐‘ข = 0 = = ๐Ÿ๐Ÿ โˆ’ ๐ž๐ž๐ž๐ž๐ž๐ž (โˆ’๐‘ญ๐‘ญ๐’–๐’– โ‹… ๐‘ญ๐‘ญ๐’—๐’—๐‘ป๐‘ป)

27 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Poisson PMF:

๐‘ท๐‘ท(๐‘ฟ๐‘ฟ = ๐’†๐’†) =๐€๐€๐’Œ๐’Œ

๐’Œ๐’Œ!๐’†๐’†โˆ’๐€๐€

Page 28: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Task: Given a network, estimate ๐‘ญ๐‘ญ Find ๐‘ญ๐‘ญ that maximizes the log-likelihood Many times we take the logarithm of the likelihood and

call it log-likelihood: ๐‘™๐‘™ ๐น๐น = log ๐‘ƒ๐‘ƒ(๐บ๐บ|๐น๐น)

28

Objective function is biconvex Non-negative matrix factorization: Update ๐‘ญ๐‘ญ๐’–๐’–๐‘ช๐‘ช for node ๐’–๐’– while fixing the

memberships of all other nodes

[wsdm โ€˜13]

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 29: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Task: Given a network, estimate ๐‘ญ๐‘ญ Find ๐‘ญ๐‘ญ that maximizes the log-likelihood Many times we take the logarithm of the likelihood and

call it log-likelihood: ๐’๐’ ๐‘ญ๐‘ญ = ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ ๐‘ท๐‘ท(๐‘ฎ๐‘ฎ|๐‘ญ๐‘ญ)

29

Connection: Non-negative matrix factorization We want to โ€œfactorโ€ the adjacency matrix G

F FT Matrix of

edge probs.

๐Ÿ๐Ÿ โˆ’ ๐’†๐’†๐’†๐’†๐’‘๐’‘ (โˆ’ . ) = 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Graph G

Page 30: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Non-negative matrix factorization: Use a gradient based approach! But updating the whole ๐‘ญ๐‘ญ takes too long Computing the gradient takes quadratic time! Our network are far too big to allow for that: Network with 1M nodes, can have 1012 different edges

Instead, update ๐‘ญ๐‘ญ in small โ€œunitsโ€: Update ๐‘ญ๐‘ญ๐’–๐’–๐‘ช๐‘ช for node ๐’–๐’– while fixing the memberships of

all other nodes

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 30

Page 31: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Compute gradient of a single row ๐‘ญ๐‘ญ๐’–๐’– of ๐‘ญ๐‘ญ:

Coordinate gradient ascent: Iterate over the rows of ๐‘ญ๐‘ญ: Compute gradient ๐œต๐œต๐’๐’ ๐‘ญ๐‘ญ๐’–๐’– of row ๐’–๐’– (while keeping others fixed) Update the row ๐‘ญ๐‘ญ๐’–๐’–: ๐‘ญ๐‘ญ๐’–๐’– โ† ๐‘ญ๐‘ญ๐’–๐’– + ๐œผ๐œผ ๐›๐›๐’๐’(๐‘ญ๐‘ญ๐’–๐’–) Project ๐‘ญ๐‘ญ๐’–๐’– back to a non-negative vector: If ๐‘ญ๐‘ญ๐’–๐’–๐‘ช๐‘ช < ๐ŸŽ๐ŸŽ: ๐‘ญ๐‘ญ๐’–๐’–๐‘ช๐‘ช = ๐ŸŽ๐ŸŽ

This is slow! Computing ๐œต๐œต๐’๐’ ๐‘ญ๐‘ญ๐’–๐’– takes linear time!

31 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 32: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

However, we notice:

We can cache โˆ‘ ๐‘ญ๐‘ญ๐’—๐’—๐’—๐’— So, computing โˆ‘ ๐‘ญ๐‘ญ๐’—๐’—๐’—๐’—โˆ‰๐“๐“(๐’–๐’–) now takes linear time

in the degree ๐“๐“(๐’–๐’–) of ๐’–๐’– In networks degree of a node is much smaller to the total

number of nodes in the network, so this is a significant speedup!

32 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 33: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

BigCLAM takes 5 minutes for 300k node nets Other methods take 10 days

Can process networks with 100M edges! 33

Page 34: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

34 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 35: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Extension: Make community membership edges directed: Outgoing membership: Nodes โ€œsendsโ€ edges Incoming membership: Node โ€œreceivesโ€ edges

35 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 36: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 36

Page 37: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Everything is almost the same except now we have 2 matrices: ๐‘ญ๐‘ญ and ๐‘ฏ๐‘ฏ ๐‘ญ๐‘ญโ€ฆ out-going community memberships ๐‘ฏ๐‘ฏโ€ฆ in-coming community memberships

Edge prob.: ๐‘ท๐‘ท ๐’–๐’–,๐’—๐’— = ๐Ÿ๐Ÿ โˆ’ ๐’†๐’†๐’†๐’†๐’‘๐’‘(โˆ’๐‘ญ๐‘ญ๐’–๐’–๐‘ฏ๐‘ฏ๐’—๐’—๐‘ป๐‘ป)

Network log-likelihood:

which we optimize the same way as before 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 37

๐‘ญ๐‘ญ ๐‘ฏ๐‘ฏ

Page 38: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

38 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets

Page 39: CS246: Mining Massive Datasets Jure Leskovec, ...snap.stanford.edu/class/cs246-2014/slides/12-graphs2.pdfInternational Conference on Web Search and Data Mining (WSDM), 2014. Community

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach by J. Yang, J. Leskovec. ACM International Conference on Web Search and Data Mining (WSDM), 2013.

Detecting Cohesive and 2-mode Communities in Directed and Undirected Networks by J. Yang, J. McAuley, J. Leskovec. ACM International Conference on Web Search and Data Mining (WSDM), 2014.

Community Detection in Networks with Node Attributes by J. Yang, J. McAuley, J. Leskovec. IEEE International Conference On Data Mining (ICDM), 2013.

2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 39