Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ......

30
CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

Transcript of Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ......

Page 1: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

CS 322: (Social and Information) Network AnalysisJure LeskovecStanford University

Page 2: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Initially some nodes S are activeInitially some nodes S are active Each edge (a,b) has probability (weight) pab

a d0.4

d

b

a d

f

0.4 0.20.20.3

0.30.3

d

e

g

fh

i

0.4

0.20.4 0.3

0.30.3 0.2g

i

e

Node a becomes active:  ti t d b ith b

cg

i0.4g

i

activates node b with prob. pab Activations spread through the network

10/29/2009 2Jure Leskovec, Stanford CS322: Network Analysis

Page 3: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

If S is initial active set, let f(S) denoteIf S is initial active set, let f(S) denote expected size of final active set 

S is more influential if f(S) is larger10/29/2009 3Jure Leskovec, Stanford CS322: Network Analysis

Page 4: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Blogs – information epidemics Blogs – information epidemics Which are the influential/infectious blogs?

Viral marketing Who are the trendsetters?Who are the trendsetters?  Influential people?

Disease spreading Where to place monitoring stations to detect p gepidemics?

410/29/2009 Jure Leskovec, Stanford CS322: Network Analysis

Page 5: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Most influential set of size k: set S of k nodesMost influential set of size k: set S of k nodes producing largest expected cascade size f(S) if activated [Domingos‐Richardson ‘01]activated [Domingos Richardson  01]

10/29/2009 5Jure Leskovec, Stanford CS322: Network Analysis

Page 6: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Optimization problem: Optimization problem:

)(maxksizeofS

Sf

How hard is this problem?

ksizeofS

NP‐HARD! Show that finding most influential Show that finding most influential set is at least as hard as set cover

10/29/2009 6Jure Leskovec, Stanford CS322: Network Analysis

Page 7: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Set cover:Set cover: Given U={u1,…,un} and sets S1,…, Sm U Are there k sets among S S Are there k sets among S1,…, Smsuch that their union is U?

Goal:Goal: Encode set cover as an instance of 

10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis

)(maxk size of S

Sf7

Page 8: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Given a set cover instance with sets S1,…, SGiven a set cover instance with sets S1,…, Sm Build a graph:

‐‐Create  edge (Si, u)edge (Si, u)Si uSi.‐‐ Directed edge from all sets to elements.‐‐ Put weight 1 on each edge

There exists a set S of size k with f(S)=k+n iff

g

There exists a set S of size k with f(S) k+n iffthere exists a size k set cover

10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 8

Page 9: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Bad news: Bad news: Influence maximization is NP‐hard

Good news: There e ists an appro imation al orithm! There exists an approximation algorithm!

Consider: Consider: Greedy hill‐climbing to find a good set S

10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 9

Page 10: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Start with S ={} Start with S0={} For i=1…k Choose node v that max f(S {v}) f(Si 1{v}) Choose node v that max f(Si‐1{v}) Let Si = Si‐1{v}

a

bd

f(Si‐1{v})

What is the runtime?E h t j t ti

b

ca

b

c de

Each step just runs n time steps for each node v

c d

e

10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 10

Page 11: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Hill climbing produces a solution S whereHill climbing produces a solution S where f(s) (1‐1/e) of optimal value  (~63%)[Hemhauser, Fisher, Wolsey ’78, Kempe, Kleinberg, Tardos ‘03]

Claim holds for functions f with 2 properties: f is monotone:f is monotone:if S T then f(S)  f(T) and f({})=0 f is submodular:

ddi l i l i hadding element to a set gives less improvement than adding to one of subsets

10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 11

Page 12: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Diminishing returns:gf

size of set

f(S { }) f(S) ≥ f(T { }) f(T)Gain of adding a node to a small set Gain of adding a node to a large set

f(S  {u}) – f(S)   ≥  f(T  {u}) – f(T)10/29/2009 12Jure Leskovec, Stanford CS322: Network Analysis

Page 13: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Show 2 things: Show 2 things:

1) Our f(S) is submodular) ( )

2) Hill climbing works well for monotone submodular functions

10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 13

Page 14: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Keep adding nodes that give the largest gain Keep adding nodes that give the largest gain Start with S0={}, produce sets S1, S2,…,Sk Add elements one by one Add elements one by one

Marginal gain: i = f(Si) ‐ f(Si‐1)

Let T={t1…tk} be the best set of size k

We need to show: f(S)  (1‐1/e) f(T)

10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 14

Page 15: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Claim: f(AB) f(A) k f(A{b }) f(A) Claim: f(AB)‐f(A)  j f(A {bj}) ‐ f(A) where: B = {b1,…,bk} and f is submodular, 

Proof: Let B = {b b } Proof: Let Bi = {b1,…bi}f(AB)‐f(A) = 

10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 15

Page 16: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

f(T) f(SiT)                               T={t1…tk}( ) ( i ) { 1 k}=  f(SiT) ‐ f(Si)  + f(Si) 

jk [f(Si {tj}) ‐ f(Si)] + f(Si)

jk i+1 + f(Si) =j i+1  ( i)

Thus: f(T)  f(Si) + k i+1( ) ( i) i+1

10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 16

Page 17: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

We know: 1/k (f(T) f(S )) We know: i+1  1/k (f(T)‐f(Si))

What is f(S )= What is f(Si+1)=

What is f(S )? What is f(Sk)?

10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 17

Page 18: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Claim: )(111)( TfSfi

Claim:

Proof by induction:

)(11)( Tfk

Sf i

Proof by induction: i=0:

10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 18

Page 19: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Claim: )(111)( TfSfi

Claim: 

Induction: At i+1:

)(11)( Tfk

Sf i

At i+1:

10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 19

Page 20: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Thus: )(111)()( TfSfSfk

Thus:

Then:

)(11)()( Tfk

SfSf k

Then:

f(S ) =f(Sk) =  

10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 20

Page 21: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Next we must show f is submodular: S T Next, we must show f is submodular: S T

Gain of adding a node to a small set Gain of adding a node to a large set

F(S  {u}) – f(S)   ≥  f(T  {u}) – f(T)Gain of adding a node to a small set Gain of adding a node to a large set

Basic fact: If f1,…,fK are submodular, and c1,…,ck 0 then 

is also submodular2110/29/2009 Jure Leskovec, Stanford CS322: Network Analysis

Page 22: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

f(S {u}) – f(S) ≥ f(T {u}) – f(T)f(S  {u}) – f(S)   ≥  f(T  {u}) – f(T) A simple submodular function: Sets S S Sets S1,…,Sk f(S) = |i Si|

10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 22

Page 23: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Principle of deferreda d

0.4

0 4Principle of deferred decision: Generate randomness

bef

h

0.4 0.20.20.3

0.30.3

Generate randomness ahead of time

cg

h

i0.4

0.4 0.2 0.4 0.30.30.3 0.2

• Flip a coin for each edge to decide whether it will succeed when (if ever) it attempts to transmitEd hi h ti ti ill d li• Edges on which activation will succeed are live

• f(Si) = size of the set reachable by live‐edge paths10/29/2009 23Jure Leskovec, Stanford CS322: Network Analysis

Page 24: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

Fix outcome i of coina d

0.4

0 4 Fix outcome i of coin flips

Let fi(S) be size of d f S i

bef

h

0.4 0.20.20.3

0.30.3

cascade from S given these coin flips

cg

h

i0.4

0.4 0.2 0.4 0.30.30.3 0.2

• Let fi(v) = set of nodes reachable from v on live‐edge paths

• fi(S) = size of union fi(v) → fi is submodularfi( ) fi( ) fi• f = ∑ fi→ f is submodular10/29/2009 24Jure Leskovec, Stanford CS322: Network Analysis

Page 25: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

What do we know about optimizing ll l b

p gsubmodular functions?

A hill‐climbing (i.e., greedy) is near a d

reward

Hill‐climbing

optimal (1-1/e (~63%) of optimal) But 

a

b ab

d

Hill‐climbing algorithm is slow At each iteration we need to re‐evaluate marginal gains

cc

d

e

g g It scales as O(n k)e

Add sensor with highest Add sensor with highest marginal gain

Jure Leskovec, Stanford CS322: Network Analysis10/29/2009 25

Page 26: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

[Leskovec et al., KDD ’07]

Observation:Submodularity guarantees that marginal benefits decrease with the solution size

Marginal gain f(Si{x})

Idea: exploit submodularity, doing lazy 

x

p y, g yevaluations!

Jure Leskovec, Stanford CS322: Network Analysis Part 2‐2610/29/2009

Page 27: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

[Leskovec et al., KDD ’07]

Lazy hill climbing: Lazy hill‐climbing: Keep an ordered list of marginal benefits b from previous

a dMarginal gain

benefits bi from previous iteration Re‐evaluate bi only for top

b

c

ab

ceRe evaluate bi only for top 

node Re‐sort and prune

cd

eRe sort and prune e

Jure Leskovec, Stanford CS322: Network Analysis Part 2‐2710/29/2009

Page 28: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

[Leskovec et al., KDD ’07]

Lazy hill climbing: Lazy hill‐climbing: Keep an ordered list of marginal benefits b from previous

a dMarginal gain

benefits bi from previous iteration Re‐evaluate bi only for top

ab

c

b

c eRe evaluate bi only for top node Re‐sort and prune

cd

eRe sort and prune e

Jure Leskovec, Stanford CS322: Network Analysis Part 2‐2810/29/2009

Page 29: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

[Leskovec et al., KDD ’07]

Lazy hill climbing: Lazy hill‐climbing: Keep an ordered list of marginal benefits b from previous

a dMarginal gain

benefits bi from previous iteration Re‐evaluate bi only for top

ab

c

d

b eRe evaluate bi only for top node Re‐sort and prune c

ce

Re sort and prune

Jure Leskovec, Stanford CS322: Network Analysis Part 2‐2910/29/2009

Page 30: Stanford Universitysnap.stanford.edu/na09/12-influence-annot.pdf · Stanford University ... 10/29/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

[Leskovec et al., KDD ’07]

Given a real city waterGiven a real city water distribution network

A d d t h And data on how contaminants spread in the network

Problem posed by US Environmental SSEnvironmental Protection Agency

SS

10/29/2009 30Jure Leskovec, Stanford CS322: Network Analysis