Understanding and Managing Cascades on Large Graphs

156
Understanding and Managing Cascades on Large Graphs B. Aditya Prakash Carnegie Mellon University Virginia Tech. Christos Faloutsos Carnegie Mellon University Graph Analytics wkshp B. A. Prakash; C. Faloutsos

description

Understanding and Managing Cascades on Large Graphs. B. Aditya Prakash Carnegie Mellon University  Virginia Tech. Christos Faloutsos Carnegie Mellon University. From: VLDB’12 tutorial. http://vldb.org/pvldb/vol5/p2024_badityaprakash_vldb2012.pdf. B. Aditya Prakash. - PowerPoint PPT Presentation

Transcript of Understanding and Managing Cascades on Large Graphs

Page 1: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

Understanding and Managing Cascades on

Large GraphsB. Aditya Prakash

Carnegie Mellon University Virginia Tech.

Christos FaloutsosCarnegie Mellon University

Graph Analytics wkshp

Page 2: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 2

From: VLDB’12 tutorial• http://vldb.org/pvldb/vol5/

p2024_badityaprakash_vldb2012.pdf

Graph Analytics wkshp

B. Aditya Prakash

Page 3: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 3

Networks are everywhere!

Human Disease Network [Barabasi 2007]

Gene Regulatory Network [Decourty 2008]

Facebook Network [2010]

The Internet [2005]

Graph Analytics wkshp

Page 4: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 4

Dynamical Processes over networks are also everywhere!

Graph Analytics wkshp

Page 5: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 5

Why do we care?• Social collaboration• Information Diffusion• Viral Marketing• Epidemiology and Public Health• Cyber Security• Human mobility • Games and Virtual Worlds • Ecology........Graph Analytics wkshp

Page 6: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 6

Why do we care? (1: Epidemiology)

• Dynamical Processes over networks[AJPH 2007]

CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts

Diseases over contact networks

Graph Analytics wkshp

Page 7: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 8

Why do we care? (1: Epidemiology)

• Dynamical Processes over networks

• Each circle is a hospital• ~3000 hospitals• More than 30,000 patients transferred

[US-MEDICARE NETWORK 2005]

Problem: Given k units of disinfectant, whom to immunize?

Graph Analytics wkshp

Page 8: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 9

Why do we care? (1: Epidemiology)

CURRENT PRACTICE OUR METHOD

~6x fewer!

[US-MEDICARE NETWORK 2005]

Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year)Graph Analytics wkshp

Page 9: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 10

Why do we care? (2: Online Diffusion)

> 800m users, ~$1B revenue [WSJ 2010]

~100m active users

> 50m users

Graph Analytics wkshp

Page 10: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 11

Why do we care? (2: Online Diffusion)

• Dynamical Processes over networks

Celebrity

Buy Versace™!

Followers

Social Media MarketingGraph Analytics wkshp

Page 11: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 12

Why do we care? (3: To change the world?)

• Dynamical Processes over networks

Social networks and Collaborative ActionGraph Analytics wkshp

Page 12: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 13

High Impact – Multiple Settings

Q. How to squash rumors faster?

Q. How do opinions spread?

Q. How to market better?

epidemic out-breaks

products/viruses

transmit s/w patches

Graph Analytics wkshp

Page 13: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 14

Research Theme

DATALarge real-world

networks & processes

ANALYSISUnderstanding

POLICY/ ACTIONManaging

Graph Analytics wkshp

Page 14: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 15

Research Theme – Public Health

DATAModeling # patient

transfers

ANALYSISWill an epidemic

happen?

POLICY/ ACTION

How to control out-breaks?Graph Analytics wkshp

Page 15: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 16

Research Theme – Social Media

DATAModeling Tweets

spreading

POLICY/ ACTION

How to market better?

ANALYSIS# cascades in

future?

Graph Analytics wkshp

Page 16: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 17

In this tutorial

ANALYSISUnderstanding

Given propagation models:

Q1: Will an epidemic happen?

Graph Analytics wkshp

Page 17: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 18

In this tutorial

Q2: How to immunize and control out-breaks better?

POLICY/ ACTIONManaging

Graph Analytics wkshp

Page 18: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 19

In this tutorial

DATALarge real-world

networks & processes

Q3: How do #hashtags spread?

Graph Analytics wkshp

Page 19: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 20

Outline• Motivation• Part 1: Understanding Epidemics (Theory)• Part 2: Policy and Action (Algorithms)• Part 3: Learning Models (Empirical Studies)• Conclusion

Graph Analytics wkshp

Page 20: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 21

Part 1: Theory• Q1: What is the epidemic threshold?• Q2: How do viruses compete?

Graph Analytics wkshp

Page 21: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 22

A fundamental questionStrong Virus

Epidemic?

Graph Analytics wkshp

Page 22: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 23

example (static graph)Weak Virus

Epidemic?

Graph Analytics wkshp

Page 23: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 24

Problem Statement

Find, a condition under which– virus will die out exponentially quickly– regardless of initial infection condition

above (epidemic)

below (extinction)

# Infected

time

Separate the regimes?

Graph Analytics wkshp

Page 24: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 25

Threshold (static version)Problem Statement• Given: –Graph G, and –Virus specs (attack prob. etc.)

• Find: –A condition for virus extinction/invasion

Graph Analytics wkshp

Page 25: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 26

Threshold: Why important?• Accelerating simulations• Forecasting (‘What-if’ scenarios)• Design of contagion and/or topology• A great handle to manipulate the spreading– Immunization– Maximize collaboration…..

Graph Analytics wkshp

Page 26: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 27

Part 1: Theory• Q1: What is the epidemic threshold?– Background– Result and Intuition (Static Graphs)– Proof Ideas (Static Graphs)– Bonus: Dynamic Graphs

• Q2: How do viruses compete?

Graph Analytics wkshp

Page 27: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 28

“SIR” model: life immunity (mumps)

• Each node in the graph is in one of three states– Susceptible (i.e. healthy)– Infected– Removed (i.e. can’t get infected again)

Prob. β Prob. δ

t = 1 t = 2 t = 3

Background

Graph Analytics wkshp

Page 28: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 29

Terminology: continued• Other virus propagation models (“VPM”)– SIS : susceptible-infected-susceptible, flu-like– SIRS : temporary immunity, like pertussis– SEIR : mumps-like, with virus incubation (E = Exposed)….………….

• Underlying contact-network – ‘who-can-infect-whom’

Background

Graph Analytics wkshp

Page 29: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 30

Related Work R. M. Anderson and R. M. May. Infectious Diseases of Humans. Oxford University Press,

1991. A. Barrat, M. Barthélemy, and A. Vespignani. Dynamical Processes on Complex Networks.

Cambridge University Press, 2010. F. M. Bass. A new product growth for model consumer durables. Management Science,

15(5):215–227, 1969. D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and C. Faloutsos. Epidemic thresholds in

real networks. ACM TISSEC, 10(4), 2008. D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly

Connected World. Cambridge University Press, 2010. A. Ganesh, L. Massoulie, and D. Towsley. The effect of network topology in spread of

epidemics. IEEE INFOCOM, 2005. Y. Hayashi, M. Minoura, and J. Matsukubo. Recoverable prevalence in growing scale-free

networks and the effective immunization. arXiv:cond-at/0305549 v2, Aug. 6 2003. H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42, 2000. H. W. Hethcote and J. A. Yorke. Gonorrhea transmission dynamics and control. Springer

Lecture Notes in Biomathematics, 46, 1984. J. O. Kephart and S. R. White. Directed-graph epidemiological models of computer

viruses. IEEE Computer Society Symposium on Research in Security and Privacy, 1991. J. O. Kephart and S. R. White. Measuring and modeling computer virus prevalence. IEEE

Computer Society Symposium on Research in Security and Privacy, 1993. R. Pastor-Santorras and A. Vespignani. Epidemic spreading in scale-free networks.

Physical Review Letters 86, 14, 2001.

……… ……… ………

All are about either:

• Structured topologies (cliques, block-diagonals, hierarchies, random)

• Specific virus propagation models

• Static graphs

Background

Graph Analytics wkshp

Page 30: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 31

Part 1: Theory• Q1: What is the epidemic threshold?– Background– Result and Intuition (Static Graphs)– Proof Ideas (Static Graphs)– Bonus: Dynamic Graphs

• Q2: How do viruses compete?

Graph Analytics wkshp

Page 31: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 32

How should the answer look like?

• Answer should depend on:– Graph– Virus Propagation Model (VPM)

• But how??– Graph – average degree? max. degree? diameter?– VPM – which parameters? – How to combine – linear? quadratic? exponential?

?diameterdavg ?/)( max22 ddd avgavg …..

Graph Analytics wkshp

Page 32: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 33

Static Graphs: Our Main Result

• Informally,

For, any arbitrary topology (adjacency matrix A) any virus propagation model (VPM) in standard literature

the epidemic threshold depends only 1. on the λ, first eigenvalue of A, and 2. some constant , determined by

the virus propagation model

λVPMC

No epidemic if λ *

< 1

VPMCVPMC

In Prakash+ ICDM 2011 (Selected among best papers).

Graph Analytics wkshp

Page 33: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 34

Our thresholds for some models

• s = effective strength• s < 1 : below threshold

Models Effective Strength (s) Threshold (tipping point)

SIS, SIR, SIRS, SEIRs = λ .

s = 1

SIV, SEIV s = λ .

(H.I.V.) s = λ .

12

221

vvv

2121 VVISI

Graph Analytics wkshp

Page 34: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 35

Our result: Intuition for λ“Official” definition:

• Let A be the adjacency matrix. Then λ is the root with the largest magnitude of the characteristic polynomial of A [det(A – xI)].

• Doesn’t give much intuition!

“Un-official” Intuition • λ ~ # paths in the

graph

uu≈ .k

kA

(i, j) = # of paths i j of length k

kAGraph Analytics wkshp

Page 35: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 36N nodes

Largest Eigenvalue (λ)

λ ≈ 2 λ = N λ = N-1

N = 1000λ ≈ 2 λ= 31.67 λ= 999

better connectivity higher λ

Graph Analytics wkshp

Page 36: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 37

Examples: Simulations – SIR (mumps)

(a) Infection profile (b) “Take-off” plot

PORTLAND graph31 million links, 6 million nodes

Frac

tion

of In

fecti

ons

Foot

prin

tEffective StrengthTime ticks

Graph Analytics wkshp

Page 37: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 38

Examples: Simulations – SIRS (pertusis)

Frac

tion

of In

fecti

ons

Foot

prin

tEffective StrengthTime ticks

(a) Infection profile (b) “Take-off” plot

PORTLAND graph31 million links, 6 million nodesGraph Analytics wkshp

Page 38: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 39

Part 1: Theory• Q1: What is the epidemic threshold?– Background– Result and Intuition (Static Graphs)– Proof Ideas (Static Graphs)– Bonus: Dynamic Graphs

• Q2: How do viruses compete?

Graph Analytics wkshp

Page 39: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 40

λ * < 1VPMC

Graph-based

Model-based

Proof Sketch

General VPM structure

Topology and stability

Graph Analytics wkshp

Page 40: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 41

Models and more modelsModel Used for

SIR Mumps

SIS Flu

SIRS Pertussis

SEIR Chicken-pox

……..

SICR Tuberculosis

MSIR Measles

SIV Sensor Stability

H.I.V.……….

2121 VVISI

Graph Analytics wkshp

Page 41: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 42

Ingredient 1: Our generalized model

Endogenous Transitions

Susceptible Infected

Vigilant

Exogenous Transitions

Endogenous Transitions

Endogenous Transitions

Susceptible Infected

Vigilant

Graph Analytics wkshp

Page 42: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 43

Special case

Susceptible Infected

Vigilant

Graph Analytics wkshp

Page 43: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 44

Special case: H.I.V.

2121 VVISI

Multiple Infectious, Vigilant states

“Terminal”

“Non-terminal”

Graph Analytics wkshp

Page 44: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 45

Ingredient 2: NLDS+Stability

• View as a NLDS– discrete time – non-linear dynamical system (NLDS)

Probability vector Specifies the state of the system at time t

Details

size mN x 1

.

.

.

.

.

size N (number of nodes in the graph)

.

.

.

S

I

V

Graph Analytics wkshp

Page 45: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 46

Ingredient 2: NLDS + Stability

• View as a NLDS– discrete time – non-linear dynamical system (NLDS)

Non-linear functionExplicitly gives the evolution of system

Details

size mN x 1

.

.

.

.

.

.

.

.Graph Analytics wkshp

Page 46: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 47

Ingredient 2: NLDS + Stability

• View as a NLDS– discrete time – non-linear dynamical system (NLDS)

• Threshold Stability of NLDS

Graph Analytics wkshp

Page 47: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 48

= probability that node i is not attacked by any of its infectious neighbors

Special case: SIR

size 3N x 1 I

R

S

NLDS

I

R

S

Details

Graph Analytics wkshp

Page 48: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 49

Fixed Point

11.

00.

00.

State when no node is infected

Q: Is it stable?

Details

Graph Analytics wkshp

Page 49: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 50

Stability for SIR

Stableunder threshold

Unstableabove threshold

Graph Analytics wkshp

Page 50: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 51

λ * < 1VPMC

Graph-based

Model-basedGeneral VPM structure

Topology and stability

See paper for full proof

Graph Analytics wkshp

Page 51: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 52

Part 1: Theory• Q1: What is the epidemic threshold?– Background– Result and Intuition (Static Graphs)– Proof Ideas (Static Graphs)– Bonus: Dynamic Graphs

• Q2: How do viruses compete?

Graph Analytics wkshp

Page 52: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 53

Dynamic Graphs: Epidemic?

adjacency matrix

8

8

Alternating behaviorsDAY (e.g., work)

Graph Analytics wkshp

Page 53: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 54

adjacency matrix

8

8

Dynamic Graphs: Epidemic?Alternating behaviorsNIGHT

(e.g., home)

Graph Analytics wkshp

Page 54: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 55

• SIS model– recovery rate δ– infection rate β

• Set of T arbitrary graphs

Model Description

day

N

N night

N

N , weekend…..

Infected

Healthy

XN1

N3

N2

Prob. βProb. β Prob. δ

Graph Analytics wkshp

Page 55: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 56

• Informally, NO epidemic if

eig (S) = < 1

Our result: Dynamic Graphs Threshold

Single number! Largest eigenvalue of The system matrix S

In Prakash+, ECML-PKDD 2010

S =

Details

Graph Analytics wkshp

Page 56: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 57

Synthetic MIT Reality Mining

log(fraction infected)

Time

BELOW

AT

ABOVE ABOVE

AT

BELOW

Infection-profile

Graph Analytics wkshp

Page 57: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 58

“Take-off” plotsFootprint (# infected @ “steady state”)

Our threshold

Our threshold

(log scale)

NO EPIDEMIC

EPIDEMIC

EPIDEMIC

NO EPIDEMIC

Synthetic MIT Reality

Graph Analytics wkshp

Page 58: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 59

Part 1: Theory• Q1: What is the epidemic threshold?• Q2: What happens when viruses compete?– Mutually-exclusive viruses– Interacting viruses

Graph Analytics wkshp

Page 59: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 60

Competing ContagionsiPhone v Android

Blu-ray v HD-DVD

Biological common flu/avian flu, pneumococcal inf etc

Attack Retreatv

Graph Analytics wkshp

Page 60: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 61

A simple model• Modified flu-like • Mutual Immunity (“pick one of the two”)• Susceptible-Infected1-Infected2-Susceptible

Virus 1 Virus 2

Details

Graph Analytics wkshp

Page 61: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 62

Question: What happens in the end?

green: virus 1red: virus 2

Footprint @ Steady State Footprint @ Steady State = ?

Number of Infections

ASSUME: Virus 1 is stronger than Virus 2Graph Analytics wkshp

Page 62: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 63

Question: What happens in the end?

green: virus 1red: virus 2

Number of Infections

ASSUME: Virus 1 is stronger than Virus 2

Strength Strength

??= Strength Strength

2

Footprint @ Steady State Footprint @ Steady State

Graph Analytics wkshp

Page 63: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 64

Answer: Winner-Takes-Allgreen: virus 1red: virus 2

ASSUME: Virus 1 is stronger than Virus 2

Number of Infections

Graph Analytics wkshp

Page 64: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 65

Our Result: Winner-Takes-All

In Prakash+ WWW 2012

Given our model, and any graph, the weaker virus always dies-out completely

1. The stronger survives only if it is above threshold 2. Virus 1 is stronger than Virus 2, if: strength(Virus 1) > strength(Virus 2)3. Strength(Virus) = λ β / δ same as before!

Details

Graph Analytics wkshp

Page 65: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 66

Real Examples

Reddit v Digg Blu-Ray v HD-DVD

[Google Search Trends data]

Graph Analytics wkshp

Page 66: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 67

Part 1: Theory• Q1: What is the epidemic threshold?• Q2: What happens when viruses compete?– Mutually-exclusive viruses– Interacting viruses

Graph Analytics wkshp

Page 67: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 68

A simple model: SI1|2S• Modified flu-like (SIS) • Susceptible-Infected1 or 2-Susceptible• Interaction Factor ε– Full Mutual Immunity: ε = 0– Partial Mutual Immunity (competition): ε < 0– Cooperation: ε > 0

Virus 1 Virus 2

&

Graph Analytics wkshp

Page 68: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 69

Question: What happens in the end?

ASSUME: Virus 1 is stronger than Virus 2

ε = 0Winner takes all

ε = 1Co-exist independently

ε = 2Viruses cooperate

What about for 0 < ε <1?Is there a point at which both viruses can co-exist?

Graph Analytics wkshp

Page 69: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 70

Answer: Yes! There is a phase transition

ASSUME: Virus 1 is stronger than Virus 2Graph Analytics wkshp

Page 70: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 71

Answer: Yes! There is a phase transition

ASSUME: Virus 1 is stronger than Virus 2Graph Analytics wkshp

Page 71: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 72

Answer: Yes! There is a phase transition

ASSUME: Virus 1 is stronger than Virus 2Graph Analytics wkshp

Page 72: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 73

1. The stronger survives only if it is above threshold 2. Virus 1 is stronger than Virus 2, if: strength(Virus 1) > strength(Virus 2)3. Strength(Virus) σ = N β / δ

Our Result: Viruses can Co-exist

Given our model and a fully connected graph, there exists an εcritical such that for ε ≥ εcritical,

there is a fixed point where both viruses survive.

Details

In Beutel+ KDD 2012Graph Analytics wkshp

Page 73: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 74

Real Examples

Hulu v Blockbuster

[Google Search Trends data]

Graph Analytics wkshp

Page 74: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 75

Real Examples

Chrome v Firefox

[Google Search Trends data]

Graph Analytics wkshp

Page 75: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 76

Outline• Motivation• Part 1: Understanding Epidemics (Theory)• Part 2: Policy and Action (Algorithms)• Part 3: Learning Models (Empirical Studies)• Conclusion

Graph Analytics wkshp

Page 76: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 77

Part 2: Algorithms• Q3: Whom to immunize?• Q4: How to detect outbreaks?• Q5: Who are the culprits?

Graph Analytics wkshp

Page 77: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 78

?

?

Given: a graph A, virus prop. model and budget k; Find: k ‘best’ nodes for immunization (removal).

k = 2

??

Full Static Immunization

Graph Analytics wkshp

Page 78: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 79

Part 2: Algorithms• Q3: Whom to immunize?– Full Immunization (Static Graphs)– Full Immunization (Dynamic Graphs)– Fractional Immunization

• Q4: How to detect outbreaks?• Q5: Who are the culprits?

Graph Analytics wkshp

Page 79: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 80

Challenges• Given a graph A, budget k, Q1 (Metric) How to measure the ‘shield-

value’ for a set of nodes (S)? Q2 (Algorithm) How to find a set of k nodes

with highest ‘shield-value’?

Graph Analytics wkshp

Page 80: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 81

Proposed vulnerability measure λ

Increasing λ Increasing vulnerability

λ is the epidemic threshold

“Safe” “Vulnerable” “Deadly”

Graph Analytics wkshp

Page 81: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 82

1

9

10

3

4

5

7

8

6

2

9

1

11

10

3

4

56

7

8

2

9

Original Graph Without {2, 6}

Eigen-Drop(S) Δ λ = λ - λs

Δ

A1: “Eigen-Drop”: an ideal shield value

Graph Analytics wkshp

Page 82: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 83

(Q2) - Direct Algorithm too expensive!

• Immunize k nodes which maximize Δ λ

S = argmax Δ λ• Combinatorial!• Complexity:– Example: • 1,000 nodes, with 10,000 edges • It takes 0.01 seconds to compute λ• It takes 2,615 years to find 5-best nodes!

Graph Analytics wkshp

Page 83: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 84

A2: Our Solution• Part 1: Shield Value–Carefully approximate Eigen-drop (Δ λ)–Matrix perturbation theory

• Part 2: Algorithm–Greedily pick best node at each step–Near-optimal due to submodularity

• NetShield (linear complexity)–O(nk2+m) n = # nodes; m = # edges

In Tong, Prakash+ ICDM 2010Graph Analytics wkshp

Page 84: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 85

Our Solution: Part 1• Approximate Eigen-drop (Δ λ)

• Δ λ ≈ SV(S) =

– Result using Matrix perturbation theory–u(i) == ‘eigenscore’ ~~ pagerank(i) A u = λ . u

u(i)

Details

Graph Analytics wkshp

Page 85: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 86

P1: node importance P2: set diversity

Original Graph Select by P1 Select by P1+P2

Details

Graph Analytics wkshp

Page 86: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 87

Our Solution: Part 2: NetShield

• We prove that: SV(S) is sub-modular (& monotone non-decreasing)

• NetShield: Greedily add best node at each step

Corollary: Greedy algorithm works 1. NetShield is near-optimal (w.r.t. max SV(S)) 2. NetShield is O(nk2+m)

Footnote: near-optimal means SV(S NetShield) >= (1-1/e) SV(S Opt)Graph Analytics wkshp

Page 87: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 88

Experiment: Immunization qualityLog(fraction of infected nodes)

NetShield

Degree

PageRank

Eigs (=HITS)Acquaintance

Betweeness (shortest path)

Lower is

better TimeGraph Analytics wkshp

Page 88: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 89

Part 2: Algorithms• Q3: Whom to immunize?– Full Immunization (Static Graphs)– Full Immunization (Dynamic Graphs)– Fractional Immunization

• Q4: How to detect outbreaks?• Q5: Who are the culprits?

Graph Analytics wkshp

Page 89: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 90

Full Dynamic Immunization • Given: Set of T arbitrary graphs

• Find: k ‘best’ nodes to immunize (remove)

day

N

N night

N

N , weekend…..

In Prakash+ ECML-PKDD 2010Graph Analytics wkshp

Page 90: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 91

Full Dynamic Immunization • Our solution– Recall theorem– Simple: reduce (= )

• Goal: max eigendrop Δ

• No competing policy for comparison• We propose and evaluate many policies

Matrix Product

Δ =

day night

after before λλ λλ

λ

Graph Analytics wkshp

Page 91: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 92

Greedy-DmaxA

Greedy-Dav

gA

Greedy-Aav

gA

Greedy-S

Optimal2022242628303234

Footprint after k=6 immunizations

Footprint

Performance of Policies

MIT Reality Mining

Lower is better

Graph Analytics wkshp

Page 92: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 93

Part 2: Algorithms• Q3: Whom to immunize?– Full Immunization (Static Graphs)– Full Immunization (Dynamic Graphs)– Fractional Immunization

• Q4: How to detect outbreaks?• Q5: Who are the culprits?

Graph Analytics wkshp

Page 93: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 94

Fractional Immunization of NetworksB. Aditya Prakash, Lada Adamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos

Under Submission

Graph Analytics wkshp

Page 94: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 95

?

?

Given: a graph A, virus prop. model and budget k; Find: k ‘best’ nodes for immunization (removal).

k = 2

Full Static Immunization

Graph Analytics wkshp

Page 95: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 96

Fractional Asymmetric Immunization

• Fractional Effect [ f(x) = ]• Asymmetric Effect

# antidotes = 3

x5.0

Graph Analytics wkshp

Page 96: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 97

Fractional Asymmetric Immunization

• Fractional Effect [ f(x) = ]• Asymmetric Effect

# antidotes = 3

x5.0

Graph Analytics wkshp

Page 97: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 98

Fractional Asymmetric Immunization

• Fractional Effect [ f(x) = ]• Asymmetric Effect

# antidotes = 3

x5.0

Graph Analytics wkshp

Page 98: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

Fractional Asymmetric Immunization

Hospital Another Hospital

99

Drug-resistant Bacteria (like XDR-TB)

Graph Analytics wkshp

Page 99: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

Fractional Asymmetric Immunization

Hospital Another Hospital

Drug-resistant Bacteria (like XDR-TB)

100

= f

Graph Analytics wkshp

Page 100: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

Fractional Asymmetric Immunization

Hospital Another Hospital

101

Problem: Given k units of disinfectant, how to distribute them to maximize

hospitals saved?

Graph Analytics wkshp

Page 101: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 102

Our Algorithm “SMART-ALLOC”

CURRENT PRACTICE SMART-ALLOC

[US-MEDICARE NETWORK 2005]• Each circle is a hospital, ~3000 hospitals• More than 30,000 patients transferred

~6x fewer!

Graph Analytics wkshp

Page 102: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

Running Time

103

Simulations SMART-ALLOC

> 1 week

14 secs

> 30,000x speed-up!

Wall-Clock Time

Lower is better

Graph Analytics wkshp

Page 103: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 104

Experiments

K = 200 K = 2000

PENN-NETWORK SECOND-LIFE

~5 x ~2.5 x

Lower is better

Graph Analytics wkshp

Page 104: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 105

Part 2: Algorithms• Q3: Whom to immunize?• Q4: How to detect outbreaks?• Q5: Who are the culprits?

Graph Analytics wkshp

Page 105: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 106

Outbreak detection• Problems of finding sources of

contamination in water networks and finding “hot” stories on blogs are isomorphic.– Minimize time to detection,

population affected– Maximize probability of detection.– Minimize sensor placement cost.

Blogs

Posts

LinksInformation

cascadeGraph Analytics wkshp

Page 106: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 107

• J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, N. Glance. "Cost-effective Outbreak Detection in Networks” KDD 2007

Graph Analytics wkshp

Page 107: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 108

CELF: Main idea• Given a graph G(V,E)• and a budget of B sensors • and data on how contaminations spread over the

network: – for each contamination i we know the time T(i, u) when it

contaminated node u• Minimize time to detect outbreak• CELF algorithm uses submodularity and lazy evaluation

Graph Analytics wkshp

Page 108: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 109

Blogs: Comparison to heuristics

Benefit(higher=better)

Graph Analytics wkshp

Page 109: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 110

• k PA score Blog NP IL OLO OLA• 1 0.1283 http://instapundit.com 4593 4636 1890 5255• 2 0.1822 http://donsurber.blogspot.com 1534 1206 679 3495• 3 0.2224 http://sciencepolitics.blogspot.com 924 576 888 2701• 4 0.2592 http://www.watcherofweasels.com 261 941 1733 3630• 5 0.2923 http://michellemalkin.com 1839 12642 1179 6323• 6 0.3152 http://blogometer.nationaljournal.com 189 2313 3669 9272• 7 0.3353 http://themodulator.org 475 717 1844 4944• 8 0.3508 http://www.bloggersblog.com 895 247 1244 10201• 9 0.3654 http://www.boingboing.net 5776 6337 1024 6183• 10 0.3778 http://atrios.blogspot.com 4682 3205 795 3102

“Best 10 blogs to read”NP - number of posts, IL- in-links, OLO- blog out links, OLA- all out links

Graph Analytics wkshp

Page 110: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 111

Part 2: Algorithms• Q3: Whom to immunize?• Q4: How to detect outbreaks?• Q5: Who are the culprits?

Graph Analytics wkshp

Page 111: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 112

• B. Aditya Prakash, Jilles Vreeken, Christos Faloutsos ‘Detecting Culprits in Epidemics: Who and How many?’

Under Submission

Graph Analytics wkshp

Page 112: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

Problem definition

113

2-d grid‘+’ -> infectedWho started it?

Graph Analytics wkshp

Page 113: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

Problem definition

114

2-d grid‘+’ -> infectedWho started it?

Graph Analytics wkshp

Page 114: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 115

Culprits: Exoneration

Graph Analytics wkshp

Page 115: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 116

Who are the culprits• Two-part solution– use MDL for number of seeds– for a given number:• exoneration = centrality + penalty• our method uses smallest eigenvector of Laplacian

submatrix

• Running time =– linear! (in edges and nodes)

Graph Analytics wkshp

Page 116: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 117

Culprits: Results

Graph Analytics wkshp

Page 117: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 118

Outline• Motivation• Part 1: Understanding Epidemics (Theory)• Part 2: Policy and Action (Algorithms)• Part 3: Learning Models (Empirical Studies)• Conclusion

Graph Analytics wkshp

Page 118: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 119

Part 3: Empirical Studies• Q6: How do cascades look like?• Q7: How does activity evolve over time?• Q8: How does external influence act?

Graph Analytics wkshp

Page 119: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 120

Cascading Behavior in Large Blog

Graphs

How does information propagate over the blogosphere?

Blogs Posts

LinksInformation

cascade

J. Leskovec, M.McGlohon, C. Faloutsos, N. Glance, M. Hurst. Cascading Behavior in Large Blog Graphs. SDM 2007.

Graph Analytics wkshp

Page 120: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 3 - 121

Cascades on the Blogosphere

Cascade is graph induced by a time ordered propagation of information (edges)

Cascades

B1 B2

B4B3

a

b c

de

B1 B2

B4B3

11

2

1 3

1

d

e

b c

e

a

Blogosphereblogs + posts

Blog networklinks among blogs

Post networklinks among posts

Graph Analytics wkshp

Page 121: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 3 - 122

Blog data 45,000 blogs participating in cascades All their posts for 3 months (Aug-Sept ‘05) 2.4 million posts ~5 million links (245,404 inside the dataset)

Time [1 day]

Num

ber o

f pos

tsNumber of posts

Graph Analytics wkshp

Page 122: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 123

Popularity over time

Post popularity drops-off – exponentially?

lag: days after post

# in links

1 2 3

@t

@t + lag

Graph Analytics wkshp

Page 123: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 124

Popularity over time

Post popularity drops-off – exponentially?POWER LAW!Exponent?

# in links(log)

days after post(log)

Graph Analytics wkshp

Page 124: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 125

Popularity over time

Post popularity drops-off – exponentially?POWER LAW!Exponent? -1.6 • close to -1.5: Barabasi’s stack model• and like the zero-crossings of a random walk

# in links(log)

-1.6

days after post(log)

Graph Analytics wkshp

Page 125: Understanding and Managing Cascades on Large Graphs

CMU SCS

B. A. Prakash; C. Faloutsos

-1.5 slopeJ. G. Oliveira & A.-L. Barabási Human Dynamics: The

Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005) . [PDF]

Graph Analytics wkshp 126

Page 126: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 3 - 127

Topological patterns: Cascades

Procedure for gathering cascades: Find all initiators (nodes with out-degree 0) Follow in-links Produces directed acyclic graph Count cascade shapes (use our multi-level graph

isomorphism testing algorithm)a

b c

de

a

b c

de

d

e

b c

e

a

Graph Analytics wkshp

Page 127: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 128

Topological ObservationsHow do we measure how information flows through

the network?Common cascade shapes extracted using algorithms in [Leskovec, Singh, Kleinberg; PAKDD 2006].

Graph Analytics wkshp

Page 128: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 129

Topological Observations

Cascade size distributions also follow power law.

What graph properties do cascades exhibit?

Observation 2: The probability of observing a cascade on n nodes follows a Zipf distribution:

p(n) ∝ n-2

Cascade size (# of nodes)

Coun

t

a=-2

Graph Analytics wkshp

Page 129: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 130

Topological ObservationsWhat graph properties do cascades exhibit?

Stars and chains also follow a power law, with different exponents (star -3.1, chain -8.5).

Size of chain (# nodes)

Coun

t

Size of star (# nodes)

Coun

t

a=-3.1 a=-8.5

Graph Analytics wkshp

Page 130: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 131

Blogs and structure• Cascades take on different shapes (sorted by

frequency):

How can we use cascades to identify communities?

Graph Analytics wkshp

Page 131: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 132

PCA on cascade types• Perform PCA on sparse

matrix.• Use log(count+1)• Project onto 2 PC…

.01…

.07.67…

1.12.1…

5.1…

4.2…

.073.41.13.2boingboing

.092.14.6slashdot

…………

~9,000 cascade types

~44,

000

blog

s

Graph Analytics wkshp

Page 132: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 13328

PCA on cascade types• Observation: Content of blogs and cascade behavior are

often related.

• Distinct clusters for “conservative” and “humorous” blogs (hand-labeling).

M. McGlohon, J. Leskovec, C. Faloutsos, M. Hurst, N. Glance. Finding Patterns in Blog Shapes and Blog Evolution. ICWSM 2007.

Graph Analytics wkshp

Page 133: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 13429

PCA on cascade types• Observation: Content of blogs and cascade behavior are

often related.

• Distinct clusters for “conservative” and “humorous” blogs (hand-labeling).

M. McGlohon, J. Leskovec, C. Faloutsos, M. Hurst, N. Glance. Finding Patterns in Blog Shapes and Blog Evolution. ICWSM 2007.

Graph Analytics wkshp

Page 134: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 135

Part 3: Empirical Studies• Q6: How do cascades look like?• Q7: How does activity evolve over time?• Q8: How does external influence act?

Graph Analytics wkshp

Page 135: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

• Meme (# of mentions in blogs)– short phrases Sourced from U.S. politics in 2008

136

“you can put lipstick on a pig”

“yes we can”

Rise and fall patterns in social media

Graph Analytics wkshp

Page 136: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

Rise and fall patterns in social media

137

• Can we find a unifying model, which includes these patterns?• four classes on YouTube [Crane et al. ’08]• six classes on Meme [Yang et al. ’11]

Graph Analytics wkshp

Page 137: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

Rise and fall patterns in social media

138

• Answer: YES!

• We can represent all patterns by single model

In Matusbara+ SIGKDD 2012 Graph Analytics wkshp

Page 138: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

Main idea - SpikeM- 1. Un-informed bloggers (uninformed about rumor)- 2. External shock at time nb (e.g, breaking news)- 3. Infection (word-of-mouth)

139

Infectiveness of a blog-post at age n:

- Strength of infection (quality of news)

- Decay function (how infective a blog posting is)

Time n=0 Time n=nb Time n=nb+1

β

Power LawGraph Analytics wkshp

Page 139: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

SpikeM - with periodicity• Full equation of SpikeM

140

Periodicity

12pmPeak activity 3am

Low activity

Time n

Bloggers change their activity over time

(e.g., daily, weekly, yearly)

activity

Graph Analytics wkshp

Page 140: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

Details• Analysis – exponential rise and power-raw fall

141

Liner-log

Log-log

Rise-part

SI -> exponential SpikeM -> exponential

Graph Analytics wkshp

Page 141: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

Details• Analysis – exponential rise and power-raw fall

142

Liner-log

Log-log

Fall-part

SI -> exponential SpikeM -> power law

Graph Analytics wkshp

Page 142: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

Tail-part forecasts

143

• SpikeM can capture tail part

Graph Analytics wkshp

Page 143: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

“What-if” forecasting

144

e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date

? ?

(1) First spike (2) Release date (3) Two weeks before release

Graph Analytics wkshp

Page 144: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

“What-if” forecasting

145

–SpikeM can forecast not only tail-part, but also rise-part!

• SpikeM can forecast upcoming spikes

(1) First spike (2) Release date (3) Two weeks before release

Graph Analytics wkshp

Page 145: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 146

Part 3: Empirical Studies• Q6: How do cascades look like?• Q7: How does activity evolve over time?• Q8: How does external influence act?

Graph Analytics wkshp

Page 146: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 147

Tweets Diffusion: Problem Definition

• Given: – Action log of people tweeting a #hashtag– A network of users

• Find:– How external influence varies with #hashtags?

? ??

?

?? ??

Graph Analytics wkshp

Page 147: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 148

Tweet Diffusion: Data• Yahoo! Twitter firehose• More than 750 million tweets (> 10 Tera-bytes)• Test-bed of > 6000 machines– Hadoop+PIG system ver 0.20.204.0

• Took top 500 hashtags (by volume) in Feb 2011• Network of users:– connecting user X to user Y if X directed at least 3 @-

messages to Y (or RT-ed a tweet)

Graph Analytics wkshp

Page 148: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 149

Tweet Diffusion• Propagation = Influence + External• Developed a model– takes the previous observations into account– with parameters representing external influence

• Learn from previous data– EM-style alternating minimizing algorithm

• Group tags according to learnt params

Graph Analytics wkshp

Page 149: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 150

Results: External Influence vs Time

time

“External Effects”

#nowwatching, #nowplaying, #epictweets

#purpleglasses, #brits, #famouslies

#oscar, #25jan

#openfollow, #ihatequotes, #tweetmyjobs

Can also use for Forecasting, Anomaly Detection!

Bursty, external events

“Word-of-mouth” Not trending

Long-running tags

“Word-of-mouth”

Graph Analytics wkshp

Page 150: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 151

Outline• Motivation• Part 1: Understanding Epidemics (Theory)• Part 2: Policy and Action (Algorithms)• Part 3: Learning Models (Empirical Studies)• Conclusion

Graph Analytics wkshp

Page 151: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 152

Conclusions• Epidemic Threshold– It’s the Eigenvalue

• Fast Immunization– Max. drop in eigenvalue, linear-time near-optimal algorithm

• Bursts: SpikeM model– Exponential growth, Power-law decay

Graph Analytics wkshp

Page 152: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos

ML & Stats.

Comp. Systems

Theory & Algo.

Biology

Econ.

Social Science

Physics

153

Propagation on Networks

Graph Analytics wkshp

Page 153: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 154

Publications1. Winner-takes-all: Competing Viruses or Ideas on fair-play networks (B. Aditya Prakash, Alex Beutel, Roni Rosenfeld,

Christos Faloutsos) – In WWW 2012, Lyon2. Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks (B. Aditya Prakash, Deepayan

Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos) - In IEEE ICDM 2011, Vancouver (Invited to KAIS Journal Best Papers of ICDM.)

3. Times Series Clustering: Complex is Simpler! (Lei Li, B. Aditya Prakash) - In ICML 2011, Bellevue4. Epidemic Spreading on Mobile Ad Hoc Networks: Determining the Tipping Point (Nicholas Valler, B. Aditya Prakash,

Hanghang Tong, Michalis Faloutsos and Christos Faloutsos) – In IEEE NETWORKING 2011, Valencia, Spain5. Formalizing the BGP stability problem: patterns and a chaotic model (B. Aditya Prakash, Michalis Faloutsos and

Christos Faloutsos) – In IEEE INFOCOM NetSciCom Workshop, 2011.6. On the Vulnerability of Large Graphs (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad and Christos Faloutsos) –

In IEEE ICDM 2010, Sydney, Australia7. Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms (B. Aditya Prakash, Hanghang

Tong, Nicholas Valler, Michalis Faloutsos and Christos Faloutsos) – In ECML-PKDD 2010, Barcelona, Spain8. MetricForensics: A Multi-Level Approach for Mining Volatile Graphs (Keith Henderson, Tina Eliassi-Rad, Christos

Faloutsos, Leman Akoglu, Lei Li, Koji Maruhashi, B. Aditya Prakash and Hanghang Tong) - In SIGKDD 2010, Washington D.C.

9. Parsimonious Linear Fingerprinting for Time Series (Lei Li, B. Aditya Prakash and Christos Faloutsos) - In VLDB 2010, Singapore

10. EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs (B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju and Christos Faloutsos) – In PAKDD 2010, Hyderabad, India

11. BGP-lens: Patterns and Anomalies in Internet-Routing Updates (B. Aditya Prakash, Nicholas Valler, David Andersen, Michalis Faloutsos and Christos Faloutsos) – In ACM SIGKDD 2009, Paris, France.

12. Surprising Patterns and Scalable Community Detection in Large Graphs (B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju and Christos Faloutsos) – In IEEE ICDM Large Data Workshop 2009, Miami

13. FRAPP: A Framework for high-Accuracy Privacy-Preserving Mining (Shipra Agarwal, Jayant R. Haritsa and B. Aditya Prakash) – In Intl. Journal on Data Mining and Knowledge Discovery (DKMD), Springer, vol. 18, no. 1, February 2009, Ed: Johannes Gehrke.

14. Complex Group-By Queries For XML (C. Gokhale, N. Gupta, P. Kumar, L. V. S. Lakshmanan, R. Ng and B. Aditya Prakash) – In IEEE ICDE 2007, Istanbul, Turkey.

****

**

**

Graph Analytics wkshp

Page 154: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 155

Acknowledgements

Collaborators Christos Faloutsos Roni Rosenfeld, Michalis Faloutsos, Lada Adamic, Theodore Iwashyna (M.D.), Dave Andersen, Tina Eliassi-Rad, Iulian Neamtiu,

Varun Gupta, Jilles Vreeken,

Deepayan Chakrabarti, Hanghang Tong, Kunal Punera, Ashwin Sridharan, Sridhar Machiraju, Mukund Seshadri, Alice Zheng, Lei Li, Polo Chau, Nicholas Valler, Alex Beutel, Xuetao Wei

Graph Analytics wkshp

Page 155: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 156

Acknowledgements

Funding

Graph Analytics wkshp

Page 156: Understanding and Managing Cascades on Large Graphs

B. A. Prakash; C. Faloutsos 157

Analysis Policy/Action Data

Dynamical Processes on Large Networks

B. Aditya Prakash Christos Faloutsos

Graph Analytics wkshp