© 2012 IBM Corporation
IBM Research
Gelling, and Melting, Large Graphs by Edge Manipulation
Joint Work by
Hanghang Tong(IBM)
B. Aditya Prakash(Virginia Tech.)
Tina Eliassi-Rad (Rutgers)
Michalis Faloutsos (UCR)
Christos Faloutsos (CMU)
Presenter: Hanghang Tong
An Example: Flu/Virus/Rumor/Idea Propagation
HealthySick
Contact
2
An Example: Flu/Virus Propagation
HealthySick
Contact
1: Sneeze to neighbors
2: Some neighbors Sick
3: Try to recover
3
An Example: Flu/Virus Propagation
HealthySick
Contact
1: Sneeze to neighbors
2: Some neighbors Sick
3: Try to recover
Q: How to guild propagation by opt. link structure?
4
An Example: Flu/Virus Propagation
HealthySick
Contact
1: Sneeze to neighbors
2: Some neighbors Sick
3: Try to recover
Q: How to guild propagation by opt. link structure? - Q1: Understand tipping point existing work - Q2: Minimize the propagation - Q3: Maximize the propagation
5
This paper
IBM Research
© 2012 IBM CorporationSocial Analytics & Collaboration Technologies Group6
Roadmap
Motivation: An Illustrative Example
Q1: Understanding the Tipping Point (Background)
Q2: Minimize Propagation
Q3: Maximize Propagation
Conclusion
Eigenvalue is the Key! [ICDM2011]
• (Informal Description) For,– any arbitrary topology (adjacency matrix A)– any virus propagation model (VPM) in standard
literature (~25 in total)
• the epidemic threshold depends only on – the λ (leading eigenvalue of A), – some model constant Cvpm (by prop. model itself)
Theorem [Faloutsos2 + ICDM 2011]: No epidemic Ifλ x (Cvpm) ≤ 1.
7
Epidemic Threshold for Alternating Behavior[PKDD 2010, Networking 2011]
Theorem [PKDD 2010, Networking 2011]: No epidemic Ifλ(S) ≤ 1.
System matrix S = Πi Si
Si = (1-δ)I + β Ai
dayday
N
N nightnight
N
NAi……
Log (Infection Ratio)
Time Ticks
At Threshold
Below
Above
8
Why is λ So Important?
• λ Capacity of a Graph:
Larger λ better connected9
IBM Research
© 2012 IBM CorporationSocial Analytics & Collaboration Technologies Group10
Roadmap
Motivation: An Illustrative Example
Q1: Understanding the Tipping Point (Background)
Q2: Minimize Propagation
Q3: Maximize Propagation
Conclusion
Minimizing Propagation: Edge Deletion•Given: a graph A, virus prop model and budget k; •Find: delete k ‘best’ edges from A to minimize λ
Bad
11
Good
Q: How to find k best edges to delete efficiently?
Left eigen-score of source
Right eigen-score of target
12
Minimizing Propagation: Evaluations
Time Ticks
Log (Infected Ratio)
(better)
Our Method
Aa Data set: Oregon Autonomous System Graph (14K node, 61K edges)
Discussions: Node Deletion vs. Edge Deletion•Observations:
• Node or Edge Deletion λ Decrease• Nodes on A = Edges on its line graph L(A)
•Questions?• Edge Deletion on A = Node Deletion on L(A)? • Which strategy is better (when both feasible)?
Original Graph A Line Graph L(A)
Discussions: Node Deletion vs. Edge Deletion•Q: Is Edge Deletion on A = Node Deletion on L(A)?•A: Yes!
•But, Node Deletion itself is not easy:
15
Theorem: Hardness of Node Deletion.Find Optimal k-node Immunization is NP-Hard
Theorem: Line Graph Spectrum. Eigenvalue of A Eigenvalue of L(A)
Discussions: Node Deletion vs. Edge Deletion•Q: Which strategy is better (when both feasible)?•A: Edge Deletion > Node Deletion
16
(better)
Green: Node Deletion (e.g., shutdown a twitter account)Red: Edge Deletion (e.g., un-friend two users)
IBM Research
© 2012 IBM CorporationSocial Analytics & Collaboration Technologies Group17
Roadmap
Motivation: An Illustrative Example
Q1: Understanding the Tipping Point (Background)
Q2: Minimize Propagation
Q3: Maximize Propagation
Conclusion
Maximizing Propagation: Edge Addition•Given: a graph A, virus prop model and budget k; •Find: add k ‘best’ new edges into A.
• By 1st order perturbation, we have λs - λ ≈Gv(S)= c ∑eєS u(ie)v(je)
• So, we are done need O(n2-m) complexity
Left eigen-score of source
Right eigen-score of target
Low GvHigh Gv 18
λs - λ ≈Gv(S)= c ∑eєS u(ie)v(je)
• Q: How to Find k new edges w/ highest Gv(S) ?• A: Modified Fagin’s algorithm
k
k
#3:Searchspace k+d
k+d
Searchspace
:existing edgeTime Complexity: O(m+nt+kt2), t = max(k,d)
#1: Sorting Sources by u
#2: Sorting Targets by v
Maximizing Propagation: Edge Addition
Maximizing Propagation: Evaluation
Time Ticks
Log (Infected Ratio)
(better)
20
Our Method
IBM Research
© 2012 IBM CorporationSocial Analytics & Collaboration Technologies Group
Conclusion
Goal: Guild Influence Prop. by Opt. Link Structure
Our Observation: Opt. Influence Prop = Opt. λ
Our Solutions:– NetMel to Minimize Propagation
– NetGel to Maximize Propagation
t = 1 t = 2 t = 3
Top Related