Speakers: Surinderjeet Singh Ambuj Pushkar Ojha Ilyeech Kishore Rapelli Bhoor S Raj Meena
03/22/11 I2.2: Analysis of significant substructures in time-varying networks Ambuj Singh (in...
-
Upload
jerome-skinner -
Category
Documents
-
view
217 -
download
0
Transcript of 03/22/11 I2.2: Analysis of significant substructures in time-varying networks Ambuj Singh (in...
03/22/11
I2.2: Analysis of significant substructures in time-varying networks
Ambuj Singh
(in collaboration with P. Bogdanov, M. Mongiovi, X. Yang)
NS-CTA INARC Mid-Year ReviewMarch 2011 1
03/22/11
Dynamic networks
• Dynamic networks are commonplaceo online interaction networks
Twitter, Wikipedia, LinkedIn, Facebook, .. o mobile networks
Cyber-physical scenario (EDIN, INARC) virus propagation (E2.1)
• Generative models to explain the network structureo preferential attachment [Barabasi '99]o forest-fire [Lescovec '09]
• Markov Chain models (discrete, continuous)o when, where, what changes [Avin '08, Clemente '08]
• Latent space / context models [Zheng '05]• Network flow/traffic [Daganzo '94, Bickel '01, Stoev '09] • Disease propagation, blog cascade, SIS [Lescovec '07] • Stochastic actor-based models [Snijders '09]
2
03/22/11
Our focus
• Dynamic edge attributes• Simplest case
o edge is +1 or -1o +1 means flow of interest
congestion, flow above historical thresholdo real values are a general case and can also
be considered• Query: find highest scoring substructures in
graph over time o combines graph structure and time
3
03/22/11
Outline
• Motivation• Problem definition• Solving for a fixed time interval• Heuristic for multiple time intervals• Path Forward
6
03/22/11
Problem definition
t1 t2 t3 t4
1 -1 1 -1
1 -1 -1 -1
-1 -1 -1 -1
1 1 1 -1
1 -1 1 -1
11
1-1
1
-1-1
1 -1-1
-1
-11
1-1
-1
-1
1
• A time evolving graph G = (V, E, Ft(e))
o V: set of nodeso E: set of edgeso Ft(e): mapping of edges to
{-1,1}
• Score of an edge e in interval [t1,t2] = ∑ Ft(e)
• Score of a subgraph in interval = ∑ score(e), for all e in the subgraph
-1-1
7
03/22/11
• Given a graph G=(V, E) with positive node weights p(v) and negative edge weights c(e), find a subtree T’= (V’,E’) such that
o Goemans-Williamson Minimization (GW-PCST):
o Net Worth Maximization (NW-PCST):
• Both are NP-hard (equivalent objective functions) [Johnson’00]
o GW-PCST has an approximation factor = 2-1/(n-1).
o The rooted version of NW-PCST is NP-hard to approximate within any constant factor [Feig 01]
Prize-collecting Steiner Tree (PCST)
GW(T’) = ∑ c(e) + ∑ p(v)
NW(T’) = ∑ p(v) - ∑ c(e)
e in E’
e in E’v in V’
v not in V’
8
03/22/11
Why the same guarantee doesn’t hold for NW?
In this specific example:
GW-PCST• APX = 3*(k-1)• OPT = 2*k• ratio ≈ 2/3
NW-PCST• OPT = k• APX = 3• ratio = k/3
2220
2
3
3
3
3
Optimal solution: the whole graph
k
OPTAPX
9
03/22/11
Merge-and-refine approximation
• Merge nodes into clusters in a bottom-up fashion• shortest-path metric graph using edge costs
• Merge triangle and star structures considering both node values and interconnect cost
• Multiple refinement iterations
• Approximation qualityo OPT <= APX + c*N(OPT), where N is the cost of
interconnectiono Good approximation for instances in which there are
cheaply connected clusters of high-prize nodes
• Challengeso Relatively high computational cost due to all pairs
shortest path computation 10
03/22/11
An example
• Aggregate edge values within the interval• Transform the edge-weighted graph into NW-PCST• Apply the Merge-and-refine approximation
11
03/22/11
Running time of merge-and-refine
• APSP comprises 90% of the approximation running time• Takes more than a second for N=360 for one interval
12
03/22/11
Baseline solution across time
• Find the best subgraph in time by exhaustive enumerationo Consider all O(t2) intervalso Apply the solution for a fixed interval in eacho Take the best obtained subgraph in all intervals
• Polynomial cost, but impractical for real-world problemso The highway system of Southern California has ~ 4k
edges with live-traffic measurementso The Autonomous Systems (AS)-level Internet backbone
has hundreds of thousand of links o The baseline solution would not be practical for networks
of this scale
• Need for scalable solutions of acceptable quality13
03/22/11
Best-first approach using bounds
• Idea: reduce the number of calls to Merge-and-refineo Estimate solutions for different intervalso Evaluate the most "promising" intervals firsto Prune intervals that do not contain the best solution
• Bound the solution in an intervalo Computationally simple to computeo Effective in terms of pruning power
• Best first procedureo Order intervals by their upper boundo Prune infeasible intervals using lower bound
14
03/22/11
Upper bound (UB)
• Offline: o Consider a hyper-graph in which original edges become
nodes and original nodes become hyper-edges o Split the original edges into k partitions via hyper-graph
partitioningo Maintain edges at partition "boundaries“
• Online UB estimation for a fixed interval:o UB of a partition is the aggregate of its positive edgeso Edges between partitions:
0 cost if there is at least one positive boundary edge cheapest boundary edge otherwise
o Solve the NW-PCST on the obtained coarse-level graph
15
03/22/11
Upper bound effectiveness
• The upper bound is more effective if:o Partitions are well connected (small diameter)o Edges within partitions are correlatedo Boundary edges are minimal and have expected value
closer to -1 than within-partition edges
• The upper bound is a coarse aggregation of the original grapho Coarseness is controlled by # partitionso Trade-off between efficiency and effectiveness
17
03/22/11
Upper bound quality
• Random Markovian graph (N=150,M=180,T=300).• Number of partitions: 2-64. • Random 64 is a random partitioning of edges into 64.
18
03/22/11
Lower bound
• Local iterative search in the solution space within an intervalo Simulated Annealing (SA) procedure that grows/shrinks a
subgraph within an intervalo Possible moves: add/remove an edge from an existing
solutiono Allow sub-optimal moves according to an annealing
schedule• Better quality than simple greedy algorithm
o Due to sub-optimal moves, high-score clusters can be joined even if there are more than 2-hops away
• Better running time than Merge-and-refineo No computation of all pairs shortest paths
19
03/22/11
Summary
• Dynamic graphs with changing edge attributes
• Simplest query: find the highest scoring substructure
• Heuristics under development• Approximation guarantee
• Empirical validation ono traffic networko twitter messages
20
03/22/11
Path forward
• Maximal scoring subgraph is a building block for richer queries and analyseso What is the structure of a congestion? Global (short and
large), longitudinal (prolonged and localized) or a combination of both?
o What characterizes the evolution of a network?o How do different network regions compare? o Is evolution similar across networks of different genres?
• Index structures o Use statistical models for indexing real-world networks
Exploit locality within the network and locality in time Represent the network at different level of coarseness
o Queries constrained by Time Neighborhood
o Similarity queries 21
03/22/11
Connections
• Queries/analysis of information flow (E 2.1) • Queries on mobile networks (E 2.2, E2.3)
• Formal modeling of time (E1.1)
• Dynamic network models (E2.1)
22
03/22/11
• Query/analysis of mobility networks
• Cyber-physical scenario
• Query/analysis of evolving networks
• Patterns of behavior in composite networks
• Find terrorist groups using temporal interactions
Army relevance
23
03/22/11
• P. Bogdanov, B. Baumer, P. Basu, A. Singh, and A. Bar-Noy, “Discovering Influential Groups of Agents Using Composite Network Analysis,” submitted to NetSci 2011.
• P. Bogdanov, Nicholas D. Larusso and Ambuj K. Singh, “Towards Community Discovery in Signed Collaborative Interaction Networks,” published in SIASP at 2010 IEEE International Conference on Data Mining, 2010.
• K. Macropol and A. Singh, “Content-based Modeling and Prediction of Information Dissemination,” submitted to ASONAM 2011.
• M. Mongiovi, A. Singh, X. Yan, B. Zong, K. Psounis, “An Indexing System for Mobility-aware Information Management,” submitted to VLDB.
• Ziyu Guan, Jian Wu, Zheng Yun, Ambuj K. Singh and Xifeng Yan, Assessing and Ranking Structural Correlations in Graphs, to appear at SIGMOD 2011.
• Nicholas D Larusso and Ambuj K. Singh, Synopses for Probabilistic Data over Large Domains, in EDBT 2011.
Publications
24
03/22/11
• Markovian - the graph state is a Markov Chaino Fixed set of nodeso Edges at time t depend on edges at time t-1
• Cover Time of Dynamic Graphs [Avin et Al. '08]o Introduction of Markovian Dynamic Graphso Exponential cover timeo Lazy random walks
• Information spread in Markovian graphs [Clementi '09]o Edge-Markoviano Geometric Markovian - node mobility
• Evolving range-dependent graphs [Grindrod '09]o Edge dynamics as a birth/death process
Markovian dynamic models
26
03/22/11
Dynamic models of traffic
• The cell transmission model (CTM) [Daganzo '94]o Dynamic model of highway traffico Inspired by hydrodynamic theory
• Traffic Flow on a Freeway Network [Bickel '01]o Time and context Markovian model of the traffic flowo The state of a segment at time t depends on the state of
its neighbors and and itself at time t-1o Model of a single highway. How about junctions?
• Computer Network Traffic [Stoev '09]o Statistical model of traffic flow across all linkso Applied to traffic prediction
27
03/22/11
Background literature
[Avin '08] Chen Avin and Zvi Lotker. "How to Explore a Fast-Changing World." 2008
[Bickel '01] Peter Bickel, Chao Chen, Jaimyoung Kwon, and John Rice. "Traffic Flow on a Freeway Network" Electrical Engineering, 2001.
[Clementi '09] Andrea Clementi, Angelo Monti, Francesco Pasquale, and Riccardo Silvestri. "Information Spreading in Stationary Markovian Evolving Graphs". Informatica, 2009
[Feig’01] J. Feigenbaum, C. Padimitriou, and S. Shenker, “Sharing the Cost of Multicast Transmissions,” JCSS, 63, 21-41, 2001.
[Grinford '09] Peter Grindrod and Desmond J. Higham. "Evolving Graphs: Dynamical Models, Inverse Problems and Propagation." 2009
[Johnson’00] D. Johnson, M. Minkoff, S. Phillips, “The Prize Collecting Steiner Tree Problem: Theory and Practice,” ACM SODA, 2000.
[Lescovec '07] Jure Leskovec, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst "Cascading behavior in large blog graphs Patterns and a Model", SDM, 2007
28
03/22/11
Background literature
[Ribeiro '11] B. Ribeiro, D. Figueiredo, E. de Souza e Silva, and D. Towsley, "Characterizing Dynamic Graphs with Continuous-time Random Walks" SIGMETRICS 2011.
[Snijders '09] Tom A.B. Snijders, Gerhard G. van de Bunt, Christian E.G. Steglich, "Introduction to Stochastic Actor-Based Models for Network Dynamics", Social Networks, 2009
[Stoev '09] Stilian A. Stoev, George Michailidis, and Joel Vaughan. "Global Modeling and Prediction of Computer Network", Arxiv 2009
[Zheng '05] A. X. Zheng and A. Goldenberg "A Generative Model for Dynamic Contextual Friendship Networks", Learning, 2005
29