Post on 18-Jan-2016
Presentation: Genetic clustering of social networks using random walks
ELSEVIER
Computational Statistics & Data AnalysisFebruary 2007
Genetic clustering of social networks using random walks
Aykut Firat, Sangit Chatterjee, Mustafa Yilmaz
College of Business Administration, Northeastern University, Boston, MA 02115, USA
Presented by Oleg Kolgushev
Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21
Presentation: Genetic clustering of social networks using random walks
• Introduction to Clustering in networks
• Random walk based distance measure
• Genetic representation
• Experiments– Synthetic data creation– Network clustering experiments– Spatial data experiments
• Conclusion
Contents
Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21
Presentation: Genetic clustering of social networks using random walks
• Popularity of social networks
• Mathematical model is a dream. Use heuristic techniques.
• Clustering is NP-hard problem.
• Genetic algorithm with medoid based representation.
• Random walk measure is superior to Euclidian distance.
Introduction
Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21
Presentation: Genetic clustering of social networks using random walks
• Network is represented by weighted graph (V,E,w) where w is a measure of similarity between vertices.
• Objective is to find decomposition into k-clusters (non-overlapping sub-graphs highly connected vertices)
• Random walker will likely to stay inside of a cluster until most of vertices are visited.
• Calculating “escape probabilities”.
• GA fitness function classifies a node based on sum of edges in a cluster versus sum of edges leading to different sets.
Background
Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21
Presentation: Genetic clustering of social networks using random walks
• Average First time passage m(i,j)
• Average Commute Time (ACT)
• In matrix and vector multiplication it represented as
• Where
• ui = [0100…0], L=D-A, A is similarity matrix (wij), e - is a column vector made of [1111…1] , and
Random walk based distance
Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21
Presentation: Genetic clustering of social networks using random walks
• This measure is appealing for social networks as clustered nodes connect by lots of short paths, clusters are not similar sizes and not spherically shaped.
Random walk based distance
Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21
Presentation: Genetic clustering of social networks using random walks
• GA is a computer simulation of evolution processes (inheritance, mutation, selection, and crossover).
• Representation is a key value– Array of size N (nodes in graph)elements restricted by k (clusters)– k-bins with elements restrictedby N (nodes)– k-medoids are clusters representedby one node and other nodes are assigned to the nearest cluster
• Possible gene is [3,7] with assignment [{1,2,3,4},{5,6,7,8}]
• Small genome, tight clustering.
Genetic Representation
Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21
Presentation: Genetic clustering of social networks using random walks
• Exception bin contains nodes that do not obey representation by the medoid.
• Possible gene [3,7] suggests allocation [{1,2,3,4},{5,6,7,8}] with exception [3,7{5,6},{2}]
• Crossover defined by randomly interchanging genes• Mutation is mode of exception creation based on proximity • Fitness function used: inverse of the sum of the distances to the medoids; inverse of
the sum of all pair-wise distances within a group; min sum of all pair-wise distances between nodes .
•
Medoid-based representation with exception bins
Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21
Presentation: Genetic clustering of social networks using random walks
• How accurate are the clustering results compare to Euclidian distance clustering?
• How efficient this approach and what is algorithm complexity?• Synthetic data creation:
Experiments
Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21
Presentation: Genetic clustering of social networks using random walks
• Example of 50 nodes network with 6 clusters shown.
Network clustering experiments
Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21
Presentation: Genetic clustering of social networks using random walks
Network clustering experiments
Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21
Presentation: Genetic clustering of social networks using random walks
• Results of transformation and clustering of 150 iris specimens, 50 from each of three species (Fisher’s Iris data)
Spacial data experiments
Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21
Presentation: Genetic clustering of social networks using random walks
• O(n3) limit applicability of random walk distances for large network
• Excellent result when number of clusters is known. What k is right?
• Superior results compare to Euclidian distances regardless of clustering algorithm used.
• Exceptionally good clustering results for representing spacial data as a network when optimum number of nearest neighbors used.
Conclusion
Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21