Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

19
Team No 24 Integrating Network Discovery and Community Detection Nikhil Daliya - 201301142 Athresh G - 201505565

Transcript of Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Page 1: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Team No 24Integrating Network

Discovery and Community Detection

Nikhil Daliya - 201301142Athresh G - 201505565

Page 2: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Overview Integrating network discovery and community detection routines for nodes in thegiven network and identifying the characteristics of the nodes (constant or rapidlychanging) in the network.

Page 3: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Dataset Railway datasetRailway network, proposed by [Ghosh et al. 2011] consists of nodes representing railway stations in India, where two stations si and sj are connected by an edge if there exists at least one train-route such that both si and sj are scheduled halts on that route. Here the communities are states/provinces of India since the number of trains within each state is much higher than the trains in-between two states.

Page 4: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Dataset FootballFootball network, proposed by [Girvan and Newman 2002a] contains the network of American football games between Division IA colleges during the regular season of Fall 2000. The vertices in the graph represent teams (identified by their college names) and edges represent regular-season games between the two teams they connect.

Page 5: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

DatasetFootballThe teams are divided into conferences (indicating communities) containing around 8-12 teams each. Games are more frequent between members of the same conference than between members of different conferences. Teams that are geographically close to one another but belong to different conferences are more likely to play one another than teams separated by large geographic distances.

Page 6: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Application●Exploring the adversarial networks(such as terrorist networks).

●Clustering in social networks.

●Politeness policies on crawling website makes it difficult to mine the whole network on social networking sites. There are space and bandwidth limits which put constraints on the size of network that can be mined.

Page 7: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Challenges● Dynamic discovery of the network imposes problems in clustering of nodes . ● Identifying the characteristic of nodes(constant , changing or rapidly changing) is difficult problem.●The dataset grows rapidly with network discovery and keepingtrack of probability distribution of each node for different communities ischallenging task.

Page 8: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Tools Used● Third party package ( https://sites.google.com/site/santofortunato/inthepress2 ) for generating synthetic graphs as input.● Language to be used: Python and Java. Packages such as panda, numpy, scikit learn, networkX and igraph will be used accordingly.● matplotlib package for plotting the results for better visualization and understanding.

Page 9: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Implementation●We have used 2 modules mainly ChooseNode which chooses node in each iteration to be merged to the network and UpdateCommunity which will update the community or clusters from the choosen node.

●Spectral clustering is applied on the initial set of target nodes.

Page 10: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

ImplementationDuring ChooseNode we use 2 measures to choose the node for updation.Ncut measure : minimize the similarity across a cut, while simultaneously maximizing the similarity within the same community.

Modularity : additional fraction of the edges that fall within the given communities over the expected fraction

Page 11: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

ImplementationI/P : ●Initial set of clustering , Initial network, cost and budget.

O/P : ●Final network and nodes with clusters formed from nodes we have discovered.

●List of rapidly changing nodes in the network.

Page 12: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Results and Analysis- We have used Average Clustering Purity (ACP) and Average Clustering Entropy (ACE) to measure effectiveness of our algorithm.

- Both these measures incorporates the fraction of nodes of particular cluster belonging to same class as their measure.

Page 13: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Results and AnalysisRailway Dataset :

Total no. of target nodes : 80

Average cluster purity : 0.79

Average Cluster entropy : 0.17

Rapidly changing nodes : 6,47,84,91

Page 14: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Results and AnalysisRailway Dataset :

Page 15: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Results and AnalysisRailway Dataset :

Page 16: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Results and Analysis Football Dataset :

Total no. of target nodes : 48

Average cluster purity : 0.91

Average Cluster entropy : 0.11

Changing nodes : 51 , 63 , 49

Page 17: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Results and Analysis Football Dataset :

Page 18: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

References

Research paper : On integrating Network and

Community Discovery

http://hanj.cs.illinois.edu/pdf/wsdm15_jliu.pdf

Page 19: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24

Thank You !!!!