KONECT Cloud – Large Scale Network Mining in the Cloud

19
1 KONECT Cloud Large Scale Network Mining in the Cloud Jérôme Kunegis Future SOC Lab Day, 18.04.2012

description

In the Winter 2011/2012 run at the Future SOC Lab, we used the KONECTframework (Koblenz Network Collection) to compute tendifferent network statistics on a large collection of downsampledversions of a large network dataset, with the goal of determiningwhether sampling of a large network can be used to reduce thecomputational effort needed to compute a network statistic. Preliminaryresults show that this is indeed the case.

Transcript of KONECT Cloud – Large Scale Network Mining in the Cloud

Page 1: KONECT Cloud – Large Scale Network Mining in the Cloud

1

KONECT Cloud

Large Scale Network Mining in the Cloud

Jérôme Kunegis Future SOC Lab Day, 18.04.2012

Page 2: KONECT Cloud – Large Scale Network Mining in the Cloud

Networks are Everywhere

Communication

Authorship

Friendship

c

Interaction

Trust

Co-occurrence

Page 3: KONECT Cloud – Large Scale Network Mining in the Cloud

Social Networks

friend

Page 4: KONECT Cloud – Large Scale Network Mining in the Cloud

Trust Networks

trust

Page 5: KONECT Cloud – Large Scale Network Mining in the Cloud

Friend/Enemy Network

enemy

frien

d

Page 6: KONECT Cloud – Large Scale Network Mining in the Cloud

Interaction Networklisten

Page 7: KONECT Cloud – Large Scale Network Mining in the Cloud

KONECT – Koblenz Network Collection

148 network datasets

26 are undirected 38 are directed 84 are bipartite 59 have unweighted edges 77 allow multiple edges 04 have signed edges 08 have ratings as edges 78 have edge arrival times

konect.uni-koblenz.de

Page 8: KONECT Cloud – Large Scale Network Mining in the Cloud

Largest Network

Directed “who follows who” network

0 041 652 230 users

1 468 365 182 edges

konect.uni-koblenz.de/networks/twitter

Page 9: KONECT Cloud – Large Scale Network Mining in the Cloud

148 Network Datasets

authorshipcommunicationco-occurrence

featuresfolksonomyinteraction

physicalratings

referencesemantic

socialtrust

Page 10: KONECT Cloud – Large Scale Network Mining in the Cloud

What We Computed

Connected componentsNetwork diameterClustering coefficientsDegree distributionsSpectral distributionEigenvector centralityGraph drawingTemporal AnalysisLink prediction

←at Future SOC Lab

Page 11: KONECT Cloud – Large Scale Network Mining in the Cloud

Network Diameter

6

Page 12: KONECT Cloud – Large Scale Network Mining in the Cloud

90 Percentile Effective Diameter

5

Page 13: KONECT Cloud – Large Scale Network Mining in the Cloud

90 Percentile Effective Diameter

3

Page 14: KONECT Cloud – Large Scale Network Mining in the Cloud

90 Percentile Effective Diameter

3.75

Page 15: KONECT Cloud – Large Scale Network Mining in the Cloud

Computing the Effective Diameter

for each node i { |V| count hops needed to reach 90% |E|

}

Total runtime: |E| × |V|

Page 16: KONECT Cloud – Large Scale Network Mining in the Cloud

Graph Sampling

KeepX% of edges

Page 17: KONECT Cloud – Large Scale Network Mining in the Cloud

Computation

× 1 000 vertices (sampled)× 120 840 391 edges× 20 sample sizes (5%, 10%, …, 100%)× 50 random samplings

Evaluation on single machine:

1 TiB memory 64 cores Matlab 64 bit

Page 18: KONECT Cloud – Large Scale Network Mining in the Cloud

Results

Page 19: KONECT Cloud – Large Scale Network Mining in the Cloud

Dr. Jérôme Kunegis

[email protected]

west.uni-koblenz.de

Thank You!

konect.uni-koblenz.de