USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept....
Transcript of USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept....
![Page 1: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/1.jpg)
USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR LARGE-SCALE TEMPORAL GRAPH PROCESSING
Matthias Steinbauer, Gabriele Anderst-Kotsis
Institute of Telecooperation
![Page 2: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/2.jpg)
TALK OUTLINE
⬛ Introduction and Motivation ⬛ Preliminaries on temporal graphs ⬛ DynamoGraph a platform for large-scale temporal graph
processing ⬛ How users visit the web ⬛ Real-world Social Networks ⬛ The Global Social Learning Network ⬛ Conclusions
![Page 3: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/3.jpg)
INTRODUCTION / MOTIVATION
⬛ Graphs serve as models for real world structures in many different disciplines ⬛ Social sciences > Social networks ⬛ Biology > Protein-Protein Interactions ⬛ Cartography > Digital road maps ⬛ Web > The web graph
⬛ Graph and network models are very well studied in mathematics and related disciplines, and computer science
⬛ So why is there need for new research in this area?
![Page 4: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/4.jpg)
REAL WORLD GRAPHS OFTEN GROW TO LARGE SCALES
It is not feasible to process large graphs with traditional
tools and algorithms
require new tactics for
visualisation
Exceed memory size of single
computer
![Page 5: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/5.jpg)
THE DIMENSION OF TIME CANNOT BE NEGLECTED
Static reachability measures in social networks
do not hold for dynamic networks
Biological processes are time dependentStatic views on
graphs show often show blurred or too dense
data
![Page 6: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/6.jpg)
TEMPORAL GRAPHS
F. Harary and G. Gupta. Dynamic Graph Models. Mathl. Comput. Modelling, 25(7):79–87, 1997.
Graph G is a pair (V, E)where V denotes the set of vertices and E denotes the set of edges between any v, e ∈ V
A temporal graph T can be given as a set of graphs T = {G1, G2, G3, …, Gt} where each Gx = (Vx, Ex) Gx is called a static snapshot at time x And Gtm..tn as a selection of multiple Gx from T is a static snapshot for a timespan
![Page 7: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/7.jpg)
VERTEX CENTRIC EMBODIMENT
{id: 39827736,name: 'Rob Henderson',description: '',inEdges: [ {
weight: 7.3,edgeType: 'PHONE',source: 39761932,target: 39827736, } ],
outEdges: [ {weight: 10.0,edgeType: 'EMAIL',source: 39827736,target: 39761932, } ],
}
![Page 8: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/8.jpg)
VERTEX CENTRIC EMBODIMENT WITH TEMPORAL DOCUMENT
{id: 39827736,resolution: 'MONTHS','1420070400': {
name: 'Rob Henderson',description: '',inEdges: [ {
weight: 3.3,edgeType: 'PHONE',source: 39761932,target: 39827736, } ],
outEdges: [ {weight: 4.0,edgeType: 'EMAIL',source: 39827736,target: 39761932, } ],
},'1422748800': {
inEdges: [ {weight: 4.0,edgeType: 'PHONE',source: 39761932,
![Page 9: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/9.jpg)
HORIZONTAL SCALABILITY
host1 host2 host35 2
![Page 10: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/10.jpg)
PREGEL / COMPUTE AGGREGATE BROADCAST
G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-Scale Graph Processing. In ACM SIGMOD International Conference on Management of Data, 2010.
Master
Worker 1
Worker 2
Worker 3
Worker 4
Initia
lisation
Local C
om
puta
tion
Me
ssage R
outing
Step 1
Syncro
nis
ation
Local C
om
puta
tion
Step 2M
essage R
outing
Syncro
nis
ation
Local C
om
puta
tion
Step 3
Me
ssage R
outing
Ha
ltin
g
![Page 11: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/11.jpg)
ZooKeeper
PrivateCommunication
Network
Host 0 (Master) Host 1 (Worker)Client
PartitionManager
SuperStepManager
CodeManager
Client API
MasterProcess
PublicCommunication
Network
ClientApp WorkerProcess / Slots
CassandraNode
Partition
StepExecutor
MessageQueue
WorkerProcess / Slots
CassandraNode
Partition
StepExecutor
MessageQueue
Host n (Worker)
DYNAMOGRAPH ARCHITECTURE
![Page 12: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/12.jpg)
HOW USERS VISIT THE WEB
⬛ Center for Complex Networks and Systems Research at Indiana University Bloomington collected the Click Dataset
⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ⬛ Available as a ~2.8 TB compressed (~12 TB uncompressed)
⬛ Imported to DynamoGraph in monthly resolution ⬛ retained only human generated traffic ⬛ created vertices based on domain names (3.4 million after cleansing)
http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset/
![Page 13: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/13.jpg)
HOW USERS VISIT THE WEB
⬛ Click temporal graph was successfully loaded into DynamoGraph ⬛ 10 compute nodes with 8 threads (graph partitions) each ⬛ On top of an OpenStack cloud
⬛ Currently experiments with distributed PageRank are conducted ⬛ PageRank is computed in sliding windows of 6 to 3 months ⬛ It is expected that popularity trends in web-sites, ad-networks, and
social networks are visible in the data
⬛ First look into the data clearly shows the drop of popularity for the online social network MySpace.com
![Page 14: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/14.jpg)
REAL WORLD SOCIAL NETWORKS
⬛ Users interacting with each other (online or in real-life) form a social network
⬛ Much research from diverse disciplines (sociology, mathematics, computer science, reality mining, …) was already conducted in this area
⬛ Social networks can be modelled as temporal graphs
⬛ Many social interactions are now happening online > easy to track and record
⬛ In collaboration with Ecker (iiWAS 2015) data from Internet Relay Chat (IRC) was logged, annotated and imported to DynamoGraph
![Page 15: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/15.jpg)
full time-span
![Page 16: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/16.jpg)
selected time-span
1st June 2008 15th June 2008
![Page 17: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/17.jpg)
REAL WORLD SOCIAL NETWORKS
⬛ Data is available as online resource ⬛ Users can filter and load data into DynamoGraph ⬛ Algorithms for automatic layout of the visualisation are available
(ForceAtlas2)
⬛ In visualisation it is already clear that reachability measures computed on the full time-span will often not hold on shorter time-spans ⬛ How is information dissemination influenced by that fact? ⬛ Can we perform cluster analysis on users and identify topics of
interest?
![Page 18: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/18.jpg)
THE GLOBAL SOCIAL LEARNING NETWORK
![Page 19: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/19.jpg)
THE GLOBAL SOCIAL LEARNING NETWORK
⬛ Experience API (xAPI) used to store learning logs independent of a certain Learning Management System (LMS)
⬛ Scrapers available that convert data from other systems
⬛ An xAPI proxy service is available that can ingest xAPI statements into DynamoGraph
⬛ Setting up experiments with real students
Scrapers
LMS
xAPI Proxy
xAPI Repository
DynamoGraph
![Page 20: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet](https://reader035.fdocuments.us/reader035/viewer/2022071420/6119f9d26b529c2fa5139d29/html5/thumbnails/20.jpg)
CONCLUSIONS AND FUTURE WORK
⬛ DynamoGraph as a platform for large-scale temporal graph processing has matured enough to be evaluated in scientific scenarios
⬛ Three scenarios were discussed that motivate the need for temporal graph analytics and layout the paths for our future work
⬛More temporal graph algorithms are to be implemented (reachability, clustering, …) to provide more interesting metrics to our users