Optimization of graph storage using GoFFish
-
Upload
anushree-prasanna-kumar -
Category
Data & Analytics
-
view
52 -
download
2
Transcript of Optimization of graph storage using GoFFish
GoFFish: A Sub-Graph Centric Framework for
Large-Scale Graph Analytics
Guide: Prof. Dinkar Sitaram Department of Computer Science and Engineering PES Institute of Technology
Ms. Prafullata Department of Computer Science and Engineering PES Institute of Technology
Guide: Prof. Yogesh Simmhan Department of SERC Indian Institute of Science
Team Members: Anushree P K 1PI11IS017 Bhavani B 1PI11IS027 Mithilesh K G 1PI11IS059
What is GoFFISH?
GoFFish is a scalable software framework for storing graphs, and composing and executing graph
analytics in a Cloud and commodity cluster environment
It consists of :-
1. GoFS – It is a distributed store for partitioning, storing and accessing graph datasets across hosts in a cluster.
2. Gopher - Gopher is a programming framework that offers sub-graph centric abstractions on a a Cloud or cluster in conjunction with GoFS.
GoFFish is implemented in Java.
Existing GoFFISH Storage Architecture
Worker 2 Worker 3
Worker 1 + Head Node
Partition 2
Partition 1
Partition 3
t0 – t10
Slice 1 Slice 1
Same Storage format in worker 1
• Large graph partitioned into subgraphs and distributed across workers.
• GoFFISH default storage 10 bins in a slice. 10 instances of every subgraph.
Using the GoFFish framework: To store the real time graphs in – temporal and spatial formats To compute the efficiencies of both storage formats based on the input algorithm (gopher job)
Intuition :-
The time taken for a gopher job over large graphs should be optimized when the graphs are stored
In the given two formats depending on the algorithm run.
For example for vertex count algorithm temporal format should take lessar computation time
Problem Definition
GoFFISH Storage Architecture – Our Model
Worker 2 Worker 3
Worker 1 + Head Node
Partition 2
Partition 1
Partition 3
t0 – tn
Slice 1
Same Storage format in worker 1
• All instances for a subgraph in one slice.
• One bin per slice.
Slice 2
Slice 3
Slice 4
We used slicing pointers instancegroupingsize and numsubgraphbins to manipulate the way the graphs were partitioned in order to obtain the desired storage format. These slicing pointers correspond the temporal and subgraph bin packing schemes in GoFFISH.
Algorithms • Vertex Count Each subgraph processor calculate number of vertices within a subgraph. They send messages to all other subgraphs with their count where each subgraph calculate the total using the messages received.
• Connected Components Each subgraph finds the smallest vertex id for each subgraph and propagate that smallest value to its connected subgraphs. If incoming value to a subgraph is different from its current value it updates the current value and propagate the changes to its neighbours.
Flow Diagram
Dataset :- Road network graph Vehicle route tracking using traffic cams –Time-series graph of sync camera snapshots –Sensors are vertices –Edges are road connectivity w/ distance weight Graph instance is image metadata every N sec –License plate, vehicle color, direction, speed Urban
Dataset
0
2000
4000
6000
8000
10000
12000
0 1 2 3 4
TIm
e T
ake
n
Partition Id
Partition-Wise Total App Time
S=10,t=10
S=1,T=ALL
4200
4250
4300
4350
4400
4450
0 1 2 3 4
TIm
e T
ake
n
Partition Id
Partition-Wise Total App Time
S=10,t=10
S=1,T=ALL
Vertex Count Connected Components
Performance Analysis
Partition 1
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0 1 2 3 4
Tim
e T
ake
n
Superstep
Superstep- wise Compute Task Time
VC(s=10,t=10)
VC(s=1,t=ALL)
0
50
100
150
200
250
300
350
400
0 0.5 1 1.5 2 2.5 3 3.5
Tim
e T
ake
n
Superstep
Superstep- wise Compute Task Time
CC(s=10,t=10)
CC(s=1,t=ALL)
Vertex Count Connected Components
Performance Analysis
DEMO + Loggers
Screenshots
Screenshots
GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics - Indian
Institute of Science, Bangalore 560012 India, University of Southern California, Los
Angeles CA 90089 USA, November 26, 2013 Scalable Analytics over Distributed Time-series Graphs using GoFFish - Indian
Institute of Science, Bangalore 560012 India, University of Southern California, Los
Angeles CA 90089 USA, June 23, 2014 Chronos: A Graph Engine for Temporal Graph Analysis - Tsinghua University,
University of Science and Technology of China, Microsoft Research
References
The Team
Anushree Prasanna Kumar 8th Sem ISE
Bhavani B 8th Sem ISE
Mithilesh Kumar 8th Sem ISE
Thank you