Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
description
Transcript of Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny path to the Large-Scale Graph ProcessingZinoviev Alexey
About
• I am a <graph theory, machine learning, traffic jams prediction, BigData
algorythms> scientist
• But I'm a <Java, JavaScript, Android, NoSQL, Hadoop, Spark>
programmer
3/65
BigData & Graph Theory
Big Data of old times• Astronomy
• Weather
• Trading
• Sea routes
• Battles
And now ...• Web graph
• Facebook friend network
• Gmail email graph
• EU road network
• Citation graph
• PayPal transaction graph
Graph Number of vertexes
Number of edges
Volume Data/per day
Web-graph 1,5 * 10^12 1,2 * 10^13 100 PB 300 TB
Facebook (friends graph)
1,1 * 10^9 160 * 10^9 1 PB 15 TB
Road graph of EU
18 * 10^6 42 * 10^6 20 GB 50 MB
Road graph of this city
250 000 460 000 500 MB 100 KB
Problems• Popularity rank (page rank)
• Determining popular users, news, jobs, etc.
• Shortest paths
• Max flow
• How are users, groups connected?
• Clustering, semi-clustering
• Max clique, triangle closure, label propagation algorithms
• Finding related people, groups, interests
Node Centrality Problem• Verticies with high impact
• Removal of important vertices reduces the reliability
Cases:
• Bioinformatics
• Social connections
• Road network
• Spam detection
• Recommendation system
Small World Problem
Facebook 4.74 712 M 69 G
Twitter 3.67 ---- 5G follows
MSN Messenger (1 month)
6.6 180 M 1.3 G arcs
15/65
Large graph processing tools
Think like a vertex…• Majority of graph algorithms are iterative and traverse the graph in
some way
• Classic map-reduce overheads (job startup/shutdown, reloading data
from HDFS, shuffling)
• High complexity of graph problem reduction to key-value model
• Iteration algorythms, but multiple chained jobs in M/R with full saving
and reading of each state
Why not use MapReduce/Hadoop?• Example: PageRank, Google‘s
famous algorithm for measuring the
authority of a webpage based on the
underlying network of hyperlinks
• defined recursively: each vertex
distributes its authority to its neighbors
in equal proportions
Google Pregel• Distributed system especially developed for large scale graph
processing
• Bulk Synchronous Parallel (BSP) as execution model
• Supersteps are atomic units of parallel computation
• Any superstep can be restarted from a checkpoint (need not be user
defined)
• A new superstep provides an opportunity for rebalancing of
components among available resources
Superstep in BSP
Vertex-centric BSP• Each vertex has an id, a value, a list of its adjacent vertex ids and the
corresponding edge values
• Each vertex is invoked in each superstep, can recompute its value and
send messages to other vertices, which are delivered over superstep
barriers
• Advanced features : termination votes, combiners, aggregators,
topology mutations
C++ API, Pregel
23/65
Apache Giraph
Why Apache GiraphPregel is proprietary, but:
• Apache Giraph is an open source implementation of Pregel
• Runs on standard Hadoop infrastructure
• Computation is executed in memory
• Can be a job in a pipeline(MapReduce, Hive)
• Uses Apache ZooKeeperfor synchronization
Why Apache Giraph
• No locks: message-based communication
• No semaphores: global synchronization
• Iteration isolation: massively parallelizable
ZooKeeper in Apache GiraphZooKeeper: responsible for
computation state
• Partition/worker mapping
• Global state: superstep
• Checkpoint paths, aggregator
values, statistics
Master in Apache GiraphMaster: responsible for coordination
• Assigns partitions to workers
• Coordinates synchronization
• Requests checkpoints
• Aggregates aggregator values
• Collects health statuses
Worker in Apache GiraphWorker: responsible for vertices
• Invokes active vertices
compute() function
• Sends, receives and assigns
messages
• Computes local aggregation
values
Scaling Giraph to a trillion edges
Fault toleranceNo single point of failure from Giraph threads
• With multiple master threads, if the current master dies, a new one will automatically take over.
• If a worker thread dies, the application is rolled back to a previously checkpointed superstep.
• If a zookeeper server dies, as long as a quorum remains, the application can proceed
Hadoop single points of failure still exist (Namenode, jobtracker)
Worker Scalability, 250m nodes
Vertex scalability, 300 workers
Vertex/workers scalability
MapReduce vs Giraph6 machines with 2x8core Opteron CPUs, 4x1TB disks and 32GB RAM each, ran 1 Giraph worker per core
Wikipedia page link graph (6 million vertices, 200 million edges)
PageRank on Hadoop/Mahout
• 10 iterations approx. 29 minutes• average time per iteration: approx. 3 minutes
PageRank on Giraph
• 30 iterations took approx. 15 minutes• average time per iteration: approx. 30 seconds
10x performance improvement
Okapi• Apache Mahout for graphs
• Graph-based recommenders: ALS,
SGD, SVD++, etc.
• Graph analytics: Graph
partitioning, Community Detection,
K-Core, etc.
Giraph’s killer
Spark• MapReduce in memory
• Up to 50x faster than Hadoop
• Support for Shark (like Hive), MLlib
(Machine learning), GraphX (graph
processing)
• RDD is a basic building block
(immutable distributed collections of
objects)
Spark in Hadoop old family
GraphXSupported algorythms
● PageRank
● Connected components
● Label propagation
● SVD++
● Strongly connected components
● Triangle count
GraphChi• Asynchronous Disk-based version of GraphLab
• Utilizing parallel sliding window
• Very small number of non-sequential accessesto the disk
• Graph does not fit in memory
• Input graph is split into P disjoint intervals to balance edges,
each associated with a shard
• For Home deals ...
GraphChi
GraphChi
46/65
Road Networks
Definition• Edge weights > 0
• A few classes of roads
• Lat/Lon attributes for each vertex
• Subgraphs for cross-roads
• Not so big as web graph
• Static
Shortest path problem
AI
Full
Dijkstra
Bi-Directional
We need in fast system!• Response < 10 ms (with high accuracy)
• Shortest path (SP) with O(n)
• Preprocessing phase
• Don’t keep all SP - O(n^2)
• Use geo attributes
• Using compression and recoding for
disk storage
• Network is stable
EU Road network
Dijkstra ALT RE HH CH TN HL
2 008 300 24 656 2444 462.0 94.0 1.8 0.3
• ALT: [Goldberg & Harrelson 05], [Delling & Wagner 07]• RE: [Gutman 05], [Goldberg et al. 07]• HH: [Sanders & Schultes 06]• CH: [Geisberger et al. 08]• TN: [Geisberger et al. 08]• HL: [Abraham et al. 11]
A* with landmarks (ALT)
Reach (RE)
Transit nodes (TN)• Divide graph G on subgraphs G_i
• Find R (subset of G_i) for each G_i
• All sortest path in G_i across R
• Build pairs (v_i, r_k) for each v_i where
r_k is closest Transit Node
• Calculate shortest paths between transit
nodes in R
• Save it!
TN + ALT
59/65
Special Cases
Optimization problems• Unstable graph
• Prerpocessing phase is meaningless
• How to invest 1B $ in road network to minimize human time in
traffic jams
• How to invest 1M $ in road network to improve reliability before
the flooding
Last steps ...
• I/O Efficient Algorythms and Data Structures
• Graphs and Memory Errors
Omsk
Novosibirsk
Novosibirsk, TN preprocessing