Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)

Thorny path to the Large-Scale Graph ProcessingZinoviev Alexey

About

• I am a <graph theory, machine learning, traffic jams prediction, BigData

algorythms> scientist

• But I'm a <Java, JavaScript, Android, NoSQL, Hadoop, Spark>

programmer

3/65

BigData & Graph Theory

Big Data of old times• Astronomy

• Weather

• Trading

• Sea routes

• Battles

And now ...• Web graph

• Facebook friend network

• Gmail email graph

• EU road network

• Citation graph

• PayPal transaction graph

Graph Number of vertexes

Number of edges

Volume Data/per day

Web-graph 1,5 * 10^12 1,2 * 10^13 100 PB 300 TB

Facebook (friends graph)

1,1 * 10^9 160 * 10^9 1 PB 15 TB

Road graph of EU

18 * 10^6 42 * 10^6 20 GB 50 MB

Road graph of this city

250 000 460 000 500 MB 100 KB

Problems• Popularity rank (page rank)

• Determining popular users, news, jobs, etc.

• Shortest paths

• Max flow

• How are users, groups connected?

• Clustering, semi-clustering

• Max clique, triangle closure, label propagation algorithms

• Finding related people, groups, interests

Node Centrality Problem• Verticies with high impact

• Removal of important vertices reduces the reliability

Cases:

• Bioinformatics

• Social connections

• Road network

• Spam detection

• Recommendation system

Small World Problem

Facebook 4.74 712 M 69 G

Twitter 3.67 ---- 5G follows

MSN Messenger (1 month)

6.6 180 M 1.3 G arcs

15/65

Large graph processing tools

Think like a vertex…• Majority of graph algorithms are iterative and traverse the graph in

some way

• Classic map-reduce overheads (job startup/shutdown, reloading data

from HDFS, shuffling)

• High complexity of graph problem reduction to key-value model

• Iteration algorythms, but multiple chained jobs in M/R with full saving

and reading of each state

Why not use MapReduce/Hadoop?• Example: PageRank, Google‘s

famous algorithm for measuring the

authority of a webpage based on the

underlying network of hyperlinks

• defined recursively: each vertex

distributes its authority to its neighbors

in equal proportions

Google Pregel• Distributed system especially developed for large scale graph

processing

• Bulk Synchronous Parallel (BSP) as execution model

• Supersteps are atomic units of parallel computation

• Any superstep can be restarted from a checkpoint (need not be user

defined)

• A new superstep provides an opportunity for rebalancing of

components among available resources

Superstep in BSP

Vertex-centric BSP• Each vertex has an id, a value, a list of its adjacent vertex ids and the

corresponding edge values

• Each vertex is invoked in each superstep, can recompute its value and

send messages to other vertices, which are delivered over superstep

barriers

• Advanced features : termination votes, combiners, aggregators,

topology mutations

C++ API, Pregel

23/65

Apache Giraph

Why Apache GiraphPregel is proprietary, but:

• Apache Giraph is an open source implementation of Pregel

• Runs on standard Hadoop infrastructure

• Computation is executed in memory

• Can be a job in a pipeline(MapReduce, Hive)

• Uses Apache ZooKeeperfor synchronization

Why Apache Giraph

• No locks: message-based communication

• No semaphores: global synchronization

• Iteration isolation: massively parallelizable

ZooKeeper in Apache GiraphZooKeeper: responsible for

computation state

• Partition/worker mapping

• Global state: superstep

• Checkpoint paths, aggregator

values, statistics

Master in Apache GiraphMaster: responsible for coordination

• Assigns partitions to workers

• Coordinates synchronization

• Requests checkpoints

• Aggregates aggregator values

• Collects health statuses

Worker in Apache GiraphWorker: responsible for vertices

• Invokes active vertices

compute() function

• Sends, receives and assigns

messages

• Computes local aggregation

values

Scaling Giraph to a trillion edges

https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920

https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920

Fault toleranceNo single point of failure from Giraph threads

• With multiple master threads, if the current master dies, a new one will automatically take over.

• If a worker thread dies, the application is rolled back to a previously checkpointed superstep.

• If a zookeeper server dies, as long as a quorum remains, the application can proceed

Hadoop single points of failure still exist (Namenode, jobtracker)

Worker Scalability, 250m nodes

Vertex scalability, 300 workers

Vertex/workers scalability

MapReduce vs Giraph6 machines with 2x8core Opteron CPUs, 4x1TB disks and 32GB RAM each, ran 1 Giraph worker per core

Wikipedia page link graph (6 million vertices, 200 million edges)

PageRank on Hadoop/Mahout

• 10 iterations approx. 29 minutes• average time per iteration: approx. 3 minutes

PageRank on Giraph

• 30 iterations took approx. 15 minutes• average time per iteration: approx. 30 seconds

10x performance improvement

Okapi• Apache Mahout for graphs

• Graph-based recommenders: ALS,

SGD, SVD++, etc.

• Graph analytics: Graph

partitioning, Community Detection,

K-Core, etc.

Giraph’s killer

Spark• MapReduce in memory

• Up to 50x faster than Hadoop

• Support for Shark (like Hive), MLlib

(Machine learning), GraphX (graph

processing)

• RDD is a basic building block

(immutable distributed collections of

objects)

Spark in Hadoop old family

GraphXSupported algorythms

● PageRank

● Connected components

● Label propagation

● SVD++

● Strongly connected components

● Triangle count

GraphChi• Asynchronous Disk-based version of GraphLab

• Utilizing parallel sliding window

• Very small number of non-sequential accessesto the disk

• Graph does not fit in memory

• Input graph is split into P disjoint intervals to balance edges,

each associated with a shard

• For Home deals ...

GraphChi

46/65

Road Networks

Definition• Edge weights > 0

• A few classes of roads

• Lat/Lon attributes for each vertex

• Subgraphs for cross-roads

• Not so big as web graph

• Static

Shortest path problem

Dijkstra

Bi-Directional

We need in fast system!• Response < 10 ms (with high accuracy)

• Shortest path (SP) with O(n)

• Preprocessing phase

• Don’t keep all SP - O(n^2)

• Use geo attributes

• Using compression and recoding for

disk storage

• Network is stable

EU Road network

Dijkstra ALT RE HH CH TN HL

2 008 300 24 656 2444 462.0 94.0 1.8 0.3

• ALT: [Goldberg & Harrelson 05], [Delling & Wagner 07]• RE: [Gutman 05], [Goldberg et al. 07]• HH: [Sanders & Schultes 06]• CH: [Geisberger et al. 08]• TN: [Geisberger et al. 08]• HL: [Abraham et al. 11]

A* with landmarks (ALT)

Reach (RE)

Transit nodes (TN)• Divide graph G on subgraphs G_i

• Find R (subset of G_i) for each G_i

• All sortest path in G_i across R

• Build pairs (v_i, r_k) for each v_i where

r_k is closest Transit Node

• Calculate shortest paths between transit

nodes in R

• Save it!

TN + ALT

59/65

Special Cases

Optimization problems• Unstable graph

• Prerpocessing phase is meaningless

• How to invest 1B $ in road network to minimize human time in

traffic jams

• How to invest 1M $ in road network to improve reliability before

the flooding

Last steps ...

• I/O Efficient Algorythms and Data Structures

• Graphs and Memory Errors

Novosibirsk

Novosibirsk, TN preprocessing

twitter + G+ + VK

https://twitter.com/zaleslaw

https://plus.google.com/u/0/110650159101065784429/posts

https://vk.com/russianpopup

https://twitter.com/zaleslaw

Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)

Internet

Transcript of Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)