Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
-
Upload
sonal-raj -
Category
Technology
-
view
7.487 -
download
4
description
Transcript of Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real-Time stream computation on
graphs using Storm, Neo4j and
Python
Sonal Raj
http://www.sonalraj.com
Presented at Pycon India 2013
Bangalore, India
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
1
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Introduction
2
• With data multiplying each day, storage and
knowledge extraction is a major concern.
• Social Data Analysis, Business Intelligence
• Constraints of Real Time and Fault-Tolerant
Processing
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
. . In this Talk
3
• A look at storm as a distributed
computation Framework
• Neo4J as a NoSQL graph database
• Some Cool Pictures
• What are we trying to achieve ?
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Disclaimer !
4
• This talk presents an overview of Storm and
Neo4J . . Less dirty details
• I’m going to go pretty fast . . . Please hang on.
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
5
Part -1
Storm – The Hadoop
of Real Time
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Don’t we have Hadoop ?
6
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Storm v/s Hadoop
7
STORM
HADOOP
• Distributed
Processing
• Fault Tolerance
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Storm v/s Hadoop
8
HADOOP
• Large but Finite Jobs
• Processes a Lot of Data at Once
• High Latency
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Storm v/s Hadoop
9
HADOOP
• Large but Finite Jobs
• Processes a Lot of Data at Once
• High Latency
Storm
Infinite Computations called Topologies
Process Infinite Streams of data one-tuple-at-a-time
Low Latency
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
So, what Storm gives us . .
10
Real-Time Computations
Guaranteed data Processing
Horizontal Scalability and Fault-Tolerance
No intermediate message Brokers
Higher Abstraction than Message Passing, so makes
sense !
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
11
Streams
Tuple Tuple Tuple Tuple Tuple
An unbounded sequence of Tuples
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
12
Streams
Tuple Tuple Tuple Tuple Tuple
An unbounded sequence of Tuples
So, what kind of a tuple is this ?
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
13
Spouts
A source of Streams
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
14
Spouts
A source of Streams
But, what is the source FOR the spouts ?
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
15
Bolts
Computational units processing input
streams and producing new streams
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
16
Bolts
Computational units processing input
streams and producing new streams
Just 1 stream ?
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
17
Topologies
A network of spouts and bolts
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Is that it . . . ?
18
Tasks and Parallelism
A spout or bolt can execute
multiple tasks across the
cluster
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
19
[ ]Mr. Tuple
O Shoot, where do I go now?
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Groupings . . To the rescue of Mr. Tuple !
20
• Shuffle Grouping #pick a random task
• Fields Grouping #mod hashing on a
subset of tuple fields
• All Grouping #sends to all tasks
• Global Grouping #picks task with lowest
task id
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A Storm Cluster
21
NIMBUS
ZOOKEEPER
ZOOKEEPER
ZOOKEEPER
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A Storm Cluster
22
NIMBUS
ZOOKEEPER
ZOOKEEPER
ZOOKEEPER
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
If this were Hadoop
Job TrackerTask Tracker
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A Storm Cluster
23
NIMBUS
ZOOKEEPER
ZOOKEEPER
ZOOKEEPER
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
But it’s NOT Hadoop !
Co-ordinates
Everything
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Salient Features . .
24
• Storm > 0.7 supports Transactional Topologies Processes small batches of topologies
If failure during commit, both batch+commit is
retried
• Storm guarantees message Processing using
acknowledgements
• Petrel by AirSage is a python wrapper for
Storm ; you can write and submit topologies in
Python.
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
25
Part -2
Neo4J – “Get Graphed”
26
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
This is how
Graph Data was
represented in
RDBMS.
27
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
ENTER, NOSQL DATABASES
28
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Types of NOSQL Databases
Graph
databases
Document
databases
Column-
Family
Key-Value
Stores
Data Complexity
Da
ta S
ize
29
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Why NOSQL Databases
• Easily horizontally scalable
• Dynamic Schemas, Handle Unstructured data really
well.
• Excel in speed and volume
• Trade off in consistency for efficiency (except in
graph databases . . . We’ll see why )
• Pleasure to code
• Free to use any query language ( even SQL ! )
• Downtime? What Downtime ?
30
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
The Property Graph Model of Graph Databases
• Core Abstractions
Nodes
Relationship between Nodes
Properties of both
• Traversal Framework
High Performance Queries on connected datasets
• Bindings
REST, Gremlin, etc.
31
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Neo4J
• Fully ACID with rollbacks support (unbelievable!)
• Schema-less and Efficient storage of Semi Structured
Data
• Fast deep traversal instead of slow SQL queries that
span many table joins
• Whiteboard Friendly
• Very natural to express graph related problems with
traversals (recommendation engine, shortest path etc..)
32
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Neo4J Pythonized !
• Py2Neo is an excellent binding for Neo4J
• Accesses Neo4J using it’s RESTful API
• Still under development . . Features like labels yet to be
included !
33
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
So, Will Relational databases be Extinct ?
OOPS!
34
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Categories of Graphical Data
• Social Networks
• Citations
• Product Co-Purchasing
• Internet peer-to-peer
• Road Network and Map Data
• Web Graphs
Excellent Source of Sample Graphical Data
“ http://snap.Stanford.edu/data/ “
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
35
Part -3
Get your hands dirty !
36
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A demo . .
• Sample Social Network data set
• Data Includes people signing up info,
adding friends, unfriending etc. . . for a
month’s activity
• Neo4J
Store and Update the social data
• Storm
Calculate “friendship-index”
37
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A demo . .
• “friendship-index”
n = Through how many people is
person “A” connected to person “B”
Gives an idea of how close two people
are !
Useful while searching friends on Social
Networks ( something like friends of friends concept
in facebook’s graph search )
38
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
The Topology . .
UpdateSpout
UpdateBolt
QuerySpout Query
Bolt
Source
Source
39Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Spout
40Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Spout
Define what kind of tuples
are emitted
41Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Spout
Gets and emits tuple streams
42Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Bolt
43Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Bolt
Objects for database access
and indexing service
44Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Bolt
45Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Spout
46Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Spout
The tuple to be emitted
can contain multiple
entities.
47Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Bolt
48Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Bolt
49Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Bolt
Retrieve caller friend and
requested friend ids
50Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Bolt
Retrieve caller friend
and requested friend
ids as per database
51Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Create Topology
52Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Create Topology
Import all spout and
bolt files
53Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Create Topology
Unfortunately, There was no option in
Petrel to turn off console debug, so the
console view is really messy.
54Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Topology.yaml
Configurations to the topology are
specified in this file
55
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little More . .
UpdateSpout
UpdateBolt
QuerySpout Query
Bolt
Source
Source
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
56
Final Thoughts
• A Storm-Neo4j framework is a boon for real-time graph computations
• Quite flexible in Java, Python bindings and implementations still have a long way to go.
• If you are an Admin or developer, Analyse your data and computing requirements before narrowing down on a framework.
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
57
…to play with Storm and Neo4J
• My PyCon Talk Repo – slides, code skeletons,
etc.http://www.sonalraj.com/neo-storm.html
• Storm documentation (official)http://github.com/nathanmarz/storm
• Storm Bookhttp://www.amazon.com/Getting-Started-Storm-Jonathan-Leibiusky/dp/1449324010
• Deployment of storm on AWShttp://github.com/nathanmarz/storm-deploy
• Neo4J Documentationhttp://www.neo4j.org
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
58
Ex-terminated . . .
- That’s it- Thanks for Listening !- Questions