Titan - Graph Computing with Cassandra

30
AURELIUS THINKAURELIUS.COM TITAN Graph Computing with Cassandra Matthias Broecheler, CTO @mbroecheler June XI, MMXIII #CASSANDRA13

description

This presentation introduces Titan, Faunus, and scalable graph computing in general. We present a case study of how Pearson builds an education social network on top of Titan, Faunus, and Cassandra to support learning in the 21st century. Titan is an open source distributed graph database build on top of Cassandra that can power real-time applications with thousands of concurrent users over graphs with billions of edges. Faunus is an open source global graph processing engine build on top of Hadoop and compatible with Cassandra that can analyze graphs, compute graph statistics, and execute global traversals. Titan and Faunus are components of the Aurelius Graph Cluster which enables scalable graph computation and powers applications in social networking, recommendation engines, advertisement optimization, knowledge representation, health care, education, and security.

Transcript of Titan - Graph Computing with Cassandra

Page 1: Titan - Graph Computing with Cassandra

AURELIUS THINKAURELIUS.COM

TITAN Graph Computing with Cassandra

Matthias Broecheler, CTO @mbroecheler June XI, MMXIII

#CASSANDRA13

Page 2: Titan - Graph Computing with Cassandra

Thank You

JOFF L?KO?MNM @?;NOL? MOAA?MNCIHM

<OA L?JILNM =IGGOHCNS MOJJILN

Page 3: Titan - Graph Computing with Cassandra

June 14th 2012

September 2012

December 2012

March 2013

May 2013

Alpha Release

Titan 0.1.0

Titan 0.2.0

Titan 0.3.0

Titan 0.3.1

%RJ?LCG?HN;F L?F?;M? I@ ; >CMNLC<ON?>m IJ?H rMIOL=? AL;JB >;N;<;M?

&CLMN MN;<F? L?F?;M?

2?QLCN? I@ =IL? )H>?RCHA h %F;MNC=3?;L=B

0?L@ILG;H=? "OA@CRCHA

Page 4: Titan - Graph Computing with Cassandra

June 14th 2012

September 2012

December 2012

March 2013

May 2013

Alpha Release

Titan 0.1.0

Titan 0.2.0

Titan 0.3.0

Titan 0.3.1

%RJ?LCG?HN;F L?F?;M? I@ ; >CMNLC<ON?>m IJ?H rMIOL=? AL;JB >;N;<;M?

&CLMN MN;<F? L?F?;M?

2?QLCN? I@ =IL? )H>?RCHA h %F;MNC=3?;L=B

0?L@ILG;H=? "OA@CRCHA

Faunus Release

Page 5: Titan - Graph Computing with Cassandra

Titan

Graph Database >CMNLC<ON?>

L?;F NCG?

IJ?H MIOL=?

Page 6: Titan - Graph Computing with Cassandra

name: Hercules type: demigod

name: Cerberus type: monster

battled

time:12

6?LN?R

%>A? ,;<?F

%>A?

0LIJ?LNS

Page 7: Titan - Graph Computing with Cassandra

Value in Relationships low high

Key-Value

7B?H MBIOF> SIO OM? ; 'L;JB $;N;<;M?g

K V

BigTable K V V V V

Document

Relational

Graph

"

Page 8: Titan - Graph Computing with Cassandra

Educating the Planet

Page 9: Titan - Graph Computing with Cassandra

Educating the Planet

Page 10: Titan - Graph Computing with Cassandra

Person

Person Student Teacher

Course

Institution

Concept

Discussion

Comment

Share

enrolledIn

teaches

relatesTo

hasCourse

belongsTo

follows

author references

hasComment relatesTo

author

partOf

relatesTo

Page 11: Titan - Graph Computing with Cassandra

Person

Person Student Teacher

Course

Institution

Concept

Discussion

Comment

Share

enrolledIn

teaches

relatesTo

hasCourse

belongsTo

follows

author references

hasComment relatesTo

author

partOf

relatesTo

Page 12: Titan - Graph Computing with Cassandra

Titan

Integrative Data Model CH ; JIFSAFIN MNIL;A? QILF>

Page 13: Titan - Graph Computing with Cassandra

Student

Person

Teacher

Course

Institution

Concept

Discussion

Comment

Share

enrolledIn

teaches

relatesTo

hasCourse

belongsTo

follows

author references

hasComment relatesTo

author

partOf

DiscussionRank

relatesTo

Page 14: Titan - Graph Computing with Cassandra

Titan

Analyze Relationships CH L?;F NCG?

Page 15: Titan - Graph Computing with Cassandra

Scaling Titan

HOG<?L I@ NL;HM;=NCIHM

MCT? I@ NB? AL;JB

Page 16: Titan - Graph Computing with Cassandra

121 Billion Edges 6.2 Billion Vertices

U -CFFCIH 5HCP?LMCNC?M

Page 17: Titan - Graph Computing with Cassandra

0F;=?G?HN 'LIOJ

BCU .4RF

Page 18: Titan - Graph Computing with Cassandra

1.1 million edges / sec

OMCHA <;N=B GI>?

Data Ingestion

Page 19: Titan - Graph Computing with Cassandra

\^ GU .G?>COG

Page 20: Titan - Graph Computing with Cassandra

x = [] as Set; m = [:]!m = user.out('follows').aggregate(x)[0..(num*2)]!!.out('follows').except(x)[0..limit]!!.groupCount(m);!

m.sort{-it.value}[0..num]._()!!.transform{ [userid: it.key.id, !! ! ! ! ! ! points: it.value]};!

&IFFIQ 2?=IGG?H>;NCIH

Page 21: Titan - Graph Computing with Cassandra

Generic Graph API

Dataflow Processing

TraversalLanguage

Object-GraphMapper

GraphAlgorithms

GraphServer

?R=CNCHA QILE =IGCHA

2%34 h *3/. 4CN;H’M %=IMSMN?G

KO?LS F;HAO;A?

http://tinkerpop.com

Page 22: Titan - Graph Computing with Cassandra

10,200 transactions / sec

UZ L;H>IGFS =BIM?H =IGJF?R NL;P?LM;F N?GJF;N?M

Throughput

Page 23: Titan - Graph Computing with Cassandra

Transaction Description Avg (ms) Stdev (ms) Student retrieves all content for a single course in their course list

279.32 81.83

Student follows another student 193.72 22.77 Student is recommended people to follow

241.33 256.48

Student reads their stream and shares an item with followers

284.07 68.20

Student retrieves their profile 53.740 22.61 Student reads the most recent comments for their courses

211.07 45.56

Page 24: Titan - Graph Computing with Cassandra

Scaling Titan

N?=BHC=;F J?LMJ?=NCP?

Page 25: Titan - Graph Computing with Cassandra

Vertex Representation

time: 1

5

8

4

9

2

7

mother

battled

battled

battled

fought

time: 4

time: 7

CH>O=?> IL>?L

name: Hercules type: demigod

5

Property

Property

Edge

Edge

Edge

Edge

Edge

LIQ CH>C=?M @IL @;MN P?LN?R =?HNLC= KO?LC?M

Page 26: Titan - Graph Computing with Cassandra

label id + direction

primary key edge id Δ

vertex id signature

properties other

properties

Edge Representation

Column Value

=IGJL?MM?> M?LC;FCT?> I<D?=NM

P;LC;<F? FIHA ?H=I>CHA

Page 27: Titan - Graph Computing with Cassandra

Token Ring

Graph Partitioning

;MMCAHM C>M NI G;J P?LNC=?M CHNI “IJNCG;F” NIE?H L;HA?

,INM I@ CHN?L?MNCHA KO?MNCIHM @IL@ONOL? QILE

OM?M "/0

Page 28: Titan - Graph Computing with Cassandra

Aurelius Graph Cluster

Stores a massive-scale property graph allowing real-time traversals and updates

Batch processing of large graphs with Hadoop

Runs global graph algorithms on large, compressed,

in-memory graphs

Map/Reduce Load & Compress

Analysis results back into Titan

Bulk Load

TITAN FAUNUS FULGORA

Apache 2

[email protected]

titan.thinkaurelius.com faunus.thinkaurelius.com

Page 29: Titan - Graph Computing with Cassandra

Special Thanks

Steve Hill (@kindageeky) Director Architecture & Innovation

at Pearson Education

Page 30: Titan - Graph Computing with Cassandra

AURELIUS THINKAURELIUS.COM

We are Hiring