Titan - Graph Computing with Cassandra

Post on 15-Jan-2015

2.370 views 1 download

Tags:

description

This presentation introduces Titan, Faunus, and scalable graph computing in general. We present a case study of how Pearson builds an education social network on top of Titan, Faunus, and Cassandra to support learning in the 21st century. Titan is an open source distributed graph database build on top of Cassandra that can power real-time applications with thousands of concurrent users over graphs with billions of edges. Faunus is an open source global graph processing engine build on top of Hadoop and compatible with Cassandra that can analyze graphs, compute graph statistics, and execute global traversals. Titan and Faunus are components of the Aurelius Graph Cluster which enables scalable graph computation and powers applications in social networking, recommendation engines, advertisement optimization, knowledge representation, health care, education, and security.

Transcript of Titan - Graph Computing with Cassandra

AURELIUS THINKAURELIUS.COM

TITAN Graph Computing with Cassandra

Matthias Broecheler, CTO @mbroecheler June XI, MMXIII

#CASSANDRA13

Thank You

JOFF L?KO?MNM @?;NOL? MOAA?MNCIHM

<OA L?JILNM =IGGOHCNS MOJJILN

June 14th 2012

September 2012

December 2012

March 2013

May 2013

Alpha Release

Titan 0.1.0

Titan 0.2.0

Titan 0.3.0

Titan 0.3.1

%RJ?LCG?HN;F L?F?;M? I@ ; >CMNLC<ON?>m IJ?H rMIOL=? AL;JB >;N;<;M?

&CLMN MN;<F? L?F?;M?

2?QLCN? I@ =IL? )H>?RCHA h %F;MNC=3?;L=B

0?L@ILG;H=? "OA@CRCHA

June 14th 2012

September 2012

December 2012

March 2013

May 2013

Alpha Release

Titan 0.1.0

Titan 0.2.0

Titan 0.3.0

Titan 0.3.1

%RJ?LCG?HN;F L?F?;M? I@ ; >CMNLC<ON?>m IJ?H rMIOL=? AL;JB >;N;<;M?

&CLMN MN;<F? L?F?;M?

2?QLCN? I@ =IL? )H>?RCHA h %F;MNC=3?;L=B

0?L@ILG;H=? "OA@CRCHA

Faunus Release

Titan

Graph Database >CMNLC<ON?>

L?;F NCG?

IJ?H MIOL=?

name: Hercules type: demigod

name: Cerberus type: monster

battled

time:12

6?LN?R

%>A? ,;<?F

%>A?

0LIJ?LNS

Value in Relationships low high

Key-Value

7B?H MBIOF> SIO OM? ; 'L;JB $;N;<;M?g

K V

BigTable K V V V V

Document

Relational

Graph

"

Educating the Planet

Educating the Planet

Person

Person Student Teacher

Course

Institution

Concept

Discussion

Comment

Share

enrolledIn

teaches

relatesTo

hasCourse

belongsTo

follows

author references

hasComment relatesTo

author

partOf

relatesTo

Person

Person Student Teacher

Course

Institution

Concept

Discussion

Comment

Share

enrolledIn

teaches

relatesTo

hasCourse

belongsTo

follows

author references

hasComment relatesTo

author

partOf

relatesTo

Titan

Integrative Data Model CH ; JIFSAFIN MNIL;A? QILF>

Student

Person

Teacher

Course

Institution

Concept

Discussion

Comment

Share

enrolledIn

teaches

relatesTo

hasCourse

belongsTo

follows

author references

hasComment relatesTo

author

partOf

DiscussionRank

relatesTo

Titan

Analyze Relationships CH L?;F NCG?

Scaling Titan

HOG<?L I@ NL;HM;=NCIHM

MCT? I@ NB? AL;JB

121 Billion Edges 6.2 Billion Vertices

U -CFFCIH 5HCP?LMCNC?M

0F;=?G?HN 'LIOJ

BCU .4RF

1.1 million edges / sec

OMCHA <;N=B GI>?

Data Ingestion

\^ GU .G?>COG

x = [] as Set; m = [:]!m = user.out('follows').aggregate(x)[0..(num*2)]!!.out('follows').except(x)[0..limit]!!.groupCount(m);!

m.sort{-it.value}[0..num]._()!!.transform{ [userid: it.key.id, !! ! ! ! ! ! points: it.value]};!

&IFFIQ 2?=IGG?H>;NCIH

Generic Graph API

Dataflow Processing

TraversalLanguage

Object-GraphMapper

GraphAlgorithms

GraphServer

?R=CNCHA QILE =IGCHA

2%34 h *3/. 4CN;H’M %=IMSMN?G

KO?LS F;HAO;A?

http://tinkerpop.com

10,200 transactions / sec

UZ L;H>IGFS =BIM?H =IGJF?R NL;P?LM;F N?GJF;N?M

Throughput

Transaction Description Avg (ms) Stdev (ms) Student retrieves all content for a single course in their course list

279.32 81.83

Student follows another student 193.72 22.77 Student is recommended people to follow

241.33 256.48

Student reads their stream and shares an item with followers

284.07 68.20

Student retrieves their profile 53.740 22.61 Student reads the most recent comments for their courses

211.07 45.56

Scaling Titan

N?=BHC=;F J?LMJ?=NCP?

Vertex Representation

time: 1

5

8

4

9

2

7

mother

battled

battled

battled

fought

time: 4

time: 7

CH>O=?> IL>?L

name: Hercules type: demigod

5

Property

Property

Edge

Edge

Edge

Edge

Edge

LIQ CH>C=?M @IL @;MN P?LN?R =?HNLC= KO?LC?M

label id + direction

primary key edge id Δ

vertex id signature

properties other

properties

Edge Representation

Column Value

=IGJL?MM?> M?LC;FCT?> I<D?=NM

P;LC;<F? FIHA ?H=I>CHA

Token Ring

Graph Partitioning

;MMCAHM C>M NI G;J P?LNC=?M CHNI “IJNCG;F” NIE?H L;HA?

,INM I@ CHN?L?MNCHA KO?MNCIHM @IL@ONOL? QILE

OM?M "/0

Aurelius Graph Cluster

Stores a massive-scale property graph allowing real-time traversals and updates

Batch processing of large graphs with Hadoop

Runs global graph algorithms on large, compressed,

in-memory graphs

Map/Reduce Load & Compress

Analysis results back into Titan

Bulk Load

TITAN FAUNUS FULGORA

Apache 2

aureliusgraphs@googlegroups.com

titan.thinkaurelius.com faunus.thinkaurelius.com

Special Thanks

Steve Hill (@kindageeky) Director Architecture & Innovation

at Pearson Education

AURELIUS THINKAURELIUS.COM

We are Hiring