Cloud east shutl_talk

Tuesday, 28 May 13

How Neo4j helps Shutl to delivery even faster...

Tuesday, 28 May 13

Volker Pacher

senior developer @shutl

@vpacher

http://github.com/vpacher

Tuesday, 28 May 13



Tuesday, 28 May 13

• SaaS platform

Tuesday, 28 May 13

• SaaS platform

• we provide an API for carriers and merchants

Tuesday, 28 May 13

• SaaS platform


• shutl.it C2C platform

Tuesday, 28 May 13

• SaaS platform



• customers can chose between a delivery either:

Tuesday, 28 May 13

• SaaS platform



• customers can chose between a delivery either: within 90 minutes of purchase

Tuesday, 28 May 13

• SaaS platform



• customers can chose between a delivery either: within 90 minutes of purchase or a 1 hour window of their choice

Tuesday, 28 May 13

• SaaS platform



• customers can chose between a delivery either: within 90 minutes of purchase or a 1 hour window of their choice (same day or any day)

Tuesday, 28 May 13

• SaaS platform




• fastest delivery to date 15:00 min

Tuesday, 28 May 13

• SaaS platform




• fastest delivery to date 15:00 min

• SOA with services built using jRuby, sinatra, mongoDB and neo4j

Tuesday, 28 May 13

Tuesday, 28 May 13

Problems?

Tuesday, 28 May 13

http://xkcd.com/287/Tuesday, 28 May 13

http://xkcd.com/287/

http://xkcd.com/287/

problems with our previous attempt (v1):

Tuesday, 28 May 13

• exponential growth of joins in mysql with added features


Tuesday, 28 May 13


• code base too complex and unmaintanable


Tuesday, 28 May 13



• api response time growing too large the more data was added


Tuesday, 28 May 13



• api response time growing too large the more data was added

• our fastest delivery was quicker then our slowest query!


Tuesday, 28 May 13

The case for graph databases:

Tuesday, 28 May 13


• relationships are explicit stored (RDBS lack relationships)

Tuesday, 28 May 13



• domain modelling is simplified because adding new ‘subgraphs‘ doesn’t affect the existing structure and queries (additive model)

Tuesday, 28 May 13




• white board friendly

Tuesday, 28 May 13





• schema-less

Tuesday, 28 May 13





• schema-less

• db performance remains relatively constant because queries are localized to its portion of the graph. O(1) for same query

Tuesday, 28 May 13





• schema-less

• db performance remains relatively constant because queries are localized to its portion of the graph. O(1) for same query

• traversals of relationships are easy and very fast

Tuesday, 28 May 13

What is a graph anyway?

Node 1 Node 2

Node 4

Node 3

a collection of vertices (nodes) connected by edges (relationships)

Tuesday, 28 May 13

a short history

Leonard Euler

the seven bridges of Königsberg (1735)

Tuesday, 28 May 13

directed graph

Node 1 Node 2

Node 4

Node 3

each relationship has a direction orone start node and one end node

Tuesday, 28 May 13

property graph

name: Volker

•nodes contain properties (key, value)•relationships have a type and are always directed•relationships can contain properties too

name: Sam

:friends

name: Megan

:knowssince: 2005

name: Paul

:friends

:works_for

:knows

Tuesday, 28 May 13

a graph is its own index (constant query performance)

Tuesday, 28 May 13

the case for Neo4j

Tuesday, 28 May 13

the case for Neo4j

• we can run it embedded in the same jvm

Tuesday, 28 May 13

the case for Neo4j


• we can use jruby as we know ruby very well already

Tuesday, 28 May 13

the case for Neo4j



• lots of good ruby libraries are available, we chose the neo4j gem

by Andreas Ronge (https://github.com/andreasronge/neo4j)

Tuesday, 28 May 13

https://github.com/andreasronge/neo4j


the case for Neo4j





• it speaks cypher

Tuesday, 28 May 13



the case for Neo4j





• it speaks cypher

• the guys from neotech are awesome

Tuesday, 28 May 13



neo4j (jvm)

flockdb (jvm)

DEX (c++)

OrientDB (jvm)

Sones GraphDB (c#)

some graph dbs available:

Tuesday, 28 May 13

embedded vs. standalone

pros:

cons:

better performancetransaction supportneo4j gem is availablewe can use cypher and traversal

only the code running the db has access to the db

access via rest api and cypherlanguage independent and code doesn’t need to run on JVM

not as performantonly works with cyphertransaction is on a per query basisneed to write model wrappers for ourselves

Tuesday, 28 May 13

gotchas and other stuff to consider:

Tuesday, 28 May 13


• testing proved to be difficult and we had to write our own tools

Tuesday, 28 May 13



• migrations of schemaless dbs are more difficult to stay on top of and require special solutions in the case of graph dbs

Tuesday, 28 May 13




• seeding an embedded database is hard

Tuesday, 28 May 13





• graph db partioning is almost impossible and the whole graph needs to be in memory

Tuesday, 28 May 13






• encoding Dates and Times that are stored in UTC and work across timezone is non-trivial

Tuesday, 28 May 13






• encoding Dates and Times that are stored in UTC and work across timezone is non-trivial

• nested datastructure (hashes and array) can’t be stored and need to be converted to json

Tuesday, 28 May 13

Querying the graph: Cypher

Tuesday, 28 May 13


• declarative query language specific to neo4j

Tuesday, 28 May 13



• easy to learn and intuitive

Tuesday, 28 May 13




• enables the user to specify specific patterns to query for (something that looks like ‘this’)

Tuesday, 28 May 13





• inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching)

Tuesday, 28 May 13






• focuses on what to query for and not how to query for it

Tuesday, 28 May 13






• focuses on what to query for and not how to query for it

• switch from a mySQl world is made easier by the use of cypher instead of having to learn a traversal framework straight away

Tuesday, 28 May 13

• START: Starting points in the graph, obtained via index lookups or by element IDs.• MATCH: The graph pattern to match, bound to the starting points in START.• WHERE: Filtering criteria.• RETURN: What to return.• CREATE: Creates nodes and relationships.• DELETE: Removes nodes, relationships and properties.• SET: Set values to properties.• FOREACH: Performs updating actions once per element in a list.• WITH: Divides a query into multiple, distinct parts

cypher clauses

Tuesday, 28 May 13

an example graph

Node 1me

Node 2Steve

Node 3Sam

Node 4David

Node 5Megan

me - [:knows] -> Steve -[:knows] -> David

me - [:knows] -> Sam - [:knows] -> Megan

Megan - [:knows] -> David

knows

knowsknows

knows

knows

Tuesday, 28 May 13

START me=node(1)MATCH me-[:knows]->()-[:knows]->fofRETURN fof

the query

Tuesday, 28 May 13

START me=node(1)MATCH me-[:knows*2..]->fofWHERE fof.name =~ 'Da.*'RETURN fof

Tuesday, 28 May 13

a good place to try it out:

http://console.neo4j.org/

Tuesday, 28 May 13



root (0)

Year: 2013

Month: 05 Month 01

2014

0105

2013

Year: 2014

Month: 06

06

Day: 24 Day: 25

2425

Day: 26

26

Event 1 Event 2 Event 3

happens happens happens happens

representing dates/times

Tuesday, 28 May 13

find all events on a specific day

START root=node(0)MATCH root-[:‘2013’]-()-[:’05’]-()-[:’24’]-()- [:happens]-event RETURN event

Tuesday, 28 May 13

root (0)

Year: 2013

Month: 05 Month 01

2014

0105

2013

Year: 2014

Month: 06

06

Day: 24 Day: 25

2425

Day: 26

26

Event 1 Event 2 Event 3


next next


Tuesday, 28 May 13

find all events for a given range

START root=node(0)MATCH root-[:‘2013’]-()-[:’05’]-()-[:’24’]-start, root-[:‘2013’]-()-[:’05’]-()-[:’26’]-end, start-[:next*0..]-middle-[:next*0..]-end, middle-[:happens]-eventRETURN event

Tuesday, 28 May 13

root (0)

Year: 2013

Month: 05 Month 01

2014

0105

2013

Year: 2014

Month: 06

06

Day: 24 Day: 25

2425

Day: 26

26

Event 1 (20) Event 2 Event 3


next next


Tuesday, 28 May 13

does an event happen on a certain date?

START event=node(20)MATCH event-[:’24’]-()-[:’05’]-()-[:‘2013’]-()RETURN event

Tuesday, 28 May 13

testing and importing:

• we are using rspec for all tests on the api and practice tdd/bdd

• setting up ‘scenarios’ for an integration test was difficult and slow with existing tools

• we decided to built our own dsl based on the geoff notation developed by Nigel Small to

allow for the setting up of scenarios and for the import of data from mysql

Tuesday, 28 May 13

geoff:

developed by Nigel Small (@technige, http://geoff.nigelsmall.net/)allows modelling of graphs in a human readable form(A) {"name": "Alice"}(B) {"name": "Bob"}(A)-[:KNOWS]->(B)and provides a java interface to insert them into an existing graph

Tuesday, 28 May 13

http://geoff.nigelsmall.net/

http://geoff.nigelsmall.net/

• imports any geoff file into a neo4j db

• it is open source

geoff-importer gem(https://github.com/shutl/geoff-importer)

Tuesday, 28 May 13

https://github.com/shutl/geoff


• provides a dsl for creating a graph and inserting it into the db

• it is open source

• it works together with FactoryGirl (https://github.com/thoughtbot/factory_girl)

• it supports only the graph structure of the neo4j gem at the moment

• we haven’t solved all the issues with event listeners yet

geoff gem(https://github.com/shutl/geoff)

Tuesday, 28 May 13

https://github.com/thoughtbot/factory_girl

https://github.com/thoughtbot/factory_girl



Geoff(Company, Person) do company 'Acme' do address "13 Something Road"

outgoing :employees do person 'Geoff'

person 'Nigel' do name 'Nigel Small' end end end

company 'Github' do outgoing :customers do person 'Tom' person 'Dick' person 'Harry' end end

person 'Harry' do incoming :customers do company 'NeoTech' end endend

geoff gem(https://github.com/shutl/

geoff)

Tuesday, 28 May 13





root node

:company :person

acme13 somthing road

NeoTech

GitHub

:all

:all

:all

Geoff

Nigel Small

Tom

Dick

Harry

:all:all

:all

:all

:all

:employees

:employees

:customers

:customers

:customers

Tuesday, 28 May 13

QUESTIONS?

Volker Pacher

[email protected]

Tuesday, 28 May 13

mailto:[email protected]




Cloud east shutl_talk

Documents

Transcript of Cloud east shutl_talk