Introduction to Gremlin

47
Introduction to Gremlin Chicago Graph Database Meet-Up Max De Marzi

Transcript of Introduction to Gremlin

Page 1: Introduction to Gremlin

Introduction to Gremlin

Chicago Graph Database Meet-UpMax De Marzi

Page 2: Introduction to Gremlin

About Me

• My Blog: http://maxdemarzi.com• Find me on Twitter: @maxdemarzi• Email me: [email protected]• GitHub: http://github.com/maxdemarzi

Built the Neography Gem (Ruby Wrapper to the Neo4j REST API)Playing with Neo4j since 10/2009

Page 3: Introduction to Gremlin

Agenda

• What is Gremlin?• Gremlin in Neo4j• Gremlin Steps• Gremlin Recommends

Page 4: Introduction to Gremlin

What is Gremlin?

Page 5: Introduction to Gremlin

Not a Car

Page 6: Introduction to Gremlin

Not a little Monster

Page 7: Introduction to Gremlin

Gremlin is

• A Graph Traversal Language• A domain specific language for traversing

property graphs• Implemented by most Graph Database

Vendors• Primarily seen with the Groovy Language• With JVM connectivity in Java, Scala, and

other languages

Page 8: Introduction to Gremlin

Marko Rodriguezhttp://markorodriguez.com

Created by:

Page 9: Introduction to Gremlin

A Graph DSL

A Dynamic Language for the JVM

A Data Flow Framework

“JDBC” for Graph DBs

Page 10: Introduction to Gremlin

Gremlin in Neo4j

Page 11: Introduction to Gremlin

g = (neo4jgraph[EmbeddedGraphDatabase [/neo4j/data/graph.db]]

Gremlin in Neo4j

Page 12: Introduction to Gremlin

g.v(1)

1

Page 13: Introduction to Gremlin

g.v(1).first_name

1

first_name=Max

Page 14: Introduction to Gremlin

g.v(1).last_name

1

last_name=De Marzi

Page 15: Introduction to Gremlin

g.v(1).map()

1

first_name=Maxlast_name=De Marzi

Page 16: Introduction to Gremlin

g.v(1).outE

1

create

d

knows

knows

Page 17: Introduction to Gremlin

g.v(1).outE.since

1

create

d

knows

knows

null

since=2009

since=2010

Page 18: Introduction to Gremlin

g.v(1).outE.inV

1

create

d

knows

knows

2

3

4

Page 19: Introduction to Gremlin

g.v(1).outE.inV.name

1

create

d

knows

knows

2

3

4

name=neography

name=Neo4j

name=Gremlin

Page 20: Introduction to Gremlin

g.v(1).outE.filter{it.label==‘knows’}

1

create

d

knows

knows

Page 21: Introduction to Gremlin

g.v(1).outE.filter{it.label==‘knows’}.count()

2

Page 22: Introduction to Gremlin

g.v(1). outE.filter{it.label==‘knows’}.inV.name

1

create

d

knows

knows

3

4

name=Neo4j

name=Gremlin

Page 23: Introduction to Gremlin

g.v(1). out(‘knows’).name

1

create

d

knows

knows

3

4

name=Neo4j

name=Gremlin

Page 24: Introduction to Gremlin

g.v(1). out(‘created’)

1

create

d

2

Page 25: Introduction to Gremlin

g.v(1). out(‘created’).in(‘contributed’)

1

create

d

5

2

contributed

Page 26: Introduction to Gremlin

g.v(1). out(‘created’).in(‘contributed’).name

1

create

d

5 name=Peter

2

contributed

Page 27: Introduction to Gremlin

g.v(1). out(‘created’).in(‘contributed’).name.back(1)

1

create

d

5 name=Peter

2

contributed

Page 28: Introduction to Gremlin

g.v(1). out(‘created’).in(‘contributed’).name.back(1).sideEffect{g.addEdge(g.v(1), it, ‘collaborator’)}

1

create

d

collaborator5 name=Peter

2

contributed

Page 29: Introduction to Gremlin

Gremlin Steps

Page 30: Introduction to Gremlin

Gremlin Transform Steps

_ V E idlabeloutoutEoutV

ininEinVbothbothEbothVkeymap

memoizegatherscatterpathcapselecttransform

Page 31: Introduction to Gremlin

Gremlin Filter Steps

[i] [i..j]hashasNotbackandorrandom

dedupsimplePathexceptretainfilter

Page 32: Introduction to Gremlin

Gremlin Side-Effect Steps

groupBy groupCountaggregatetabletreeasoptionalstore

sideEffect

Page 33: Introduction to Gremlin

Gremlin Branch Steps

loopifThenElsecopySplitfairMergeexhaustMerge

Page 34: Introduction to Gremlin

Gremlin Recommends

Page 35: Introduction to Gremlin

Our Graph (from MovieLens)

Page 36: Introduction to Gremlin

Recommendation Algorithm

m = [:];x = [] as Set; v = g.v(node_id);

v. out('hasGenre').aggregate(x).back(2).inE('rated').filter{it. stars > 3}.

(continued)outV. outE('rated'). filter{it.stars > 3}.inV.filter{it != v}.filter{it.out('hasGenre').toSet().equals(x)}.groupCount(m){\"${it.id}:${it.title}\"}.iterate();m.sort{a,b -> b.value <=> a.value}[0..24]

Page 37: Introduction to Gremlin

Explanation

m = [:];x = [] as Set; v = g.v(node_id);

In Groovy [:] is a map, we will return thisThe set “x” will hold the collection of genres we want our recommendedmovies to have.

v is our starting point.

Page 38: Introduction to Gremlin

Explanation

v. out('hasGenre'). (we are now at a genre node)aggregate(x).

We fill the empty set “x” with the genres of our movie.These are the properties we want to make sure our recommendations have.

Page 39: Introduction to Gremlin

Explanation

back(2). (we are back to our starting point)inE('rated').filter{it. stars > 3}. (we are now at the link between our movie and users)

We go back two steps to our starting movie, go to the relationship ‘rated’ and filter it so we only keep those with more than 3 stars.

Page 40: Introduction to Gremlin

Explanation

outV. (we are now at a user node)outE('rated'). filter{it.stars > 3}. (we are now at the link between user and movie)

We follow our relationships to the users who made them, and then go to the “rated” relationships of movies which also received morethan 3 stars.

Page 41: Introduction to Gremlin

Explanation

inV. (we are now at a movie node) filter{it != v}.

We follow our relationships to the movies who received the, but filter out “v” which is our starting movie. We do not want the system to recommend the same movie we just watched.

Page 42: Introduction to Gremlin

Explanation

filter{it.out('hasGenre').toSet().equals(x)}.

We also want to keep only the movies that have the same genres as ourstarting movie. People gave Toy Story and Terminator both 4 stars,but you wouldn’t want to recommend one to the other.

Page 43: Introduction to Gremlin

Explanation

groupCount(m){\"${it.id}:${it.title}\"}.iterate();

groupCount does what it sounds like and stores the values in the map “m”we created earlier, but we to retain the id and title of the movies.

iterate() is needed from the Neo4j REST API, the gremlin shell does it automatically for you. You will forget this one day and kill30 minutes of your life trying to figure out why you get nothing.

Page 44: Introduction to Gremlin

Explanation

m.sort{a,b -> b.value <=> a.value}[0..24]

Finally, we sort our map by value in descending order and grab the top25 items… and you’re done.See http://maxdemarzi.com/2012/01/16/neo4j-on-heroku-part-two/for the full walk-through including data loading.

Page 45: Introduction to Gremlin

How to treat Gremlin in Neo4j

As the equivalent of Stored Procedures in SQL.

Allow only parameters from end-users, do not generate gremlin dynamically or you’ll have the mother of all SQL injection vulnerabilities…

Gremlin => Groovy => JVM => Full Power

Page 46: Introduction to Gremlin

Questions?

?

Page 47: Introduction to Gremlin

Thank you!http://maxdemarzi.com