The Football Graph - Neo4j and the Premier League
-
date post
14-Sep-2014 -
Category
Sports
-
view
594 -
download
22
description
Transcript of The Football Graph - Neo4j and the Premier League
In a League of their Own: Neo4j and Premiership Football
Mark Needham @markhneedham
Outline
• Intro to graphs • When do we need a graph? • Property graph model • Neo4j’s query language • The football graph • Using Neo4j from .NET
Let’s talk graphs
You mean these?
EaJng Brains
Dancing With Michael Jackson
Nope!
EaJng Brains
Dancing With Michael Jackson Thes
e����������� ������������������ are����������� ������������������ Cha
rts!����������� ������������������
����������� ������������������ NOT
����������� ������������������ Graphs!
����������� ������������������
Ok so what’s a graph then?
Node
RelaJonship
The tube
The social network (graph)
Complexity
What are graphs good for?
complexity = f(size, semi-structure, connectedness)
Data Complexity
Size
complexity = f(size, semi-structure, connectedness)
The Real Complexity
Semi-‐Structure
Email: [email protected] Email: [email protected] TwiXer: @markhneedham Skype: mk_jnr1984
USER
CONTACT
CONTACT_TYPE
FIRST_NAME LAST_NAME USER_ID EMAIL_1 EMAIL_2 TWITTER FACEBOOK SKYPE
Mark Needham 315 [email protected]
[email protected] @markhneedham NULL mk_jnr1984
Semi-‐Structure
complexity = f(size, semi-structure, connectedness)
The Real Complexity
Connectedness
Connectedness
Connectedness
When do we need a graph?
Densely Connected
Semi Structured
Densely connected?
Lots of join tables
Semi-‐Structured?
Lots of sparse tables
ProperJes of graph databases
• Millions of ‘joins’ per second • Consistent query Jmes as dataset grows • Join Complexity and Performance • Easy to evolve data model • Easy to ‘layer’ different types of data together
Property Graph Data Model
Nodes
Nodes can have properJes
• Used to represent enJty a"ributes and/or metadata (e.g. Jmestamps, version)
• Key-‐value pairs • Java primiJves • Arrays • null is not a valid value
• Every node can have different properJes
What’s a node?
RelaJonships
RelaJonships
• RelaJonships are first class ciJzens • Every relaJonship has a name and a direc.on – Add structure to the graph – Provide semanJc context for nodes
• ProperJes used to represent quality or weight of relaJonship, or metadata
• Every relaJonship must have a start node and end node
RelaJonships
Nodes can have more than one relaJonship
Self relaJonships are allowed
Nodes can be connected by more than one relaJonship
Labels
Think Gmail labels
• Nodes – EnJJes
• RelaJonships – Connect enJJes and structure domain
• ProperJes – EnJty aXributes, relaJonship qualiJes, and metadata
• Labels – Group nodes by role
Four Building Blocks
Purposeful abstracJon of a domain designed to saJsfy parJcular applicaJon/end-‐user goals
Models
Model Query
Design for Queryability
Model Model
Design for Queryability
Model Query
Design for Queryability
Introducing Cypher
• DeclaraJve PaXern-‐Matching language • SQL-‐like syntax • Designed for graphs
PaXerns, paXerns, everywhere
A
B C
(a) --> (b)
a b
It’s all about the ASCII art!
a b
The most basic query
MATCH (a)-->(b) RETURN a, b
(a)–[:ACTED_IN]->(m)
a m
Adding in a relaJonship type
ACTED IN
a m
Adding in a relaJonship type
MATCH (a)-[:ACTED_IN]->(m) RETURN a.name, m.name
ACTED IN
The football graph
The football graph
Find Arsenal’s away matches
Find Arsenal’s away matches
Find Arsenal’s away matches
MATCH (team:Team)<-[:away_team]-(game)
WHERE team.name = "Arsenal"
RETURN game
Graph PaXern
MATCH (team:Team)<-[:away_team]-(game)
WHERE team.name = "Arsenal"
RETURN game.name
Anchor paXern in graph
MATCH (team:Team)<-[:away_team]-(game)
WHERE team.name = "Arsenal"
RETURN game.name
Create projecJon of results
MATCH (team:Team)<-[:away_team]-(game)
WHERE team.name = "Arsenal"
RETURN game.name
Find Arsenal’s away matches
Evolving the football graph
Find the top away goal scorers
Find the top away goal scorers
MATCH (team)<-[:away_team]-(game:Game),
(game)<-[:contains_match]-(season:Season),
(team)<-[:for]-(stats)<-[:played]-(player),
(stats)-[:in]->(game)
WHERE season.name = "2012-2013"
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10
MulJple graph paXerns
MATCH (team)<-[:away_team]-(game:Game),
(game)<-[:contains_match]-(season:Season),
(team)<-[:for]-(stats)<-[:played]-(player),
(stats)-[:in]->(game)
WHERE season.name = "2012-2013"
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10
Anchor paXern in the graph
MATCH (team)<-[:away_team]-(game:Game),
(game)<-[:contains_match]-(season:Season),
(team)<-[:for]-(stats)<-[:played]-(player),
(stats)-[:in]->(game)
WHERE season.name = "2012-2013"
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10
Group by player
MATCH (team)<-[:away_team]-(game:Game),
(game)<-[:contains_match]-(season:Season),
(team)<-[:for]-(stats)<-[:played]-(player),
(stats)-[:in]->(game)
WHERE season.name = "2012-2013"
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10
Find the top away goal scorers
Other football queries
• Goals scored in each month by Michu • ToXenham results when Gareth Bale scores • What did Wayne Rooney do in April? • Which players only score when a game is televised?
Graph Query Design
The relaJonal version
Graph vs RelaJonal
Rela%onal Graphs Tables -‐ assume records all have the same structure
Nodes -‐ no need to set a property if it doesn’t exist
Foreign keys between tables -‐ joins calculated at run Jme -‐ the more tables you join to a query the slower the query gets
Rela%onships -‐ stored as a ‘Pre-‐computed index’ at write Jme -‐ very easy to do lots of ‘hops’ between relaJonships
.NET and Neo4j
REST Client
ApplicaJon
H T T P
Neo4j Server
Neo4jClient
.NET and Neo4j
ApplicaJon
H T T P
Neo4j Server
REST Client
.NET and Neo4j
.NET and Neo4j
.NET and Neo4j
.NET and Neo4j
.NET and Neo4j
Thinking in graphs
Graphs should be fun!
Ask for help if you get stuck Last Wednesday of the month
Come take a copy, it’s free!
Ian Robinson, Jim Webber & Emil Eifrem
Graph Databases
h
Compliments
of Neo Technology
www.graphdatabases.com
QuesJons?
Mark Needham @markhneedham [email protected]