The Panama Papers: analysing it with neo4j and neo4j spatial - MINC 2016
raph Databases with Neo4j – Emil Eifrem
-
Upload
buildacloud -
Category
Technology
-
view
125 -
download
1
description
Transcript of raph Databases with Neo4j – Emil Eifrem
1
Neo4j is Teh Awesome
Graph Database 101
1
@emileifrem#neo4j
2
“So what’s a graph database?”
2
33
“A traditional relational database may tell you the
average age of everyone in this room...”
44
“... but a graph database will tell you who is most
likely tobuy you a beer!”
5
No. Srsly.
๏ Nodes
๏ Relationships
๏ Properties
5
6
How fast is it?
6
7
How fast is it?๏ a sample social graph
•with ~1,000 persons
๏ average 50 friends per person
๏ pathExists(a,b) limited to depth 4
๏ caches warmed up to eliminate disk I/O
# persons query time
Relational database 1,000 2000ms
Neo4j 1,000 2ms
Neo4j 1,000,000
8
How fast is it?๏ a sample social graph
•with ~1,000 persons
๏ average 50 friends per person
๏ pathExists(a,b) limited to depth 4
๏ caches warmed up to eliminate disk I/O
# persons query time
Relational database 1,000 2000ms
Neo4j 1,000 2ms
Neo4j 1,000,000 2ms
9
So how do you query it?
9
1010
Cypher
(A) -[:LOVES]-> (B)
LOVESAA BB
Graph Patterns
START A=node:person(name=“A”)MATC
HRETURN B as lover
ASCII art
11
// step 1: find starting pointSTART andreas=node:persons(name = ‘Andreas’)// step 2: describe pattern and resultsSTART andreas=node:persons(name = ‘Andreas’)MATCH (andreas)-->()-->(foaf) RETURN foaf
Example: Finding Friends of Friends
11
(andreas)
12
Neo4j is a Graph Database๏ A Graph Database:
•a Property Graph with Nodes, Relationships
•and Properties on both
•perfect for complex, highly connected data
๏ A Graph Database:
•reliable with real ACID Transactions
•scalable: high availability clustering in Neo4j Enterprise
•server with HTTP API, or embeddable on the JVM
•high-performance with High-Availability (read scaling)
12
13
Who’s using graphs today?
13
Accenture
14
So how do you get your hands on Neo4j?
14
๏ Option A: Download and install locally...
•go to http://neo4j.org
•click the shiny “Download Neo4j Now” button
•expand the archive, read the readmes
๏ Or B...
15
Graphs In The Cloud (BETA)
// new to heroku? get help$ heroku help
// create a new application$ heroku create intro-to-neo4j
// add Neo4jheroku addons:add neo4j --app intro-to-neo4j
// find out about the applicationheroku info --app intro-to-neo4j
// find the Neo4j Webadminheroku config --app intro-to-neo4j
// done trying it out? remove the applicationheroku destroy --app intro-to-neo4j
15
16
I needs thy help๏ Cloud Devops Engineer (San Mateo, London or
Malmö, SE)
•Come help us build a world class graph database cloud platform!
๏ Director of Community North America (San Mateo)
•Head up our developer outreach and evangelism in NA
๏ Developer Evangelist (San Mateo)
•Preach graphs to silicon valley and the world 16
1717
thank you!
stay connected
Core Industries Core Industries & Use Cases:& Use Cases: Web / ISVWeb / ISV Finance & Finance &
InsuranceInsuranceDatacom / Datacom / TelecomTelecom
Network /Cloud Network /Cloud MgmtMgmt
MDMMDM
SocialSocial
GeoGeo
Early Adopter Graph Database Segments
Core Industries Core Industries & Use Cases:& Use Cases: Web / ISVWeb / ISV Finance & Finance &
InsuranceInsuranceDatacom / Datacom / TelecomTelecom
Network /Cloud Network /Cloud MgmtMgmt
MDMMDM
SocialSocial
GeoGeo
Core Core Industries Industries
& Use Cases:& Use Cases:Web / ISVWeb / ISV
Finance Finance & &
InsurancInsurancee
Datacom Datacom / Telecom/ Telecom LogisticsLogistics Life Life
SciencesSciences
Media & Media & PublishinPublishin
gg
Education, Education, Not-for-Not-for-ProfitProfit
Government, Government, Aerospace, Aerospace, Gaming, ...Gaming, ...
Network Network /Cloud Mgmt/Cloud Mgmt
MDMMDM
SocialSocial
GeoGeo
Resource Auth Resource Auth & Access & Access ControlControl
Content Content ManagementManagement
Recommend-Recommend-ationsations
Data Center Data Center ManagementManagement
Fraud Fraud Detection, ...Detection, ...
Early Adopter Graph Database Segments
Early Adopters Going Mainstream
21
TelenorResource authorization & Access Control in Telecommunications
21
Background•10th largest Telco provider in the world, leading in the Nordics
•Online self-serve system where large business customers manage employee subscriptions and plans
•24/7 availability critical to customer satisfaction
Business problem
Solution & Benefits
22
TelenorResource authorization & Access Control in Telecommunications
22
23
TelenorResource authorization & Access Control in Telecommunications
23
Background•10th largest Telco provider in the world, leading in the Nordics
•Online self-serve system where large business customers manage employee subscriptions and plans
•24/7 availability critical to customer satisfaction
Business problem
Solution & Benefits•Resource authorization and access control across millions of plans, customers, administrators and groups, all interconnected, becomes a challenge
•Used Sybase RDBMS for pre-computing access rights daily
•Pre-computation time projected to reach 9 hours in 2014
•Users cannot log in until their rights are computed
•Resource graph easily modeled and queried in Neo4j
•1500 lines of stored procedures => 10s of lines of Neo4j code
•All requests computed in real time in milliseconds
•Changes to customer resources reflected immediately
•Customer retention risks mitigated
24
SFRNetwork Management in Telecommunications
24
Background•Second largest Telco in France
•Part of Vivendi Group, partnering with Vodaphone
Business problem
Solution & Benefits
RouterRouter
Service
Service
DEPENDS_ON
SwitchSwitch SwitchSwitch
RouterRouter
Fiber Link
Fiber Link Fiber
LinkFiber Link
Fiber Link
Fiber Link
Oceanfloor
Cable
Oceanfloor
Cable
DEP
END
S_O
N
DEPEN
DS_O
N
DEPENDS_ON
DEPE
ND
S_O
N
DEPENDS_ON
DEPENDS_ON
DEPENDS_ONDEPENDS_ON
DEPEN
DS
_ON
LINKED
LINKED
LINKE
D
DEPENDS_ON
25
SFRNetwork Management in Telecommunications
25
RouterRouter
ServiceService
DEPENDS_ON
SwitchSwitch SwitchSwitch
RouterRouter
Fiber LinkFiber LinkFiber LinkFiber Link
Fiber LinkFiber Link
Oceanfloor Cable
Oceanfloor Cable
DEP
END
S_O
N
DEPEN
DS_O
N
DEPENDS_ON
DEPE
ND
S_O
N
DEPENDS_ON
DEPENDS_ON
DEPENDS_ON
DEPENDS_ON
DEPEN
DS
_ON
LINKED
LINKED
LINKE
D
DEPENDS_ON
26
SFRNetwork Management in Telecommunications
26
Background•Second largest Telco in France
•Part of Vivendi Group, partnering with Vodaphone
Business problem
Solution & Benefits•Need for flexible network inventory management, aggregation, and troubleshooting
•Impact analysis of planned and unplanned network outages, so that affected services can be notified or receive increased redundancy
•Highly volatile network structure changing daily, with business requirements changing as well
•Neo4j Enterprise with a highly available cluster
•Dynamic system allowing for new applications to tie into network structure data
•Near 1:1 mapping of real world to graph, greatly reducing modeling work
•High adaptability to changing business requirements
RouterRouter
Service
Service
DEPENDS_ON
SwitchSwitch SwitchSwitch
RouterRouter
Fiber Link
Fiber Link Fiber
LinkFiber Link
Fiber Link
Fiber Link
Oceanfloor
Cable
Oceanfloor
Cable
DEP
END
S_O
N
DEPEN
DS_O
N
DEPENDS_ON
DEPE
ND
S_O
N
DEPENDS_ON
DEPENDS_ON
DEPENDS_ONDEPENDS_ON
DEPEN
DS
_ON
LINKED
LINKED
LINKE
D
DEPENDS_ON
27
AdobeContent Management, Access Control & Collaboration
27
Background•Creative Cloud, announced 2011, is a
cloud-based offering for professional users of Adobe’s creative suiteCollaborative Cloud is the social element of the Creative Cloud, connecting professional users around the world
Business problem
Solution & Benefits
2828
Domain GraphDeployment Architecture
AdobeContent Management, Access Control & Collaboration
29
AdobeContent Management, Access Control & Collaboration
29
Background•Creative Cloud, announced 2011, is a
cloud-based offering for professional users of Adobe’s creative suiteCollaborative Cloud is the social element of the Creative Cloud, connecting professional users around the world
Business problem
Solution & Benefits•Identifies which collections a user has
access toFinds third-party assets that are like a user’s assetsInfers professional relations based on user-generated content
•Fit:
•Graph model is a natural fit for social network
•Collaborative user experience adds competitive advantage to Adobe offering
•Flexibility: Data model can be easily evolved to support permissions and more sophisticated recommendation strategies
•Performance: Sub-second results for large, densely-connected datasets
30
ViadeoRecommendations in Social
๏ Customer: a professional social network
• 35 millions users, adding 30,000+ each day
๏ Goal: up-to-date recommendations
• Scalable solution with real-time end-user experience
• Low maintenance and reliable architecture
• 8-week implementation
30
๏ Problem:
• Real-time recommendation imperative to attract new users and maintain positive user retentionClustered MySQL solution not scalable or fast enough to support real-time requirementsUpgrade from running a batch job
๏ initial hour-long batch job
• but then success happened, and it became a day
• then two days
• With Neo4j, real time recommendations
32
First off: the name
๏WE ALL HATES IT, M’KAY?
32
NOSQL is NOT...
๏ NO to SQL
๏ NEVER SQL
Not Only SQL
NOSQL is simply
But why now?
36
Trends in BigData & NOSQL
36
๏ 1. increasing data size (big data)
•“Every 2 days we create as much information as we did up to 2003” - Eric Schmidt
๏ 2. increasingly connected data (graph data)
•for example, text documents to html
๏ 3. semi-structured data
•individualization of data, with common sub-set
๏ 4. architecture - a facade over multiple services
•from monolithic to modular, distributed applications
37
4 Categories of NOSQL
37
38
Key-Value Category๏ “Dynamo: Amazon’s Highly Available Key-Value Store”
(2007)
๏ Data model:
•Global key-value mapping
•Big scalable HashMap
•Highly fault tolerant (typically)
๏ Examples:
•Riak, Redis, Voldemort
38
39
Key-Value: Pros & Cons๏ Strengths
•Simple data model
•Great at scaling out horizontally
•Scalable
•Available
๏Weaknesses:
•Simplistic data model
•Poor for complex data
39
40
Column-Family Category๏ Google’s “Bigtable: A Distributed Storage System for
Structured Data” (2006)
•Column-Family are essentially Big Table clones
๏ Data model:
•A big table, with column families
•Map-reduce for querying/processing
๏ Examples:
•HBase, HyperTable, Cassandra
40
41
Column-Family: Pros & Cons๏ Strengths
•Data model supports semi-structured data
•Naturally indexed (columns)
•Good at scaling out horizontally
๏Weaknesses:
•Unsuited for connected data
41
42
Document Database Category๏ Data model
•Collections of documents
•A document is a key-value collection
•Index-centric, lots of map-reduce
๏ Examples
•CouchDB, MongoDB
42
43
Document Database: Pros & Cons๏ Strengths
•Simple, powerful data model
•Good scaling (especially if sharding supported)
๏Weaknesses:
•Unsuited for connected data
•Query model limited to keys (and indexes)
43
44
Graph Database Category๏ Data model:
•Nodes & Relationships
•Hypergraph, sometimes (edges with multiple endpoints)
๏ Examples:
•Neo4j (of course), OrientDB, InfiniteGraph, AllegroGraph
44
45
Graph Database: Pros & Cons๏ Strengths
•Powerful data model, as general as RDBMS
•Fast, for connected data
•Easy to query
๏Weaknesses:
•Requires conceptual shift
‣though graph-like thinking becomes addictive
45
46
Scaling to Size
Scaling to Complexity
Key/Value stores
ColumnFamily stores
Document databases
Graph databases
My subjective view: > 90% of use cases
100+ billion of nodesand relationships
The NOSQL Space