Post on 18-Mar-2018
GraphDBWhirlwind TourMichael Hunger
Code Days - OOP
(Michael Hunger)-[:WORKS_FOR]->(Neo4j)
michael@neo4j.com | @mesirii | github.com/jexp | jexp.de/blog
Michael Hunger - Head of Developer Relations @Neo4j
WhyGraphs
?
UseCases
Data Model
Query-ing
Neo4j
Why Graphs?
Because the World is a Graph!
Everything and Everyone is Connected
• people, places, events• companies, markets• countries, history, politics• sciences, art, teaching• technology, networks, machines, applications, users• software, code, dependencies, architecture, deployments• criminals, fraudsters and their behavior
Value from Relationships
Value from Data RelationshipsCommon Use Cases
Internal Applications
Master Data Management
Network and IT Operations
Fraud Detection
Customer-Facing Applications
Real-Time Recommendations
Graph-Based Search
Identity and Access Management
The Rise of Connections in Data
Networks of People Business Processes Knowledge Networks
E.g., Risk management, Supply chain, Payments
E.g., Employees, Customers, Suppliers, Partners, Influencers
E.g., Enterprise content, Domain specific content, eCommerce content
Data connections are increasing as rapidly as data volumes
9
Harnessing Connections Drives Business Value
Enhanced Decision
Making
Hyper
Personalization
Massive Data
Integration
Data Driven Discovery
& Innovation
Product Recommendations
Personalized Health Care
Media and Advertising
Fraud Prevention
Network Analysis
Law Enforcement
Drug Discovery
Intelligence and Crime Detection
Product & Process Innovation
360 view of customer
Compliance
Optimize Operations
Connected Data at the Center
AI & Machine
Learning
Price optimization
Product Recommendations
Resource allocation
Digital Transformation Megatrends
Graph Databases areHOT
Graph Databases Are Hot
Lots of Choice
Newcomers in the last 3 years
• DSE Graph
• Agens Graph
• IBM Graph
• JanusGraph
• Tibco GraphDB
• Microsoft CosmosDB
• TigerGraph
• MemGraph
• AWS Neptune
• SAP HANA Graph
Database Technology Architectures
Graph DB
Connected DataDiscrete Data
Relational DBMSOther NoSQL
Right Tool for the Job
The impact of Graphs
How Graphs are changing the World
GRAPHSFOR GOOD
Neo4j ICIJ Distribution
Better Health with Graphs
Cancer Research - Candiolo Cancer Institute
“Our application relies on complexhierarchical data, which required a moreflexible model than the one provided bythe traditional relational databasemodel,” said Andrea Bertotti, MD
neo4j.com/case-studies/candiolo-cancer-institute-ircc/
Graph Databases in Healthcare and Life Sciences
14 Presenters from all around Europe on:
• Genome• Proteome• Human Pathway• Reactome• SNP• Drug Discovery• Metabolic Symbols• ...
neo4j.com/blog/neo4j-life-sciences-healthcare-workshop-berlin/
DISRUPTIONWITHGRAPHS
BETTERBUSINESSWITH GRAPHS
28
Real-Time Recommendations
Fraud Detection
Network &IT Operations
Master Data Management
Knowledge Graph
Identity & Access Management
Common Graph Technology Use Cases
AirBnb
30
• Record “Cyber Monday” sales• About 35M daily transactions• Each transaction is 3-22 hops• Queries executed in 4ms or less• Replaced IBM Websphere commerce
• 300M pricing operations per day• 10x transaction throughput on half the
hardware compared to Oracle• Replaced Oracle database
• Large postal service with over 500k employees
• Neo4j routes 7M+ packages daily at peak, with peaks of 5,000+ routing operations per second.
Handling Large Graph Work Loads for Enterprises
Real-time promotion
recommendations
Marriott’s Real-time
Pricing Engine
Handling Package
Routing in Real-Time
Software
Financial Services Telecom
Retail & Consumer Goods
Media & Entertainment Other Industries
Airbus
NEWINSIGHTSWITH GRAPHS
Machine Learning is Based on Graphs
The Property GraphModel, Import, Query
The Whiteboard Model Is the Physical Model
Eliminates Graph-to-Relational Mapping
In your data modelBridge the gap
between business and IT models
In your applicationGreatly reduce need for application code
CAR
name: “Dan”born: May 29, 1970
twitter: “@dan”name: “Ann”
born: Dec 5, 1975
since: Jan 10, 2011
brand: “Volvo”model: “V70”
Property Graph Model Components
Nodes• The objects in the graph• Can have name-value properties• Can be labeled
Relationships• Relate nodes by type and direction• Can have name-value properties
LOVES
LOVES
LIVES WITHPERSON PERSON
Cypher: Powerful and Expressive Query Language
MATCH (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
Relational Versus Graph Models
Relational Model Graph Model
KNOWSANDREAS
TOBIAS
MICA
DELIA
Person FriendPerson-Friend
ANDREASDELIA
TOBIAS
MICA
Retail ...
Recommendations
Our starting point – Northwind ER
Building Relationships in Graphs
ORDERED
Customer OrderOrder
Locate Foreign Keys
(FKs)-[:BECOME]->(Relationships) & Correct Directions
Drop Foreign Keys
Find the Join Tables
Simple Join Tables Becomes Relationships
Attributed Join Tables Become Relationships with Properties
(One) Northwind Graph Model
(:You)-[:QUERY]->(:Data)
in a graph
Who bought Chocolat?
You all know SQL
SELECT distinct c.CompanyNameFROM customers AS cJOIN orders AS o
ON (c.CustomerID = o.CustomerID)JOIN order_details AS od
ON (o.OrderID = od.OrderID)JOIN products AS p
ON (od.ProductID = p.ProductID)WHERE p.ProductName = 'Chocolat'
Apache Tinkerpop 3.3.x - Gremlin
g = graph.traversal();g.V().hasLabel('Product')
.has('productName','Chocolat')
.in('INCLUDES')
.in('ORDERED')
.values('companyName').dedup();
W3C Sparql
PREFIX sales_db: <http://sales.northwind.com/>
SELECT distinct ?company_name WHERE {<sales_db:CompanyName> ?company_name .
?c <sales_db:ORDERED> ?o .?o <sales_db:ITEMS> ?od .?od <sales_db:INCLUDES> ?p .?p <sales_db:ProductName> "Chocolat" .
}
openCypher
MATCH (c:Customer)-[:ORDERED]->(o)-[:INCLUDES]->(p:Product)
WHERE p.productName = 'Chocolat'
RETURN distinct p.companyName
Basic Pattern: Customers Orders?
MATCH (:Customer {custName:"Delicatessen"} ) -[:ORDERED]-> (order:Order) RETURN order
VAR LABEL
NODE NODE
LABEL PROPERTY
ORDERED
Customer OrderOrder
REL
Basic Query: Customer's Orders?
MATCH (c:Customer)-[:ORDERED]->(order)
WHERE c.customerName = 'Delicatessen'
RETURN *
Basic Query: Customer's Frequent Purchases?
MATCH (c:Customer)-[:ORDERED]->()-[:INCLUDES]->(p:Product)
WHERE c.customerName = 'Delicatessen'
RETURN p.productName, count(*) AS freqORDER BY freq DESC LIMIT 10;
openCypher - Recommendation
MATCH(c:Customer)-[:ORDERED]->(o1)-[:INCLUDES]->(p),(peer)-[:ORDERED]->(o2)-[:INCLUDES]->(p),(peer)-[:ORDERED]->(o3)-[:INCLUDES]->(reco)
WHERE c.customerId = $customerIdAND NOT (c)-[:ORDERED]->()-[:INCLUDES]->(reco)
RETURN reco.productName, count(*) AS freqORDER BY freq DESC LIMIT 10
Product Cross-Sell
MATCH(:Product {productName: 'Chocolat'})<-[:INCLUDES]-(:Order)<-[:SOLD]-(employee)-[:SOLD]->()-[:INCLUDES]->(cross:Product)
RETURNemployee.firstName, cross.productName, count(distinct o2) AS freq
ORDER BY freq DESC LIMIT 5;
openCypher
openCypher...
...is a community effort to evolve Cypher, and tomake it the most useful language for querying property graphs
openCypher implementations
SAP Hana Graph, Redis, Agens Graph, Cypher.PL, Neo4j
github.com/opencypher Language Artifacts
● Cypher 9 specification● ANTLR and EBNF Grammars● Formal Semantics (SIGMOD)● TCK (Cucumber test suite)● Style Guide
Implementations & Code
● openCypher for Apache Spark● openCypher for Gremlin● open source frontend (parser)● ...
Cypher 10
● Next version of Cypher
● Actively working on natural language specification
● New features○ Subqueries○ Multiple graphs○ Path patterns○ Configurable pattern matching semantics
Extending Neo4j
Extending Neo4j -User Defined Procedures & Functions
Neo4j Execution EngineUser Defined
Procedure
User Defined Functions
Applications
Bolt
User Defined Procedures & Functions let
you write custom code that is:
• Written in any JVM language
• Deployed to the Database
• Accessed by applications via Cypher
Procedure Examples
Built-In• Metadata Information
• Index Management
• Security
• Cluster Information
• Query Listing & Cancellation
• ...
Libraries• APOC (std library)• Spatial• RDF (neosemantics)• NLP• ...
neo4j.com/developer/procedures-functions
Example: Data(base) Integration
Graph Analytics
Neo4j Graph Algorithms
”Graph analysis is possibly the single most effective
competitive differentiator for organizations pursuing data-
driven operations and decisions“
The Impact of Connected Data
Existing Options (so far)
•Data Processing•Spark with GraphX, Flink with Gelly•Gremlin Graph Computer
•Dedicated Graph Processing•Urika, GraphLab, Giraph, Mosaic, GPS, Signal-Collect, Gradoop
•Data Scientist Toolkit• igraph, NetworkX, Boost in Python, R, C
Goal: Iterate Quickly
•Combine data from sources into one graph
•Project to relevant subgraphs
•Enrich data with algorithms
•Traverse, collect, filter aggregate with queries
•Visualize, Explore, Decide, Export
•From all APIs and Tools
1. Call as Cypher procedure
2. Pass in specification (Label, Prop, Query) and configuration
3. ~.stream variant returns (a lot) of results
CALL algo.<name>.stream('Label','TYPE',{conf})
YIELD nodeId, score
4. non-stream variant writes results to graph returns statistics
CALL algo.<name>('Label','TYPE',{conf})
Usage
Pass in Cypher statement for node- and relationship-lists.
CALL algo.<name>(
'MATCH ... RETURN id(n)',
'MATCH (n)-->(m)
RETURN id(n) as source,
id(m) as target', {graph:'cypher'})
Cypher Projection
DEMO: OOP
Development
Data Storage andBusiness Rules Execution
Data Mining and Aggregation
Neo4j Fits into Your Environment
Application
Graph Database Cluster
Neo4j Neo4j Neo4j
Ad HocAnalysis
Bulk AnalyticInfrastructure
Graph Compute EngineEDW …
Data Scientist
End User
Databases
RelationalNoSQL
Hadoop
Official Language Drivers
• Foundational drivers for popular programming languages
• Bolt: streaming binary wire protocol
• Authoritative mapping to native type system, uniform across drivers
• Pluggable into richer frameworks
JavaScript Java .NET Python PHP, ....
Drivers
Bolt
Bolt + Official Language Drivers
http://neo4j.com/developer/ http://neo4j.com/developer/language-guides/
Using Bolt: Official Language Drivers look all the same
With JavaScript
var driver = Graph.Database.driver("bolt://localhost");
var session = driver.session();
var result = session.run("MATCH (u:User) RETURN u.name");
neo4j.com/developer/spring-data-neo4j
Spring Data Neo4j Neo4j OGM
@NodeEntitypublic class Talk { @Id @GeneratedValueLong id; String title; Slot slot;Track track; @Relationship(type="PRESENTS",
direction=INCOMING) Set<Person> speaker = new HashSet<>();
}
Spring Data Neo4j Neo4j OGM
interface TalkRepository extends Neo4jRepository<Talk, Long> {
@Query("MATCH (t:Talk)<-[rating:RATED]-(user) WHERE t.id = {talkId} RETURN rating")
List<Rating> getRatings(@Param("talkId") Long talkId);
List<Talk> findByTitleContaining(String title);}
github.com/neoj4-contrib/neo4j-spark-connector
Neo4j Spark Connector
Neo4j
THE Graph Database Platform
Graph Transactions
Graph Analytics
Data Integration
Development & Admin
Analytics Tooling
Drivers & APIs Discovery & Visualization
Developers
Admins
Applications Business Users
Data Analysts
Data Scientists
• Operational workloads
• Analytics workloads
Real-time Transactional
and Analytic Processing • Interactive graph exploration
• Graph representation of data
Discovery and
Visualization
• Native property graph model
• Dynamic schema
Agilit
y
• Cypher - Declarative query language
• Procedural language extensions
• Worldwide developer community
Developer Productivity
• 10x less CPU with index-free adjacency
• 10x less hardware than other platforms
Hardware efficiency
Neo4j: Graph Platform
Performance
• Index-free adjacency
• Millions of hops per second
Index-free adjacency ensures lightning-
fast retrieval of data and relationships
Native Graph Architecture
Index free adjacencyUnlike other database models Neo4j
connects data as it is stored
Neo4j Query Planner
Cost based Query Planner since Neo4j
• Uses transactional database statistics
• High performance Query Engine
• Bytecode compiled queries
• Future: Parallism
1
2
3
4
5
6
Architecture Components
Index-Free Adjacency
In memory and on flash/disk
vs
ACID Foundation
Required for safe writes
Full-Stack Clustering
Causal consistencySecurity
Language, Drivers, Tooling
Developer Experience, Graph Efficiency
Graph Engine
Cost-Based Optimizer, Graph Statistics, Cypher Runtime
Hardware Optimizations
For next-gen infrastructure
Neo4j – allows you to connect the dots
• Was built to efficiently
• store,
• query and
• manage highly connected data
• Transactional, ACID• Real-time OLTP• Open source• Highly scalable on few machines
High Query Performance: Some Numbers
• Traverse 2-4M+ relationships per second and core
• Cost based query optimizer –complex queries return in milliseconds
• Import 100K-1M records per second transactionally
• Bulk import tens of billions of records in a few hours
Get Started
Neo4j Sandbox
How do I get it? Desktop – Container – Cloud
http://neo4j.com/download/
docker run neo4j
Neo4j Cluster Deployment Options
• Developer: Neo4j Desktop (free Enterprise License)• On premise – Standalone or via OS package• Containerized with official Docker Image•
In the Cloud• AWS, GCE, Azure
• Using Resource Managers• DC/OS – Marathon
• Kubernetes
• Docker Swarm
10M+
Downloads
3M+ from Neo4j Distribution
7M+ from Docker
Events
400+Approximate Number of
Neo4j Events per Year
50k+
Meetups
Number of Meetup
Members Globally
Active Community
50k+Trained/certified Neo4j
professionals
Trained Developers
Summary: Graphs allow you ...
• Keep your rich data model
• Handle relationships efficiently
• Write queries easily
• Develop applications quickly
• Have fun
Thank You!
Questions?!
@neo4j | neo4j.com@mesirii | Michael Hunger
Users Love Neo4j
Causal Clustering
Core & Replica Servers Causal Consistency
Causal Clustering - Features
• Two Zones – Core + Edge
• Group of Core Servers – Consistent and Partition tolerant (CP)
• Transactional Writes
• Quorum Writes, Cluster Membership, Leader via Raft Consensus
• Scale out with Read Replicas
• Smart Bolt Drivers with
• Routing, Read & Write Sessions
• Causal Consistency with Bookmarks
• For massive query throughput
• Read-only replicas• Not involved in Consensus
Commit • Disposable, suitable for
auto-scaling
Replica
• Small group of Neo4j databases
• Fault-tolerant Consensus Commit
• Responsible for data safety
Core
Writing to the Core Cluster
Neo4j
Driver
✓
✓
✓
Success
Neo4j
Cluster
Application
Server
Neo4j
DriverMax
Jim
Jane
Mar
k
Routed write statements
driver = GraphDatabase.driver( "bolt+routing://aCoreServer" );
try ( Session session = driver.session( AccessMode.WRITE ) )
{
try ( Transaction tx = session.beginTransaction() )
{
tx.run( "MERGE (user:User {userId: {userId}})",
parameters( "userId", userId ) );
tx.success();
}
}
Bookmark
• Session token• String (for portability)• Opaque to application• Represents ultimate user’s most
recent view of the graph• More capabilities to come
Data
Redundancy
Massive
ThroughputHigh
Availability
3.0
Bigger ClustersConsensus
Commit
Built-in load
balancing
3.1Causal
Clusteri
ng
Neo4j 3.0 Neo4j 3.1High Availability
ClusterCausal Cluster
Master-Slave architecture
Paxos consensus used for
master election
Raft protocol used for leader
election, membership changes
and
commitment of all
transactions
Two part cluster: writeable
Core and read-only read
replicas.
Transaction committed
once written durably on
the master
Transaction committed once written
durably on a majority of the core
members
Practical deployments:
10s servers
Practical deployments: 100s
servers
Causal Clustering - Features
• Two Zones – Core + Edge
• Group of Core Servers – Consistent and Partition tolerant (CP)
• Transactional Writes
• Quorum Writes, Cluster Membership, Leader via Raft Consensus
• Scale out with Read Replicas
• Smart Bolt Drivers with
• Routing, Read & Write Sessions
• Causal Consistency with Bookmarks
• For massive query throughput
• Read-only replicas• Not involved in Consensus
Commit • Disposable, suitable for
auto-scaling
Replica
• Small group of Neo4j databases
• Fault-tolerant Consensus Commit
• Responsible for data safety
Core
Writing to the Core Cluster – Raft Consensus
CommitsNeo4j
Driver
✓
✓
✓
Success
Neo4j
Cluster
Application
Server
Neo4j
DriverMax
Jim
Jane
Mar
k
Routed write statements
driver = GraphDatabase.driver( "bolt+routing://aCoreServer" );
try ( Session session = driver.session( AccessMode.WRITE ) )
{
try ( Transaction tx = session.beginTransaction() )
{
tx.run( "MERGE (user:User {userId: {userId}})“, parameters( "userId",
userId ) );
tx.success();
}
}
Bookmark
• Session token• String (for portability)• Opaque to application• Represents ultimate user’s most
recent view of the graph• More capabilities to come
Data
Redundancy
Massive
ThroughputHigh
Availability
3.0
Bigger ClustersConsensus
Commit
Built-in load
balancing
3.1Causal
Clusteri
ng
Flexible Authentication Options
Choose authentication method
• Built-in native users repositoryTesting/POC, single-instance deployments
• LDAP connector to Active Directory or openLDAP
Production deployments
• Custom auth provider plugins
Special deployment scenarios
128
CustomPlugin
Active Directory openLDAP
LDAP connector
LDAP connector
Auth PluginExtension Module
Built-inNative Users
Neo4j
Built-in Native Users
Auth Plugin Extension Module
129
Flexible Authentication OptionsLDAP Group to Role Mapping
dbms.security.ldap.authorization.group_to_role_mapping= \
"CN=Neo4j Read Only,OU=groups,DC=example,DC=com" = reader; \
"CN=Neo4j Read-Write,OU=groups,DC=example,DC=com" = publisher; \
"CN=Neo4j Schema Manager,OU=groups,DC=example,DC=com" = architect; \
"CN=Neo4j Administrator,OU=groups,DC=example,DC=com" = admin; \
"CN=Neo4j Procedures,OU=groups,DC=example,DC=com" = allowed_role./conf/neo4j.conf
CN=Bob Smith
CN=Carl JuniorOU=peopleDC=example
DC=com
BASE DN
OU=groups
CN=Neo4j Read Only
CN=Neo4j Read-Write
CN=Neo4j Schema Manager
CN=Neo4j Administrator
CN=Neo4j Procedures
Map to Neo4j permissions
Use Cases
Case Study: Knowledge Graphs at eBay
Case Study: Knowledge Graphs at eBay
Case Study: Knowledge Graphs at eBay
Case Study: Knowledge Graphs at eBay
Bags
Men’s Backpack
Handbag
Case Study: Knowledge Graphs at eBay
Case studySolving real-time recommendations for the World’s largest retailer.
Challenge
• In its drive to provide the best web experience for its customers, Walmart wanted to optimize its online recommendations.
• Walmart recognized the challenge it faced in delivering recommendations with traditional relational database technology.
• Walmart uses Neo4j to quickly query customers’ past purchases, as well as instantly capture any new interests shown in the customers’ current online visit – essential for making real-time recommendations.
Use of Neo4j
“As the current market leader in graph databases, and with enterprise features for scalability and availability, Neo4j is the right choice to meet our demands”.
- Marcos Vada, Walmart
• With Neo4j, Walmart could substitute a heavy batch process with a simple and real-time graph database.
Result/Outcome
Case studyeBay Now Tackles eCommerce Delivery Service Routing with Neo4j
Challenge
• The queries used to select the best courier for eBays routing system were simply taking too long and they needed a solution to maintain a competitive service.
• The MySQL joins being used created a code base too slow and complex to maintain.
• eBay is now using Neo4j’s graph database platform to redefine e-commerce, by making delivery of online and mobile orders quick and convenient.
Use of Neo4j
• With Neo4j eBay managed to eliminate the biggest roadblock between retailers and online shoppers: the option to have your item delivered the same day.
• The schema-flexible nature of the database allowed easy extensibility, speeding up development.
• Neo4j solution was more than 1000x faster than the prior MySQL Soltution.
Our Neo4j solution is literally thousands of times faster than the prior MySQL solution, with queries that require 10-100 times less code.
Result/Outcome
– Volker Pacher, eBay
Top Tier US RetailerCase studySolving Real-time promotions for a top US
retailer
Challenge
• Suffered significant revenues loss, due to legacy infrastructure.
• Particularly challenging when handling transaction volumes on peak shopping occasions such as Thanksgiving and Cyber Monday.
• Neo4j is used to revolutionize and reinvent its real-time promotions engine.
• On an average Neo4j processes 90% of this retailer’s 35M+ daily transactions, each 3-22 hops, in 4ms or less.
Use of Neo4j
• Reached an all time high in online revenues, due to the Neo4j-based friction free solution.
• Neo4j also enabled the company to be one of the first retailers to provide the same promotions across both online and traditional retail channels.
“On an average Neo4j processes 90% of this retailer’s 35M+ daily transactions, each 3-22 hops, in 4ms or less.”
– Top Tier US Retailer
Result/Outcome
Relational DBs Can’t Handle Relationships Well
• Cannot model or store data and relationships without complexity
• Performance degrades with number and levels of relationships, and database size
• Query complexity grows with need for JOINs
• Adding new types of data and relationships requires schema redesign, increasing time to market
… making traditional databases inappropriatewhen data relationships are valuable in real-time
Slow developmentPoor performance
Low scalabilityHard to maintain
Unlocking Value from Your Data Relationships
• Model your data as a graph of data and relationships
• Use relationship information in real-time to transform your business
• Add new relationships on the fly to adapt to your changing business
MATCH (sub)-[:REPORTS_TO*0..3]->(boss),(report)-[:REPORTS_TO*1..3]->(sub)
WHERE boss.name = "Andrew K."RETURN sub.name AS Subordinate, count(report) AS Total
Express Complex Queries Easily with Cypher
Find all direct reports and how many people they manage, up to 3 levels down
Cypher Query
SQL Query