Data Modeling with Neo4j

Post on 27-Jan-2015

117 views 1 download

Tags:

description

This presentation covers several aspects of modeling data and domains with a graph database like Neo4j. The graph data model allows high fidelity modeling. Using the first class relationships of the graph model allow to use much higher forms of normalization than you would use in a relational database. Video here: https://vimeo.com/67371996

Transcript of Data Modeling with Neo4j

Data Modeling with Neo4j

1

Michael Hunger, Neo Technology@neo4j | @mesirii | michael@neo4j.org

(Michael) -[:WORKS_ON]-> (Neo4j)

ME

Spring Cloud

Community

Cypher

console

community graph

Server

2

3

is a

4

5

NOSQL

Graph Database

6

A graph database...

7

A graph database...

7

NO: not for charts & diagrams, or vector artwork

A graph database...

7

NO: not for charts & diagrams, or vector artwork

YES: for storing data that is structured as a graph

A graph database...

7

NO: not for charts & diagrams, or vector artwork

YES: for storing data that is structured as a graph

remember linked lists, trees?

A graph database...

7

NO: not for charts & diagrams, or vector artwork

YES: for storing data that is structured as a graph

remember linked lists, trees?

graphs are the general-purpose data structure

A graph database...

7

NO: not for charts & diagrams, or vector artwork

YES: for storing data that is structured as a graph

remember linked lists, trees?

graphs are the general-purpose data structure

“A relational database may tell you the average age of everyone in this place,

but a graph database will tell you who is most likely to buy you a beer.”

8

You know relational

8

You know relational

8

You know relational

8

foo

You know relational

8

foo bar

You know relational

8

foo barfoo_bar

You know relational

8

foo barfoo_bar

You know relational

8

foo barfoo_bar

You know relational

8

foo barfoo_bar

You know relational

8

now consider relationships...

You know relational

8

now consider relationships...

You know relational

8

now consider relationships...

You know relational

8

now consider relationships...

You know relational

8

now consider relationships...

You know relational

8

now consider relationships...

8

9

We're talking about aProperty Graph

9

We're talking about aProperty Graph

9

Nodes

We're talking about aProperty Graph

9

Nodes

Relationships

Emil

Andrés

Lars

Johan

Allison

Peter

Michael

Tobias

Andreas

IanMica

Delia

knows

knows

knowsknows

knows

knows

knows

knows

knows

knowsMica

knowsknowsMica

Delia

knows

We're talking about aProperty Graph

9

Nodes

Relationships

Properties (each a key+value)

Emil

Andrés

Lars

Johan

Allison

Peter

Michael

Tobias

Andreas

IanMica

Delia

knows

knows

knowsknows

knows

knows

knows

knows

knows

knowsMica

knowsknowsMica

Delia

knows

We're talking about aProperty Graph

9

Nodes

Relationships

Properties (each a key+value)

+ Indexes (for easy look-ups)

Aggregate vs. Connected Data-Model

10

NOSQL

RelationalGraph

Document

KeyValue

Riak

Column oriented

11

Redis

Cassandra

Mongo

Couch

Neo4j

MySQL Postgres

NOSQL Databases

12

“There is a significant downside - the whole approach works really well when data access is aligned with the aggregates, but what if you want to look at the data in a different way? Order entry naturally stores orders as aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the database is that it allows you to slice and dice your data different ways for different audiences.

This is why aggregate-oriented stores talk so much about map-reduce.”

Martin Fowler

Aggregate Oriented Model

13

The connected data model is based on fine grained elements that are richly connected, the emphasis is on extracting many

dimensions and attributes as elements. Connections are cheap and can be used not only for the

domain-level relationships but also for additional structures that allow efficient access for different use-cases. The fine

grained model requires a external scope for mutating operations that ensures Atomicity, Consistency, Isolation and

Durability - ACID also known as Transactions.

Michael Hunger

Connected Data Model

Data Modeling

14

Why Data Modeling

15

๏What is modeling?

๏Aren‘t we schema free?

๏How does it work in a graph?

๏Where should modeling happen? DB or Application

Data Models

16

Model mis-match

Real World Model

Model mis-match

Application Model Database Model

Trinity of models

Whiteboard --> Data

20

Whiteboard --> Data

20

Andreas Peter

Emil

Allison

Whiteboard --> Data

20

Andreas Peter

Emil

Allison

knows

knows knows

knows

Whiteboard --> Data

20

Andreas Peter

Emil

Allison

knows

knows knows

knows

Whiteboard --> Data

20

Andreas Peter

Emil

Allison

knows

knows knows

knows

// Cypher query - friend of a friendstart n=node(0)match (n)--()--(foaf) return foaf

21

You traverse the graph

21

You traverse the graph

21

// lookup starting point in an indexSTART n=node:People(name = ‘Andreas’)

Andreas

You traverse the graph

21

// lookup starting point in an indexSTART n=node:People(name = ‘Andreas’)

Andreas

You traverse the graph

21

// then traverse to find resultsSTART me=node:People(name = ‘Andreas’MATCH (me)-[:FRIEND]-(friend)-[:FRIEND]-(friend2) RETURN friend2

21

SELECT skills.*, user_skill.* FROM users JOIN user_skill ON users.id = user_skill.user_id JOIN skills ON user_skill.skill_id = skill.id WHERE users.id = 1

22

START user = node(1) MATCH user -[user_skill]-> skill RETURN skill, user_skill

An Example

23

What language do they speak here?

Language Country

What language do they speak here?

Language Country

What language do they speak here?

Language Country

Tables

language_codelanguage_nameword_count

Languagecountry_codecountry_nameflag_uri

Country

Need to model the relationship

language_codelanguage_nameword_count

Languagecountry_codecountry_nameflag_urilanguage_code

Country

What if the cardinality changes?

language_codelanguage_nameword_countcountry_code

Languagecountry_codecountry_nameflag_uri

Country

Or we go many-to-many?

language_codelanguage_nameword_count

Languagecountry_codecountry_nameflag_uri

Countrylanguage_codecountry_code

LanguageCountry

Or we want to qualify the relationship?

language_codelanguage_nameword_count

Languagecountry_codecountry_nameflag_uri

Countrylanguage_codecountry_codeprimary

LanguageCountry

Start talking about Graphs

Explicit Relationship

nameword_count

Languagenameflag_uri

Country

IS_SPOKEN_IN

Relationship Properties

nameword_count

Languagenameflag_uri

Country

IS_SPOKEN_INas_primary

What’s different?

language_codelanguage_nameword_count

Languagecountry_codecountry_nameflag_uri

Countrylanguage_codecountry_codeprimary

LanguageCountryIS_SPOKEN_IN

What’s different?๏ Implementation of maintaining relationships is left up

to the database๏ Artificial keys disappear or are unnecessary๏ Relationships get an explicit name

• can be navigated in both directions

Relationship specialisation

nameword_count

Languagenameflag_uri

Country

IS_SPOKEN_INas_primary

Bidirectional relationships

nameword_count

Languagenameflag_uri

Country

IS_SPOKEN_IN

PRIMARY_LANGUAGE

Weighted relationships

nameword_count

Languagenameflag_uri

Country

POPULATION_SPEAKSpopulation_fraction

Keep on adding relationships

nameword_count

Languagenameflag_uri

Country

POPULATION_SPEAKSpopulation_fraction

SIMILAR_TO ADJACENT_TO

EMBRACE the paradigm

Use the building blocks๏ Nodes

๏ Relationships

๏ Properties name: value

RELATIONSHIP_NAME

Anti-pattern: rich properties

name: “Canada”languages_spoken: “[ ‘English’, ‘French’ ]”

Normalize Nodes

Anti-Pattern: Node represents multiple concepts

nameflag_urilanguage_namenumber_of_wordsyes_in_languageno_in_languagecurrency_codecurrency_name

Country

USES_CURRENCY

Split up in separate concepts

nameflag_uricurrency_codecurrency_name

Countrynamenumber_of_wordsyesno

Country

SPEAKS

Currencycurrency_codecurrency_name

Challenge: Property or Relationship?

๏ Can every property be replaced by a relationship?• Hint: triple stores. Are they easy to use?

๏ Should every entities with the same property values be connected?

Object Mapping๏ Similar to how you would map objects to a relational

database, using an ORM such as Hibernate๏ Generally simpler and easier to reason about๏ Examples

• Java: Spring Data Neo4j• Ruby: Active Model

๏ Why Map?• Do you use mapping because you are scared of SQL?• Following DDD, could you write your repositories

directly against the graph API?

CONNECT for fast accessIn-Graph Indices

Relationships for querying๏ like in other databases

• same structure for different use-cases (OLTP and OLAP) doesn‘t work

• graph allows: add more structures๏ Relationships should the primary means to access

nodes in the database๏ Traversing relationships is cheap – that’s the whole

design goal of a graph database๏ Use lookups only to find starting nodes for a query

Data Modeling examples in Manual

Anti-pattern: unconnected graph

name: “Jones” name: “Jones”

name: “Jones”

name: “Jones”name: “Jones”

name: “Jones”

name: “Jones” name: “Jones”

name: “Jones”

name: “Jones”

name: “Jones”

Pattern: Linked List

52

Pattern: Multiple Relationships

53

Pattern-Trees: Tags and Categories

54

Pattern-Tree: Multi-Level-Tree

55

Pattern-Trees: R-Tree (spatial)

56

Example: Activity Stream

57

Graph Evolution

58

Combine multiple Domains in a Graph๏ you start with a single domain๏ add more connected domains as your system evolves๏ more domains allow to ask different queries๏ one domain „indexes“ the other๏ Example Facebook Graph Search

• social graph• location graph• activity graph• favorite graph• ...

Notes on the Graph Data Model๏Schema free, but constraints

๏Model your graph with a whiteboard and a wise man

๏Nodes as main entities but useless without connections

๏Relationships are first level citizens in the model and database

๏Normalize more than in a relational database

๏use meaningful relationship-types, not generic ones like IS_

๏use in-graph structures to allow different access paths

๏evolve your graph to your needs, incremental growth

61

Realworld Examples

62

63

63

Real World Use Cases:

63

Real World Use Cases:

•[A] ACL from Hell

63

Real World Use Cases:

•[A] ACL from Hell

•[B] Timely recommendations

63

Real World Use Cases:

•[A] ACL from Hell

•[B] Timely recommendations

•[C] Global collaboration

[A] ACL from Hell

64

[A] ACL from Hell๏ Customer:

• leading consumer utility company with tons and tons of users

๏ Goal:

• comprehensive access control administration for customers

๏ Benefits:

• Flexible and dynamic architecture

• Exceptional performance

• Extensible data model supports new applications and features

• Low cost

64

[A] ACL from Hell๏ Customer:

• leading consumer utility company with tons and tons of users

๏ Goal:

• comprehensive access control administration for customers

๏ Benefits:

• Flexible and dynamic architecture

• Exceptional performance

• Extensible data model supports new applications and features

• Low cost

64

• A Reliable access control administration system for

5 million customers, subscriptions and agreements

• Complex dependencies between groups, companies, individuals, accounts, products, subscriptions, services and agreements

• Broad and deep graphs (master customers with 1000s of customers, subscriptions & agreements)

[A] ACL from Hell๏ Customer:

• leading consumer utility company with tons and tons of users

๏ Goal:

• comprehensive access control administration for customers

๏ Benefits:

• Flexible and dynamic architecture

• Exceptional performance

• Extensible data model supports new applications and features

• Low cost

64

• A Reliable access control administration system for

5 million customers, subscriptions and agreements

• Complex dependencies between groups, companies, individuals, accounts, products, subscriptions, services and agreements

• Broad and deep graphs (master customers with 1000s of customers, subscriptions & agreements)

name: Andreas

subscription: sports

service: NFL

account: 9758352794

agreement: ultimate

owns

subscribes to

has plan

includes

provides group: graphistas

promotion: fall

member of

offered

discounts

company: Neo Technologyworks with

gets discount on

subscription: local

subscribes to

provides service: Ravens

includes

[B] Timely Recommendations

65

[B] Timely Recommendations๏ Customer:

• a professional social network

• 35 millions users, adding 30,000+ each day

๏ Goal: up-to-date recommendations

• Scalable solution with real-time end-user experience

• Low maintenance and reliable architecture

• 8-week implementation

65

[B] Timely Recommendations๏ Customer:

• a professional social network

• 35 millions users, adding 30,000+ each day

๏ Goal: up-to-date recommendations

• Scalable solution with real-time end-user experience

• Low maintenance and reliable architecture

• 8-week implementation

65

๏ Problem:

• Real-time recommendation imperative to attract new users and maintain positive user retention

• Clustered MySQL solution not scalable or fast enough to support real-time requirements

๏ Upgrade from running a batch job

• initial hour-long batch job

• but then success happened, and it became a day

• then two days

๏ With Neo4j, real time recommendations

[B] Timely Recommendations๏ Customer:

• a professional social network

• 35 millions users, adding 30,000+ each day

๏ Goal: up-to-date recommendations

• Scalable solution with real-time end-user experience

• Low maintenance and reliable architecture

• 8-week implementation

65

๏ Problem:

• Real-time recommendation imperative to attract new users and maintain positive user retention

• Clustered MySQL solution not scalable or fast enough to support real-time requirements

๏ Upgrade from running a batch job

• initial hour-long batch job

• but then success happened, and it became a day

• then two days

๏ With Neo4j, real time recommendations

name:Andreasjob: talking

name: Allisonjob: plumber

name: Tobiasjob: coding

knows

knows

name: Peterjob: building

name: Emiljob: plumber

knows

name: Stephenjob: DJ

knows

knows

name: Deliajob: barking

knows

knows

name: Tiberiusjob: dancer

knows

knows

knows

knows

[C] Collaboration on Global Scale

66

[C] Collaboration on Global Scale๏ Customer: a worldwide software leader

• highly collaborative end-users

๏ Goal: offer an online platform for global collaboration

• Highly flexible data analysis

• Sub-second results for large, densely-connected data

• User experience - competitive advantage

66

[C] Collaboration on Global Scale๏ Customer: a worldwide software leader

• highly collaborative end-users

๏ Goal: offer an online platform for global collaboration

• Highly flexible data analysis

• Sub-second results for large, densely-connected data

• User experience - competitive advantage

66

• Massive amounts of data tied to members, user groups, member content, etc. all interconnected

• Infer collaborative relationships through user-generated content

• Worldwide Availability

[C] Collaboration on Global Scale๏ Customer: a worldwide software leader

• highly collaborative end-users

๏ Goal: offer an online platform for global collaboration

• Highly flexible data analysis

• Sub-second results for large, densely-connected data

• User experience - competitive advantage

66

• Massive amounts of data tied to members, user groups, member content, etc. all interconnected

• Infer collaborative relationships through user-generated content

• Worldwide Availability

Asia North America Europe

[C] Collaboration on Global Scale๏ Customer: a worldwide software leader

• highly collaborative end-users

๏ Goal: offer an online platform for global collaboration

• Highly flexible data analysis

• Sub-second results for large, densely-connected data

• User experience - competitive advantage

66

• Massive amounts of data tied to members, user groups, member content, etc. all interconnected

• Infer collaborative relationships through user-generated content

• Worldwide Availability

Asia North America Europe

Asia North America Europe

How to get started?

67

How to get started?๏ Documentation

67

How to get started?๏ Documentation

• neo4j.org

67

How to get started?๏ Documentation

• neo4j.org

67

How to get started?๏ Documentation

• neo4j.org

‣http://www.neo4j.org/learn/nosql

67

How to get started?๏ Documentation

• neo4j.org

‣http://www.neo4j.org/learn/nosql

67

How to get started?๏ Documentation

• neo4j.org

‣http://www.neo4j.org/learn/nosql

• docs.neo4j.org - tutorials+reference

67

How to get started?๏ Documentation

• neo4j.org

‣http://www.neo4j.org/learn/nosql

• docs.neo4j.org - tutorials+reference

67

How to get started?๏ Documentation

• neo4j.org

‣http://www.neo4j.org/learn/nosql

• docs.neo4j.org - tutorials+reference

‣Data Modeling Examples

67

How to get started?๏ Documentation

• neo4j.org

‣http://www.neo4j.org/learn/nosql

• docs.neo4j.org - tutorials+reference

‣Data Modeling Examples

• http://console.neo4j.org

67

How to get started?๏ Documentation

• neo4j.org

‣http://www.neo4j.org/learn/nosql

• docs.neo4j.org - tutorials+reference

‣Data Modeling Examples

• http://console.neo4j.org

67

How to get started?๏ Documentation

• neo4j.org

‣http://www.neo4j.org/learn/nosql

• docs.neo4j.org - tutorials+reference

‣Data Modeling Examples

• http://console.neo4j.org

• Neo4j in Action

67

How to get started?๏ Documentation

• neo4j.org

‣http://www.neo4j.org/learn/nosql

• docs.neo4j.org - tutorials+reference

‣Data Modeling Examples

• http://console.neo4j.org

• Neo4j in Action

67

How to get started?๏ Documentation

• neo4j.org

‣http://www.neo4j.org/learn/nosql

• docs.neo4j.org - tutorials+reference

‣Data Modeling Examples

• http://console.neo4j.org

• Neo4j in Action

• Good Relationships

67

How to get started?๏ Documentation

• neo4j.org

‣http://www.neo4j.org/learn/nosql

• docs.neo4j.org - tutorials+reference

‣Data Modeling Examples

• http://console.neo4j.org

• Neo4j in Action

• Good Relationships

๏ Worldwide one-day Neo4j Trainings

67

How to get started?๏ Documentation

• neo4j.org

‣http://www.neo4j.org/learn/nosql

• docs.neo4j.org - tutorials+reference

‣Data Modeling Examples

• http://console.neo4j.org

• Neo4j in Action

• Good Relationships

๏ Worldwide one-day Neo4j Trainings

๏ Get Neo4j

67

How to get started?๏ Documentation

• neo4j.org

‣http://www.neo4j.org/learn/nosql

• docs.neo4j.org - tutorials+reference

‣Data Modeling Examples

• http://console.neo4j.org

• Neo4j in Action

• Good Relationships

๏ Worldwide one-day Neo4j Trainings

๏ Get Neo4j

• http://neo4j.org/download

67

How to get started?๏ Documentation

• neo4j.org

‣http://www.neo4j.org/learn/nosql

• docs.neo4j.org - tutorials+reference

‣Data Modeling Examples

• http://console.neo4j.org

• Neo4j in Action

• Good Relationships

๏ Worldwide one-day Neo4j Trainings

๏ Get Neo4j

• http://neo4j.org/download

• http://addons.heroku.com/neo4j/

67

68

68

Really, once you start thinking in graphs it's hard to stop

Recommendations MDM

Systems Management

Geospatial

Social computing

Business intelligence

Biotechnology

Making Sense of all that data

your brainaccess control

linguistics

catalogs

genealogy routing

compensation market vectors

68

Really, once you start thinking in graphs it's hard to stop

Recommendations MDM

Systems Management

Geospatial

Social computing

Business intelligence

Biotechnology

Making Sense of all that data

your brainaccess control

linguistics

catalogs

genealogy routing

compensation market vectors

What will you build?

Thank You!Questions ?

69