MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Post on 15-Jan-2015

3.376 views 1 download

Tags:

description

This session is a walk through and best practices from installation and initial set up, through maintenance and performance tuning, all the way to production use for a series of Neo4j learning opportunities for administrators.

Transcript of MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

(MySQL)-[:to]->(neo4j)A DBA Perspective

Dave Stern

@davestern1

Dev Ops @ FiftyThreeMySQL user & admin since 1998

Multiple tiers of masters & slaves

Bare metal & AWS - EC2/RDS

MySQL & Percona

neo4j user & admin since 2012

neo4j 1.8, 1.9

AWS: Multiple 3-instance enterprise clusters

How do you use MySQL?

Single Instance

Master/Slave, Multi-master

MySQL Cluster

Have you tried neo4j yet?

Where does FiftyThree useneo4j?

Where does FiftyThree useneo4j?

Much more in development...

What is this talk about?

Comparison

Configuration

Use

Comparison

Logical Partitioning

http://www.mysql.com/products/workbench/

MySQLStrictly enforced schema

neo4jNo logical databases

No tables

...no schema

...no joins

2.0: schema-optional

Physical Partitioning & ShardingImproves write performance, usually disk I/O

MySQLinnodb_file_per_table

Databases on separate partitions or devices

Shard horizontally (e.g. by time range)

Shard vertically (e.g. by table or function)

Logs can be on separate partitions for I/O

gain

neo4jNo logical partitioning by DB or table

Highly connected data: no clear separation

Logs can be on separate partitions for I/O

gain

SCALE UP!

Authentication & AuthorizationMySQL

mysql> select Host, db, user, select_priv, insert_priv, update_priv, delete_priv from db;+-----------+---------+-----------+-------------+-------------+-------------+-------------+| Host | db | user | select_priv | insert_priv | update_priv | delete_priv |+-----------+---------+-----------+-------------+-------------+-------------+-------------+| % | test | | Y | Y | Y | Y || % | test\_% | | Y | Y | Y | Y || localhost | Orders | admin | Y | Y | Y | Y || localhost | Events | admin | Y | Y | Y | Y || localhost | Events | events | Y | Y | Y | N || 10.% | Events | events | Y | N | N | N |+-----------+---------+-----------+-------------+-------------+-------------+-------------+

Authentication & Authorizationneo4j

No permissions

No users

How do you secure the DB?1. Protect the database in a Private Network or VPC2. Firewall: router, AWS Security Groups, iptables3. Proxy requests via web server or Load Balancer

If you must allow access, use HTTPS & authenticate at the proxy.

Replication

http://www.mysqlperformanceblog.com/wp-content/uploads/2013/07/23.png

Replication STOP SLAVE; SET GLOBAL sql_slave_skip_counter = 1; START SLAVE;

Replication vs. HA

MySQLFree

Slaves pull updates

Eventual consistency

One-way, asynchronous

neo4jEnterprise edition: can cost $depending on use

Slaves can pull asynchronousupdates

Eventual consistency, optimisticpushes to slaves are the default

Writes to any cluster member

JVMBuffers & Memory management =~ JVM settings

The database itself is extendable via Java

... if you're into that sort of thing

Built-in ToolsData Browser

Built-in ToolsData BrowserBackup Script

neo4j

$ /opt/neo4j/bin/neo4j-backup -from single://10.66.182.177:6362 \> -to /media/neo4j-backup/production/2013-11-02T05:40:10ZPerforming full backup from 'single://10.66.182.177:6362'............................................[44 Files copied]Full consistency check.................... 10%.................... 20%.................... 30%.................... 40%.................... 50%.................... 60%.................... 70%.................... 80%.................... 90%.................... 100%Done

Built-in ToolsData BrowserBackup Script

MySQL

$ innobackupex --user=DBUSER --password=DBUSERPASS /path/to/BACKUP-DIR/

innobackupex: Backup created in directory '/path/to/BACKUP-DIR/2013-03-25_00-00-09'innobackupex: MySQL binlog position: filename 'mysql-bin.000003',position 1946111225 00:00:53innobackupex: completed OK!

Built-in ToolsData BrowserBackup Script

Visual Server Info

ConfigurationMySQL

So many options... mysql> SHOW VARIABLES; +-----------------------------------------+---------------------------+ | Variable_name | Value | +-----------------------------------------+---------------------------+ | auto_increment_increment | 1 | | auto_increment_offset | 1 | | autocommit | ON | | automatic_sp_privileges | ON | | back_log | 50 | | basedir | /home/mysql/bin/mysql-5.5 | | big_tables | OFF | | binlog_cache_size | 32768 | | binlog_direct_non_transactional_updates | OFF | | binlog_format | STATEMENT | | binlog_stmt_cache_size | 32768 | | bulk_insert_buffer_size | 8388608 | ... | max_allowed_packet | 1048576 | | max_binlog_cache_size | 18446744073709547520 | | max_binlog_size | 1073741824 | | max_binlog_stmt_cache_size | 18446744073709547520 | | max_connect_errors | 10 | | max_connections | 151 | | max_delayed_threads | 20 | | max_error_count | 64 | | max_heap_table_size | 16777216 | | max_insert_delayed_threads | 20 | | max_join_size | 18446744073709551615 | ...

You can optimize dozens of settings like these...

MySQL ConfigurationBuffers, Caching & I/O

innodb_buffer_pool_size = 12Ginnodb_buffer_pool_instances = 8innodb_additional_mem_pool_size = 256M

innodb_flush_log_at_trx_commit = 2innodb_flush_method = O_DIRECTinnodb_log_file_size = 128Minnodb_log_buffer_size = 64M

innodb_file_per_tableinnodb_io_capacity = 500innodb_read_io_threads = 64innodb_write_io_threads = 64

and these...

MySQL ConfigurationNetwork & Concurrency

table_cache = 2048max_connections = 1000

max_allowed_packet = 16M

and these...

MySQL ConfigurationReplication

server-id = 2master-host = db-master.mycompany.commaster-port = 3306master-user = usernamemaster-password = passwordmaster-connect-retry = 60

And these, depending on version & hardware...

MySQL ConfigurationOther

sort_buffer_size = 2Mtmp_table_size = 32M

join_buffer_size = 128k

query_cache_type = 1query_cache_size = 64M

open_files_limit = 8192

....

neo4j Configuration TuningSimple Questions

How many nodes do you expect?

How many relationships do you expect?

Average number of properties per node and relationship?

Optional: How do you expect to traverse the graph?

Long paths and/or large result sets?

Short paths and/or small results sets?

3 things to calculate:File Cache Mapped Memory & Object Caches

Heap Size

RAM for OS

neo4j ConfigurationStore file Record size Contents

neostore.nodestore.db 9 B Nodes

neostore.relationshipstore.db 33 B Relationships

neostore.propertystore.db 41 B Properties for nodes andrelationships

neostore.propertystore.db.strings 128 B Values of string properties

neostore.propertystore.db.arrays 128 B Values of array properties

Capacity Planning Estimates:

Node size (9B) x expected nodes (14 B in 2.0)

Relaltionship size (33B) x expected relationships

Property size (41B) x expected properties

Strings & Arrays

ConfigurationMain config files

neo4j-wrapper.conf

neo4j.properties

neo4j-server.properties

Configurationneo4j-wrapper.conf

Heap Size

GC method

Configurationneo4j.properties

File Caches: Mapped memory

Object Caches

Indexes

HA

Backup

Configurationneo4j-server.properties

HTTP/S

Admin client

REST

Database mode

Logging

Configuration21.2. Server Configuration

25. Configuration & Performance

neo4j: Buffers, Caching & I/Oneo4j-wrapper.conf

# Initial Java Heap Size (in MB)wrapper.java.initmemory=1024

# Maximum Java Heap Size (in MB)wrapper.java.maxmemory=1024

neo4j: Buffers, Caching & I/Oneo4j.properties

Two types of caches: file buffer and object cache

File Buffer Cache:

# Default values for the low-level graph engineneostore.nodestore.db.mapped_memory=25Mneostore.relationshipstore.db.mapped_memory=50Mneostore.propertystore.db.mapped_memory=90Mneostore.propertystore.db.strings.mapped_memory=130Mneostore.propertystore.db.arrays.mapped_memory=130M

Object Cache:

node_cache_size=256Mrelationship_cache_size=256M# optionalnode_cache_array_fraction=5relationship_cache_array_fraction=5

# The GC resistant cache described below is only available in the# Neo4j Enterprise Edition.# cache_type values: soft (default), weak, strongcache_type=gcr

neo4j: Concurrencyneo4j.properties

# concurrent HTTP requests that the server will service.org.neo4j.server.webserver.maxthreads=64

neo4j: HAneo4j-server.properties

org.neo4j.server.database.mode=HA

neo4j.properties

ha.server_id=1

ha.initial_hosts=server1:5001,server2:5001#ha.discovery.url=http://example.com/list

#Host & port to bind the cluster management communication.ha.cluster_server=server1:5001

#Hostname and port to bind the HA server.ha.server=my-domain.com:6001

##### Optional cluster strategies ###### Interval of pulling updates from master.ha.pull_interval=10s

#The amount of slaves the master will ask to replicate a committed#transaction.ha.tx_push_factor=1

#Push strategy of a transaction to a slave during commit.ha.tx_push_strategy=fixed # or round_robin

UseFile System

$PATH_TO_NEO4J = /opt/neo4j

/opt/neo4j/bin neo4j neo4j-backup

/opt/neo4j/conf neo4j.properties neo4j-server.properties neo4j-wrapper.conf

/opt/neo4j/data

/opt/neo4j/data/graph.db The actual graph data

/opt/neo4j/data/log All logs

UseFile System

$PATH_TO_NEO4J = /opt/neo4j

/opt/neo4j/bin (/usr/bin/mysql) neo4j neo4j-backup

/opt/neo4j/conf (/etc/mysql) neo4j.properties neo4j-server.properties neo4j-wrapper.conf

/opt/neo4j/data (/var/lib/mysql)

/opt/neo4j/data/graph.db (/var/lib/mysql/data) The actual graph data

/opt/neo4j/data/log (/var/log/mysql) All logs

UseIndexes

The database itself is a natural index

Lucene for searches

neo4j 2.0:Nodes have labels: Person, Location, etc. that group them into sets

CREATE INDEX ON :Person(name)

Look familiar?

CREATE INDEX id_index ON Person (id);

UseIndexesneo4j 2.0:

Properties can have unique constraints

CREATE CONSTRAINT ON (book:Book) ASSERT book.isbn IS UNIQUE

Look familiar?

CREATE UNIQUE INDEX email_index ON Person (email);

UseIndexes

Current 1.9.x:

Auto indexing (deprecated):

one for nodes, one for relationships

off by default

UseQuerying

mysql> select * from graph_local limit 10;+----+-------------------+---------+---------------+------------+| id | graph_template_id | host_id | snmp_query_id | snmp_index |+----+-------------------+---------+---------------+------------+| 1 | 12 | 1 | 0 | || 2 | 9 | 1 | 0 | || 3 | 10 | 1 | 0 | || 4 | 8 | 1 | 0 | || 5 | 58 | 2 | 0 | || 6 | 62 | 2 | 0 | || 7 | 53 | 2 | 0 | || 8 | 37 | 2 | 0 | || 9 | 67 | 2 | 0 | || 10 | 65 | 2 | 0 | |+----+-------------------+---------+---------------+------------+10 rows in set (0.00 sec)

http://www.mysql.com/products/workbench/

Example response:

UseQuerying via REST

POST http://localhost:7474/db/data/cypherAccept: application/json; charset=UTF-8Content-Type: application/json

{ "query" : "start x = node:node_auto_index(name={startName}) match path = (x-[r]-friend) where friend.name = {name} return TYPE(r)", "params" : { "startName" : "I", "name" : "you" }}

200: OKContent-Type: application/json; charset=UTF-8

{ "columns" : [ "TYPE(r)" ], "data" : [ [ "know" ] ]}

DBA PerspectiveUse the best database for the job, or both

neo4j ships with great tools

neo4j is easier to configure: fewer options, less complex, still flexiblefor optimization

HA more robust and more opaque than basic replication

For better or worse, JVM handles a lot for you

Authorization - it's up to you

Scaling up is easier than changing your data model

We're hiringjobs@fiftythree.com

Thank You!Thanks to:

Aseem Kishore @aseemk

Chris Leishman @cleishm

Max De Marzi @maxdemarzi