Scale Out Your Graph Across Servers and Clouds with OrientDB
-
Upload
luca-garulli -
Category
Software
-
view
165 -
download
2
Transcript of Scale Out Your Graph Across Servers and Clouds with OrientDB
Scale Out Your Graph Across Servers and Clouds
with OrientDB
#gdsf17
Luca Garulli, Founder and CEO @lgarulli GraphDay - San Francisco - June 17, 2017
Copyright (c) - OrientDB LTD 2
We all want the same thing: an open source GraphDB that is
Fast, Flexible, Scalable and … Unbreakable!
Copyright (c) - OrientDB LTD 3
Complexity
Scal
ability
Single Thread
Multi Thread
Distributed
Systems/Complexity
Copyright (c) - OrientDB LTD 4
Master Node
Auto-Discovery
I’m the only one!
C
Copyright (c) - OrientDB LTD 5
Auto-Discovery
Master Node
Master Node
Connected!
C
Copyright (c) - OrientDB LTD 6
Master Node
Master Node
C
Updated distributed configuration is broadcasted to
all the connected clients
Clients See the Distributed Configuration
Copyright (c) - OrientDB LTD 77
Master Node
Master Node
CC
Master Node
Auto-reconnect in Case of Failure
In case of failure, the clients auto-reconnect to
the available nodes
Copyright (c) - OrientDB LTD 88
Master Node
Master Node
C
DBs are automatically deployed
to the newly joined nodes
DB DB
Database Auto-deploy
CC
C
Copyright (c) - OrientDB LTD 9
Replication: Under the Hood Client commits a transaction
99999
Master Node
Master Node
Master Node
HA Queue
HA Queue
HA Queue
C
Transaction
Requests
Copyright (c) - OrientDB LTD 10101010101010
Master Node
Master Node
Master Node
HA Queue
HA Queue
HA Queue
C
HA Queue
HA Queue
HA Queue
Replication: Under the Hood Response Handling
WriteQuorum = 2
Sends OK
OK
Requests
Responses
Copyright (c) - OrientDB LTD 11
Replication: Under the Hood Fix the unaligned node
11111111111111
Master Node
Master Node
Master Node
HA Queue
HA Queue
HA Queue
HA Queue
HA Queue
HA Queue
Requests
Responses
Fix
Commit
Copyright (c) - OrientDB LTD 12
Replication: Under the Hood 2pc messages
12121212121212
Fix (response != quorum)
Commit (response == quorum)
Rollback (quorum not reached)
Copyright (c) - OrientDB LTD 13
Master Node A
Optimistic MVCC Distributed Transaction
(2) tx task
(4) return result
(7) asynch* commit, rollback or fix
(1) Lock all records in order
+ Execute TX locally
Master Node B
(3) Lock all records in order
+ Execute TX locally
(5) Check results
+ unlock all records
(8) Execute the message and Unlock all records
C Node A becomes the transaction coordinator
(6) send response back to the client
Copyright (c) - OrientDB LTD 14
Consistency- Distributed Locks don’t block reads - During the transaction the old version of the
record is retrieved - 2pc message is asynchronous, so a record could
not be updated yet on server X - If you need higher consistency, don’t use load
balancer policy ROUND_ROBIN_REQUEST
Copyright (c) - OrientDB LTD 15
{ "autoDeploy": true, "writeQuorum": “majority”, "newNodeStrategy": “static”, "clusters": { "*": { "servers": ["<NEW_NODE>"] } } }
Default Configuration (json)
deploys the database automatically on new
nodes
Copyright (c) - OrientDB LTD 16
Default Configuration (json)
writeQuorum=2 means at least 2
nodes must agree on writes. Set it to the majority to ensure
consistency
{ "autoDeploy": true, "writeQuorum": “majority”, "newNodeStrategy": “static”, "clusters": { "*": { "servers": ["<NEW_NODE>"] } } }
Copyright (c) - OrientDB LTD 17
Default Configuration (json)
newNodeStrategy: static (default) or
dynamic. Static = once nodes join, they are
part of configuration
{ "autoDeploy": true, "writeQuorum": “majority”, "newNodeStrategy": “static”, "clusters": { "*": { "servers": ["<NEW_NODE>"] } } }
Copyright (c) - OrientDB LTD 18
Default Configuration (json)
clusters contain the distributed
configuration per cluster
{ "autoDeploy": true, "writeQuorum": “majority”, "newNodeStrategy": “static”, "clusters": { "*": { "servers": ["<NEW_NODE>"] } } }
Copyright (c) - OrientDB LTD 19
Default Configuration (json)
cluster “*” represents the default cluster
configuration. Any new node will join,
containing all the clusters
{ "autoDeploy": true, "writeQuorum": “majority”, "newNodeStrategy": “static”, "clusters": { "*": { "servers": ["<NEW_NODE>"] } } }
Copyright (c) - OrientDB LTD 20
Custom Setting per Cluster
cluster “customer” extends the default configuration with a custom writeQuorum
{ "autoDeploy": true, "writeQuorum": “majority”, "newNodeStrategy": “static”, "clusters": { "*": { "servers": ["<NEW_NODE>"] }, "customer": { "writeQuorum": 3, "servers": ["<NEW_NODE>"] } } }
Copyright (c) - OrientDB LTD
What about scalability?
21
Copyright (c) - OrientDB LTD 22
Performance with Reads
2222222222
Master Node
Master Node
Master Node
C
10,000 req/sec
C C
10,000 req/sec
10,000 req/sec
Copyright (c) - OrientDB LTD
Full replication provides linear scalability on reads,
but what about scalability on writes?
23
Copyright (c) - OrientDB LTD 24
Performance with Writes
2424242424
Master Node
C
12,000 req/sec
C C
Copyright (c) - OrientDB LTD 25
Performance with Writes
2525252525
Master Node
Master Node
C
7000 req/sec
C C
7000 req/sec
Copyright (c) - OrientDB LTD 26
Performance with Writes
2626262626
Master Node
Master Node
Master Node
C
5000 req/sec
C C
5000 req/sec
5000 req/sec
The more replicas you
configure, the more the propagation cost
will impact performance, based
on writeQuorum
Copyright (c) - OrientDB LTD
In order to scale up writes, you need
sharing + replication
We used a solution similar to RAID for Hard Drives
27
Copyright (c) - OrientDB LTD
Sharding in OrientDB v2.2
28
Copyright (c) - OrientDB LTD 29
Assign 1 Cluster per Node
2929
customer_usa customer_europe customer_china
Master Node usa
Master Node europe
Master Node china
Customer
Copyright (c) - OrientDB LTD
Master Node usa
Master Node china
Master Node europe
30
RAID for Databases
303030
customer_usa customer_europe customer_china
Customer
customer_europecustomer_usacustomer_china
Replica factor = 2
Copyright (c) - OrientDB LTD
Master Node usa
Master Node china
Master Node europe
31
Traversal with Spark (Pregel)
313131
C
https://spark.apache.org/docs/latest/graphx-programming-guide.html#pregel-api
Copyright (c) - OrientDB LTD 32
Sharding ConfigurationLEGEND: X = Owner, o = Copy +---------------+-----------+----------+-------+-------+-------+ | | | |MASTER |MASTER |MASTER | | | | |ONLINE |ONLINE |ONLINE | +---------------+-----------+----------+-------+-------+-------+ |CLUSTER |writeQuorum|readQuorum| usa |europe |china | +---------------+-----------+----------+-------+-------+-------+ |* | 2 | 1 | X | | | |customer_usa | 2 | 1 | X | | o | |customer_europe| 2 | 1 | o | X | | |customer_china | 2 | 1 | | o | X | +---------------+-----------+----------+-------+-------+-------+
Copyright (c) - OrientDB LTD 33
Sharding Configuration
first node in list is the “owner” for that
cluster
{ "clusters": { “customer_usa": { "servers": [“usa”, “china”] }, “customer_europe": { "servers": [“europe”, “usa”] }, “customer_china": { "servers": [“china”, “europe”] } } }
Copyright (c) - OrientDB LTD 34
Static OwnerThe owner can be static. If the
owner is not online, new records can’t be
inserted in the cluster
{ "clusters": { "client_usa": { "owner": "usa", "servers" : [ "usa", "europe", "asia" ] } } } This assures
the owner is not assigned
dynamically
Copyright (c) - OrientDB LTD 353535353535
Master Node
C
5000 req/sec
C C
Performance with Sharding
Copyright (c) - OrientDB LTD 363636363636
Master Node
Master Node
C
5000 req/sec
C C
5000 req/sec
Performance with Sharding
Copyright (c) - OrientDB LTD 373737373737
Master Node
Master Node
C
5000 req/sec
C C
5000 req/sec
Performance with Sharding
Master Node
5000 req/sec
Master Node
5000 req/sec
Master Node
5000 req/sec
Each write is replicated on only 2 nodes
Copyright (c) - OrientDB LTD 38
Master Node
Master Node
CC
C
CC
C Master Node
CC
C
Master Node
CC
C
Master Node
CC
C
Master Node
CC
C
Master Node
C
C
Linear and Elastic Scalability on both Read & Writes!
Copyright (c) - OrientDB LTD 39
Replica only Nodes
3939
Replica NodeMaster
Node usa
Master Node usa
Master Node usa
Replica NodeReplica NodeReplica NodeReplica NodeReplica NodeReplica NodeReplica NodeReplica NodeReplica NodeReplica NodeReplica Node
Replica servers don’t concur in writeQuorum
Copyright (c) - OrientDB LTD 40
Server Role Configuration
“*” by default each node is a master.
Following nodes are replica only
{ "servers": { "*": “master”, “usa_r1": “replica", “usa_r2": “replica", “usa_r3": “replica", “europe_r1": “replica", “china1_r1": "replica" } }
Copyright (c) - OrientDB LTD 41
Master Node
Master Node
C
Load-Balancing on Client Side
Master Node
Copyright (c) - OrientDB LTD 42
Load-Balancing Configuration
final OrientGraphFactory factory = new OrientGraphFactory(“remote:localhost/demo");
factory.setConnectionStrategy( OStorageRemote.CONNECTION_STRATEGY.ROUND_ROBIN_CONNECT);
OrientGraph graph = factory.getTx();
Available strategies: - STICKY, - ROUND_ROBIN_CONNECT, - ROUND_ROBIN_REQUEST
Copyright (c) - OrientDB LTD 43
What if some records are not aligned
for **ANY** reason?
Copyright (c) - OrientDB LTD 44
OrientDB Auto-Repairer
- Executed in chain - It works in batch of 50 records per time (configurable) - Strategies: - Quorum: checks if it meets the configured write quorum - Content: checks the content - Majority: checks if at least there is a majority - Version: gets the higher version - DC (EE only). Example: dc{winner:asia}
Copyright (c) - OrientDB LTD 45
Master Node
A
Master Node
B
Auto-Repair Flow
(1) tx read [#10:33,#43:90,#12:23]
(3) return [{#10:33 v1},{#43:90 v5},{#12:23 v9}]
(2) lock [#10:33,#43:90,#12:
23](4) fix [{#43:90 v6}]
Copyright (c) - OrientDB LTD 46
Dynamic Timeouts (from v2.2.18)
orientdb> HA STATUS -latency -output=text
REPLICATION LATENCY AVERAGE (in milliseconds) +-------+-----+------+-----+ |Servers|node1|node2*|node3| +-------+-----+------+-----+ |node1 | | 0.60| 0.43| |node2* | 0.39| | 0.38| |node3 | 0.35| 0.53| | +-------+-----+------+-----+
Copyright (c) - OrientDB LTD 47
Cross Datacenter & Cloud
Dublin
Austin
Cross Data Centre and Cloud replication. Data
Centers don’t need to have the same number
of servers
Copyright (c) - OrientDB LTD 48
Data-Center (Enterprise only)"dataCenters": { "rome": { "writeQuorum": "all", "servers": [ "europe-0", "europe-1", "europe-2" ] }, "austin": { "writeQuorum": "all", "servers": [ "usa-0", "usa-1", "usa-2" ] } }
Copyright (c) - OrientDB LTD 49
Performance
Copyright (c) - OrientDB LTD 50
Yahoo Benchmark 3 Nodes
OrientDB is 2x-3x faster than Cassandra on the
same HW/SW configuration, same workload
Ops/sec
Copyright (c) - OrientDB LTD 51
OrientDB v3.1 (Q1 2018)- DHT-like algorithm = Distributed Sharded index
- Native support for batch-traversal (Pregel-like) without Spark
- Background refactoring of the Graph based on usage statistics