Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

25
A New MongoDB Sharding Architecture for Higher Availability and Better Resource Utilization Leif Walsh @ leifwalsh

description

Доклад Лейфа Уолша на HighLoad++ 2014.

Transcript of Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

Page 1: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

A New MongoDB Sharding Architecture for Higher Availability and Better Resource Utilization Leif Walsh @leifwalsh

Page 2: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

A Traditional MongoDB Cluster

•  3 shards. •  3 replicas per shard.

Page 3: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

A Traditional MongoDB Cluster

•  3x write throughput. •  3x read throughput.

Page 4: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

A Traditional MongoDB Cluster

•  1 node can go down

without losing availability.

Page 5: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

A Traditional MongoDB Cluster

•  Data can survive

destruction of 2 nodes.

Page 6: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

General MongoDB Cluster

•  Sx write throughput. •  Rx read throughput. •  R/2 nodes can go down

without losing availability. •  Data can survive

destruction of R-1 nodes. •  S×R hardware &

maintenance cost.

Page 7: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

TokuMX: MongoDB with Fractal Trees •  MongoDB fork. •  Compression, performance, transactions. •  Details about Fractal Trees after lunch.

Page 8: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

TokuMX: MongoDB with Fractal Trees •  Read-free Replication •  Fast Updates •  Optimized Sharding Migrations •  Ark Consensus for Replication Failover •  Partitioned Collections •  Clustering Indexes & Primary Keys •  tokutek.com/tokumx

Page 9: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

Writes are cheap: •  O(1/B) I/Os per op. •  ≈10k/s Reads are expensive: •  Ω(1) I/O per op. •  ≈100/s

Fractal Tree Performance Basics

Page 10: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

Read-free Replication Updates are reads + writes. Secondaries can trust the primary, only do writes.

Page 11: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

Read-free Replication Updates are reads + writes. Secondaries can trust the primary, only do writes. Looking at I/O utilization, secondaries are very cheap compared to primaries.

Page 12: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

A Traditional TokuMX Cluster

•  9 machines, only 3x throughput benefit.

•  Secondaries are under-utilized.

Page 13: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

A TokuMX Cluster With Read-free Replication

•  3x write throughput. •  3x read throughput.

•  (maybe separately)

Page 14: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

A TokuMX Cluster With Read-free Replication

•  1 node can go down without losing availability.

Page 15: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

A TokuMX Cluster With Read-free Replication

•  Data can survive destruction of 2 nodes.

Page 16: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

A TokuMX Cluster With Read-free Replication

•  Only 3x hardware cost, down from 9x.

Page 17: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

Dynamo Architecture •  Developed at Amazon. •  Used by Cassandra, Riak, Voldemort. •  Many components, I will focus on data

partitioning.

http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

Page 18: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

Dynamo Architecture •  Servers are equal peers, not separate

primaries and secondaries. •  Store overlapping subsets of data

(MongoDB shards store disjoint subsets). •  Data partitioning determined by

consistent hashing.

Page 19: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

Dynamo Partitioning •  N servers in a ring. •  hash(K) is a location

around the ring. •  Store data for K on the

next R servers on the ring.

Page 20: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

Dynamo Partitioning •  All nodes accept writes:

~linear write scaling. •  Data replicated R times:

Rx read performance/reliability.

Page 21: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

Dynamo-style Sharding in TokuMX

•  Each node is primary for some chunks, secondary for others.

•  Nodes store overlapping subsets of the data set.

Page 22: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

Dynamo-style Sharding in TokuMX

•  S primaries in the ring: Sx write throughput.

•  R copies of each chunk on separate machines: Rx read throughput, availability & recovery guarantees.

Page 23: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

Dynamo-style Sharding in TokuMX

•  Adding a node: –  Move one secondary from each

of next 2 nodes to the new node. –  Initialize a new replica set on the

new node and next 2 nodes.

Page 24: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

Future Work Chunk balancer is not sophisticated: •  Adding/removing machines is

rough, overloads the machine’s neighbors.

•  Can we use ideas from Cassandra & Riak to improve this?

MongoDB architecture requires managing multiple processes on each machine. •  We can do better with good

tools. Talk to me if you want to write them.

Page 25: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

Thanks! Come to my talk after lunch for details about Fractal Trees.

[email protected] @leifwalsh

tokutek.com/tokumx slidesha.re/13pxgH8