Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

19
Dynamo Highly Available Key- Value Store 1 Dennis Kafura – CS5204 – Operating Systems

Transcript of Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Page 1: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Highly Available Key-Value Store

1Dennis Kafura – CS5204 – Operating Systems

Page 2: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Dennis Kafura – CS5204 – Operating Systems

Context

Core e-commerce services need scalable and reliable storage for massive amounts of data n x 100 of services n x 100,000 concurrent sessions on key services

Size and scalability require a storage architecture that is highly decentralized high component count commodity hardware

High component count creates reliability problems (“treats failure handling as the normal case”)

Address reliability problems by replication Replication raises issues of:

Consistency (replicas differ after failure) Performance

When to enforce consistency (on read, on write) Who enforces consistency (client, storage system)

2

Page 3: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

System Elements Maintain state of services with

High reliability requirements Latency-sensitive performance Control tradeoff between consistency and performance

Used only internally Can leverage characteristics of services and workloads Non-hostile environment (no security requirements)

Simple key-value interface Applications do not require more complicated (e.g. database) semantics or hierarchical

name space Key is unique identifier for data item; Value is a binary object (blob) No operations over multiple data items

Adopts weaker model of consistency (eventual consistency) in favor of higher availability

Service level agreements (SLA) At 99.9% percentile Key factors: service latency at a given request rate Example: response time of 300ms for 99.9% of requests at peak client load of 500

requests per second State manage (storage) efficiency a key factor in SLAs

Dennis Kafura – CS5204 – Operating Systems 3

Page 4: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Design Considerations

Consistency vs. availability Strict consistency means that data is unavailable in case of

failure to one of the replicas To improve availability,

use weaker form of consistency (eventual consistency) allow optimistic updates (changes propagate in the background)

Can lead to conflicting changes which must be detected and resolved

Conflicts Dynamo applications require “always writeable” storage Perform conflict detection/resolution on reads

Other factors Incremental scalability Symmetry/decentralization (P2P organization/control) Heterogeneity (not all servers the same)

Dennis Kafura – CS5204 – Operating Systems 4

Page 5: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Design Overview

Dennis Kafura – CS5204 – Operating Systems 5

Page 6: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Partitioning

Interface get(key)

Returns context and A single object or a list of conflicting objects

put(key, context, object) Context from previous read

Object placement/replication MD5 hash of key yields

128 bit identifier Consistent hashing

Dennis Kafura – CS5204 – Operating Systems 6

preference list

Page 7: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Versioning

Failure free operation

What to do in case of failure?

Dennis Kafura – CS5204 – Operating Systems 7

?

put replicas

put replicas

Page 8: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Versioning

Object content is treated as immutable and an update operation creates a new version

Dennis Kafura – CS5204 – Operating Systems 8

v1

v1

v1

put

v2

v2

v2

Page 9: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Versioning

Versoning can lead to inconsistency Due to network partitioning

Dennis Kafura – CS5204 – Operating Systems 9

v1

v1

v1

put

v2

v2

Page 10: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Versioning

Versoning can lead to inconsistency Due to concurrent updates

Dennis Kafura – CS5204 – Operating Systems 10

puta

putb

v1v2av2b

v1v2bv2a

v1v2av2b

Page 11: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Object Resolution

Uses vector-clocks Conflicting versions

passed to application as output of get operation

Application resolves conflicts and puts a new (consistent) version

Inconsistent version rare: 99.94% of get operations saw exactly one version

Dennis Kafura – CS5204 – Operating Systems 11

Page 12: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Handling get/put operations

Operating handled by coordinator: First among the top N nodes in the preference list Located by

call to load balancer (no Dynamo-specific node needed in application but may require extra level of indirection)

Direct call to coordinator (via Dynamo-specific client library)

Quorum voting R nodes must agree to a get operation W nodes must agree to a put operation R+W > N (N, R, W) can be chosen to achieve desired tradeoff Common configuration is (3,2,2)

“Sloppy quorum” Top N’ healthy nodes in the preference list Coordinator is first in this group Replicas sent to node contain a “hint” indicating the (unavailable) original node

that should hold the replica Hinted replicas are stored by available node and sent forwarded when original

node recovers.

Dennis Kafura – CS5204 – Operating Systems 12

Page 13: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Replica synchronization

Accelerates detection of inconsistent replicas using Merkle tree

Separate tree maintained by each node for each key range

Adds overhead to maintain Merkle trees

Dennis Kafura – CS5204 – Operating Systems 13

Page 14: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Ring membership

Nodes are explicitly added to/removed from a ring Membership, partitioning, and placement information

propagates via periodic exchanges (a gossip protocol) Existing nodes transfer key ranges to newly added

node or receive key ranges from exiting nodes Nodes eventually know key ranges of its peers and

can forward requests to them Some “seed” nodes are well-known Nodes failures detected by lack of responsiveness and

recovery detected by periodic retry

Dennis Kafura – CS5204 – Operating Systems 14

Page 15: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Partition/Placement Strategies

Dennis Kafura – CS5204 – Operating Systems 15

Strategy Placement Partition

1 T random tokens per node Consecutive tokens create a partition

2 T random tokens per node Q equal sized partitions

3 Q/S tokens per node Q equal sized partitions

S = number of nodes

Page 16: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Strategy Performance Factors

Strategy 1 Bootstrapping of new node is lengthy

It must acquire its key ranges from other nodes Other nodes process scanning/transmission of key

ranges for new node as background activities Has taken a full day during peak periods

Numerous nodes many have to adjust their Merkle trees when a new node joins/leaves system

Archival process difficult Key ranges may be in transit No obvious synchronization/checkpointing structure

Dennis Kafura – CS5204 – Operating Systems 16

Page 17: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Strategy Performance Factors

Strategy 2 Decouples partition and placement Allows changing of placement scheme at run-time

Strategy 3 Decouples partition and placement Faster bootstrapping/recovery and ease of archiving

because key ranges can be segregates into different files that can be shared/archived separately

Dennis Kafura – CS5204 – Operating Systems 17

Page 18: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Partition Strategies - Performance

Strategies have different tuning parameters Fair comparison: evaluate the skew in their load distributions for a

fixed amount of space to maintain membership information Strategy 3 superior

Dennis Kafura – CS5204 – Operating Systems 18

Page 19: Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Dynamo

Client- vs Server-Side Coordination

Any node can coordinate read requests; write requests handled by coordinator State-machine for coordination can be in load balancing server or

incorporated into client Client-driven coordination has lower latency because it avoids extra network

hop (redirection)

Dennis Kafura – CS5204 – Operating Systems 19