Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.
-
Upload
corey-fowler -
Category
Documents
-
view
219 -
download
0
Transcript of Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.
Dynamo
Highly Available Key-Value Store
1Dennis Kafura – CS5204 – Operating Systems
Dynamo
Dennis Kafura – CS5204 – Operating Systems
Context
Core e-commerce services need scalable and reliable storage for massive amounts of data n x 100 of services n x 100,000 concurrent sessions on key services
Size and scalability require a storage architecture that is highly decentralized high component count commodity hardware
High component count creates reliability problems (“treats failure handling as the normal case”)
Address reliability problems by replication Replication raises issues of:
Consistency (replicas differ after failure) Performance
When to enforce consistency (on read, on write) Who enforces consistency (client, storage system)
2
Dynamo
System Elements Maintain state of services with
High reliability requirements Latency-sensitive performance Control tradeoff between consistency and performance
Used only internally Can leverage characteristics of services and workloads Non-hostile environment (no security requirements)
Simple key-value interface Applications do not require more complicated (e.g. database) semantics or hierarchical
name space Key is unique identifier for data item; Value is a binary object (blob) No operations over multiple data items
Adopts weaker model of consistency (eventual consistency) in favor of higher availability
Service level agreements (SLA) At 99.9% percentile Key factors: service latency at a given request rate Example: response time of 300ms for 99.9% of requests at peak client load of 500
requests per second State manage (storage) efficiency a key factor in SLAs
Dennis Kafura – CS5204 – Operating Systems 3
Dynamo
Design Considerations
Consistency vs. availability Strict consistency means that data is unavailable in case of
failure to one of the replicas To improve availability,
use weaker form of consistency (eventual consistency) allow optimistic updates (changes propagate in the background)
Can lead to conflicting changes which must be detected and resolved
Conflicts Dynamo applications require “always writeable” storage Perform conflict detection/resolution on reads
Other factors Incremental scalability Symmetry/decentralization (P2P organization/control) Heterogeneity (not all servers the same)
Dennis Kafura – CS5204 – Operating Systems 4
Dynamo
Design Overview
Dennis Kafura – CS5204 – Operating Systems 5
Dynamo
Partitioning
Interface get(key)
Returns context and A single object or a list of conflicting objects
put(key, context, object) Context from previous read
Object placement/replication MD5 hash of key yields
128 bit identifier Consistent hashing
Dennis Kafura – CS5204 – Operating Systems 6
preference list
Dynamo
Versioning
Failure free operation
What to do in case of failure?
Dennis Kafura – CS5204 – Operating Systems 7
?
put replicas
put replicas
Dynamo
Versioning
Object content is treated as immutable and an update operation creates a new version
Dennis Kafura – CS5204 – Operating Systems 8
v1
v1
v1
put
v2
v2
v2
Dynamo
Versioning
Versoning can lead to inconsistency Due to network partitioning
Dennis Kafura – CS5204 – Operating Systems 9
v1
v1
v1
put
v2
v2
Dynamo
Versioning
Versoning can lead to inconsistency Due to concurrent updates
Dennis Kafura – CS5204 – Operating Systems 10
puta
putb
v1v2av2b
v1v2bv2a
v1v2av2b
Dynamo
Object Resolution
Uses vector-clocks Conflicting versions
passed to application as output of get operation
Application resolves conflicts and puts a new (consistent) version
Inconsistent version rare: 99.94% of get operations saw exactly one version
Dennis Kafura – CS5204 – Operating Systems 11
Dynamo
Handling get/put operations
Operating handled by coordinator: First among the top N nodes in the preference list Located by
call to load balancer (no Dynamo-specific node needed in application but may require extra level of indirection)
Direct call to coordinator (via Dynamo-specific client library)
Quorum voting R nodes must agree to a get operation W nodes must agree to a put operation R+W > N (N, R, W) can be chosen to achieve desired tradeoff Common configuration is (3,2,2)
“Sloppy quorum” Top N’ healthy nodes in the preference list Coordinator is first in this group Replicas sent to node contain a “hint” indicating the (unavailable) original node
that should hold the replica Hinted replicas are stored by available node and sent forwarded when original
node recovers.
Dennis Kafura – CS5204 – Operating Systems 12
Dynamo
Replica synchronization
Accelerates detection of inconsistent replicas using Merkle tree
Separate tree maintained by each node for each key range
Adds overhead to maintain Merkle trees
Dennis Kafura – CS5204 – Operating Systems 13
Dynamo
Ring membership
Nodes are explicitly added to/removed from a ring Membership, partitioning, and placement information
propagates via periodic exchanges (a gossip protocol) Existing nodes transfer key ranges to newly added
node or receive key ranges from exiting nodes Nodes eventually know key ranges of its peers and
can forward requests to them Some “seed” nodes are well-known Nodes failures detected by lack of responsiveness and
recovery detected by periodic retry
Dennis Kafura – CS5204 – Operating Systems 14
Dynamo
Partition/Placement Strategies
Dennis Kafura – CS5204 – Operating Systems 15
Strategy Placement Partition
1 T random tokens per node Consecutive tokens create a partition
2 T random tokens per node Q equal sized partitions
3 Q/S tokens per node Q equal sized partitions
S = number of nodes
Dynamo
Strategy Performance Factors
Strategy 1 Bootstrapping of new node is lengthy
It must acquire its key ranges from other nodes Other nodes process scanning/transmission of key
ranges for new node as background activities Has taken a full day during peak periods
Numerous nodes many have to adjust their Merkle trees when a new node joins/leaves system
Archival process difficult Key ranges may be in transit No obvious synchronization/checkpointing structure
Dennis Kafura – CS5204 – Operating Systems 16
Dynamo
Strategy Performance Factors
Strategy 2 Decouples partition and placement Allows changing of placement scheme at run-time
Strategy 3 Decouples partition and placement Faster bootstrapping/recovery and ease of archiving
because key ranges can be segregates into different files that can be shared/archived separately
Dennis Kafura – CS5204 – Operating Systems 17
Dynamo
Partition Strategies - Performance
Strategies have different tuning parameters Fair comparison: evaluate the skew in their load distributions for a
fixed amount of space to maintain membership information Strategy 3 superior
Dennis Kafura – CS5204 – Operating Systems 18
Dynamo
Client- vs Server-Side Coordination
Any node can coordinate read requests; write requests handled by coordinator State-machine for coordination can be in load balancing server or
incorporated into client Client-driven coordination has lower latency because it avoids extra network
hop (redirection)
Dennis Kafura – CS5204 – Operating Systems 19