Amazon Dynamo

14
Dynamo: Amazon’s Highly Available Key-Value Store Farley Lai University of Iowa [email protected] February 21, 2014 Farley Lai (UIOWA) Amazon Dynamo (Big Data) February 21, 2014 1 / 14

description

Big data class presentation of the Amazon Dynamo storage service.

Transcript of Amazon Dynamo

Page 1: Amazon Dynamo

Dynamo: Amazon’s Highly Available Key-Value Store

Farley Lai

University of Iowa

[email protected]

February 21, 2014

Farley Lai (UIOWA) Amazon Dynamo (Big Data) February 21, 2014 1 / 14

Page 2: Amazon Dynamo

Motivation

MapReduce processes big data in a parallel and distributed fashion.

Daynamo forms the foundation of big data, namely, the storage.

Shopping Cart

Clients tend to insert and update items frequenty but review the cart tocheck out only at the end. Is it fun for the sytem to always ask you toretry later in minutes whenever there is an item inserted/updated in theshopping cart?

Farley Lai (UIOWA) Amazon Dynamo (Big Data) February 21, 2014 2 / 14

Page 3: Amazon Dynamo

SOA of Amazon’s Platform

Farley Lai (UIOWA) Amazon Dynamo (Big Data) February 21, 2014 3 / 14

Page 4: Amazon Dynamo

Roles

Service Provider: Amazon

Service: Dynamo, the storage service

Customer: application/service vendors

Client: applications/services

User: human and/or bots

Service Level Agreements (SLA)

SLA are contracts signed by service providers and customers, specifyingthe quality of service guaranteed for a client access distribution.

Example: service guaranteeing that it will provide a response within300ms for 99.9% of its requests for a peak client load of 500 requests persecond.

Farley Lai (UIOWA) Amazon Dynamo (Big Data) February 21, 2014 4 / 14

Page 5: Amazon Dynamo

What is Dynamo?

A distributed key-value storage service built on a ring topology with

high availability for writes

eventual consistency

Farley Lai (UIOWA) Amazon Dynamo (Big Data) February 21, 2014 5 / 14

Page 6: Amazon Dynamo

Requirements and Assumptions

Requirements

Simple read/write to data items identified by unique keys

ACID: automicity, consistency, isolation and durability

SLA: latency constraints on the 99.9th percentile of thedistribution

Assumptions

Trusted environment and machines without security concerns

Farley Lai (UIOWA) Amazon Dynamo (Big Data) February 21, 2014 6 / 14

Page 7: Amazon Dynamo

Problems, Techniques and Advantages

Problems Techniques Advantages

Partitioning Consistent Hashing Incremental ScalabilityHigh write availability Vector clocks with

conlict resolutionVersion size is decoupledfrom update rates

Temporary failures Sloppy Quorum,hinted handoff

High availability and dura-bility guarantee despitesome unavailable replicas

Permanent failures Merkle trees Fast replica synchronizationMembership Gossip protocol decentralized registry for

storing membership andliveness info

Farley Lai (UIOWA) Amazon Dynamo (Big Data) February 21, 2014 7 / 14

Page 8: Amazon Dynamo

Partitioning

Consistent hashing

1 key space

2 tokens assignment

3 replication

4 load distribution

5 node availability

6 node capacity

Farley Lai (UIOWA) Amazon Dynamo (Big Data) February 21, 2014 8 / 14

Page 9: Amazon Dynamo

Data Versioning

Operations

1 read()⇒get()

2 write()⇒put()

3 conflict resolution

4 vector clock

Farley Lai (UIOWA) Amazon Dynamo (Big Data) February 21, 2014 9 / 14

Page 10: Amazon Dynamo

Sloppy Quorum

1 R(2) + W (2) > N(3)

2 latency

Farley Lai (UIOWA) Amazon Dynamo (Big Data) February 21, 2014 10 / 14

Page 11: Amazon Dynamo

Replica Synchronization

Figure : Merkle hash tree1 Figure : Merkle hash tree2

Farley Lai (UIOWA) Amazon Dynamo (Big Data) February 21, 2014 11 / 14

Page 12: Amazon Dynamo

Evaluation: latency

Farley Lai (UIOWA) Amazon Dynamo (Big Data) February 21, 2014 12 / 14

Page 13: Amazon Dynamo

Evaluation: load balance

Farley Lai (UIOWA) Amazon Dynamo (Big Data) February 21, 2014 13 / 14

Page 14: Amazon Dynamo

Evaluation: write buffer

Farley Lai (UIOWA) Amazon Dynamo (Big Data) February 21, 2014 14 / 14