Dynamo and BigTable - Review and Comparison

17
Dynamo and BigTable Review and Comparison IEEEI 2014 Grisha Weintraub

Transcript of Dynamo and BigTable - Review and Comparison

Page 1: Dynamo and BigTable - Review and Comparison

Dynamo and BigTable Review and Comparison

IEEEI 2014 Grisha Weintraub

Page 2: Dynamo and BigTable - Review and Comparison

Outline

• Introduction to NoSQL • Introduction to Dynamo and BigTable • Dynamo vs. BigTable comparison

• Open source implementations

Page 3: Dynamo and BigTable - Review and Comparison

Introduction to NoSQL

• New generation of databases

• Response to a “big data” challenge

• Main characteristics: – Non-relational – Distributed – Fault tolerant – Scalable

Page 4: Dynamo and BigTable - Review and Comparison

Introduction to NoSQL

Page 5: Dynamo and BigTable - Review and Comparison

Dynamo and BigTable - Introduction

Dynamo (Amazon) • Giuseppe DeCandia, et al.:

Dynamo: amazon's highly available key-value store. SOSP 2007

BigTable (Google) • Fay Chang, et al.: BigTable: A

Distributed Storage System for Structured Data. OSDI 2006

Highly Available

Key-value Structured Data

Page 6: Dynamo and BigTable - Review and Comparison

Dynamo vs. BigTable

BigTable Dynamo

Architecture

Data model

API

Security

Partitioning

Replication

Storage

Membership and failure detection

Page 7: Dynamo and BigTable - Review and Comparison

Architecture

Dynamo

• Decentralized: – Every node has the same set of

responsibilities as its peers.

– There is no single point of failure.

BigTable

• Centralized: – Single master node maintains

all system metadata. – Other nodes (tablet servers)

handle read and write requests.

Master

Page 8: Dynamo and BigTable - Review and Comparison

Data Model

Dynamo

• Key-value - data is stored as <key, value> pairs, such that key is a unique identifier and a value is an arbitrary entry.

BigTable

• Multidimensional sorted map – map is indexed by a row key and a column key, and ordered by a row key. Column keys are grouped into sets called column families.

Value Key

{ “Name” : ”John”, “Email” : ”[email protected]”, “Card” : ”6652” }

188

{ “Name” : ”Bob”, “Phone” : ”781455”, “Card” : ”9875” }

145

Financial Data Personal Data User ID

Card = “9875” Name = "Bob" Phone = "781455" 145

Card = “6652” Name = "John" Email = "[email protected]" 188

row key column family

column key

Page 9: Dynamo and BigTable - Review and Comparison

API

Dynamo

• get – returns an object associated with the given key.

• put – associates the given object with the specified key.

BigTable

• get – returns values from the individual rows.

• scan – iterates over multiple rows.

• put – inserts a value to the specified table's cell.

• delete – deletes a whole row or a specified cell inside a particular row.

Page 10: Dynamo and BigTable - Review and Comparison

Security

Dynamo

• No security features

BigTable

• Access control rights are granted at column family level.

Financial Data Personal Data Row Key

Card = “9875” Name = "Bob" Phone = "781455" 145

Card = “6652” Name = "John" Email = "[email protected]" 188

Views Personal Data

Views/Updates Personal Data

Views/Updates all the Data

Page 11: Dynamo and BigTable - Review and Comparison

Partitioning

Dynamo • Consistent Hashing:

– Each node is assigned to a random position on the ring.

– Key is hashed to the fixed point on the ring.

– Node is chosen by walking clockwise from the hash location.

BigTable • Data is stored ordered by a row key. • Each table consists of a set of tablets. • Each tablet is assigned to exactly one

tablet server. • METADATA table stores the location of a

tablet under a row key.

A B

D E

F

G

hash(key)

C

….. id

….. 15000

Tablet 1 ….. ….

….. 20000

….. 20001

Tablet 2 ….. ….

….. 25000

Tablet-51 Tablet-11

Tablet-32 Tablet-7

Tablet-16 Tablet-8

Tablet-1 Tablet-21

Tablet Server 1 Tablet Server 2

Page 12: Dynamo and BigTable - Review and Comparison

Replication

Dynamo • Each data item is replicated at N nodes

(N is a user-defined parameter). • Each key K is assigned to a coordinator

node. • Coordinator stores the data associated

with K locally, and also replicates it at the N-1 healthy clockwise successor nodes in the ring.

BigTable • Each tablet is stored in GFS as a

sequence of read-only files called SSTables.

• SSTables are divided into fixed-size chunks, and these chunks are stored on chunkservers.

• Each chunk in GFS is replicated across multiple chunkservers.

N = 3

A B

D E

F

G

hash(key)

C

SSTable3 SSTable2 SSTable1

Chunk3 Chunk2 Chunk1

Chunk1

Chunk3

Chunk1

Chunk2

Chunkserver 1 Chunkserver 2

Chunk2

Chunk3

Chunkserver 3

Page 13: Dynamo and BigTable - Review and Comparison

Storage

Dynamo

• Each node in Dynamo has a local persistence engine where data items are stored as binary objects.

• Different Dynamo instances may use different persistence engines (e.g. MySql, BDB)

• Applications choose the persistence engine based on their object size distribution.

BigTable

• Data is stored in GFS in SSTable file format.

• SSTable is an immutable ordered map, whose keys and values are arbitrary strings.

• SSTable supports "get by key" and "get by key range" requests.

Page 14: Dynamo and BigTable - Review and Comparison

Membership and Failure detection

Dynamo • Gossip-based protocol:

– Each node contacts a peer chosen at random every second and the two nodes exchange their membership data (every node maintains a persistent view of the membership).

BigTable • Failed tablet servers are

identified by regular handshakes between the master and all tablet servers.

A

B

D E

F

G

C

Master

Page 15: Dynamo and BigTable - Review and Comparison

Dynamo vs. BigTable

BigTable Dynamo

centralized decentralized Architecture

sorted map key-value Data model

get, put, scan, delete get, put API

access control no Security

key range based consistent hashing Partitioning

chunkservers in GFS successor nodes in the

ring Replication

SSTables in GFS Plug-in Storage

Handshakes initiated by master

Gossip-based protocol Membership and failure

detection

Page 16: Dynamo and BigTable - Review and Comparison

Open source implementations

Page 17: Dynamo and BigTable - Review and Comparison

Thank You