Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and...

24
CS435 Introduction to Big Data Fall 2019 Colorado State University 11/20/2019 Week 13-B Sangmi Lee Pallickara 1 11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.0 CS435 Introduction to Big Data PART 2. LARGE SCALE DATA STORAGE SYSTEMS NO SQL DATA STORAGE WITH A CASE STUDY OF AMAZON’S DYNAMO Sangmi Lee Pallickara Computer Science, Colorado State University http://www.cs.colostate.edu/~cs435 11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.1 FAQs Term project presentations 12/9 (team 1- 6), 12/11 (team 6-12), 12/13 (team 13-16) Please attend at least 2 presentation sessions and ask questions or provide comments Participation score (attendance + question) 12 minutes (including transition time)/team Submit your slides (No PDF!) on canvas Final Report Grading criteria (12 points) Relevance: Is this project using sufficient Big Data technologies? (0~4) Completeness: Does this project present all the required components in depth? (0~4) Challenge: Does this project demonstrate an adequate level of challenge? (0~4)

Transcript of Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and...

Page 1: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

1

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.0

CS435 Introduction to Big Data

PART 2. LARGE SCALE DATA STORAGE SYSTEMSNO SQL DATA STORAGE WITH A CASE STUDY OF AMAZON’S DYNAMO

Sangmi Lee PallickaraComputer Science, Colorado State Universityhttp://www.cs.colostate.edu/~cs435

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.1

FAQs• Term project presentations• 12/9 (team 1- 6), 12/11 (team 6-12), 12/13 (team 13-16)• Please attend at least 2 presentation sessions and ask questions or provide comments

• Participation score (attendance + question)• 12 minutes (including transition time)/team• Submit your slides (No PDF!) on canvas

• Final Report Grading criteria (12 points)•Relevance: Is this project using sufficient Big Data technologies? (0~4)

•Completeness: Does this project present all the required components in depth? (0~4)

•Challenge: Does this project demonstrate an adequate level of challenge? (0~4)

Page 2: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

2

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.2

Today’s topics

• No SQL storage

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.3

Part 2. Large scale data storage system

NoSQL Storage: Key-Value StoresIntroduction

Page 3: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

3

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.4

NoSQL databases

• Basic Idea• Operates without a schema• Allows users to add fields without having to define any changes in structure first• Useful when dealing with nonuniform data and custom fields

• Stands for “Not Only SQL”

• Handles data access with size and performance that demand a cluster

• Improves the productivity of application development by using a more convenient data interaction style

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.5

Polyglot persistence

• Using different data stores in different circumstances• Without picking a particular database for all situations

• Most organizations have a mix of data storage technologies for different circumstances

Page 4: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

4

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.6

NoSQL Storage Data Model: (1) Key-Value Store

• Imagine a simple hash table• All access to the storage is via primary key

• Get the value for the key• Put a value for a key• Delete a key• Add a key

• “value” is stored as a blob• Without caring or knowing what’s inside• Application is responsible for understanding data

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.7

NoSQL Storage Data Model: (2) Document Storage Model• Documents• Self-describing• Data structure

• Maps, collections, tree, and scalar values• Stores documents in the value part of the key-value store

• MongoDB, CouchDB, OrientDB, RavenDB, etc.

• Users can query the data inside the document • without having to retrieve the whole document

Page 5: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

5

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.8

NoSQL Storage Data Model: (3) Column-Family Stores• Cassandra, Hbase, Hypertable, and Amazon SimpleDB

• Stores data in column family as rows • Have many columns associated with a row key

• Column families• Groups of related data that is often accessed together

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.9

Part 2. Large scale data storage system

NoSQL Storage: Key-Value StoresDynamo

Page 6: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

6

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.10

This material is built based on,

• Giuseppe DeCandia, et al. “Dynamo: Amazon’s Highly Available Key-Value Store” Proceedings of SOSP, 2007

• Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and HariBalakrishnan,"Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications" Proc. 2001 SIGCOMM, Mar. 2001, pp.149-160

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.11

What Amazon needs (1/2)

• Amazon’s architecture• A highly decentralized, loosely coupled, service oriented architecture consisting of

hundreds of services

• Storage technologies that are always available

• Customer should be able to view and add items to the shopping cart even if:• The disks are failing• Network routes are flapping or, • Data centers are being destroyed by tornados

Page 7: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

7

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.12

If you design a data storage system for,

• Amazon.com to store transaction data for the shopping cart management, how would you prioritize properties: Consistency, Availability, or Partition tolerance? And Why?

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.13

What Amazon needs

• Highly available system with failure resilience• Small but significant number of servers• Network components

• Failure handling should not impact availability or performance

Page 8: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

8

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.14

Overview of Dynamo

• Partitions and replicates data using consistent hashing

• Tracks object version to provide consistency

• Uses quorum-like technique to ensure consistency among replicas

• Uses a decentralized synchronization protocol• Storage nodes can be added and removed from Dynamo without any manual partitioning or

redistribution

• Gossip based distributed failure detection and membership protocol

• DynamoDB supports document storage features as well

• We will focus on the original Dynamo’s key-value storage features and architecture

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.15

System Assumptions (1/2)

• Query model• Simple read and write operations to a data item

• Uniquely identified by a key• State is stored as binary objects (i.e. blobs) identified by unique keys

• Usually less than 1MB• No operations span multiple data items

• No need for relational schema

• ACID Properties• Dynamo targets applications that operate with weaker consistency if this results in high

availability• Dynamo does not provide any isolation guarantees

Page 9: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

9

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.16

System Assumptions (2/2)

• Efficiency• The system needs to function on a commodity hardware infrastructure• Stringent latency requirements

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.17

Summary of techniques

Problem Technique(1) Partitioning Consistent Hashing

(2) High Availability for writes Vector clocks with reconciliation during reads

(3) Handling temporary failures Sloppy Quorum and hinted handoff

(4) Recovering from permanent failures

Anti-entropy using Merkle tree

(5) Membership and failure detection Gossip-based membership protocol and failure detection

Page 10: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

10

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.18

Part 2. Large scale data storage system

NoSQL Storage: Key-Value Stores (Dynamo)(1) Partitioning

(2) High Availability for writes(3) Handling temporary failures

(4) Recovering from permanent failures(5) Membership and failure detection

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.19

Data partitioning algorithm

• Dynamically partitions the data over the set of nodes

• Distributes the load across multiple storage hosts

• Dynamo uses consistency hashing

• zero-hop DHT

Page 11: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

11

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.20

Non-consistent hashing vs. consistent hashing

• When a hash table is resized• Non-consistent hashing algorithm requires to re-hash complete table• Consistent hashing algorithm requires only partial records of the table

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.21

Consistent hashing (1/3)

0

1

4

2

5

6

A

C

B

Identifier circle with m = 3Consistent hash function assigns each node and key an m-bit identifier

using a hashing function

Hashing value of IP address

m-bit Identifier: 2m identifiersm has to be big enough to make the probability of

two nodes or keys hashing to the same identifier negligible

7

3

Page 12: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

12

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.22

Key 3 will be stored in machine successor(3) = 5

Consistent hashing (2/3)

01

4

35

7

A

C

B

Consistent hashing assigns keys to nodes:Key k will be assigned to the first node whose identifier is equal to or follows k in the identifier space

Key 2 will be stored in machine Csuccessor(2) = 5

Identifier: 2m identifiers

Machine B is the successor node of key 1.successor (1) = 1

62

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.23

Consistent hashing (3/3)

01

4

35

7

A

C

B

62

If machine C leaves circle,Successor(5) will points to A

If machine N joins circle,successor(2) will points to N

New node N

Page 13: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

13

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.24

Scalable Key location

• In consistent hashing:• Each node need only be aware of its successor node on the circle• Queries can be passed around the circle until falling into the first bucket it contains

the specified key

• What is the disadvantage of this scheme?

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.25

Scalable Key location

• In consistent hashing:• Each node need only be aware of its successor node on the circle• Queries can be passed around the circle until falling into the first bucket it contains

the specified key

• What is the disadvantage of this scheme?

• It may require traversing all N nodes to find the appropriate mapping

Page 14: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

14

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.26

Dynamo’s partitioning• Zero-hop DHT

• Each node maintains entire mapping list of key range and the storage node• Inspired by Consistent Hashing and Chord

• When a node starts for the first time• Chooses its set of tokens (virtual nodes in the consistent hash space)• Maps nodes to their respective token sets• Stores both tokens and nodes onto disk

• Repeated reconciliation of the membership change

• Partitioning and placement information are propagated via the gossip-based protocol• Token ranges handled by its peers

• Direct forwarding of read/write operations are possible

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.27

Zero hop DHT

01

4

35

7

A

C

B

62

N

Key Range Successor[5, 0) A[0, 1) B[1,3) N[3,5) C

Key Range Successor[5, 0) A[0, 1) B[1,3) N[3,5) C

Key Range Successor[5, 0) A[0, 1) B[1,3) N[3,5) C Key Range Successor

[5, 0) A[0, 1) B[1,3) N[3,5) C

Page 15: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

15

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.28

Zero-hop DHT

01

4

35

7

A

C

B

62

N

Key Range Successor[5, 0) A

[0, 1) B

[1,3) N

[3,5) C

Query with the key 4

Forwarding th

e queryRe

triev

e the

valu

e for

key 4

Lookup the table

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.29

Part 2. Large scale data storage system

NoSQL Storage: 1. Key-Value Stores (Dynamo)(1) Partitioning

(2) High Availability for writes(3) Handling temporary failures

(4) Recovering from permanent failures(5) Membership and failure detection

Page 16: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

16

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.30

Replication (1/3)

• Dynamo replicates its data on multiple hosts• Each data item is replicated at R hosts, where R is a parameter configured “per-instance”

• Each key k is assigned to a coordinator node • The coordinator is managing the replication of the data items that fall within its

range• Stores at the R-1 clockwise successor nodes in the ring

• Coordinator node and and (R-1) consecutive successor nodes• Each node is responsible for the region of the ring between it and its Rth

predecessor

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.31

Replication (2/3)

01

4

35

6

A

D

B

Machine D will store the keys,[0,1),[1,3), and [3,5)

C

7

2

Page 17: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

17

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.32

Replication (2/3)

01

4

35

6

A

D

B

Machine D will store the keys,[0,1),[1,3), and [3,5)

C

7

2

Machine C will store the keys,???

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.33

Replication (2/3)

01

4

35

6

A

D

B

Machine D will store the keys,[0,1),[1,3), and [3,5)

C

7

2

Machine C will store the keys,[1,3), [0,1), and [5,0)

Page 18: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

18

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.34

Replication (2/3)

01

4

35

6

A

D

B

Machine D will store the keys,[0,1),[1,3), and [3,5)

C

7

2

Machine C will store the keys,[1,3), [0,1), and [5,0)

Machine B will store the keys,???

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.35

Replication (2/3)

01

4

35

6

A

D

B

Machine D will store the keys,[0,1),[1,3), and [3,5)

C

7

2

Machine C will store the keys,[1,3), [0,1), and [5,0)

Machine B will store the keys,[0,1), [5,0) and [3,5)

Page 19: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

19

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.36

Replication (3/3)

• Preference list• The list of nodes that is responsible for storing a particular key

• If there are node failures• Preference list contains more than R nodes

• Virtual nodes can reduce actual number of machines in “R nodes”• The preference list for a key is constructed only by distinct physical nodes

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.37

Data Versioning

• Dynamo provides eventual consistency• Allows for updates to be propagated to all replicas asynchronously• A put call may return to its caller before the update has been applied to all the

replicas

Page 20: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

20

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.38

“Add to Cart” example

• The shopping cart application requires that an “Add to Cart” operation can never be forgotten or rejected

• If the most recent cart is not available and user makes changes to an old version of the cart• Still the change should be meaningful and preserved

• “add to cart” and “delete item from cart” should be translated into putoperation to Dynamo

• The divergent versions are reconciled later

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.39

Maintaining vector clock

• Dynamo treats the result of each modification as a new and immutable version of the data

• Multiple version of data can be in the system

• Version branching

• Due to the failure(s) in node(s), there are conflicting versions of an object

• Merging

• Collapses multiple branches of data evolution back into one

• Semantic reconciliation• e.g. merging different versions of shopping cart and preserving all of the items those client put into

the cart

Page 21: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

21

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.40

Vector clocks

• Used to capture causality between different versions of same object• Two versions of object are on parallel branches or have a causal ordering

• Vector clock• A list of (node, counter) pairs• One vector clock is associated with every version of every object

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.41

Definition of the vector clocks

• VC(x) denotes the vector clock of event x• VC(x)z denotes the component of that clock for process z

• x à y denotes that event x happens before event y• If x à y, then VC(x) < VC(y)

VC(x)<VC(y)⇔∀z[VC(x)z ≤VC(y)z ]∧∃z '[VC(x)z ' <VC(y)z ' ]

Page 22: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

22

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.42

Examples

• VC(D1)=([Sx, 3], [Sy, 2], [Sz, 2], [Sq, 2])• VC(D2)=([Sx, 3], [Sy, 2], [Sz, 2], [Sq, 1])• VC(D3)=([Sx, 3], [Sy, 2], [Sq, 1])• VC(D4)=([Sx, 3], [Sy, 3], [Sz, 2], [Sq, 1])• Is VC(D1) > VC(D2)?• Is VC(D3) > VC(D2)?• Is VC(D4) > VC(D2)?

VC(x)<VC(y)⇔∀z[VC(x)z ≤VC(y)z ]∧∃z '[VC(x)z ' <VC(y)z ' ]

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.43

Examples

• VC(D1)=([Sx, 3], [Sy, 2], [Sz, 2], [Sq, 2])• VC(D2)=([Sx, 3], [Sy, 2], [Sz, 2], [Sq, 1])• VC(D3)=([Sx, 3], [Sy, 2], [Sq, 1])• VC(D4)=([Sx, 3], [Sy, 3], [Sz, 2], [Sq, 1])• Is VC(D1) > VC(D2)? YES• Is VC(D3) > VC(D2)? NO• Is VC(D4) > VC(D2)? YES

VC(x)<VC(y)⇔∀z[VC(x)z ≤VC(y)z ]∧∃z '[VC(x)z ' <VC(y)z ' ]

Page 23: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

23

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.44

Properties of the vector clocks

• If VC(a) < VC(b), then a à b

• Antisymmetry:

• If VC(a)<VC(b) then NOT (VC(b) < VC(a))

• Transitivity

• If VC(a)<VC(b) and VC(b)<VC(c), then VC(a)<VC(c)

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.45

D1([Sx,1])

Write handled by Sx

D2([Sx,2])

Write handled by Sx

D3([Sx,2],[Sy,1])

Write handled by Sy

D4([Sx,2],[Sz,1])

Write handled by Sz

Object Dx

Node performed updateAnd updated replicas

counter

Some of the replications might still have(Sx,1)

Could not get the time Vector of ([Sx,2],[Sy,1])

Node Failure in SyBefore updating replicas

D5([Sx,3],[Sy,1],[Sz,1])

Reconciled and written by Sx

Write handled by Sx

Page 24: Sangmi Lee Pallickara - CSUcs435/slides/week13-B-2.pdfOverview of Dynamo •Partitions and replicates data using consistent hashing •Tracks object version to provide consistency

CS435 Introduction to Big DataFall 2019 Colorado State University

11/20/2019 Week 13-BSangmi Lee Pallickara

24

11/20/2019 CS435 Introduction to Big Data - Fall 2019 W13.B.46

Execution of get and put operations

• Users can send the operations to any storage node in Dynamo

• Coordinator• A node handling a read or write operation• The top N nodes in the preference list

• Client can select a coordinator• Route its request through a generic load balancer• Use a partition-aware client library

• Directly access the coordinators