Data Management in Large-Scale Distributed Systems ...

Data Management in Large-Scale DistributedSystems

Apache Cassandra

Thomas Ropars

thomas.ropars@univ-grenoble-alpes.fr

http://tropars.github.io/

References

• The lecture notes of V. Leroy

• The lecture notes of F. Zanon Boito

• Designing Data-Intensive Applications by Martin KleppmannI Chapters 2 and 7

In this lecture

• The design of Apache Cassandra

• Scaling to large number of nodesI Trade-off between consistency and performanceI Distributed hash tablesI Quorum systems

Agenda

Introduction

Paritioning and Data Distribution

Consistency and Replication

Apache Cassandra

• Column family data store• Paper published by Facebook in 2010 (A. Lakshman and P.

Malik)I Used for implementing search functionalitiesI Released as open source

• Warning: Some changes in the design have been made sincethe first version of Cassandra

Disclaimer

About the information provided in this lecture:

• Not necessarily up-to-date with to the most recent version ofCassandra

• The goal is to understand some generally applicable ideas• We are not going to describe all parts of Cassandra:

I Focus on partitioning and consistency

The design principles of Cassandra are mostly inspired from othersystems:

• Google BigTableI Storage engine

• Amazon DynamoI Partitioning and consistency

Agenda

Introduction

Partitioning in Cassandra

Partitioning based on a hashed name space

• Data items are identified by keys• Data are assigned to nodes based on a hash of the key

I Tries to avoid hot spots

Namespace represented as a ring

• Allows increasing incrementally the size of the system• Each node is assigned a random identifier

I Defines the position of a node in the ring

• Each node is responsible for all the keys in the range betweenits identifier and the one of the previous node.

Partitioning in Cassandra

Better version of the partitioning

Limits of the current approach:

High risk of imbalance

• Some nodes may store more keys than othersI Nodes are not necessarily well distributed on the ringI Especially true with a low number of nodes

• Issues when nodes join or leave the systemI When a node joins, it gets part of the load of its successorI When a node leaves, all the corresponding keys are assigned to

the successor

Concept of virtual nodes

• Assign multiple random positions to each node

Limits of the current approach: High risk of imbalance

the successor

Limits of the current approach: High risk of imbalance

the successor

Partitioning and virtual nodes

The key space is better distributed between the nodes

Partitioning and virtual nodes

If a node crashes, the load is redistributed between multiple nodes

Partitioning and replication

Items are replicated for fault tolerance.

Strategies for replica placement

• Simple strategy

I Place replicas on the next R nodes in the ring

• Topology-aware placement

I Iterate through the nodes clockwise until finding a nodemeeting the required condition

I For example a node in a different rack

Partitioning and replication

Items are replicated for fault tolerance.

Strategies for replica placement

• Simple strategyI Place replicas on the next R nodes in the ring

• Topology-aware placementI Iterate through the nodes clockwise until finding a node

meeting the required conditionI For example a node in a different rack

Agenda

Introduction

Replication in Cassandra

Replication is based on quorums

• A read/write request might be sent to a subset of the replicasI To tolerate f faults, it has to be sent to f + 1 replicas

Consistency

• The user can choose the level of consistencyI Trade-off between consistency and performance (and

availability)

• Eventual consistencyI If an item is modified, readers will eventually see the new value

A Read/Write requestFigure from https://dzone.com/articles/introduction-apache-cassandras

• A client can contact any node in the system• The coordinator contacts all replicas• The coordinator waits for a specified number of responses

before sending an answer to the client 16

Consistency levels

ONE (default level)

• The coordinator waits for one ack on write before answeringthe client

• The coordinator waits for one answer on read beforeanswering the client

• Lowest level of consistencyI Reads might return stale valuesI We will still read the most recent values in most cases

QUORUM

• The coordinator waits for a majority of acks on write beforeanswering the client

• The coordinator waits for a majority of answers on read beforeanswering the client

• High level of consistencyI At least one replica will return the most recent value

Additional references

Mandatory reading

• Cassandra: a decentralized structured storage system ., A.Lakshman et al., SIGOPS OS review, 2010.

Data Management in Large-Scale Distributed Systems ...

Documents

Transcript of Data Management in Large-Scale Distributed Systems ...

Scaling Distributed Machine Learningmuli/file/mu_defense.pdf · Scaling Distributed Machine Learning . 2 min w Xn i=1 f i (w) Large-scale problems Distributed systems Large scale

A Distributed Algorithm for Large-Scale Graph PartitioningA Distributed Algorithm for Large-Scale Graph Partitioning ... and Seif Haridi. 2015. A distributed algorithm for large-scale

Scheduling on Large Scale Distributed Platforms: From ...

Distributed large scale simulation of synchronous slow ...

Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Dapper, a Large-Scale Distributed Systems Tracing ...denglab.org/cloudcomputing/files/dapper.pdf · Dapper, a Large-Scale Distributed Systems Tracing Infrastructure ... perience building,

Distributed Control Design for Large-Scale Interconnected ...ece-research.unm.edu/chaouki/PAPERS/Journals/Complex Networks … · Distributed Control Design for Large-Scale Interconnected

Distributed Fair Spectrum Assignment for Large-Scale Wireless …web.engr.oregonstate.edu/~hamdaoui/papers/2015/bassem-crowncom … · Distributed Fair Spectrum Assignment for Large-Scale

Onix: A Distributed Control Platform for Large-scale ...mmlab.snu.ac.kr/.../presentation/20120416_Onix.pdf · Onix: A Distributed Control Platform for Large-scale Production Networks

Large-Scale Neurocognitive Networks and Distributed ...becs.aalto.fi/~eglerean/readings/Mesulam_1990_Large... · Title: Large-Scale Neurocognitive Networks and Distributed Processing

Cloud Aware Large Scale Distributed SOA

Advanced Join Strategies for Large-Scale Distributed ... · Advanced Join Strategies for Large-Scale Distributed Computation ... distributed data storage and processing systems on

Large-Scale Distributed Locality-Sensitive Hashing for ...eliezers/sisap2014preprint.pdf · Large-Scale Distributed Locality-Sensitive Hashing for General Metric Data Eliezer Silva2,

Distributed Computing Practice for Large-Scale Science ...sjha/select_publications/dpa_surveypaper.p… · Distributed Computing Practice for Large-Scale Science & Engineering Applications

Private Large-Scale Databases with Distributed Searchable ...

Routing of Structured Queries in Large-Scale Distributed Systems Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS_IR'08) @ ACM.

ENERGY EFFICIENCY IN LARGE SCALE DISTRIBUTED SYSTEMS

Designing large scale distributed systems

Data Distribution Management in Large-Scale Distributed ...

Large-scale image segmentation based on distributed ...