Cassandra Workshop - Cassandra from scratch in one day

Post on 13-Jan-2017

1.742 views 2 download

Transcript of Cassandra Workshop - Cassandra from scratch in one day

@calonso

CASSANDRA WORKSHOPCassandra from scratch in one day.

@calonso

• Introductions

• Cassandra Core concepts

• CQL

• Data modelling

• More Cassandra Concepts

• Hardware Considerations

@calonso

CARLOS ALONSO

• Spanish Londoner

• MSc Salamanca University, Spain

• Software Engineer @MyDrive Solutions

• Cassandra certified developer

• Cassandra MVP 2015

• @calonso / http://mrcalonso.com

@calonso

MYDRIVE SOLUTIONS

• World leading driver profiling company

• Using technology and data to understand how to improve driving behaviour

• Recently acquired by the Generali Group

• @_MyDrive / http://mydrivesolutions.com

• We are hiring!!

@calonso

AND YOU?

@calonso

AND YOU?I’m a Db admin (ORACLE?) and I want to learn Cassandra

@calonso

AND YOU?I’m a Db admin (ORACLE?) and I want to learn Cassandra

I’m rolling in production with Cassandra

@calonso

AND YOU?I’m a Db admin (ORACLE?) and I want to learn Cassandra

I’ve never heard about NoSQLI’m rolling in production with Cassandra

@calonso

AND YOU?I’m a Db admin (ORACLE?) and I want to learn Cassandra

I’ve never heard about NoSQLI’ve heard about Cassandra and

want to get my hands on it

I’m rolling in production with Cassandra

@calonso

AND YOU?I’m a Db admin (ORACLE?) and I want to learn Cassandra

I’ve never heard about NoSQL

I’ve never heard about SQL

I’ve heard about Cassandra and want to get my hands on it

I’m rolling in production with Cassandra

@calonso

AND YOU?I’m a Db admin (ORACLE?) and I want to learn Cassandra

I’ve never heard about NoSQL

I’ve never heard about SQL

I don’t know what I’m doing here

I’ve heard about Cassandra and want to get my hands on it

I’m rolling in production with Cassandra

@calonso

AND YOU?I’m a Db admin (ORACLE?) and I want to learn Cassandra

I’ve never heard about NoSQL

I’ve never heard about SQL

I don’t know what I’m doing here

I’ve heard about Cassandra and want to get my hands on it

I’m evaluating Cassandra as a potential solution

I’m rolling in production with Cassandra

@calonso

AND YOU?I’m a Db admin (ORACLE?) and I want to learn Cassandra

I’ve never heard about NoSQL

I’ve never heard about SQL

I don’t know what I’m doing here

I’ve heard about Cassandra and want to get my hands on it

I’ve using Cassandra for sometests and want to go deeperI’m evaluating Cassandra as

a potential solution

I’m rolling in production with Cassandra

@calonso

CASSANDRA

• A.k.a Alexandra or Kassandra

• Daughter of King Priam and Queen Hecuba of Troy.

• Apollo gave her the power of prophecy to seduce her. She refused and then Apollo spat on her mouth cursing her never to be believed.

• https://en.wikipedia.org/wiki/Cassandra

@calonso

CASSANDRA

• Open Source distributed database management system

• Initially developed at Facebook

• Inspired by Amazon’s Dynamo and Google BigTable papers

• Became Apache top-level project in Feb, 2010

• Nowadays developed by DataStax

@calonso

WHY CASSANDRA?

“Cassandra is the cursed ORACLE”

@calonso

WHY CASSANDRA?

“Cassandra is the cursed ORACLE”

@calonso

CASSANDRA CORE CONCEPTSTechnical introduction to Apache Cassandra

@calonsoNOSQL

@calonso

BIG DATA REQUIREMENTS

• Everywhere

• Fast

• Always available

• Consistent

+

Ingestion Consumption

@calonso

THE CAP THEOREM

@calonso

SCALINGVertical Horizontal

@calonso

CASSANDRA• Fast Distributed NoSQL Database

• High Availability

• Linear Scalability => Predictability

• No SPOF

• Multi-DC

• Horizontally scalable => $$$

• Not a drop in replacement for RDBMS

@calonso

CASSANDRA CLUSTER

@calonsoREPLICATION FACTOR

How many copies (replicas) for your data

@calonsoCONSISTENCY LEVEL

How many replicas of your data must respond ok?

@calonso

CASSANDRA DATA MODEL• Query driven data model

• Column family non relational db

@calonso

CQL

CREATE TABLE users ( id UUID, name VARCHAR, surname VARCHAR, birthdate TIMESTAMP, PRIMARY KEY(id));

Familiar row-column SQL-like approach.

INSERT INTO users (id, name, surname, birthdate) VALUES (uuid(), ‘Carlos’, ‘Alonso’, ’1985-03-19’);

SELECT * FROM users WHERE id = ‘f81d4fae-7dec-11d0-a765-00a0c91e6bf6’;

ALTER TABLE users ADD address VARCHAR;

@calonso

DISTRIBUTIONS

• Latest features

• JIRA

• Support via mailing list & IRC

• http://cassandra.apache.org

@calonso

DISTRIBUTIONS

• Integrated Solr for Multi-DC Search

• Integrated Spark for Analytics

• Free Startup Program

• Expert support

• Focused on stable releases for enterprises

• http://www.datastax.com/products/datastax-enterprise

@calonso

CASSANDRA: YES• If you need:

• No SPOF

• Linear horizontal scalability in commodity hardware

• Real-time writes

• Reliable data replication across distributed data centres

• Clearly defined schema in a NoSQL environment

@calonso

CASSANDRA: NO

• If you need:

• ACID transactions with rollback

• Justification for high-end software

@calonso

REVIEW QUESTIONSWhat do consistency, availability and partition tolerance mean?

@calonso

REVIEW QUESTIONSWhat do consistency, availability and partition tolerance mean?

Consistency: All clients have the exact same value for the whole data set at any given point.

@calonso

REVIEW QUESTIONSWhat do consistency, availability and partition tolerance mean?

Consistency: All clients have the exact same value for the whole data set at any given point.

Availability: All clients can read and write to the system at any given point.

@calonso

REVIEW QUESTIONSWhat do consistency, availability and partition tolerance mean?

Consistency: All clients have the exact same value for the whole data set at any given point.

Availability: All clients can read and write to the system at any given point.

Partition tolerance: Whether or not the system tolerates a node being disconnected from the system.

@calonso

REVIEW QUESTIONS

Where does Cassandra fit within the CAP Theorem?

@calonso

REVIEW QUESTIONS

Where does Cassandra fit within the CAP Theorem?

AP: Cassandra trades off consistency in order to guarantee availability and partition tolerance, but in a configurable way, so it’s

up to the developer where to sit for each query.

@calonso

REVIEW QUESTIONS

Which are the technological roots of Cassandra?

@calonso

REVIEW QUESTIONS

Which are the technological roots of Cassandra?

Google BigTable and Amazon Dynamo pulled together by developers at Facebook

@calonso

REVIEW QUESTIONS

What technology does Cassandra use to model data?

@calonso

REVIEW QUESTIONS

What technology does Cassandra use to model data?

CQL: Cassandra Query Language

@calonso

INSTALLATIONInstalling, configuring and running Cassandra

@calonso

REQUIREMENTS

JAVA >= 1.7.0_25

All nodes synchronised (NTP)

@calonso

INSTALLATION

http://cassandra.apache.org/download/

@calonso

CONFIGURATION• cluster_name

• listen_address

• rpc_address

• commitlog_directory

• data_file_directories

• saved_caches_directory

conf/cassandra.yaml

@calonso

CONFIGURATION• MAX_HEAP_SIZE

• if system memory < 2G => 1/2 of it

• if between 2G and 4G => 1G

• if > 4G => 1/4 of it but no more than 8G

• HEAP_NEWSIZE

• 1/4 of MAX_HEAP_SIZE

conf/cassandra-env.sh

@calonso

START/STOP

sudo bin/cassandra

sudo service cassandra start

ctrl - csudo bin/cassandra [-f]

ps aux | grep cassandra sudo kill <pid>

sudo service cassandra stop

@calonso

START/STOP

Node localhost/127.0.0.1 state jump to normal

@calonso

REVIEW QUESTIONS

Which setting determines a node’s cluster? Where is it configured?

@calonso

REVIEW QUESTIONS

Which setting determines a node’s cluster? Where is it configured?

cluster_name: In conf/cassandra.yaml

@calonso

REVIEW QUESTIONS

How would you stop a Cassandra instance running in background in an Unix based machine?

@calonso

REVIEW QUESTIONS

How would you stop a Cassandra instance running in background in an Unix based machine?

1. Get the PID: ps aux | grep cassandra2. Kill the process: kill <pid>

@calonso

REVIEW QUESTIONS

Which settings would you adjust to tune how much memory Cassandra uses?

In which file?

@calonso

REVIEW QUESTIONS

Which settings would you adjust to tune how much memory Cassandra uses?

In which file?

MAX_HEAP_SIZE in conf/cassandra-env.sh

@calonso

BASIC TOOLSKnowing tools required for basic Cassandra management

NODETOOLThe command line swiss army knife.

@calonso

NODETOOL

status: displays cluster state, load, host ID and token

info: displays node memory use, disk load, uptime …

ring: displays node status and cluster ring state

help: displays all possible commands and description

CQLSHOur data management and first

exploration tool

@calonso

CQLSH

DESC[RIBE]: shows information of the arguments

SOURCE: executes a file containing CQL statements

TRACING: enables/disables the tracing mode

help: shows available cqlsh + CQL commands

SELECT, ALTER, INSERT, …

CASSANDRA-STRESS

Our tool to assess performance

@calonso

CASSANDRA-STRESS

read: to execute a read-only workload

mixed: executes mixed workload

user: user defined schema and workloads

write: to execute a write-only workload

CCMOne tool to manage them all.

@calonso

CCM

• Python 2.7 +

• PyYAML

• Six

• Ant

• Loopback IP aliases (Mac OS)

Prerequisites

github: pcmanus/ccm

• Testing tool

• Communicates with localhost only

Limitations

@calonso

CCM

start/stop: starts/stops all nodes in cluster

status: shows current cluster status<node> <command>: runs command connecting to nodei.e: ccm node1 cqlsh

create: downloads, compiles and builds cluster

@calonso

REVIEW QUESTIONS

Which tool/command would I use to know the start/stop status of a particular node of my cluster

@calonso

REVIEW QUESTIONS

Which tool/command would I use to know the start/stop status of a particular node of my cluster

nodetool status

@calonso

REVIEW QUESTIONS

Name and describe two non CQL commands allowed in cqlsh.

@calonso

REVIEW QUESTIONS

Name and describe two non CQL commands allowed in cqlsh.

CAPTURE COPY DESCRIBE EXPAND PAGING SOURCECONSISTENCY DESC EXIT HELP SHOW TRACING

@calonso

REVIEW QUESTIONS

Can I manage my production cluster remotely using CCM?

@calonso

REVIEW QUESTIONS

Can I manage my production cluster remotely using CCM?

No, that’s CCM’s biggest limitation. Only connects to localhost.

@calonso

REVIEW QUESTIONS

What happens if, in a cqlsh session I type: DESCRIBE KEY and press TAB?

@calonso

INTERNAL ARCHITECTUREInternal processes that make Cassandra work

@calonso

CLUSTER COMPONENTS• Column: The smallest key-value pair.

• Row: Collection of columns. Identified by a row key.

• Partition: Bucket containing several rows. Identified by a token.

• Node: a Cassandra instance. Contains a token range.

• Rack: a logical set of nodes

• Data Center : a logical set of racks.

• Cluster : The full set of nodes. Covers a whole token ring.

CONSISTENT HASHING

Which node holds this data?

– Wikipedia

“Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.

Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using

the original value.”

@calonso

CONSISTENT HASHING

• Data is stored in partitions, identified by a unique token within the range (-2^63 - 2^63)

• Nodes contain partition ranges.

@calonso

THE PARTITIONER• System running on each node that

computes hashes through a hash function.

• Various partitioners available.

• Default is murmur3

• All nodes MUST use the same!!!!

Hash function

“Carlos” 185664

1773456738847666528349

-894763734895827651234

@calonso

VNODES

@calonso

VNODES

@calonso

REQUEST COORDINATIONHow are client requests coordinated?

@calonso

THE COORDINATOR

• The node designated to coordinate a particular query.

• ANY node can coordinate ANY request.

• No SPOF: One of the main Cassandra’s principles.

• The driver chooses which node will coordinate

@calonso

A FULL EXAMPLE

CREATE TABLE users ( id UUID, name VARCHAR, surname VARCHAR, birthdate TIMESTAMP, PRIMARY KEY(id));

SELECT * FROM users WHERE id = ‘f81d4fae-7dec-11d0-a765-00a0c91e6bf6’;

DriverClient

DriverPartitioner

CREATE KEYSPACE test WITH REPLICATION = { “class”: “SimpleStrategy”, “replication_factor”: 3 };

CONSISTENCY QUORUM;

@calonso

A FULL EXAMPLE

CREATE TABLE users ( id UUID, name VARCHAR, surname VARCHAR, birthdate TIMESTAMP, PRIMARY KEY(id));

SELECT * FROM users WHERE id = ‘f81d4fae-7dec-11d0-a765-00a0c91e6bf6’;

DriverClient

DriverPartitioner

CREATE KEYSPACE test WITH REPLICATION = { “class”: “SimpleStrategy”, “replication_factor”: 3 };

CONSISTENCY QUORUM;

@calonso

A FULL EXAMPLE

CREATE TABLE users ( id UUID, name VARCHAR, surname VARCHAR, birthdate TIMESTAMP, PRIMARY KEY(id));

SELECT * FROM users WHERE id = ‘f81d4fae-7dec-11d0-a765-00a0c91e6bf6’;

DriverClient

DriverPartitioner

f81d4fae-…

CREATE KEYSPACE test WITH REPLICATION = { “class”: “SimpleStrategy”, “replication_factor”: 3 };

CONSISTENCY QUORUM;

@calonso

A FULL EXAMPLE

CREATE TABLE users ( id UUID, name VARCHAR, surname VARCHAR, birthdate TIMESTAMP, PRIMARY KEY(id));

SELECT * FROM users WHERE id = ‘f81d4fae-7dec-11d0-a765-00a0c91e6bf6’;

DriverClient

DriverPartitioner

f81d4fae-…834

CREATE KEYSPACE test WITH REPLICATION = { “class”: “SimpleStrategy”, “replication_factor”: 3 };

CONSISTENCY QUORUM;

@calonso

A FULL EXAMPLE

CREATE TABLE users ( id UUID, name VARCHAR, surname VARCHAR, birthdate TIMESTAMP, PRIMARY KEY(id));

SELECT * FROM users WHERE id = ‘f81d4fae-7dec-11d0-a765-00a0c91e6bf6’;

DriverClient

DriverPartitioner

f81d4fae-…834

CREATE KEYSPACE test WITH REPLICATION = { “class”: “SimpleStrategy”, “replication_factor”: 3 };

CONSISTENCY QUORUM;

@calonso

A FULL EXAMPLE

CREATE TABLE users ( id UUID, name VARCHAR, surname VARCHAR, birthdate TIMESTAMP, PRIMARY KEY(id));

SELECT * FROM users WHERE id = ‘f81d4fae-7dec-11d0-a765-00a0c91e6bf6’;

DriverClient

DriverPartitioner

f81d4fae-…834

CREATE KEYSPACE test WITH REPLICATION = { “class”: “SimpleStrategy”, “replication_factor”: 3 };

CONSISTENCY QUORUM;

@calonso

A FULL EXAMPLE

CREATE TABLE users ( id UUID, name VARCHAR, surname VARCHAR, birthdate TIMESTAMP, PRIMARY KEY(id));

SELECT * FROM users WHERE id = ‘f81d4fae-7dec-11d0-a765-00a0c91e6bf6’;

DriverClient

DriverPartitioner

f81d4fae-…834

CREATE KEYSPACE test WITH REPLICATION = { “class”: “SimpleStrategy”, “replication_factor”: 3 };

CONSISTENCY QUORUM;

REPLICATIONHow many copies of your data?

@calonso

WHY REPLICATION?

• Disaster recovery

• Bring data closer to users (to reduce latencies)

• Workload segregation (analytical vs transactional)

@calonso

REPLICATIONDefined at keyspace level

CREATE KEYSPACE <my_keyspace> WITH REPLICATION = { “class”: “SimpleStrategy”, “replication_factor”: 2 };

CREATE KEYSPACE <my_keyspace> WITH REPLICATION = { “class”: “NetworkTopologyStrategy”,

“dc-east”: 2, “dc-west”: 3 };

@calonso

SIMPLESTRATEGY

DriverClient

CREATE KEYSPACE <my_keyspace> WITH REPLICATION = { “class”: “SimpleStrategy”, “replication_factor”: 3 };

Token: 834

@calonso

SIMPLESTRATEGY

DriverClient

CREATE KEYSPACE <my_keyspace> WITH REPLICATION = { “class”: “SimpleStrategy”, “replication_factor”: 3 };

Token: 834

@calonso

SIMPLESTRATEGY

DriverClient

CREATE KEYSPACE <my_keyspace> WITH REPLICATION = { “class”: “SimpleStrategy”, “replication_factor”: 3 };

Token: 834

@calonso

SIMPLESTRATEGY

DriverClient

CREATE KEYSPACE <my_keyspace> WITH REPLICATION = { “class”: “SimpleStrategy”, “replication_factor”: 3 };

Token: 834

@calonso

NETWORKTOPOLOGYSTRATEGY

DriverClient

Token: 834

CREATE KEYSPACE <my_keyspace> WITH REPLICATION = { “class”: “NetworkTopologyStrategy”,

“dc-east”: 2, “dc-west”: 3 };

dc-east

rack-1

rack-2

rack-1

dc-west

rack-2

@calonso

NETWORKTOPOLOGYSTRATEGY

DriverClient

Token: 834

CREATE KEYSPACE <my_keyspace> WITH REPLICATION = { “class”: “NetworkTopologyStrategy”,

“dc-east”: 2, “dc-west”: 3 };

dc-east

rack-1

rack-2

rack-1

dc-west

rack-2

@calonso

NETWORKTOPOLOGYSTRATEGY

DriverClient

Token: 834

CREATE KEYSPACE <my_keyspace> WITH REPLICATION = { “class”: “NetworkTopologyStrategy”,

“dc-east”: 2, “dc-west”: 3 };

dc-east

rack-1

rack-2

rack-1

dc-west

rack-2

@calonso

NETWORKTOPOLOGYSTRATEGY

DriverClient

Token: 834

CREATE KEYSPACE <my_keyspace> WITH REPLICATION = { “class”: “NetworkTopologyStrategy”,

“dc-east”: 2, “dc-west”: 3 };

dc-east

rack-1

rack-2

rack-1

dc-west

rack-2

@calonso

NETWORKTOPOLOGYSTRATEGY

DriverClient

Token: 834

CREATE KEYSPACE <my_keyspace> WITH REPLICATION = { “class”: “NetworkTopologyStrategy”,

“dc-east”: 2, “dc-west”: 3 };

dc-east

rack-1

rack-2

rack-1

dc-west

rack-2

@calonso

NETWORKTOPOLOGYSTRATEGY

DriverClient

Token: 834

CREATE KEYSPACE <my_keyspace> WITH REPLICATION = { “class”: “NetworkTopologyStrategy”,

“dc-east”: 2, “dc-west”: 3 };

dc-east

rack-1

rack-2

rack-1

dc-west

rack-2

@calonso

WHAT IF A NODE OR DC IS DOWN?Hinted Handoff to the rescue!

DriverClient

X

@calonso

WHAT IF A NODE OR DC IS DOWN?Hinted Handoff to the rescue!

DriverClient

X

@calonso

WHAT IF A NODE OR DC IS DOWN?Hinted Handoff to the rescue!

DriverClient

X

834

@calonso

WHAT IF A NODE OR DC IS DOWN?Hinted Handoff to the rescue!

DriverClient

X

834

@calonso

WHAT IF A NODE OR DC IS DOWN?Hinted Handoff to the rescue!

DriverClient

X

834

834

@calonso

WHAT IF A NODE OR DC IS DOWN?Hinted Handoff to the rescue!

DriverClient

X

834

834

834

@calonso

WHAT IF A NODE OR DC IS DOWN?Hinted Handoff to the rescue!

DriverClient

834

834

834

@calonso

WHAT IF A NODE OR DC IS DOWN?Hinted Handoff to the rescue!

DriverClient

834

834

834

@calonso

WHAT IF A NODE OR DC IS DOWN?Hinted Handoff to the rescue!

DriverClient

834

834

834

@calonso

WHAT IF A NODE OR DC IS DOWN?Hinted Handoff to the rescue!

DriverClient

834

834

834

CONSISTENCYHow much consistency do we want?

@calonso

CONSISTENCY LEVEL

How many nodes must to successfully write for the write to be success?

How many nodes must send their data for the read to be success?

@calonso

CONSISTENCY LEVEL

RF = 3

@calonso

CONSISTENCY LEVEL

RF = 3

ANY (Only writes)

@calonso

CONSISTENCY LEVEL

RF = 3

ANY (Only writes)

ONE, TWO, THREE

@calonso

CONSISTENCY LEVEL

RF = 3

ANY (Only writes)

ONE, TWO, THREE

LOCAL_ONE

@calonso

CONSISTENCY LEVEL

RF = 3

ANY (Only writes)

ONE, TWO, THREE

QUORUM = floor(RF / 2 + 1)

LOCAL_ONE

@calonso

CONSISTENCY LEVEL

RF = 3

ANY (Only writes)

ONE, TWO, THREE

QUORUM = floor(RF / 2 + 1)

LOCAL_ONE

LOCAL_QUORUM

@calonso

CONSISTENCY LEVEL

RF = 3

ANY (Only writes)

ONE, TWO, THREE

QUORUM = floor(RF / 2 + 1)

ALL

LOCAL_ONE

LOCAL_QUORUM

@calonso

CONSISTENCY LEVEL

Availability /Partition tolerance Consistency

@calonso

DEMOPlay with RFs, CLs and hints

REPAIRStrengthening consistency.

@calonso

DIGEST QUERY

In consistent reads, only one node is asked for data, the others are asked for a digest.

@calonso

READ REPAIRWhat if nodes disagree?

DriverClient

CL >= QUORUM

SELECT city FROM …

@calonso

READ REPAIRWhat if nodes disagree?

DriverClient

CL >= QUORUM

Madrid: 123SELECT city FROM …

@calonso

READ REPAIRWhat if nodes disagree?

DriverClient

CL >= QUORUM

Madrid: 123

Salamanca: 125

SELECT city FROM …

@calonso

READ REPAIRWhat if nodes disagree?

DriverClient

CL >= QUORUM

Madrid: 123

Salamanca: 125

London: 150

SELECT city FROM …

@calonso

READ REPAIRWhat if nodes disagree?

DriverClient

CL >= QUORUM

Madrid: 123

Salamanca: 125

London: 150London

SELECT city FROM …

@calonso

READ REPAIRAnd if CL < QUORUM?

The coordinator will issue a read_repair based on read_repair_chance table property.

CREATE TABLE users ( …) WITH read_repair_chance = 0.1;

@calonso

MANUAL REPAIRLast defense against data entropy.

The nodetool repair command makes all data on a node consistent with the latest replicas in the cluster.

—partitioner-range: option to restrict repair to node’s primary range only

@calonso

MANUAL REPAIR• Run nodetool repair :

• Recovering a failed node

• Increasing RF

• Periodically on every node

• Sequentially

• Once a week

GOSSIPNodes gossip between themselves

@calonso

GOSSIP

• Every second

• Three nodes

• Heartbeat + Versioned information of the whole cluster.

@calonso

GOSSIP

• Provide consistent list of seeds

• At least one per DC

Nodes prefer (10%) to gossip with their seeds

@calonso

SNITCH• Allows the node to know its rack and data center topology.

• Enables replication in different racks

@calonso

SNITCH

• GossipingPropertyFileSnitch: config from cassandra-rackdc.properties and propagated by gossiping

• Ec2Snitch: Amazon EC2 aware. Single region. Single DC. Availability zone = Rack

• Ec2MultiRegionSnitch: Multiple regions. Region = DC.

• …

@calonso

REVIEW QUESTIONS

Describe the relationship of nodes, racks, data centers and clusters.

@calonso

REVIEW QUESTIONS

Describe the relationship of nodes, racks, data centers and clusters.

node > rack > data center > cluster

@calonso

REVIEW QUESTIONS

What is the function of the partitioner?

@calonso

REVIEW QUESTIONS

What is the function of the partitioner?

The partitioner’s function is to hash keys. Then the rest of the cluster uses that output to determine where the data should live.

@calonso

REVIEW QUESTIONS

Can a node hold a partition with a token outside its primary range?

@calonso

REVIEW QUESTIONS

Can a node hold a partition with a token outside its primary range?

Yes, if it’s replicating data for some other node, or if it’s holding a hint.

@calonso

REVIEW QUESTIONS

In a 3 nodes cluster with RF = 2. How much total volume does each node own?

@calonso

REVIEW QUESTIONS

In a 3 nodes cluster with RF = 2. How much total volume does each node own?

66%

@calonso

REVIEW QUESTIONS

What is the function of the nodetool repair operation?

@calonso

REVIEW QUESTIONS

What is the function of the nodetool repair operation?

Synchronising replicas.Ensuring the node’s data is the most recent.

@calonso

REVIEW QUESTIONS

What is a remote coordinator?

@calonso

REVIEW QUESTIONS

What is a remote coordinator?

When using multiple DCs and NetworkTopologyStrategy, at the point of replicating in the second DC, the only node that receives the data in that DC will coordinate the request there. Is the remote coordinator.

This is to avoid transmitting all data to all nodes from DC to DC.

@calonso

REVIEW QUESTIONS

How could RF and CL be tuned to ensure immediate consistency?

@calonso

REVIEW QUESTIONS

How could RF and CL be tuned to ensure immediate consistency?

• RF >= 3• CL Write = ONE and CL Read = ALL• CL Write = ALL and CL Read = ONE• CL Write = QUORUM and CL Read = Quorum

@calonso

CQLThe Cassandra Query Language

@calonso

PHYSICAL DATA STRUCTURE

DDL + DMLDefining our data shape

and actually using it

@calonso

DEV CENTER

@calonso

DDL

CREATE KEYSPACE musicdb WITH REPLICATION = { “class”: “SimpleStrategy”, “replication_factor”: 3};

DROP KEYSPACE musicdb;

USE musicdb

@calonso

PRACTICE TIME!

We need to build a system for an online electronic books reading site.

@calonso

PRACTICE TIME!

We need to build a system for an online electronic books reading site.

CREATE KEYSPACE e_library WITH REPLICATION = { “class”: “SimpleStrategy”, “replication_factor”: 3};

@calonso

DDLCREATE TABLE performer ( name VARCHAR, type VARCHAR, country VARCHAR, style VARCHAR, founded INT, born TIMESTAMP, died TIMESTAMP, PRIMARY KEY (name));

@calonso

PRIMARY KEY

PARTITION KEY +CLUSTERING COLUMN(S)

@calonso

PRIMARY KEY• Simple partition key, no clustering columns:

• PRIMARY KEY (name)

• Composite partition key, no clustering columns:

• PRIMARY KEY ((album_title, year))

• Simple partition key and clustering columns:

• PRIMARY KEY (album_title, number)

• Composite partition key and clustering columns:

• PRIMARY KEY ((album_title, year), number)

@calonso

PRIMARY KEYS

CREATE TABLE tracks_by_album ( album_title VARCHAR, year INT, performer VARCHAR STATIC, genre VARCHAR STATIC, number INT, track_title VARCHAR, PRIMARY KEY ((album_title, year), number));

CREATE TABLE albums_by_track ( track_title VARCHAR, performer VARCHAR, year INT, album_title VARCHAR, PRIMARY KEY ( track_title, performer, year, album_title));

CQL TYPE Constants DescriptionASCII strings US-ASCII character strings

BIGINT integers 64-bit signed longBLOB blobs Arbitrary bytes (no validation), as hexadecimal

BOOLEAN booleans true or falseCOUNTER integers Distributed counter value (64 bit long)DECIMAL integers or floats Variable precision decimalDOUBLE integers 64-bit IEEE-754 floating pointFLOAT integers, floats 32-bit IEEE-754 floating pointINET strings IP address string in IPV4 or IPV6 formatINT integers 32-bit signed integerLIST n/a A collection of one or more ordered elementsMAP n/a A JSON style array of literals { literal: literal, literal: literal, …}SET n/a A collection of one or more elements

TEXT strings UTF-8 encoded textTIMESTAMP integers, strings Date + time as mills since EPOCH

TUPLE n/a Up to 32k fieldsUUID uuids Standard UUID

VARCHAR strings UTF-8 encoded stringVARINT integers Arbitrary precision integer

TIMEUUID uuids Type I UUID

@calonso

INSERT

• CQL INSERTS are:

• Atomic: Either all the values are inserted or none

• Isolated: Two inserts on the exact same PK happen one after the other, no mixed values.

INSERT INTO albums_by_performer (performer, year, title, genre)VALUES (‘The Beatles’, 1966, ‘Revolver’, ‘Rock’);

@calonso

UPDATE

• Primary Key columns cannot be changed.

• Full Primary key is required as predicate.

• CQL UPDATES are:

• Atomic: Either all the values are inserted or none

• Isolated: Two inserts on the exact same PK happen one after the other, no mixed values.

UPDATE albums_by_performer SET genre = ‘Rock’WHERE performer = ‘The Beatles’ AND year = 1966 AND title = ‘Revolver’;

@calonso

UPSERT

INSERT INTO albums_by_performer (performer, year, title, genre)VALUES (‘The Beatles’, 1966, ‘Revolver’, ‘Rock’);

UPDATE albums_by_performer SET genre = ‘Rock’WHERE performer = ‘The Beatles’ AND year = 1966 AND title = ‘Revolver’;

==

@calonso

LWT

• Use at your own discretion:

• Cassandra uses Paxos algorithm to determine if the record exists or not.

• In total 6x performance penalty.

INSERT INTO albums_by_performer (performer, year, title, genre)VALUES (‘The Beatles’, 1966, ‘Revolver’, ‘Rock’) IF NOT EXISTS;

@calonso

PRACTICE TIME!

We need to design a system that holds users. Users will have name, ID card (unique), a phones list (home, mobile and work), birth date and an email address.

NOTE: As we haven’t studied SELECT, use SELECT * FROM <table name>; to inspect your data.

@calonso

PRACTICE TIME!

CREATE TABLE users ( ID VARCHAR PRIMARY KEY, name VARCHAR, home_phone VARCHAR, work_phone VARCHAR, mobile_phone VARCHAR, email VARCHAR, birth_date TIMESTAMP

);

@calonso

MORE DDL

ALTER TABLE album ADD cover_image VARCHAR;

ALTER TABLE album DROP cover_image;

ALTER TABLE album ALTER cover_image TYPE BLOB;

@calonso

MORE DDL

CREATE TABLE albums_by_genre ( genre VARCHAR, performer VARCHAR, year INT, album_title VARCHAR, PRIMARY KEY ( genre, performer, year, album_title)) WITH CLUSTERING ORDER BY (performer ASC, year DESC, title ASC);

@calonso

SECONDARY INDEXES• Tables are indexed on columns in a PK

• Search on a partition key is very efficient

• Search on a PK and Clustering column is very efficient

• Search on other things is not supported

• Secondary indexes allow indexing other columns to be queried.

• One index per column

@calonso

SECONDARY INDEXES

CREATE TABLE performer ( name VARCHAR, type VARCHAR, country VARCHAR, style VARCHAR, founded INT, born TIMESTAMP, died TIMESTAMP, PRIMARY KEY (name));

DROP INDEX performers_by_style;

CREATE INDEX performers_by_style ON perfomer (style);

@calonso

SECONDARY INDEXES• Same recommendations for RDBMS

• Use indexes on low cardinality fields

• Beware of the write overhead

• Every node indexes it local data therefore => a read hits all nodes!!

• Don’t use them. Use lookup tables instead.

@calonso

PRACTICE TIME!

We need to query the users by name.

@calonso

PRACTICE TIME!

We need to query the users by name.

CREATE INDEX users_by_name ON users (name);

@calonso

UUID

• Type 4 UUID

• Our way to ensure uniqueness in a distributed system.

7ffa4040-9132-4e0b-b04f-610e869d8717

@calonso

UUID

• Type 4 UUID

• Our way to ensure uniqueness in a distributed system.

7ffa4040-9132-4e0b-b04f-610e869d8717

@calonso

PRACTICE TIME!

Our system has another entity: Books. Books have a title and an author. We have no guarantee of any of them or even their combination to be unique.

@calonso

PRACTICE TIME!

Our system has another entity: Books. Books have a title and an author. We have no guarantee of any of them or even their combination to be unique.

CREATE TABLE books ( uid TIMEUUID PRIMARY KEY, title VARCHAR, author VARCHAR);

@calonso

TIMEUUID

• Timestamp + UUID

• Type 1 UUID

• Generated with CQL now() function

• Can extract the Timestamp with CQL dateof() function

c9cc9e60-711c-11e5-9d70-feff819cdc9f

@calonso

TIMEUUID

• Timestamp + UUID

• Type 1 UUID

• Generated with CQL now() function

• Can extract the Timestamp with CQL dateof() function

c9cc9e60-711c-11e5-9d70-feff819cdc9f

@calonso

TIMEUUID

CREATE TABLE track_ratings_by_user (user UUID,activity TIMEUUID,rating INT,album_title VARCHAR,album_year INT,track_title VARCHAR,PRIMARY KEY (user, activity)

) WITH CLUSTERING ORDER (activity DESC);

@calonso

TTL

• Time To Live for columns specified in seconds.

• After TTL expires, column is marked with a Tombstone.

INSERT INTO albums_by_performer (performer, year, title, genre)VALUES (‘The Beatles’, 1966, ‘Revolver’, ‘Rock’) USING TTL 30;

@calonso

PRACTICE TIME!We are in the BigData era and therefore we want to measure absolutelyeverything our users do in our portal. Actions will be defined by a type (string)and a receiver (int).

@calonso

PRACTICE TIME!We are in the BigData era and therefore we want to measure absolutelyeverything our users do in our portal. Actions will be defined by a type (string)and a receiver (int).

CREATE TABLE user_action ( user_ID VARCHAR, time TIMESTAMP, type VARCHAR, receiver INT, PRIMARY KEY(user_ID, time) );

@calonso

DELETE• A whole partition:

• DELETE FROM <table> WHERE <partition_key> = value;

• A row:

• DELETE FROM <table> WHERE <primary key> = value;

• A column:

• DELETE <column name> FROM <table> WHERE <primary key> = value;

• Deleted things are marked with a tombstone, not actually removed.

@calonso

TRUNCATE

TRUNCATE albums_by_performer;

@calonso

COUNTERS• Implements distributed counters

• The value can only be updated, never set

• Cannot be part of the PK

• If present on a table, all non counter columns in the same table must be part of the PK

CREATE TABLE ratings_by_track (album_title VARCHAR,album_year INT,track_title VARCHAR,num_ratings COUNTER,sum_ratings COUNTERPRIMARY KEY (album_title, album_year, track_title)

);

@calonso

COUNTERS

• Performance considerations

• Update requires a read before

• Accuracy considerations

• Counter update is not idempotent, so retrying false failures leads to wrong value.

@calonso

COUNTERS

• No INSERT

• No value set, only update.

CREATE TABLE stats ( performer VARCHAR albums COUNTER, concerts COUNTER, PRIMARY KEY (performer));

UPDATE stats SET albums = albums + 1, concerts = concerts + 10WHERE performer = ‘The Beatles’;

@calonso

PRACTICE TIME!

We need to keep track of the number of times a specific book has been readby a specific user.

@calonso

PRACTICE TIME!

We need to keep track of the number of times a specific book has been readby a specific user.

CREATE TABLE books_read_by_user ( book_uid UUID, user_ID VARCHAR, times COUNTER, PRIMARY_KEY(book_uid, user_ID));

@calonso

COLLECTIONS

• Set: Uniqueness

• email_addresses SET<VARCHAR>

• List: Order

• email_addresses LIST<VARCHAR>

• Map: Key-Value pairs

• email_addresses MAP<VARCHAR, VARCHAR>

Our users can have several email addresses…

@calonso

SETS• Insert:

• INSERT INTO band (name, members) VALUES (‘The Beatles’, {‘John’, ’Paul’, ‘George’});

• Union (duplicates deletion managed transparently):

• UPDATE band SET members = members + {‘John’, ’Ringo’} WHERE name = ‘The Beatles’;

• Difference:

• UPDATE band SET members = members - {‘Ringo’} WHERE name = ‘The Beatles’;

• Deletion:

• DELETE members FROM band WHERE name = ‘The Beatles’;

@calonso

LISTS

• Insert:

• INSERT INTO song (name, songwriters) VALUES (‘Hold your hand’, [‘John’, ’Paul’]);

• Append:

• UPDATE song SET songwriters = songwriters +[‘Paul’] WHERE name = …;

CREATE TABLE song ( name VARCHAR songwriters LIST<VARCHAR>, PRIMARY KEY (name));

@calonso

LISTS• Prepend:

• UPDATE song SET songwriters = [‘Paul’] + songwriters WHERE name = …;;

• Update:

• UPDATE song SET songwriters[1] = ‘Jonathan’ WHERE name = …;

• Subtract

• UPDATE song SET songwriters = songwriters - [‘Jonathan’] WHERE name = …;

• Delete

• DELETE songwriters[0] FROM song WHERE name = …;

@calonso

MAPS• Insert:

• INSERT INTO album (title, tracks) VALUES (‘Revolver’, { 1: ’Taxman’, 2: ‘Eleanor’});

• Update:

• UPDATE album SET tracks[3] = ‘Yellow Submarine’ WHERE title = …;

• Delete:

• DELETE tracks[3] FROM album WHERE title = …;

CREATE TABLE album ( title VARCHAR,tracks MAP<INT, VARCHAR>,

PRIMARY KEY (title));

@calonso

PRACTICE TIME!

Our users can define a set of preferences in the portal: TimeZone, Language and Currency

@calonso

PRACTICE TIME!

Our users can define a set of preferences in the portal: TimeZone, Language and Currency

ALTER TABLE users ADD preferences MAP<VARCHAR, VARCHAR>;

@calonso

USER DEFINED TYPES

CREATE TABLE track_ratings_by_user (user UUID,activity TIMEUUID,rating INT,song FROZEN <track>,PRIMARY KEY (user, activity)

) WITH CLUSTERING ORDER BY (activity DESC);

CREATE TYPE track (album_title VARCHAR,album_year INT,track_title VARCHAR

);

FROZEN: the value has to be fully written, cannot update a single field (i.e: album_year)

@calonso

USER DEFINED TYPESCREATE TABLE track_ratings_by_user (user UUID,activity TIMEUUID,rating INT,song FROZEN <track>,PRIMARY KEY (user, activity)

) WITH CLUSTERING ORDER BY (activity DESC);

CREATE TYPE track (album_title VARCHAR,album_year INT,track_title VARCHAR

);

INSERT INTO track_ratings_by_user (user, activity, rating, song) VALUES (6ed4f220…, now(), 10, { album_title: ‘Let it be’, album_year: 1970, track_title: ‘Let it be’ });

@calonso

USER DEFINED TYPES

• Update:

• UPDATE track_ratings_by_user SET song = { album_title: ‘Let it be’, album_year: 1970, track_title: ‘Two of us’} WHERE user = … AND activity = …;

• Delete:

• DELETE song FROM track_ratings_by_user WHERE user = … AND activity = …;

@calonso

TUPLESCREATE TABLE user (id UUID PRIMARY KEY,email TEXT,name TEXT,preferences SET<TEXT>,equalizer FROZEN<TUPLE<FLOAT, FLOAT, FLOAT, INT, VARCHAR>>

);

INSERT INTO user (id, equalizer) VALUES (6ed4f220…, (3.0, 1.1, 5.1, 3, “Pop-Rock”));

@calonso

PRACTICE TIME!

Our users can have an e-reader, defined by brand and model.

CREATE TYPE e_reader (brand VARCHAR,model VARCHAR

);

@calonso

PRACTICE TIME!

Our users can have an e-reader, defined by brand and model.

CREATE TYPE e_reader (brand VARCHAR,model VARCHAR

);

ALTER TABLE users ADD reader FROZEN <e_reader>;

BATCHGrouping and atomising queries.

@calonso

BATCH• Combines multiple INSERT, UPDATE and DELETE operations into a

single logical operation:

• Saves on client - coordinator communication

• Atomic: if one succeeds, all will

• No isolation: other transactions can read/write data affected by partial batch.

@calonso

BATCH

• All modified cells will share same timestamp, so when read, will look as atomic => No order guarantee!!

• Don’t use BATCHES with operations on the same PK.

BEGIN BATCH DELETE FROM albums WHERE name = ‘Let it be’; INSERT INTO albums WHERE name = ‘Let it be’;APPLY BATCH;

@calonso

BATCH + LWT

• The whole BATCH will only run if conditions for all LWT are met.

• All operations in the BATCH will run sequentially.

BEGIN BATCH UPDATE user SET lock = true IF lock = false; DELETE FROM albums WHERE name = ‘Let it be’; INSERT INTO albums WHERE name = ‘Let it be’; UPDATE user SET lock = false;APPLY BATCH;

@calonso

ROLLBACK

• Not necessary

• RDBMS cannot know, at the beginning of a transaction, if all queries will be able to succeed

• Cassandra can, so if they won’t doesn’t even start

SELECTSearching for data

@calonso

SELECT• All rows:

• SELECT * FROM album;

• Specific columns:

• SELECT performer, title, year FROM album;

• Specific field from a UDT:

• SELECT performer.lastname FROM album;

• Count:

• SELECT COUNT(*) FROM album;

@calonso

WHERE• Equality matches:

• SELECT * FROM tracks_by_album WHERE album_title = ‘Revolver’ AND year = 1966;

• SELECT * FROM tracks_by_album WHERE album_title = ‘Revolver’ AND year = 1966 AND number = 6;

• IN:

• Only applicable in the last WHERE clause

• SELECT * FROM tracks_by_album WHERE album_title = ‘Revolver’ AND year = 1966 AND number IN (2, 3, 4);

@calonso

WHERE• Range search:

• Only on clustering columns.

• SELECT * FROM tracks_by_album WHERE album_title = ‘Revolver’ AND year = 1966 AND number >= 6 AND number < 2;

• ALLOW FILTERING:

• Allows scanning through all partitions => potentially very time consuming

• SELECT * FROM tracks_by_album WHERE number = 2 ALLOW FILTERING;

@calonso

DATA MODELLINGProcesses and good practices to design our schema.

@calonso

DATA MODELLING

• Understand your data

• Decide how you’ll query the data

• Define column families to satisfy those queries

• Implement and optimize

@calonso

DATA MODELLINGConceptual

Model

Logical Model

Physical Model

Query-DrivenMethodology

Analysis &Validation

@calonso

DATA MODELLINGE-R

Diagram

ChebotkoDiagram

Physical-levelChebotko Diagram

Query-DrivenMethodology

Analysis &Validation

@calonso

CONCEPTUAL MODEL

@calonso

QUERY DRIVEN METHODOLOGY• Spread data evenly around the cluster

• Minimize the number of partitions read

• Follow the mapping rules:

• Entities and relationships: map to tables

• Equality search attributes: must be at the beginning of the primary key

• Inequality search attributes: become clustering columns

• Ordering attributes: become clustering columns

• Key attributes: map to primary key columns

@calonsoLOGICAL MODEL

@calonso

ANALYSIS & VALIDATION• Are write conflicts (overwrites) possible?

• How large are partitions?

• Ncells = Nrow X ( Ncols – Npk – Nstatic ) + Nstatic < 1M

• How much data duplication? (batches)

• Client side joins or new table?

@calonsoPHYSICAL MODEL

@calonso

REVIEW QUESTIONS

What is the relationship between a column family and a CQL table?

@calonso

REVIEW QUESTIONS

What is the relationship between a column family and a CQL table?

Terminologically they’re the same, but technically a column family refers to the physical representation while table refers to the logical tabular representation when queried from CQL.

@calonso

REVIEW QUESTIONS

How are clustering columns ordered by default? How can we modify it?

@calonso

REVIEW QUESTIONS

How are clustering columns ordered by default? How can we modify it?

Ascending by default.We can modify it by adding WITH CLUSTERING ORDER BY… in CQL table definition.

@calonso

REVIEW QUESTIONS

Which is the biggest reason for using UUIDs in Cassandra?

@calonso

REVIEW QUESTIONS

Which is the biggest reason for using UUIDs in Cassandra?

Distributed uniqueness. UUIDs guarantee almost 100% uniqueness in distributed systems.

@calonso

REVIEW QUESTIONS

What is the difference between an UUID and a TIMEUUID?

@calonso

REVIEW QUESTIONS

What is the difference between an UUID and a TIMEUUID?

TIMEUUID contains date and time information embedded.

@calonso

REVIEW QUESTIONS

When should secondary indexes be used?

@calonso

REVIEW QUESTIONS

When should secondary indexes be used?

Very rarely. Only when it’s holding values with very low cardinality and a lookup table is truly inconvenient.

@calonso

REVIEW QUESTIONS

Are CQL COUNTERS 100% accurate?

@calonso

REVIEW QUESTIONS

Are CQL COUNTERS 100% accurate?

No, not 100%, because its update operations are not idempotent and a wrong will assign a wrong value.

@calonso

REVIEW QUESTIONS

What does it mean that Cassandra does UPSERTs?

@calonso

REVIEW QUESTIONS

What does it mean that Cassandra does UPSERTs?

That the INSERT and UPDATE operation are exactly equivalent.

@calonso

REVIEW QUESTIONS

What predicates are allowed in a CQL query?

@calonso

REVIEW QUESTIONS

What predicates are allowed in a CQL query?

Equality, Inequality and IN

@calonso

REVIEW QUESTIONS

When should the ALLOW FILTERING clause be used?

@calonso

REVIEW QUESTIONS

When should the ALLOW FILTERING clause be used?

Typically never. Only in development to scan through all your data.

@calonso

REVIEW QUESTIONS

How can data from two tables be combined in a CQL query?

@calonso

REVIEW QUESTIONS

How can data from two tables be combined in a CQL query?

Cassandra doesn’t support JOIN statements, so we can:• Nest dependent data in the same table.• JOIN at application level.

@calonso

REVIEW QUESTIONS

How can data from two tables be combined in a CQL query?

@calonso

REVIEW QUESTIONS

How can data from two tables be combined in a CQL query?

Cassandra doesn’t support JOIN statements, so we can:• Nest dependent data in the same table.• JOIN at application level.

@calonso

REVIEW QUESTIONS

What is the purpose of Chebotko Diagrams?

@calonso

REVIEW QUESTIONS

What is the purpose of Chebotko Diagrams?

Capture our entities and properties as tables along with the query access patterns expected on them.

@calonso

REVIEW QUESTIONS

Which is the most important thing to keep in mind when designing our data models?

@calonso

REVIEW QUESTIONS

Which is the most important thing to keep in mind when designing our data models?

Minimize the number of accessed partitions.

@calonso

MORE CASSANDRA CONCEPTSWrite and read paths and compactions.

WRITE PATHThe writing process

@calonso

WRITE PATHRDBMS CASSANDRA

@calonso

WRITE PATH

• Memtable: in-memory tables corresponding to CQL tables.

• CommitLog: append-only log to make writes durable.

• SSTables: Memtable snapshots periodically flushed to disk. Never updated.

• Compaction: Periodic process to merge and streamline SSTables.

@calonso

FLUSH PROCESS• Dumps a Memtable to a new SSTable on disk and its summary index.

• Marks associated commit log entries as flushed

• Triggered by:

• memtable_total_space_in_mb reached

• commitlog_total_space_in_mb reached

• nodetool flush

READ PATHThe reading process

@calonso

READ PATH• Memtable: in memory table. Serves data as part of the merge process

• RowCache: in memory cache. Stores recently read columns

• BloomFilter : predicts wether a partition key may be in its corresponding SSTable

• KeyCaches: maps recently read partition keys to specific SSTable offsets

• Partition summaries: indexes the partition indexes.

• Partition indexes: Sorted partition keys mapped to their SSTables offsets

• SSTables: static files containing data.

@calonso

READ/WRITE STATS

nodetool cfstats <keyspace>.<column family>

COMPACTIONSStreamlining tables in disk

@calonso

DELETES

• When a column is deleted a Tombstone is applied to the column in its Memtable

• Tombstoned read columns are ignored

• Tombstoned columns are around for gc_grace_seconds time.

• gc_grace_seconds time is configurable, but beware “Zombies”

@calonso

COMPACTIONS• Merges most recent partition keys and columns

• Evicts tombstoned columns

• Creates new SSTable

• Rebuilds partition indexes and summaries

• Deletes old SSTables

@calonso

COMPACTIONS

• SizeTieredCompactionStrategy

• LeveledCompactionStrategy

• DateTieredCompactionStrategy

CREATE TABLE user (id UUID PRIMARY KEY,email TEXT,name TEXT,preferences SET<TEXT>,

) WITH COMPACTION = { “class”: “<strategy>”, <params> };

@calonso

SIZE TIERED COMPACTION

@calonso

SIZE TIERED COMPACTION• Fast to complete

• Tables size endlessly increasing

• Potentially inconsistent read latency for updated data

• May waste disk as we don’t know when deleted data will be merged away

• Requires 2x free disk space as largest table

• Recommended for write-once, read-many use cases

@calonso

LEVELED COMPACTION

@calonso

LEVELED COMPACTION

• Continuously compacting (more I/O)

• 10 x stable_size_in_mb (160Mb) as max required disk space

• Ensures low read latency

• Recommended with overwrites (updates) and tombstones

@calonso

DATETIERED COMPACTION

• Compacts together data that was written near in time

• Recommended for time series

@calonso

REVIEW QUESTIONS

What happens when a Memtable is flushed?

@calonso

REVIEW QUESTIONS

What happens when a Memtable is flushed?

We create a new SSTable on disk. Also the corresponding CommitLog entries are marked as flushed.

@calonso

REVIEW QUESTIONS

What causes a Memtable to flush?

@calonso

REVIEW QUESTIONS

What causes a Memtable to flush?

• memtable_total_space_in_mb reached• commitlog_total_space_in_mb reached• nodetool flush manually executed

@calonso

REVIEW QUESTIONS

Do disk seeks happen during writes?

@calonso

REVIEW QUESTIONS

Do disk seeks happen during writes?

No, during writes we only write to the commit log that is an append-ahead log type. That means that writes happen

sequentially on disk.

@calonso

REVIEW QUESTIONS

What benefit do Bloom Filters provide to the read process?

@calonso

REVIEW QUESTIONS

What benefit do Bloom Filters provide to the read process?

It allows to skip reading SSTables that do not have the data we’re looking for.

@calonso

REVIEW QUESTIONS

Is the partition summary read for partition keys found in the key cache?

@calonso

REVIEW QUESTIONS

Is the partition summary read for partition keys found in the key cache?

No. The key cache allows us to skip the partition summary and partition index and go straight to the SSTable.

@calonso

REVIEW QUESTIONS

What is the relationship between the partition summary and the partition index?

@calonso

REVIEW QUESTIONS

What is the relationship between the partition summary and the partition index?

The partition summary is an index on the partition index.

@calonso

REVIEW QUESTIONS

What are zombie columns and how do you prevent them?

@calonso

REVIEW QUESTIONS

What are zombie columns and how do you prevent them?

Zombie columns are those that appear after bringing up a node that has been down for long enough to not see the tombstone

(gc_grace_seconds).

@calonso

REVIEW QUESTIONS

What are the benefits of SizeTieredCompaction?

@calonso

REVIEW QUESTIONS

What are the benefits of SizeTieredCompaction?

• Enable fast write operations• Less disk I/O pressure

@calonso

REVIEW QUESTIONS

What are the benefits of LeveledCompaction?

@calonso

REVIEW QUESTIONS

What are the benefits of LeveledCompaction?

• Predictable fast read performance• Not necessary to have a lot of free disk space for it to happen.

@calonso

HARDWARE CONSIDERATIONS

@calonso

MEMORY• Memory helps reads

• Recommendations

• Dedicated machines: 16GB - 64GB. Never below 8GB

• Virtual machines: 8GB - 16GB. Never below 4GB

• Testing machines: Virtual machines ~ 256Mb

@calonso

CPU

• CPU helps writes

• Recommendations

• Dedicated machines: 8 core processors

• Virtual machines: 8 cores + CPU burst

@calonso

DISK• SizeTieredCompaction: 50% free disk space

• LeveledCompaction: 10% free disk space

• Recommendations

• 500gb to 1tb per node

• Two drives: One for data, one for CommitLog

• SSDs if possible

@calonso

NETWORK

• Gigabit ethernet or faster

@calonso

THANK YOU!