Pythian: My First 100 days with a Cassandra Cluster

My First 100 days with a Cassandra Cluster

Presented by : Gustavo René Antúnez DBA Team Lead Carlos Rolo Cassandra MVP September, 2015

2

Welcome to Cassandra Summit 2015

• 18 Years of Data infrastructure management consulting

• 200+ Top brands• 6000+ databases under

management• Over 400 DBA’s, in 35 countries • Top 5% of DBA work force, 9

Oracle ACE’s, 2 Microsoft MVP’s, 1 Cassandra MVP

• Oracle, Microsoft, MySQL, Datastax partners, Netezza, Hadoop and MongoDB plus UNIX Sysadmin and Oracle apps

About Pythian

Where does René come from–Oracle DBA

• Started with Version 9.2 in 2004 – Speaker at Oracle Open World, Developers Day and Collaborate

– APress Q1 2016: “Prac%cal Data Refresh”

–Movie Fanatic & Music Lover –Bringing the best from México (Mexihtli) to the rest of the world and in the process photographing it :)

– rene-‐ace.com –@rene_ace

4

Where does Carlos come

5

• Cassandra Consultant • First contact was 0.8 • Cassandra MVP & DataStax Certified Architect

• Lisbon Cassandra Meetup • Passion for distributed systems • Loves a good challenge • Waterpolo is my sport • @cjrolo

How did you get to be a DBA

6

6th Happiest Job of 2015!

7

http://www.forbes.com/sites/susanadams/2014/03/20/the-happiest-and-unhappiest-jobs-in-2014/

Work-life balance

Relationship with boss and co-workers

Daily tasksJob resources

Field will grow by 15% between

2012 and 2022

DBA can be the key driver of

success

Happiest Job of 2034?

Oxford University: THE FUTURE OF EMPLOYMENT: HOW SUSCEPTIBLE ARE JOBS TO COMPUTERISATION?

• 47 percent of American jobs are at high risk of being taken by computers within the next two decades.

– 1st Wave • Computers will start replacing people in especially vulnerable fields like transportation/logistics, production labor, and administrative support.

– 2nd Wave • Dependent upon the development of good artificial intelligence. This could next put jobs in management, science and engineering, and the arts at risk.

8

What is Cassandra ?• NoSQL database, developed in JavaOne • Fully distributed DB

• Meaning that there is no master DB, unlike Oracle or MySQL.

• Linearly scalable • Based on 2 core technologies, Google’s Big Table and Amazon’s Dynamo

• 2 versions of Cassandra • Community Edition.-‐ This is distributed under the Apache™ License

• Enterprise Edition .-‐ This is distributed by Datastax

9

≠

CAP Theorem

• In a distributed system you can only have two out of the following three guarantees across a write/read pair:

• Consistency.-‐ A read is guaranteed to return the most recent write for a given client.

• Availability.-‐A non-‐failing node will return a reasonable response within a reasonable amount of time (no error or timeout).

• Partition Tolerance.-‐The system will continue to function when network partitions occur.

10

N1 N2

X X

N1 N2

N1 N2

What is Cassandra ?

What is Cassandra ?

• Cassandra is a BASE (Basically Available, Soft state, Eventually consistent) type system

11

• Not an ACID (Atomicity, Consistency, Isolation, Durability) type system

It Can be as easy as …

• Start your machine and install the following: • ntp (Packages are normally ntp, ntpdata and ntp-‐doc)

• wget (Unless you have your packages copied over via other means)

• vim (Or your favorite text editor) • Yum Package Management • Root or sudo access to the install machine • Latest version of Oracle Java SE Runtime Environment (JRE) 8 (recommended) or OpenJDK 7.

• Python 2.6+ (needed if installing OpsCenter)

12

It Can be as easy as …

13

• Install Cassandra. ~$ sudo yum install dsc21-2.1.5-1 cassandra2.1.5-1

• Install optional utilities. ~$ sudo yum install cassandra21-tools-2.1.5-1

• Start Cassandra service ~$ sudo service cassandra stop

~$ sudo rm -rf /var/lib/cassandra/data/system/*

• In the cassandra-‐rackdc.properties file # indicate the rack and dc for this node dc=Pythian rack=RAC1

~$ sudo service cassandra start

Where is everything in Cassandra?

14

Directories Description/var/lib/cassandra Data directories/var/log/ cassandra Log directory/var/run/ cassandra Runtime files/usr/share/ cassandra Environment settings/usr/share/ cassandra/lib

JAR files/usr/bin Optional utilities, such as sstablelevelreset,

sstablerepairedset, and sstablesplit/usr/bin Binary files/usr/sbin/etc/cassandra Configuration files/etc/init.d Service startup script/etc/security/ limits.d Cassandra user limits/etc/default/usr/share/ doc/cassandra/examples

Sample cassandra.yaml files for stress testing

I come from this world…

12c Version Architecture…

15

I come from this world…Oracle…

16

101010

Online Redo Log10100

Data Files Control Files

Segment

Database

Tablespace

Extent

Oracle data block

Schema Data file

OS block

Logical Datafile

Physical Datafile


17

RAC -‐ For Node Point of Failure

RAC Cluster

Node3Node2

ASM Disks

Node1

Public Network

Storage NetworkASM Network

CSS Network

ASM ASM ASM

DBB DBBDBB

Global Data Services – Service Failover / Load Balancing


18

Dataguard -‐ For Failover

Primary

Standby

Far Sync Instance

SYNCASYNC

Zero data loss failover

Cassandra Architecture

Cassandra Cluster

19

N1

Node

N2

Node

Rack 1

Datacenter México

N3

Node

N4

Node

Rack 2

Datacenter Portugal

One Ring to Rule them All

20

• The total amount of data managed by the cluster is represented as a ring

• Each node is assigned a part of the database to hold based on each table’s primary key.

• To guarantee both availability and durability multiple nodes will be assigned to the same data.

• There is no master node all nodes can perform all operations

1

4

3

2

A-F,T-Z,M-S

G-L,A-F,T-Z

M-S,G-L,A-F

T-Z,M-S,G-L

Gossip

21

• Peer-‐to-‐peer communication protocol in which nodes periodically exchange state information

• Runs every second and exchanges state messages with up to three other nodes in the cluster

• Failure detection • It determines locally from gossip state and history if another node in the system is down or has come back up.

Consistent Hashing

22

• A hash consists of one or more arithmetic operations on a piece of data

• Common way of load balancing across several nodes

• Hash function must have a upper and lower bound so objects can be mapped in a circle

• Common Hash algorithms – Simple checksums – Message Digest (MD5) – Secure Hash Algorithm (SHA-‐1/2) – MurmurHash

Partitioners

23

• Determines how data is distributed across the nodes in the cluster

• Function for deriving a token representing a row from its partition key

Cassandra Offers: – Murmur3Partition – RandomPartitioner – ByteOrderedPartitioner

Virtual Nodes

24

• Solution for avoiding calculating node tokens and thinking about the cluster size before hand

• Each node has multiple virtual nodes

• Each node virtual node own a much smaller subset of data

Coordinators

25

• Acts as a proxy between the client application and the nodes that own the data being requestedAny client request can be sent to any node.

Snitch

26

• Is responsible for keeping all of the nodes up to date on what node has what data, what nodes are currently down, what nodes are bootstrapping, etc.

• It Interprets the topology

The most popular are: – Gossiping property file

snitch – EC2 Snitch – EC2 Multi-‐region snitch – Dynamic Snitch

Logical database container

28

Data is Stored in Keyspaces

A CASSANDRA TABLE OR COLUMN FAMILY

29

CoordinatorSnitchCommitlog WriterMem table writerMem Table Flush (Sstable writer)ReaderMem tablesBloom Filters

Cassandra NodeCommitLog

10100

SSTables

A CASSANDRA TABLE OR COLUMN FAMILY

30

• Consists of one or more SStables and 0 or more MEMtables

• SStable stands for Sorted String Table. • E.G. all of the Columns in the SStable are sorted in order by key.

• Each SStable consists of the data table, bloom filter, index and some other minor files.

• SStables are immutable. Once written they are never altered only read and eventually deleted

videogames-events-data-jb-1.dbvideogames-events-filters-jb-1.dbvideogames-events-index-jb-1.dbvideogames-events-data-jb-2.dbvideogames-events-filters-jb-2.dbvideogames-events-index-jb-2.dbvideogames-events-data-jb-3.dbvideogames-events-filters-jb-3.dbvideogames-events-index-jb-3.dbvideogames-events-data-jb-4.dbvideogames-events-filters-jb-4.dbvideogames-events-index-jb-4.db

SStables on disk /var/lib/cassandra

REPLICATION FACTOR (RF) AND CONSISTENCY

31

• Replication Factor is the number of copies of columns stored in the ring

• Replication factor should not exceed the number of nodes in the cluster

– RF=1 is one copy this means that the data for each column is stored only once in the ring.

– RF=3 (default) means every column stored in the database is stored three times.

– Quorum .-‐ The read and write must be acked/returned from a quorum of nodes.

REPLICATION FACTOR (RF) AND CONSISTENCY

32

• Consistency – When write or read is

performed the application can choose to wait for n copies of the data to be written or read this is referred to as consistency of n.

– There is a special consistency value called quorum which means a response from RF/2+1 nodes is required.

HOW TO MAKE SURE WE DON’T LOOSE DATA

33

• Three anti-‐entropy mechanisms in Cassandra 1) Hinted handoff 2) Read repair 3) Repair

A.K.A. Anti-‐Entropy

WRITE PATH

34

COMPACTIONS

35

• SStables are immutable. • Deletes and updates are just new

writes • SStables are merged together by

partitioned key.Old obsolete data is discarded.

• Lots of SStables become a few. • Compaction can require a lot of

disk space. DO NOT LET your disks get more than 50% full.

CQL - Cassandra Query Language

36

CQL is not SQL

• Default and primary interface into the Cassandra Database (since 2.0) • Cassandra does not support joins or subqueries • Only way to create users and user based permissions

• Very similar: cqlsh> CREATE KEYSPACE sandbox WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', DC1 : 1}; cqlsh> USE sandbox; cqlsh:sandbox>CREATE TABLE data (id uuid, data text, PRIMARY KEY (id)); cqlsh:sandbox> INSERT INTO data (id, data) values (c37d661d-7e61-49ea-96a5-68c34e83db3a, 'testing'); cqlsh:sandbox> SELECT * FROM data;

38

Feature/Function DSE/Cassandra Oracle RDBMS Core architecture “Masterless”; peer-to-peer with

all nodes being the same Traditional standalone

High availability Continuous availability with built in redundancy and hardware rack awareness in both single and multiple data centers

Oracle Dataguard (for failover) and Oracle RAC (Node SPOF) GoldenGate

Data model Google Bigtable Relational/tabular Data consistency model Tunable consistency (CAP

theorem consistency per operation

Traditional ACID

Storage model Targeted directories with separation

Tablespaces

Logical database container

Keyspace Database

Backup/recovery Online, point-in-time restore Online, point-in-time restore

Enterprise management/monitoring

DataStax OpsCenter Oracle Enterprise Manager

LESSONS LEARNED

39

• Understand the Data Model Differences • Hardware Setup does Matter • Grep the logs for errors and warnings • Make sure each node is created properly • Know your tools

• nodetool utility • Cassandra bulk loader (sstableloader) • jconsole/JavaVisualVM • Cassandra-‐Stress • OpsCenter

FIT-ACER

• F – Focus (SLOW DOWN! Are you ready?)

• I – Identify server/DB name, time, authorization

• T – Type the command (do not hit enter yet)

• A – Assess the command (SPEND TIME HERE!)

• C – Check the server / database name again

• E – Execute the command

• R – Review and document the results

41

42

rene-ace.com

http://rend-ace.com

43

To contact us

[email protected]

1-877-PYTHIAN

To follow us

http://www.pythian.com/blog

http://www.facebook.com/pages/The-Pythian-Group/163902527671

@pythian

http://www.linkedin.com/company/pythian

Thank you – Q&A

http://www.linkedin.com/company/pythian

Pythian: My First 100 days with a Cassandra Cluster

Technology

Transcript of Pythian: My First 100 days with a Cassandra Cluster