Pythian: My First 100 days with a Cassandra Cluster
-
Upload
datastax-academy -
Category
Technology
-
view
1.129 -
download
7
Transcript of Pythian: My First 100 days with a Cassandra Cluster
My First 100 days with a Cassandra Cluster
Presented by : Gustavo René Antúnez DBA Team Lead Carlos Rolo Cassandra MVP September, 2015
2
Welcome to Cassandra Summit 2015
• 18 Years of Data infrastructure management consulting
• 200+ Top brands• 6000+ databases under
management• Over 400 DBA’s, in 35 countries • Top 5% of DBA work force, 9
Oracle ACE’s, 2 Microsoft MVP’s, 1 Cassandra MVP
• Oracle, Microsoft, MySQL, Datastax partners, Netezza, Hadoop and MongoDB plus UNIX Sysadmin and Oracle apps
About Pythian
Where does René come from–Oracle DBA
• Started with Version 9.2 in 2004 – Speaker at Oracle Open World, Developers Day and Collaborate
– APress Q1 2016: “Prac%cal Data Refresh”
–Movie Fanatic & Music Lover –Bringing the best from México (Mexihtli) to the rest of the world and in the process photographing it :)
– rene-‐ace.com –@rene_ace
4
Where does Carlos come
5
• Cassandra Consultant • First contact was 0.8 • Cassandra MVP & DataStax Certified Architect
• Lisbon Cassandra Meetup • Passion for distributed systems • Loves a good challenge • Waterpolo is my sport • @cjrolo
How did you get to be a DBA
6
6th Happiest Job of 2015!
7
http://www.forbes.com/sites/susanadams/2014/03/20/the-happiest-and-unhappiest-jobs-in-2014/
Work-life balance
Relationship with boss and co-workers
Daily tasksJob resources
Field will grow by 15% between
2012 and 2022
DBA can be the key driver of
success
Happiest Job of 2034?
Oxford University: THE FUTURE OF EMPLOYMENT: HOW SUSCEPTIBLE ARE JOBS TO COMPUTERISATION?
• 47 percent of American jobs are at high risk of being taken by computers within the next two decades.
– 1st Wave • Computers will start replacing people in especially vulnerable fields like transportation/logistics, production labor, and administrative support.
– 2nd Wave • Dependent upon the development of good artificial intelligence. This could next put jobs in management, science and engineering, and the arts at risk.
8
What is Cassandra ?• NoSQL database, developed in JavaOne • Fully distributed DB
• Meaning that there is no master DB, unlike Oracle or MySQL.
• Linearly scalable • Based on 2 core technologies, Google’s Big Table and Amazon’s Dynamo
• 2 versions of Cassandra • Community Edition.-‐ This is distributed under the Apache™ License
• Enterprise Edition .-‐ This is distributed by Datastax
9
≠
CAP Theorem
• In a distributed system you can only have two out of the following three guarantees across a write/read pair:
• Consistency.-‐ A read is guaranteed to return the most recent write for a given client.
• Availability.-‐A non-‐failing node will return a reasonable response within a reasonable amount of time (no error or timeout).
• Partition Tolerance.-‐The system will continue to function when network partitions occur.
10
N1 N2
X X
N1 N2
N1 N2
What is Cassandra ?
What is Cassandra ?
• Cassandra is a BASE (Basically Available, Soft state, Eventually consistent) type system
11
• Not an ACID (Atomicity, Consistency, Isolation, Durability) type system
It Can be as easy as …
• Start your machine and install the following: • ntp (Packages are normally ntp, ntpdata and ntp-‐doc)
• wget (Unless you have your packages copied over via other means)
• vim (Or your favorite text editor) • Yum Package Management • Root or sudo access to the install machine • Latest version of Oracle Java SE Runtime Environment (JRE) 8 (recommended) or OpenJDK 7.
• Python 2.6+ (needed if installing OpsCenter)
12
It Can be as easy as …
13
• Install Cassandra. ~$ sudo yum install dsc21-2.1.5-1 cassandra2.1.5-1
• Install optional utilities. ~$ sudo yum install cassandra21-tools-2.1.5-1
• Start Cassandra service ~$ sudo service cassandra stop
~$ sudo rm -rf /var/lib/cassandra/data/system/*
• In the cassandra-‐rackdc.properties file # indicate the rack and dc for this node dc=Pythian rack=RAC1
~$ sudo service cassandra start
Where is everything in Cassandra?
14
Directories Description/var/lib/cassandra Data directories/var/log/ cassandra Log directory/var/run/ cassandra Runtime files/usr/share/ cassandra Environment settings/usr/share/ cassandra/lib
JAR files/usr/bin Optional utilities, such as sstablelevelreset,
sstablerepairedset, and sstablesplit/usr/bin Binary files/usr/sbin/etc/cassandra Configuration files/etc/init.d Service startup script/etc/security/ limits.d Cassandra user limits/etc/default/usr/share/ doc/cassandra/examples
Sample cassandra.yaml files for stress testing
I come from this world…
12c Version Architecture…
15
I come from this world…Oracle…
16
101010
Online Redo Log10100
Data Files Control Files
Segment
Database
Tablespace
Extent
Oracle data block
Schema Data file
OS block
Logical Datafile
Physical Datafile
I come from this world…
17
RAC -‐ For Node Point of Failure
RAC Cluster
Node3Node2
ASM Disks
Node1
Public Network
Storage NetworkASM Network
CSS Network
ASM ASM ASM
DBB DBBDBB
Global Data Services – Service Failover / Load Balancing
I come from this world…
18
Dataguard -‐ For Failover
Primary
Standby
Far Sync Instance
SYNCASYNC
Zero data loss failover
Cassandra Architecture
Cassandra Cluster
19
N1
Node
N2
Node
Rack 1
Datacenter México
N3
Node
N4
Node
Rack 2
Datacenter Portugal
One Ring to Rule them All
20
• The total amount of data managed by the cluster is represented as a ring
• Each node is assigned a part of the database to hold based on each table’s primary key.
• To guarantee both availability and durability multiple nodes will be assigned to the same data.
• There is no master node all nodes can perform all operations
1
4
3
2
A-F,T-Z,M-S
G-L,A-F,T-Z
M-S,G-L,A-F
T-Z,M-S,G-L
Gossip
21
• Peer-‐to-‐peer communication protocol in which nodes periodically exchange state information
• Runs every second and exchanges state messages with up to three other nodes in the cluster
• Failure detection • It determines locally from gossip state and history if another node in the system is down or has come back up.
Consistent Hashing
22
• A hash consists of one or more arithmetic operations on a piece of data
• Common way of load balancing across several nodes
• Hash function must have a upper and lower bound so objects can be mapped in a circle
• Common Hash algorithms – Simple checksums – Message Digest (MD5) – Secure Hash Algorithm (SHA-‐1/2) – MurmurHash
Partitioners
23
• Determines how data is distributed across the nodes in the cluster
• Function for deriving a token representing a row from its partition key
Cassandra Offers: – Murmur3Partition – RandomPartitioner – ByteOrderedPartitioner
Virtual Nodes
24
• Solution for avoiding calculating node tokens and thinking about the cluster size before hand
• Each node has multiple virtual nodes
• Each node virtual node own a much smaller subset of data
Coordinators
25
• Acts as a proxy between the client application and the nodes that own the data being requestedAny client request can be sent to any node.
Snitch
26
• Is responsible for keeping all of the nodes up to date on what node has what data, what nodes are currently down, what nodes are bootstrapping, etc.
• It Interprets the topology
The most popular are: – Gossiping property file
snitch – EC2 Snitch – EC2 Multi-‐region snitch – Dynamic Snitch
27
Logical database container
28
Data is Stored in Keyspaces
A CASSANDRA TABLE OR COLUMN FAMILY
29
CoordinatorSnitchCommitlog WriterMem table writerMem Table Flush (Sstable writer)ReaderMem tablesBloom Filters
Cassandra NodeCommitLog
10100
SSTables
A CASSANDRA TABLE OR COLUMN FAMILY
30
• Consists of one or more SStables and 0 or more MEMtables
• SStable stands for Sorted String Table. • E.G. all of the Columns in the SStable are sorted in order by key.
• Each SStable consists of the data table, bloom filter, index and some other minor files.
• SStables are immutable. Once written they are never altered only read and eventually deleted
videogames-events-data-jb-1.dbvideogames-events-filters-jb-1.dbvideogames-events-index-jb-1.dbvideogames-events-data-jb-2.dbvideogames-events-filters-jb-2.dbvideogames-events-index-jb-2.dbvideogames-events-data-jb-3.dbvideogames-events-filters-jb-3.dbvideogames-events-index-jb-3.dbvideogames-events-data-jb-4.dbvideogames-events-filters-jb-4.dbvideogames-events-index-jb-4.db
SStables on disk /var/lib/cassandra
REPLICATION FACTOR (RF) AND CONSISTENCY
31
• Replication Factor is the number of copies of columns stored in the ring
• Replication factor should not exceed the number of nodes in the cluster
– RF=1 is one copy this means that the data for each column is stored only once in the ring.
– RF=3 (default) means every column stored in the database is stored three times.
– Quorum .-‐ The read and write must be acked/returned from a quorum of nodes.
REPLICATION FACTOR (RF) AND CONSISTENCY
32
• Consistency – When write or read is
performed the application can choose to wait for n copies of the data to be written or read this is referred to as consistency of n.
– There is a special consistency value called quorum which means a response from RF/2+1 nodes is required.
HOW TO MAKE SURE WE DON’T LOOSE DATA
33
• Three anti-‐entropy mechanisms in Cassandra 1) Hinted handoff 2) Read repair 3) Repair
A.K.A. Anti-‐Entropy
WRITE PATH
34
COMPACTIONS
35
• SStables are immutable. • Deletes and updates are just new
writes • SStables are merged together by
partitioned key.Old obsolete data is discarded.
• Lots of SStables become a few. • Compaction can require a lot of
disk space. DO NOT LET your disks get more than 50% full.
CQL - Cassandra Query Language
36
CQL is not SQL
• Default and primary interface into the Cassandra Database (since 2.0) • Cassandra does not support joins or subqueries • Only way to create users and user based permissions
• Very similar: cqlsh> CREATE KEYSPACE sandbox WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', DC1 : 1}; cqlsh> USE sandbox; cqlsh:sandbox>CREATE TABLE data (id uuid, data text, PRIMARY KEY (id)); cqlsh:sandbox> INSERT INTO data (id, data) values (c37d661d-7e61-49ea-96a5-68c34e83db3a, 'testing'); cqlsh:sandbox> SELECT * FROM data;
37
38
Feature/Function DSE/Cassandra Oracle RDBMS Core architecture “Masterless”; peer-to-peer with
all nodes being the same Traditional standalone
High availability Continuous availability with built in redundancy and hardware rack awareness in both single and multiple data centers
Oracle Dataguard (for failover) and Oracle RAC (Node SPOF) GoldenGate
Data model Google Bigtable Relational/tabular Data consistency model Tunable consistency (CAP
theorem consistency per operation
Traditional ACID
Storage model Targeted directories with separation
Tablespaces
Logical database container
Keyspace Database
Backup/recovery Online, point-in-time restore Online, point-in-time restore
Enterprise management/monitoring
DataStax OpsCenter Oracle Enterprise Manager
LESSONS LEARNED
39
• Understand the Data Model Differences • Hardware Setup does Matter • Grep the logs for errors and warnings • Make sure each node is created properly • Know your tools
• nodetool utility • Cassandra bulk loader (sstableloader) • jconsole/JavaVisualVM • Cassandra-‐Stress • OpsCenter
40
FIT-ACER
• F – Focus (SLOW DOWN! Are you ready?)
• I – Identify server/DB name, time, authorization
• T – Type the command (do not hit enter yet)
• A – Assess the command (SPEND TIME HERE!)
• C – Check the server / database name again
• E – Execute the command
• R – Review and document the results
41
43
To contact us
1-877-PYTHIAN
To follow us
http://www.pythian.com/blog
http://www.facebook.com/pages/The-Pythian-Group/163902527671
@pythian
http://www.linkedin.com/company/pythian
Thank you – Q&A