Continuous Deployment with Cassandra
-
Upload
planet-cassandra -
Category
Technology
-
view
1.303 -
download
0
description
Transcript of Continuous Deployment with Cassandra
Continuous Deployment with C*: Treating C* as First-Class Code
Michael Kjellman@mkjellman
Software Engineer, Barracuda Networks
C* At Barracuda• Powers 100% of our Spam and Webfilter Backend• 48 Node Cluster• 2 Datacenters• Requests: 20k writes/sec 30k reads/sec • Latency: 1 ms/write 1.6 ms/read• > 30TB of Data • Almost entirely native protocol/CQL3
Hardware Configuration• 32GB of RAM• 1x SSD• 2x Spinning Disks• 2x 6 Core AMD
Key Configuration Options• key_cache_size_in_mb: 1024• row_cache_size_in_mb: 0• memtable_total_space_in_mb: 2048• HEAP_NEWSIZE = “1200M” (-Xmn)• MAX_HEAP_SIZE = “8G” (-Xmx)• -XX:SurvivorRatio=6
• Sidenote: Java 7u40 is out!
How do I keep my graphs pretty during a C* upgrade?
September 18th 2013
Make a C* Build$> git clone http://git-wip-us.apache.org/repos/asf/cassandra.git$> git checkout –t origin/cassandra-1.2$> git log$> vim build.xml (change version number every time you make a build!)$> ant clean release
Deployment• Make release• Test release with CCM• Push release to Puppet (deals with config, etc)• Run controlled and scripted rolling restart one datacenter at a
time– flush– stop– start– validate node
Automate, Automate, Automate
So, why not just apt-get install cassandra?
• Makes running a custom release in the future a complete nightmare
• Lost visibility into changes in the release• WHY are you upgrading• Treat a C* build just as if it was a release of your
code. What commits did you put into your own release?
MY CODE DOESN’T WORK WITHOUT A STABLE C* CLUSTER
Simply Put:
When things go wrong• Every commit (those by C* committers or my own)
come with potential bugs and regressions• Gossip Bugs Can Bite Hard:– CASSANDRA-5665: Gossiper.handleMajorStateChange
can lose existing node ApplicationState• At 48 nodes, even small mistakes are massive
Writing your code to deal with node failure
• Upgrading a C* cluster means constant node failures for the duration of the rolling restart
• How does your code deal with read latency and retries– CASSANDRA-4705: Eager Retries for reads for 2.0+
• The mythical “constantly failing” code != stability. – Handle exceptions (and node/read failures) gracefully!
Why treat C* like your own code• Using C* will move much of your own
application logic to C*• The bugs have to go somewhere!• Data replication at database layer or at
application layer
QUESTIONS?Thanks for Listening!