Continuous Deployment with Cassandra

Continuous Deployment with C*: Treating C* as First-Class Code

Michael Kjellman@mkjellman

Software Engineer, Barracuda Networks

C* At Barracuda• Powers 100% of our Spam and Webfilter Backend• 48 Node Cluster• 2 Datacenters• Requests: 20k writes/sec 30k reads/sec • Latency: 1 ms/write 1.6 ms/read• > 30TB of Data • Almost entirely native protocol/CQL3

Hardware Configuration• 32GB of RAM• 1x SSD• 2x Spinning Disks• 2x 6 Core AMD

Key Configuration Options• key_cache_size_in_mb: 1024• row_cache_size_in_mb: 0• memtable_total_space_in_mb: 2048• HEAP_NEWSIZE = “1200M” (-Xmn)• MAX_HEAP_SIZE = “8G” (-Xmx)• -XX:SurvivorRatio=6

• Sidenote: Java 7u40 is out!

How do I keep my graphs pretty during a C* upgrade?

September 18th 2013

Make a C* Build$> git clone http://git-wip-us.apache.org/repos/asf/cassandra.git$> git checkout –t origin/cassandra-1.2$> git log$> vim build.xml (change version number every time you make a build!)$> ant clean release

Deployment• Make release• Test release with CCM• Push release to Puppet (deals with config, etc)• Run controlled and scripted rolling restart one datacenter at a

time– flush– stop– start– validate node

Automate, Automate, Automate

So, why not just apt-get install cassandra?

• Makes running a custom release in the future a complete nightmare

• Lost visibility into changes in the release• WHY are you upgrading• Treat a C* build just as if it was a release of your

code. What commits did you put into your own release?

MY CODE DOESN’T WORK WITHOUT A STABLE C* CLUSTER

Simply Put:

When things go wrong• Every commit (those by C* committers or my own)

come with potential bugs and regressions• Gossip Bugs Can Bite Hard:– CASSANDRA-5665: Gossiper.handleMajorStateChange

can lose existing node ApplicationState• At 48 nodes, even small mistakes are massive

Writing your code to deal with node failure

• Upgrading a C* cluster means constant node failures for the duration of the rolling restart

• How does your code deal with read latency and retries– CASSANDRA-4705: Eager Retries for reads for 2.0+

• The mythical “constantly failing” code != stability. – Handle exceptions (and node/read failures) gracefully!

Why treat C* like your own code• Using C* will move much of your own

application logic to C*• The bugs have to go somewhere!• Data replication at database layer or at

application layer

QUESTIONS?Thanks for Listening!

Continuous Deployment with Cassandra

Technology

Transcript of Continuous Deployment with Cassandra