Cassandra 2.0 and timeseries
-
Upload
patrick-mcfadin -
Category
Technology
-
view
5.468 -
download
2
description
Transcript of Cassandra 2.0 and timeseries
©2013 DataStax Confidential. Do not distribute without consent.
@PatrickMcFadin
Patrick McFadinChief Evangelist/Solution Architect - DataStax
Cassandra 2.0: Intro + Time Series
Friday, October 11, 13
Who I am
2
• Patrick McFadin• Solution Architect at DataStax• Cassandra MVP• User for years• Follow me for more:
I talk about Cassandra and building scalable, resilient apps ALL THE TIME!
@PatrickMcFadin
Dude. Uptime == $$
Friday, October 11, 13
Cassandra - An introduction
Friday, October 11, 13
Cassandra - Intro
• Based on Amazon Dynamo and Google BigTable paper• Shared nothing• Data safe as possible• Predictable scaling
4
Dynamo
BigTable
Friday, October 11, 13
Cassandra - More than one server
• All nodes participate in a cluster• Shared nothing• Add or remove as needed•More capacity? Add a server
5
• Each node owns a token• Tokens denote a range of keys
• 4 nodes? -> Key range/4• Each node owns 1/4 the data
Friday, October 11, 13
Cassandra - Locally Distributed
• Client writes to any node• Node coordinates with others• Data replicated in parallel• Replication factor: How many
copies of your data?• RF = 3 here
6
Each node stores 3/4 of clusters total data.
Friday, October 11, 13
Cassandra - Geographically Distributed
• Client writes local• Data syncs across WAN• Replication Factor per DC
7
Single coordinator
Friday, October 11, 13
Cassandra - Consistency
• Consistency Level (CL)• Client specifies per read or write
8
• ALL = All replicas ack• QUORUM = > 51% of replicas ack• LOCAL_QUORUM = > 51% in local DC ack• ONE = Only one replica acks
Friday, October 11, 13
Cassandra - Transparent to the application
• A single node failure shouldn’t bring failure• Replication Factor + Consistency Level = Success• This example:• RF = 3• CL = QUORUM
9
>51% Ack so we are good!
Friday, October 11, 13
Cassandra Applications - Drivers
• DataStax Drivers for Cassandra• Java• C#• Python•more on the way
10Friday, October 11, 13
Application Example - Layout
• Active-Active• Service based DNS routing
11
Cassandra Replication
Friday, October 11, 13
Application Example - Uptime
12
• Normal server maintenance• Application is unaware
Cassandra Replication
Friday, October 11, 13
Application Example - Failure
13
• Data center failure• Data is safe. Route traffic.
33
Another happy user!
Friday, October 11, 13
Cassandra 2.0 - Big new features
Friday, October 11, 13
Five Years of Cassandra
Jul-09 May-10 Feb-11 Dec-11 Oct-12 Jul-13
0.1 0.3 0.6 0.7 1.0 1.2...
2.0
DSE
Jul-08
Friday, October 11, 13
SELECT * FROM usersWHERE username = ’jbellis’
[empty resultset]
Session 1SELECT * FROM usersWHERE username = ’jbellis’
[empty resultset]
Session 2
Lightweight transactions: the problem
INSERT INTO users (username,password)VALUES (’jbellis’,‘xdg44hh’)
INSERT INTO users (userName,password)VALUES (’jbellis’,‘8dhh43k’)
It’s a Race!
Who wins?
Friday, October 11, 13
Client(locks)
Coordinatorrequest
Replica
internalrequest
Why Locking Doesn’t Work
• Client locks•Write times out• Lock released•Hint is replayed!!
Friday, October 11, 13
Client(locks)
Coordinatorrequest
Replica
internalrequest
X
Why Locking Doesn’t Work
• Client locks•Write times out• Lock released•Hint is replayed!!
Friday, October 11, 13
Client(locks)
Coordinatorrequest
Replica
internalrequest
hint X
Why Locking Doesn’t Work
• Client locks•Write times out• Lock released•Hint is replayed!!
Friday, October 11, 13
Client(locks)
Coordinatorrequest
Replica
internalrequest
hint
timeoutresponse
X
Why Locking Doesn’t Work
• Client locks•Write times out• Lock released•Hint is replayed!!
Friday, October 11, 13
Paxos• Consensus algorithm• All operations are quorum-based• Each replica sends information about unfinished operations to the leader
during prepare• Paxos made Simple
Friday, October 11, 13
LWT: details• 4 round trips vs 1 for normal updates• Paxos state is durable• Immediate consistency with no leader election or failover• ConsistencyLevel.SERIAL• http://www.datastax.com/dev/blog/lightweight-transactions-in-
cassandra-2-0
Friday, October 11, 13
LWT: Use with caution• Great for 1% of your application• Eventual consistency is your friend• http://www.slideshare.net/planetcassandra/c-summit-2013-eventual-consistency-
hopeful-consistency-by-christos-kalantzis
Friday, October 11, 13
UPDATE USERS SET email = ’[email protected]’, ...WHERE username = ’jbellis’IF email = ’[email protected]’;
INSERT INTO USERS (username, email, ...)VALUES (‘jbellis’, ‘[email protected]’, ... )IF NOT EXISTS;
Using LWT
• Don’t overwrite an existing record
• Only update record if condition is met
Friday, October 11, 13
CQL Improvements• Cursors• Large result sets now have ->next() functionality
• Prevents massive result sets OOMing• No more client side hacks with LIMIT
• Warning: Not isolated
Friday, October 11, 13
CQL Improvements• ALTER DROP• Remove a field from a CQL table.
• Conditional schema changes• Only execute if condition met
CREATE KEYSPACE IF NOT EXISTS ksWITH replication = { 'class': 'SimpleStrategy','replication_factor' : 3 };
CREATE TABLE IF NOT EXISTS test (k int PRIMARY KEY);
DROP KEYSPACE IF EXISTS ks;
ALTER TABLE users DROP address3;
Friday, October 11, 13
CQL Improvements• Aliases in SELECT
• Limit and TTL in prepared statements
SELECT event_id, dateOf(created_at) AS creation_date, blobAsText(content) AS content FROM timeline;
event_id | creation_date | content-------------------------+--------------------------+---------------------- 550e8400-e29b-41d4-a716 | 2013-07-26 10:44:33+0200 | Something happened!?
SELECT * FROM myTable LIMIT ?;
UPDATE myTable USING TTL ? SET v = 2 WHERE k = 'foo';
Friday, October 11, 13
Triggers
CREATE TRIGGER <name> ON <table> USING <classname>;
DROP TRIGGER <name> ON [<keyspace>.]<table>;
• Executed on the coordinator before mutation• Takes original mutation and adds any new• Jars deployed per server
Friday, October 11, 13
Trigger implementationclass MyTrigger implements ITrigger{ public Collection<RowMutation> augment(ByteBuffer key, ColumnFamily update) { ... }}
• You have to implement your own ITrigger (for now)• Compile and deploy to each server
Friday, October 11, 13
Experimental!• Relies on internal RowMutation, ColumnFamily classes•Not sandboxed. Be careful!• Expect changes in 2.1
Friday, October 11, 13
Cassandra and Time Series
Friday, October 11, 13
Time Series Taming the beast• Peter Higgs and Francois Englert. Nobel prize for Physics• Theorized the existence of the Higgs boson
• Found using ATLAS
• Data stored in P-BEAST
• Time series running on Cassandra
Friday, October 11, 13
Use Cassandra for time series
Friday, October 11, 13
Use Cassandra for time series
Get a nobel prize
Friday, October 11, 13
Time Series Why• Storage model from BigTable is perfect• One row key and tons of (variable)columns• Single layout on disk
Row Key Column Name Column Name
Column Value Column Value
Friday, October 11, 13
Time Series Example• Storing weather data• One weather station• Temperature measurements every minute
WeatherStation ID 2013-10-09 10:00 AM 2013-10-09 10:00 AM 2013-10-10 11:00 AM
72 Degrees 72 Degrees 65 Degrees
Friday, October 11, 13
Time Series Example• Query data•Weather Station ID = Locality of single node
WeatherStation ID100 2013-10-09 10:00 AM 2013-10-09 10:00 AM 2013-10-10 11:00 AM
72 Degrees 72 Degrees 65 Degrees
Date query weatherStationID = 100 ANDdate = 2013-10-09 10:00 AM
weatherStationID = 100 ANDdate > 2013-10-09 10:00 AM ANDdate < 2013-10-10 11:01 AM
Date Range
OR
Friday, October 11, 13
Time Series How• CQL expresses this well• Data partitioned by weather station ID and time
• Easy to insert data
• Easy to query
CREATE TABLE temperature ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time));
INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:01:00','72F');
SELECT temperature FROM temperature WHERE weatherstation_id='1234ABCD'AND event_time > '2013-04-03 07:01:00'AND event_time < '2013-04-03 07:04:00';
Friday, October 11, 13
Time Series Further partitioning• At every minute you will eventually run out of rows• 2 billion columns per storage row• Data partitioned by weather station ID and time• Use the partition key to split things up
CREATE TABLE temperature_by_day ( weatherstation_id text, date text, event_time timestamp, temperature text, PRIMARY KEY ((weatherstation_id,date),event_time));
Friday, October 11, 13
Time Series Further Partitioning• Still easy to insert
• Still easy to query
INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) VALUES ('1234ABCD','2013-04-03','2013-04-03 07:01:00','72F');
SELECT temperature FROM temperature_by_day WHERE weatherstation_id='1234ABCD' AND date='2013-04-03'AND event_time > '2013-04-03 07:01:00'AND event_time < '2013-04-03 07:04:00';
Friday, October 11, 13
Time Series Use cases• Logging• Thing Tracking (IoT)• Sensor Data• User Tracking• Fraud Detection•Nobel prizes!
Friday, October 11, 13
Thank you!
Apache Cassandra 2.0 - Data model on fire
Next talk in my data model series!
Friday, October 11, 13
©2013 DataStax Confidential. Do not distribute without consent. 39Friday, October 11, 13