Time series with apache cassandra strata

17
©2013 DataStax Confidential. Do not distribute without consent. @PatrickMcFadin Patrick McFadin Chief Evangelist Time Series with Apache Cassandra 1

description

This talk is geared around understanding the basics of how Apache Cassandra stores and access time series data.

Transcript of Time series with apache cassandra strata

Page 1: Time series with apache cassandra   strata

©2013 DataStax Confidential. Do not distribute without consent.

@PatrickMcFadin

Patrick McFadinChief Evangelist

Time Series with Apache Cassandra

�1

Page 2: Time series with apache cassandra   strata

Quick intro to Cassandra• Shared nothing •Masterless peer-to-peer • Based on Dynamo

Page 3: Time series with apache cassandra   strata

Scaling• Add nodes to scale •Millions Ops/s Cassandra HBase Redis MySQL

THRO

UG

HPU

T O

PS/S

EC)

Page 4: Time series with apache cassandra   strata

Uptime• Built to replicate • Resilient to failure • Always on

NONE

Page 5: Time series with apache cassandra   strata

Easy to use• CQL is a familiar syntax • Friendly to programmers • Paxos for locking

CREATE TABLE users (! username varchar,! firstname varchar,! lastname varchar,! email list<varchar>,! password varchar,! created_date timestamp,! PRIMARY KEY (username)!);

INSERT INTO users (username, firstname, lastname, ! email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');!

INSERT INTO users (username, firstname, ! lastname, email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],! 'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00')!IF NOT EXISTS;

Page 6: Time series with apache cassandra   strata

Time series in production• It’s all about “What’s happening” • Data is the new currency

“Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of financial data, ingesting into its database 2million pieces of information a second from every major trading exchange.”*

* http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html

Page 7: Time series with apache cassandra   strata

Why Cassandra for Time Series

ScalesResilientGood data modelEfficient Storage Model

What about that?

Page 8: Time series with apache cassandra   strata

Data Model•Weather Station Id and Time

are unique • Store as many as needed

CREATE TABLE temperature ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time) );

INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:01:00','72F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:02:00','73F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:03:00','73F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:04:00','74F');

Page 9: Time series with apache cassandra   strata

Storage Model - Logical View

2013-04-03 07:01:00

72F

2013-04-03 07:02:00

73F

2013-04-03 07:03:00

73F

SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD';

1234ABCD

1234ABCD

1234ABCD

weatherstation_id event_time temperature

2013-04-03 07:04:00

74F1234ABCD

Page 10: Time series with apache cassandra   strata

Storage Model - Disk Layout

2013-04-03 07:01:00

72F

2013-04-03 07:02:00

73F

2013-04-03 07:03:00

73F1234ABCD

2013-04-03 07:04:00

74F

SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD';

Merged, Sorted and Stored Sequentially

2013-04-03 07:05:00 !!74F

2013-04-03 07:06:00 !!75F

Page 11: Time series with apache cassandra   strata

Query patterns• Range queries • “Slice” operation on disk

SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00';

2013-04-03 07:01:00

72F

2013-04-03 07:02:00

73F

2013-04-03 07:03:00

73F1234ABCD

2013-04-03 07:04:00

74F

2013-04-03 07:05:00 !!74F

2013-04-03 07:06:00 !!75F

Single seek on disk

Page 12: Time series with apache cassandra   strata

Query patterns• Range queries • “Slice” operation on disk

SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00';

2013-04-03 07:01:00

72F

2013-04-03 07:02:00

73F

2013-04-03 07:03:00

73F

1234ABCD

2013-04-03 07:04:00

74F

weatherstation_id event_time temperature

1234ABCD

1234ABCD

1234ABCD

Programmers like this

Sorted by event_time

Page 13: Time series with apache cassandra   strata

Ingestion models• Apache Kafka • Apache Flume • Storm • Custom Applications

Apache Kafka

Your totally!killer!application

Page 14: Time series with apache cassandra   strata

Dealing with data at speed• 1 million writes per second? • 1 insert every microsecond • Collisions?

• Primary Key determines node placement • Random partitioning • Special data type - TimeUUID

Your totally!killer!application weatherstation_id='1234ABCD'

weatherstation_id='5678EFGH'

Page 15: Time series with apache cassandra   strata

TimeUUID

• Also known as a Version 1 UUID • Sortable • Reversible

Timestamp to Microsecond + UUID = TimeUUID

04d580b0-9412-11e3-baa8-0800200c9a66 Wednesday, February 12, 2014 6:18:06 PM GMT

http://www.famkruithof.net/uuid/uuidgen

=

Page 16: Time series with apache cassandra   strata

Way more information

• 5 minute interviews • Use cases • Free training!

!www.planetcassandra.org

Page 17: Time series with apache cassandra   strata

Thank You!

Follow me for more updates all the time: @PatrickMcFadin