Time series with apache cassandra strata

Post on 15-Jan-2015

4.677 views 2 download

Tags:

description

This talk is geared around understanding the basics of how Apache Cassandra stores and access time series data.

Transcript of Time series with apache cassandra strata

©2013 DataStax Confidential. Do not distribute without consent.

@PatrickMcFadin

Patrick McFadinChief Evangelist

Time Series with Apache Cassandra

�1

Quick intro to Cassandra• Shared nothing •Masterless peer-to-peer • Based on Dynamo

Scaling• Add nodes to scale •Millions Ops/s Cassandra HBase Redis MySQL

THRO

UG

HPU

T O

PS/S

EC)

Uptime• Built to replicate • Resilient to failure • Always on

NONE

Easy to use• CQL is a familiar syntax • Friendly to programmers • Paxos for locking

CREATE TABLE users (! username varchar,! firstname varchar,! lastname varchar,! email list<varchar>,! password varchar,! created_date timestamp,! PRIMARY KEY (username)!);

INSERT INTO users (username, firstname, lastname, ! email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['patrick@datastax.com'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');!

INSERT INTO users (username, firstname, ! lastname, email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['patrick@datastax.com'],! 'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00')!IF NOT EXISTS;

Time series in production• It’s all about “What’s happening” • Data is the new currency

“Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of financial data, ingesting into its database 2million pieces of information a second from every major trading exchange.”*

* http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html

Why Cassandra for Time Series

ScalesResilientGood data modelEfficient Storage Model

What about that?

Data Model•Weather Station Id and Time

are unique • Store as many as needed

CREATE TABLE temperature ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time) );

INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:01:00','72F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:02:00','73F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:03:00','73F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:04:00','74F');

Storage Model - Logical View

2013-04-03 07:01:00

72F

2013-04-03 07:02:00

73F

2013-04-03 07:03:00

73F

SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD';

1234ABCD

1234ABCD

1234ABCD

weatherstation_id event_time temperature

2013-04-03 07:04:00

74F1234ABCD

Storage Model - Disk Layout

2013-04-03 07:01:00

72F

2013-04-03 07:02:00

73F

2013-04-03 07:03:00

73F1234ABCD

2013-04-03 07:04:00

74F

SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD';

Merged, Sorted and Stored Sequentially

2013-04-03 07:05:00 !!74F

2013-04-03 07:06:00 !!75F

Query patterns• Range queries • “Slice” operation on disk

SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00';

2013-04-03 07:01:00

72F

2013-04-03 07:02:00

73F

2013-04-03 07:03:00

73F1234ABCD

2013-04-03 07:04:00

74F

2013-04-03 07:05:00 !!74F

2013-04-03 07:06:00 !!75F

Single seek on disk

Query patterns• Range queries • “Slice” operation on disk

SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00';

2013-04-03 07:01:00

72F

2013-04-03 07:02:00

73F

2013-04-03 07:03:00

73F

1234ABCD

2013-04-03 07:04:00

74F

weatherstation_id event_time temperature

1234ABCD

1234ABCD

1234ABCD

Programmers like this

Sorted by event_time

Ingestion models• Apache Kafka • Apache Flume • Storm • Custom Applications

Apache Kafka

Your totally!killer!application

Dealing with data at speed• 1 million writes per second? • 1 insert every microsecond • Collisions?

• Primary Key determines node placement • Random partitioning • Special data type - TimeUUID

Your totally!killer!application weatherstation_id='1234ABCD'

weatherstation_id='5678EFGH'

TimeUUID

• Also known as a Version 1 UUID • Sortable • Reversible

Timestamp to Microsecond + UUID = TimeUUID

04d580b0-9412-11e3-baa8-0800200c9a66 Wednesday, February 12, 2014 6:18:06 PM GMT

http://www.famkruithof.net/uuid/uuidgen

=

Way more information

• 5 minute interviews • Use cases • Free training!

!www.planetcassandra.org

Thank You!

Follow me for more updates all the time: @PatrickMcFadin