Building a custom time series db - Colin Hemmings at #DOXLON

18

Click here to load reader

description

Colin talks about how he architected and built a high performance time series database from the ground up at Dataloop.io. Handling hundreds of thousands of metrics per second. One of the objectives was to provide real time graphing and alerting. If you're 'rolling your own' metrics, are interested in Node.JS, highly scalable architectures and like listening to plenty of war stories you should enjoy this talk. Video: http://youtu.be/vx6Ms5TNtqo DevOps Exchange Meetup Group: http://bit.ly/doxlonmeetup

Transcript of Building a custom time series db - Colin Hemmings at #DOXLON

Page 1: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Colin Hemmings | Architect

Time-series Datastore on Riak

Page 2: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

•Collection •Storage •Analytics

Architecture

Page 3: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Just stick it in a database, right?

The Storage Problem

Page 4: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Past Solutions

TempoDB - the phantom menace

Page 5: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Past Solutions

MongoDB - return of the Jedi

Page 6: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Riak - Our New Hope

• Scales

• Ops Friendly

• Actually works

• No random JVM crashes here

Page 7: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Objectives

• Handle the load

• Semi-arbitrary queries

• Data retention windows

• Low latency

Page 8: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Data structure

• Resolution/rollup based queries

• Minimum 24 hours at 1 second resolution

• Second, minute and hour resolution

Page 9: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Data structure

• 86,400 data points per resolution

• 1 second -> 24 hour retention

• 1 minute -> 60 day retention

• 1 hour -> 10 year retention

Page 10: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Data structure

• per metric -> 250k data points

• 1000 metric per host -> 2.5M data points

• 300 hosts per user -> 750M data points

• 1000 customers -> 750B data points!!!!!

Page 11: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Simple Riak Storage

• Timestamp keyed object per metric value

• 2i and MapReduce are too slow

• Especially across millions of keys

• Writes would soon cripple our Riak cluster

Page 12: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Intelligent Riak Storage

• Units of storage: time based data blocks

• Compute keys

• Mutable data windows

Page 13: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Query

Get cpu metrics for host A for period t1-t4 at 1 second resolution

• Pull the correct blocks from riak, based on block boundaries

• GET /buckets/host_a/keys/cpu_second_t1b

• GET /buckets/host_a/keys/cpu_second_t2b

• GET /buckets/host_a/keys/cpu_second_t3b

• GET /buckets/host_a/keys/cpu_second_t4b

Page 14: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Query

• Filter points outside of our query range

• Aggregate all the data points

• Perform other operation if more complex query

Page 15: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Expiring

• Cleanup worker

• Removes keys out of retention window

• Host keyed, easier to clear all hosts or account data

Page 16: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Our cluster

• Riak 2.0

• 5 nodes on LevelDB

• Each 2 x 500GB striped SSDs

• Average 1ms GET and PUT latencies

Page 17: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Page 18: Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | [email protected]

Comments

• Awesome, especially for ops

• A bit more work in application tier

• Always compute keys avoid 2i and MapReduce

• Looking forward to using the new data types