Bill Hemmings: Politicas de incentivo publico de la UE hacia las aerolineas
Building a custom time series db - Colin Hemmings at #DOXLON
Click here to load reader
-
Upload
dataloopio -
Category
Internet
-
view
559 -
download
0
description
Transcript of Building a custom time series db - Colin Hemmings at #DOXLON
www.dataloop.io | @dataloopio | [email protected]
Colin Hemmings | Architect
Time-series Datastore on Riak
www.dataloop.io | @dataloopio | [email protected]
Just stick it in a database, right?
The Storage Problem
www.dataloop.io | @dataloopio | [email protected]
Riak - Our New Hope
• Scales
• Ops Friendly
• Actually works
• No random JVM crashes here
www.dataloop.io | @dataloopio | [email protected]
Objectives
• Handle the load
• Semi-arbitrary queries
• Data retention windows
• Low latency
www.dataloop.io | @dataloopio | [email protected]
Data structure
• Resolution/rollup based queries
• Minimum 24 hours at 1 second resolution
• Second, minute and hour resolution
www.dataloop.io | @dataloopio | [email protected]
Data structure
• 86,400 data points per resolution
• 1 second -> 24 hour retention
• 1 minute -> 60 day retention
• 1 hour -> 10 year retention
www.dataloop.io | @dataloopio | [email protected]
Data structure
• per metric -> 250k data points
• 1000 metric per host -> 2.5M data points
• 300 hosts per user -> 750M data points
• 1000 customers -> 750B data points!!!!!
www.dataloop.io | @dataloopio | [email protected]
Simple Riak Storage
• Timestamp keyed object per metric value
• 2i and MapReduce are too slow
• Especially across millions of keys
• Writes would soon cripple our Riak cluster
www.dataloop.io | @dataloopio | [email protected]
Intelligent Riak Storage
• Units of storage: time based data blocks
• Compute keys
• Mutable data windows
www.dataloop.io | @dataloopio | [email protected]
Query
Get cpu metrics for host A for period t1-t4 at 1 second resolution
• Pull the correct blocks from riak, based on block boundaries
• GET /buckets/host_a/keys/cpu_second_t1b
• GET /buckets/host_a/keys/cpu_second_t2b
• GET /buckets/host_a/keys/cpu_second_t3b
• GET /buckets/host_a/keys/cpu_second_t4b
www.dataloop.io | @dataloopio | [email protected]
Query
• Filter points outside of our query range
• Aggregate all the data points
• Perform other operation if more complex query
www.dataloop.io | @dataloopio | [email protected]
Expiring
• Cleanup worker
• Removes keys out of retention window
• Host keyed, easier to clear all hosts or account data
www.dataloop.io | @dataloopio | [email protected]
Our cluster
• Riak 2.0
• 5 nodes on LevelDB
• Each 2 x 500GB striped SSDs
• Average 1ms GET and PUT latencies
www.dataloop.io | @dataloopio | [email protected]
www.dataloop.io | @dataloopio | [email protected]
Comments
• Awesome, especially for ops
• A bit more work in application tier
• Always compute keys avoid 2i and MapReduce
• Looking forward to using the new data types