Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale...
Transcript of Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale...
![Page 1: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/1.jpg)
Scaling Automated Database Monitoring at Uber… with M3 and Prometheus
Richard Artoul
![Page 2: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/2.jpg)
01 Automated database monitoring02 Why scaling automated monitoring is hard03 M3 architecture and why it scales so well04 How you can use M3
Agenda
![Page 3: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/3.jpg)
Uber’s “Architecture”
2015 20194000+ microservices -
Most of which directly or
indirectly depend on
storage
22+ storage technologies -
Ranging from C* to MySQL
1000’s of dedicated servers
running databases - Monitoring
all of these is hard
![Page 4: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/4.jpg)
Monitoring Databases
ApplicationHardware
Technology
Application
![Page 5: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/5.jpg)
Hardware Level Metrics
![Page 6: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/6.jpg)
Technology Level Metrics
E
E
![Page 7: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/7.jpg)
Application Level Metrics
● Number of successes, errors, and latency broken down by○ All queries against a given database○ All queries issue by a specific service○ A specific query
■ SELECT * FROM TABLE WHERE USER_ID = ?
![Page 8: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/8.jpg)
![Page 9: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/9.jpg)
What’s so hard about that?
![Page 10: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/10.jpg)
Monitoring Applications at Scale
1200 Microservices w/ dedicated storage
100 Instances per service
20 Instances per DB cluster
20 Queries per service
10 Metrics per query
480 million unique time series
X100+ million dollars!
![Page 11: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/11.jpg)
Writes per second/s(post replication)
110MDatapoints emitted pre-aggregation
800M
Unique Metric IDs
9BDatapoints read per second
200B
Workload
![Page 12: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/12.jpg)
How do we do it?
![Page 13: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/13.jpg)
A Brief History of M3
● 2014-2015: Graphite + WhisperDB
○ No replication, operations were ‘cumbersome’
● 2015-2016: Cassandra
○ Solved operational issues
○ 16x YoY Growth
○ Expensive (> 1500 Cassandra Hosts)
○ Compactions => R.F=2
● 2016-Today: M3DB
![Page 14: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/14.jpg)
M3DB Overview
![Page 15: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/15.jpg)
M3DB
An open source distributed time series database
● Store arbitrary timestamp precision data points at any resolution for any
retention
● Tag (key/value pair) based inverted index
● Optimized file-system storage with minimal need for compactions of time
series data for real-time workloads
![Page 16: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/16.jpg)
High-Level Architecture
Like a Log Structured Merge Tree (LSM)
Except a typical LSM has levelled or size based compaction
M3DB has time window compaction
![Page 17: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/17.jpg)
Topology and Consistency
● Strong consistent topology (using etcd)
○ No gossip
○ Replicated with zone/rack aware layout and configurable replication
factor
● Consistency managed via synchronous quorum writes and reads
○ Configurable consistency level for both read and write
![Page 18: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/18.jpg)
Cost Savings and Performance
● Disk Usage in 2017
○ ~1.4PB for Cassandra at R.F=2
○ ~200TB for M3DB at R.F=3
● Much higher throughput per node with M3DB
○ Hundreds of thousands of writes/s on commodity hardware
![Page 19: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/19.jpg)
But what about the index?
![Page 20: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/20.jpg)
Centralized Elasticsearch Index
● Actual time series data was stored in Cassandra and the M3DB
● But indexing of data (for querying) was handled by Elasticsearch
● Worked for us for a long time
● … but scaling writes and reads required a lot of engineering
![Page 21: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/21.jpg)
Elasticsearch Index - Write Path
m3agg
E.S
Indexer
Redis Cache
“Don’t write cache”Influx of new metrics:1. Large Service
Deployment2. Datacenter Failover
M3DB
![Page 22: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/22.jpg)
Elasticsearch Index - Read Path
Query
E.S
M3DB
1
2
![Page 23: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/23.jpg)
Elasticsearch Index - Read Path
Query
E.S
Redis Query Cache
Need high T.T.L to prevent overwhelming E.S
… but high T.T.L means long delay for new time series to become queryable
![Page 24: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/24.jpg)
Elasticsearch Index - Read Path
Query
Redis Query Cache E.S Short
E.S Long
Redis Query Cache
Short TTL
Merge on read
Long TTL
![Page 25: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/25.jpg)
Elasticsearch Index - Final Breakdown
● Two elasticsearch clusters with separate configuration
● Two query caches
● Two don’t-write caches
● A stateful indexing tier that requires consistent hashing, an in-memory
cache, and breaks everything if you deploy it too quickly
● Another service just for automatically rotating the short-term elasticsearch
cluster indexes
● A background process that’s always running and trying to delete stale
documents from the long term index
![Page 26: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/26.jpg)
M3 Inverted Index
● Not nearly as expressive or feature-rich as Elasticsearch
● … but like M3DB, it’s designed for high throughput
● Temporal by nature (T.T.Ls are cheap)
● Fewer moving parts
![Page 27: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/27.jpg)
M3 Inverted Index
● Primary use-case, support queries in the form:
○ service = “foo” AND
○ endpoint = “bar” AND
○ client_version regexp matches “3.*”
service=”foo” endpoint=”bar” client_version=”3.*”AND AND
client_version=”3.1”
client_version=”3.2”
OR
OR
AND → Intersection
OR → Union
![Page 28: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/28.jpg)
M3 Inverted Index - F.S.Ts
● The inverted index is similar to Lucene in that it uses F.S.T segments to build an
efficient and compressed structure for fast regexp.
● Each time series’ label has its own F.S.T that can be searched to find the offset to
unpack another data structure that contains the set of metrics IDs associated with a
particular label value (postings list)
Encoded Relationships
are --> 4
ate --> 2
see --> 3
Compressed + fast regexp!
![Page 29: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/29.jpg)
M3 Inverted Index - Postings List
● For every term combination in the form of service=”foo” need to store a set of metric
IDs (integers) that match, this is called a postings list.
● Index is broken into blocks and segments, so need to be able to calculate
intersection (AND - across terms) and union (OR - within a term).
12P.M -> 2P.M Index Block
service=”foo” endpoint=”bar” client_version=”3.*”INTERSECT
Intersect → AND
Union → ORINTERSECT
client_version=”3.1”
client_version=”3.2”
Union
Union
Primary data structure for the postings list in M3DB
is the roaring bitmap
![Page 30: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/30.jpg)
M3 Inverted Index - File Structure
────────────────────────────── Time ────────────────────────────────────────▶┌──────────────────────────────────────┐
│/var/lib/m3db/data/namespace-a/shard-0│
└───────────────────┬───────────────┬──┴────────────┬───────────────┬───────────────┐
│Fileset File │Fileset File │Fileset File │Fileset File │
│Block │Block │Block │Block │
└───────────────┴───────────────┴───────────────┴───────────────┘
┌──────────────────────────────────────┐
│/var/lib/m3db/index/namespace-a │
└───────────────────┬──────────────────┴────────────┬───────────────────────────────┐
│Index Fileset File │Index Fileset File │
│Block │Block │
└───────────────────────────────┴───────────────────────────────┘
![Page 31: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/31.jpg)
M3 Summary
● Time series compression + temporal data and file structures
● Distributed and horizontally scalable by default
● Complexity pushed to the storage layer
![Page 32: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/32.jpg)
How you can use M3
![Page 33: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/33.jpg)
M3 and Prometheus
Scalable monitoring
![Page 34: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/34.jpg)
M3 and Prometheus
Prometheus
My App
Grafana
Alerting
M3
Coordinator
M3DBM3DBM3DB
Directly query M3
using coordinator for
single Grafana
datasource
![Page 35: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/35.jpg)
Roadmap
![Page 36: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per](https://reader035.fdocuments.us/reader035/viewer/2022071114/5feb40c9d12b0d33c45e819c/html5/thumbnails/36.jpg)
Roadmap
● Bring the Kubernetes operator out of Alpha with further lifecycle management
● Arbitrary out of order writes for writing data into the past and backfilling
● Asynchronous cross region replication (for disaster recovery)
● Evolving M3DB into a generic time series database (think event store)
○ Efficient compression of events in the form of Protobuf messages