Real-Time Analytics at Uber Scale
-
Upload
memsql -
Category
Technology
-
view
504 -
download
3
Transcript of Real-Time Analytics at Uber Scale
*
ApolloJames BurkhartUber - Staff Engineer
Agenda
- Motivation
- Ingest
- Storage
- Query
Motivation
- Business Intelligence- Real-time- Time series aggregates- Geospatial
What is Apollo?- Real-time analytics platform focused on:
- Recent data (~7 weeks)- Immediate visibility (1500ms-3minute p99 ingest latency)- Ad-hoc queryability
- Arbitrary drilldown- Geospatial functionality
- Data correctness/deduplication (exactly-once)- Extremely low latency query (<100ms p95, <1s p99)
- Powering internal data tools at Uber
Real-time operational analytics dashboarding
- Used by majority of Operations weekly
Apollo Query Builder
- Web UI for Apollo Query Language
- Fully interactive
NYE 2016-2017
Motivation, Functionality Requirements
- Index based on data timestamp, not arrival timestamp- Out of order and late (up to days later) arrival- Mutability
- Sub-linear performance impact of scaling QPS
Apollo architecture
Users
Environment Management(MemSQL Cluster Sizes)
Datacenter 1 Datacenter 2
Production Prime33x 256GB
Production Prime 243x 256GB
Production Minor5x 256GB
Production Minor 27x 256GB
Staging/Preprod25x 256GB
mirrored
Ingestion
Ingestion
● Simple transformations○ (i.e string uuid to binary representation)
■ “123e4567-e89b-12d3-a456-426655440000” >= 36B■ 0x123E4567E89B12D3A456426655440000 >= 16B
● Filters● Each job is one input stream to (>=1) output tables● Independent job instance per environment
val inputStream = KafkaInputStream(topic);
job.outputTables.forEach((outputTable) => {
inputStream
.filter( ... )
.map(..transformations -> sql row...)
.grouped(outputTable.batchSize)
.forEach(writeBatchToDatabase)
});
Ingestion
● Upserts - No double counting!● Async RF=2 MemSQL replication
○ Can lose recent writes during hardware failure● Solution -> every 6 hours, upsert last 72h worth of data in
batch from Hive
Storage
● In-memory rowstore - mutable/recent● Columnstore - immutable/older
Caching
● Partial, recomposable results● Sharded MySQLs
Apollo Query Language (AQL)
● Custom Analytical Time-Series Query Language● Goals:
○ Flexibility like SQL○ Minimal Learning Curve○ Ease-of-Use
● Features:○ Canonicalization○ Ease-of-parsing○ Error detection○ Automatic optimization
{ "table": "trips", "joins": [ { "alias": "g", "table": "geofences", "conditions": [ "geography_intersects(request_at, g.shape)" ] } ], "dimensions": [ { "sqlExpression": "request_at", "timeBucketizer": "day", "timeUnit": "millisecond" } ], "measures": [ { "sqlExpression": "count(*)", "rowFilters": [ "status='completed'" ] } ], "rowFilters": [ "city_id=1", "g.uuid=0x0A" ], "timeFilter": { "column": "request_at", "from": "yesterday", "to": "yesterday" }, "timezone": "America/Los_Angeles"}
Example
Apollo Query Builder
- Web UI for Apollo Query Language
- Fully interactive
Why SQL is hard for time series OLAP
Field Value
Dimension.SQLExpression request_at
Dimension.TimeBucketizer day
Dimension.TimeUnit millisecond
Timezone America/Los_Angeles
Why SQL is hard for time series OLAP● Date/time functions:
○ ROUND(UNIX_TIMESTAMP(CONVERT_TZ(DATE_FORMAT(CONVERT_TZ(FROM_UNIXTIME(((trips.request_at) - (trips.request_at) % 900000) / 1000), 'GMT', 'America/Los_Angeles'), '%Y-%m-%d'), 'America/Los_Angeles', 'UTC')) / 0.001, 0)
○ Cheap timestamp snapping to 15m○ Conversion from milliseconds to seconds○ Conversion from Unix timestamp to SQL time○ Adding timezone to Unix time○ Date/time formatting/truncation○ Timezone conversion○ Conversion from SQL time to Unix timestamp○ Conversion from seconds to milliseconds
Field Value
Dimension.SQLExpression request_at
Dimension.TimeBucketizer day
Dimension.TimeUnit millisecond
Timezone America/Los_Angeles
Why SQL is hard for time series OLAP● City/Region/Country based timezone
○ ROUND(UNIX_TIMESTAMP(CONVERT_TZ(DATE_FORMAT(CONVERT_TZ(FROM_UNIXTIME(((trips.request_at) - (trips.request_at) % 900000) / 1000), 'GMT', __tz__.sub_region_timezone), '%Y-%m-%d'), __tz__.sub_region_timezone, 'UTC')) / 0.001, 0) FROM trips JOIN api_cities as __tz__ ON trips.city_id = __tz__.id
○ Join with api_cities (which has timezone info of each level) on city_id○ Use the corresponding timezone column from api_cities
Field Value
Dimension.SQLExpression request_at
Dimension.TimeBucketizer day
Dimension.TimeUnit millisecond
Timezone sub_region_timezone(city_id)
Why SQL is hard for time series OLAP● #completed_trips / #requested_trips
○ SUM(CASE WHEN trips.status=’completed’ THEN 1 ELSE 0 END) / SUM(CASE WHEN trips.status!=’ignored’ THEN 1 ELSE 0 END)
○ SELECT …, _1.completed / _2.requested FROM (SELECT …, COUNT(*) as completed FROM trips WHERE status=’completed’ GROUP BY ...) AS _1 JOIN (SELECT …, COUNT(*) as requested FROM trips WHERE status!=’ignored’ GROUP BY ...) AS _2 ON ...
○ Filters make measures complexField Value
Measure[0].SQLExpression count(*)
Measure[0].Filters status=’completed’
Measure[0].Alias completed
Measure[1].SQLExpression count(*)
Measure[1].Filters status!=’ignored’
Measure[1].Alias requested
Measure[2].SQLExpression completed / requested
Why SQL is hard for time series OLAP● #Trips by geofence for geofence A, B and C
○ SELECT count(*), geofences.uuid FROM trips JOIN geofences ON geography_intersects(trips.request_point, geofences.shape) WHERE geofences.uuid IN (A, B, C) GROUP By geofences.uuid
● Total #Trips for geofence A, B and C○ SELECT count(*) FROM trips JOIN geofences ON geography_intersects(trips.request_point, geofences.shape) WHERE geofences.uuid IN
(A, B, C)
● Overlapping is OK, overcounting is not!○ SELECT count(*) FROM trips WHERE EXISTS (SELECT * FROM geofences WHERE geography_intersects(trips.request_point,
geofences.shape) AND geofences.uuid IN (A, B, C)
Bad SQL queries● SELECT count(*), request_at FROM trips GROUP BY request_at;
○ Time needs to be bucketized! Grouping by milliseconds makes no sense!
● SELECT count(*), fare_total FROM trips GROUP BY fare_total;
○ Some numeric values such as fare needs to be bucketized (reported as histograms)!
● SELECT sum(fare_total) FROM trips, other_table WHERE trips.fare_total>1.0 AND other_table.foo=’BAR’;
○ Join condition is missing, cartesian product is bad!
AQL Query OptimizationDate/time function performance issue
● CONCAT(DATE_FORMAT(FROM_UNIXTIME((__d0__) / 1000), '%Y-%m-%d '), LPAD(3 *
FLOOR(HOUR(FROM_UNIXTIME((__d0__) / 1000)) / 3), 2, '0'), ':00')
● Run for every row (trip)!
Two-stage aggregation
date/time function bucketizaton
request_at
count(*)
date/time function bucketizaton
request_at
count(*) as ct - t % 15m
sum(c) Stage 2
Stage 1
Time Series Bucket SplittingNow: 2016-03-22 13:17
2016-03-21 (partial week)
2016-03-21 (day) 2016-03-22 00:00 (hour)
2016-03-22 01:00 (hour)
...(hour)
2016-03-22 12:00 (hour)
2016-03-22 13:00 (15m)
2016-03-22 13:15 (minute)
2016-03-22 13:16 (minute)
2016-03-22 13:15 (15m)
Split Rollup
From: this week To: now
Time Series Bucket Splitting
2016-03-07 (week)
To: -12h
2016-03-14 (week) 2016-03-21 (partial week)
2016-03-02 (partial week)
From: -20d
2016-03-02 (day)
2016-03-03 (day)
... (day) 2016-03-06 (day)
2016-03-21 (day)
2016-03-22 00:00 (hour)
Now: 2016-03-22 13:17
2016-03-22 01:00 (hour)
Split Rollup Split Rollup
BucketSize: week
AQL Query OptimizationAggregate rollups
avg(x) = sum(x) / count(*)
Original function Stage 1 Stage 2 (rollup)
count count sum
sum sum sum
min min min
max max max
count distinct distinct count distinct
HyperLogLog
Contracts
SELECT AVG(fare), ts_15m FROM trips WHERE time >= (now() - 1h) (where city=x)
group by 15m(, city);
Contracts
SELECT AVG(fare), ts_15m FROM trips WHERE time >= (now() - 1h) (where city=x)
group by 15m(, city);
(where city=x) --p95--> 50ms 60ms 70msFor x in cities: (where city=x) -sum-> ~9s ~10s ~12sgroup by city --p95--> 200ms ~1s ~7s
1h 24h (21d, group by 24h)
Contracts
SELECT AVG(fare), ts_15m FROM trips WHERE time >= (now() - 1h) (where city=x)
group by 15m(, city);
(where city=x) --p95--> 50ms 60ms 70msFor x in cities: (where city=x) -sum-> ~9s ~10s ~12sgroup by city --p95--> 200ms ~1s ~7s
1h 24h (21d, group by 24h)
Contracts
SELECT COUNT(1), AVG(fare), SUM(fare), AVG(eta) FROM trips WHERE ...
SELECT COUNT(1), AVG(fare), SUM(fare), SUM(eta) FROM trips WHERE ...
ContractsSELECT COUNT(1) FROM trips WHERE
City = ‘San Francisco’State = ’completed’Product = ’Uber-X’
(City,State,Product),(City,State),(City,Product),(City),(State),(State,Product),(Product),(∅)
Geographical Breakdowns:World > North America > United States > US West > California > BayArea > SF
ContractsSELECT COUNT(1) FROM trips WHERE GROUP BY
City = ‘San Francisco’State = ’completed’Product = ’Uber-X’
(City,State,Product),(City,State),(City,Product),(City),(State),(State,Product),(Product),(∅)
Geographical Breakdowns:World > North America > United States > US West > California > BayArea > SF
Stats
● p80 <= 10ms● p90 <= 50ms● p95 <= 100ms● p99 <= 1000ms● p99.5 <= 5000ms
● Millions queries/day● ~250k distinct queries● Billions MySQL writes/day
Future Plans (next 3-6 months)
● Product ○ Self-service onboarding and schema management○ Schema change management and automation
● Technology ○ Cost Accounting○ Contract automation○ Query cost estimation
Challenges and Learnings
Schema Challenges
● Many Schemas:○ Ingestion transformations
■ Hive■ Avro-encoded Kafka
○ MemSQL Schema○ Query layer schema
Ingestion
Ingestion
Metric Spark Golang
Containers 32 4
CPU Cores 160 8
Memory (GB) 226 16
Throughput 36k/s 60k/s
Performance differences for largest job
Questions?
(PS: We’re hiring)
Uber Engineering Blogeng.uber.com
Uber Open Sourceuber.github.io
Uber Eng Twittertwitter.com/ubereng
These slideshttps://tinyurl.com/apollostrata msql.co/uberscale
Check out ‘Hoodie: Incremental processing on Hadoop at Uber’ Thursday 1:50-2:30 for the next Uber Strata presentation.