Driving towards the intersection of capacity and demand ...€¦ · Driving towards the...

26
Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer, Data Infra - Interactive Querying 06.20.2019

Transcript of Driving towards the intersection of capacity and demand ...€¦ · Driving towards the...

Page 1: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Driving towards the intersection of capacity and demand with

dynamic Presto scaling

Puneet Jaiswal

Software Engineer, Data Infra - Interactive Querying

06.20.2019

Page 2: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Mission

Improve people’s lives with the world’s best transportation

Page 3: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Agenda

● Presto Infra @ Lyft

● Gateway

● Schedule based Scaling

● Further Perf improvement

● GSheets Connector

● Future work

● Questions

Page 4: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Presto Infra @ Lyft

● Prestosql version 309

● 40 PB queryable event data

● 100K (peaks) daily queries (1.5M monthly)

● 950 DAUs

● 240 - 500 workers nodes total

○ 55 TB total available mem in peak time

○ 24K vCPUs

○ Worker node type - m5.12xlarge - 48 vcpu / 192 GB mem

● Schedule based scaling

Page 5: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Presto infra stackClients

Presto Gateway

Presto - load balancer

Presto 1

Presto 2

Multiple Presto clusters

Page 6: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Presto-Gateway

proxy/gateway/load balancer for presto

https://github.com/lyft/presto-gateway

Page 7: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Problems

● Single Presto Coordinator

● Scale down was not easy - worker reduction affected running queries

● Upgrade requires downtime

● Single cluster vs multi cluster

● Clients (tableau / mode / looker etc) with single connection

○ Do not pass session user with each query - bad resource / queue isolation.

Page 8: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Presto Gateway

● Transparent API layer to access presto clusters without changing the protocol.

● Separate proxy end-point to access each presto cluster.

● API to activate / deactivate presto clusters

● Monitoring and alerting (Email / PagerDuty)

● Fast access query UI to trace queries

● Recovery speed - easier to block a bad cluster, than fixing it at the moment.

Page 9: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Query Routing

Currently round robin routing for all queries

Future:

Source / use case / available resources based routing

We can add simple rules to route queries selectively.

Page 10: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Presto gateway UI

Shows last N queries

Links to access query details page in native presto cluster

Shows active available clusters

Page 11: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Prestoadm tool

Cli tool for easy activate / deactivate operations

Page 12: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Schedule based ScalingSteady growth in users & queries - scaling required

● Presto gateway as load balancer

● Presto cluster is unit of scaling

● Gateway APIs to activate/deactivate backend clusters

● Scaling is triggered based on schedule

Page 13: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Query volume pattern

Granularity - minute

Page 14: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Raw data scanned rate (GB/s)

Peaks at 300 GB/s

Page 15: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Scheduled scaling - nodes vs query volume

Cutting 50% infra in non work hours resulted 30% reduction in cost.

Page 16: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Perf improvements

Page 17: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Performance boost - timelineThe longer the cluster ran, the slower it became.

P75 - query wall time (seconds)

Weekly Rolling P75This graph visualizes trends in query performance over a 7-day rolling window.

To avoid long GC pauses, we started recycling nodes every day

Java 11 upgrade

To tackle increased load we added more clusters

Page 18: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

P99 total query time (exec+queued)(seconds)

Daily query volume

Page 19: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Google Sheets pluginWhy? Biz data usually maintained in sheets, but can be joined with hive tables.

v 0.1

● All columns are varchar type

● First line in sheet is treated as column name

● Sheet to table name mapping stored in a metadata sheet

● Presto connects to gsheets api using service account credentials

● View access to service account user for all the sheets

Page 20: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Table sheet mapping(metadata sheet)

Data Table

Querying

Page 21: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Apache Superset

Sends EXPLAIN (TYPE VALIDATE) queries to deep validate columns/udfs and tables etc.

Pre query validation & SQL IDE experience

Page 22: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Future work

● GSheets connector plugin

○ opensource

○ auto column type detection

○ easy sheet onboarding.

● Apache Superset

○ Showing query cost as user runs the presto query

● Scheduled vs adhoc query routing

● Gateway - HA

Page 23: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Questions?

Page 24: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Credits

Page 25: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Data Infra - Interactive Querying

Page 26: Driving towards the intersection of capacity and demand ...€¦ · Driving towards the intersection of capacity and demand with dynamic Presto scaling Puneet Jaiswal Software Engineer,

Thank you Presto dev community !