Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product...

18
Presto Summit NYC 2019 December 11, 2019 Slack handles: @cheolsoo; @abhonsule slack-corp.com

Transcript of Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product...

Page 1: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

Presto Summit NYC 2019December 11, 2019Slack handles: @cheolsoo; @abhonsuleslack-corp.com

Page 2: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running
Page 3: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

Mission

Make people’s working lives simpler, more pleasant and more productive.

Page 4: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

Slack

Page 5: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

215B +270M 700B 250B

Logs Daily Messages Daily Records Messages Table

Data Engineering at SlackCustodian of all data generated within Slack, the product. We provide the infrastructure and tooling necessary for

stakeholders to reliably access product data for user facing features, product and business insights.

Page 6: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

Databooks

AB Testing framework

BI portal

Presto

Airflow

Analytics.ts

Sqooper

Slack’s AB testing/ Experiments framework

Tool used by Analysts, Data scientists, Marketing, Sales, Finance

BI tool used by Corp/ Biztech

Batch ingestion system

Slack’s internal analytics portal -

Product Managers, Engineers, Analysts,

Data scientists, Sales, Marketing, Finance

DAGs running on ETL scheduling system

Presto at Slack

clog queriesQuery client logs

Page 7: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

Presto at Slack

Past Present Future

Presto on EMRSingle cluster

Starburst on EC2Multiple clusters Federated clusters

Page 8: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

Query success rate

Page 9: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

Query count

Page 10: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

Multiple clusters

● Static load balancing

● Per cluster config properties

● Per cluster capacity planning

Page 11: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

Shadow clusters

● Read-only shadow cluster in parallel

● Useful for testing config changes or version upgrades

Page 12: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

Terraform module

● Provision a cluster with 25-lines of code

● ASG optionally with spot

● Dedicated HMS per cluster

Page 13: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

Resource groups ● Per cluster resource

groups config● Per group

scheduling policies config

● Fair (ad-hoc) vs weighted_fair (etl)

● Per cluster resource groups

● Per group scheduling policies

● Fair (ad-hoc) vs weighted_fair (etl)

Page 14: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

JMX exporter -javaagent:/usr/local/jmx_exporter/jmx_exporter.jar=

7071:/usr/local/jmx_exporter/exporter.yml

JVM

self.consul_job(

'presto',

datacenters=[env + '-us-east-1-dw1'],

services=['presto']

)

Prometheus

Page 15: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

Grafana dashboard

Page 16: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

Autoscaling curl -XPUT localhost:8889/v1/info/state -d "SHUTTING_DOWN" -H "Content-type: application/json"

Graceful decommission

"auto_scaling_group": {

"prepare_for_termination_cmd": "<cmd>"

}

Chef role

Page 17: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

Federated clusters

● Dynamic load balancing

● High availability● Minimize the

impact of rogue queries

Page 18: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running

Q&A

Slack handles: @cheolsoo; @abhonsule

slack-corp.com