Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev...

61
Lessons from building large-scale, multi-cloud, SaaS software at Databricks Jeff Pang Principal Software Engineer @

Transcript of Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev...

Page 1: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Lessons from building large-scale, multi-cloud, SaaS software at DatabricksJeff PangPrincipal Software Engineer @

Page 2: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Who am I?

▪ Jeff PangPrincipal Software Engineer, Databricks

▪ Databricks Platform EngineeringTo help data teams solve the world’s toughest problems, the Databricks Platform team provides the world-class, multi-cloud platform that enables us to expand fast and iterate quickly

http://databricks.com/careers

Page 3: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

About

▪ Founded in 2013 by the original creators of Apache Spark

▪ Data and AI platform as a service for 5000+ customers

▪ 1000+ employees, 200+ engineers, >$200M annual recurring revenue

Page 4: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Our product

Data scientists Data engineers Business users

Page 5: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Agenda

The architectureInside the Unified Analytics Platform

Challenges & lessonsGrowing a SaaS data platformOperating on multiple cloudsAccelerating a data platform with data & AI

Page 6: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

The architectureInside the Unified Analytics Platform

Page 7: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Simple data engineering architecture

cluster

Reporting

Analytics

Business-level Aggregates

Filtered, CleanedAugmented

Raw Ingestion

Bronze Silver Gold

CSV,JSON, TXT…

Data LakeS3, HDFS,

Blob Store, etc.

Page 8: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Modern data engineering architecture

Data Lake Reporting, Notebooks, AI

StreamingAnalytics

Bronze Silver Gold

CSV,JSON, TXT…

Kinesis

Workflow scheduling

clusters

Cluster management

Page 9: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Customer Network

Multiply by thousands of customers...

Data Lake

CSV,JSON, TXT…

Kinesis

Customer Network

Data Lake

CSV,JSON, TXT…

Kinesis

Customer Network

Data Lake

CSV,JSON, TXT…

Kinesis

Customer Network

Data Lake

CSV,JSON, TXT…

Kinesis

Customer Network

Data Lake

CSV,JSON, TXT…

Kinesis

...

control plane

Collaborative Notebooks, AIStreamingAnalytics Workflow scheduling Cluster management Admin & Security

Reporting, Business Insights

Page 10: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

...across many regions...

Page 11: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

...on multiple clouds...

Page 12: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

→ millions of VMs managed per day

Page 13: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

That’s the Databricks control plane

What did we learn from building a large-scale, multi-cloud data platform?

100,000s of users 100,000s of Spark clusters per day

Millions of VMs launched per day Exabytes of data processed per day

Page 14: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Growing a SaaS data platform

Page 15: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Evolution of the Databricks control plane

We didn’t start with a global-scale, multi-cloud data platform

Challenge: Scaling a data platform from one customer to 5000+

Lesson: The factory that builds and evolves the data platform is more important than the data platform itself

Page 16: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Fast time to market

Databricks control plane “in-a-box”▪ Need to deliver value quickly▪ Need to iterate quickly▪ Can’t break things while iterating!

Keys to success:▪ Modern CI▪ Fast developer tools▪ Testing, testing, testing

V1 V2

25-500xScala build

speedups

10s of millions of tests per

day

100s of Databrick

s “in-a-box” test envs per day

Page 17: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Expand the total addressable market

Replicating control planes quickly▪ Need different configurations for

different environments▪ Need to update many environments▪ Can’t slow down platform

development!

Keys to success:▪ Declarative infrastructure▪ Modern CD infrastructure jsonnet

10 million lines

250klines

Page 18: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Service Framework

Land and expand workloads

Scaling the control plane▪ Need to support more users &

workloads▪ Need to build more features that scale▪ Don’t want devs to reinvent the wheel!

Keys to success:▪ A service framework to do the hard

stuff▪ Decompose monoliths to microservices

Container & replica management, APIs & RPCs, rate limits, metrics, logging, secrets & security, ...

CloudVM API

Cluster Manager

Customer Clusters

version 1

CloudVM API

CM Master

Customer Clusters

Worker Worker

API Server

CM MasterCM Shard

API ServerAPI ServerAPI Server

version 3

usage

Page 19: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Data Platform

The Databricks data platform factory

...Customer Network Customer Network Customer Network Customer Network Customer Network

Kubernetes

HCVault, Consul, Prometheus, ELK, Jaeger, Grafana, common IAM, onboarding, billing, ...

Envoy, GraphQL

Cloud VMs, network, storage, databases

CM Master

Worker Worker

API Server

CM MasterCM Shard

API ServerAPI ServerAPI Server

Page 20: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Operating on multiple clouds

Page 21: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Why multi-cloud?

The data platform needs to be where the data is▪ Performance, latency, egress data costs▪ Cloud-specific integrations▪ Data governance policies

Challenge: Supporting multiple clouds without sacrificing dev velocity

Lesson: A cloud-agnostic layer is key to dev velocity, but it also needs to integrate with the standards of each cloud and deal with their quirks

Page 22: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Challenge: dev velocity on multiple clouds

Many cloud services have no direct equivalents▪ DynamoDB vs ?

▪ CosmosDB vs ?

▪ Aurora vs ?

▪ SQL DW vs ?

Cloud APIs don’t look likeeach other▪ SDK: no common interfaces

▪ Auth: IAM vs AAD

▪ ACLs: IAM vs Azure RBAC

APIs?Services?

Operational tools for each cloud are very different▪ Templates: CloudFormation

vs ARM templates

▪ Logs: CloudWatch vs Azure Monitor

Ops?

Page 23: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Approach: cloud agnostic dev framework

Use lowest common denominator cloud services

EKS ←Kubernetes →AKS

HCVault, Consul, Prometheus, ELK, Jaeger, Grafana, common IAM, onboarding, billing, ...

Envoy

EC2VPC

RDS MySQL/Postgres

CM Master

Worker Worker

API Server

CM MasterCM Shard

API ServerAPI ServerAPI Server

Azure ComputeVNetAzure Database for MySQL/Postgres

≈≈≈

ELB Azure Load Balancer

Service framework API

Page 24: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Challenge: not everything can be cloud agnostic

Customers want to integrate with the standards of

each cloud

“Equivalent” cloud services

have implementation

quirks

Page 25: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Approach: abstraction layer for key integrations

Fargate ←Kubernetes →AKS

Bring your own key encryption

AuthN / AuthZ / Identity

EC2VPC

RDS MySQL/Postgres

CM Master

Worker Worker

API Server

CM MasterCM Shard

API ServerAPI ServerAPI Server

Azure ComputeVNetAzure Database for MySQL/Postgres

≈≈≈

Okta, OneLogin, etc. Azure Active DirectoryIAM roles

KMS Azure Key Vault

Unified usage serviceAWS Marketplace, Custom Billing Azure Commerce Billing

ELB Azure Load Balancer≈

Databricks file systemS3 Azure StorageS3 commit service

Page 26: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Approach: harmonize “equivalent” cloud service quirks

Promise of elastic computeis unevenly distributed▪ Provisioning speed differs

▪ Deletion speed differs(speed to refill quota)

→ Need to adapt to cloud resource and API limits

TCP connections are hard▪ “Invisible” NATs have

connection & timeout limits

→ Need tuned keep alive, connection limit configs

▪ Kernel TCP SACK bug caused API hangs in one cloud only

→ Need to deep robustness testing against both clouds (ex: poor NIC reliability)

NetworkVirtual machines

When MySQL != MySQL▪ Host OS matters

Ex: case sensitivity defaults

▪ Default DB params matterEx: tablespace config → 100x difference in recovery time

→ Need expertise in DB tuning to ensure equivalence

Databases

Page 27: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Accelerating a data platformwith data & AI

Page 28: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Inception: Improving a data platform with data & AI

We are one of our biggest customers

Challenge: Building a data platform is hard without a data platform▪ Need data to track usage, maintain security▪ Need data to observe and improve how users use the data platform▪ Need data to keep the data platform up and running

Lesson: Data & AI can accelerate data platform features, product analytics, and devops

Page 29: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

How we use Databricks to accelerate itselfKey platform features▪ Usage and billing reports▪ Audit logs

Essential product analytics▪ Feature usage, trends, prediction▪ Growth and churn forecast, models

Mission critical devops▪ Service KPIs and SLAs▪ API and application structured logs▪ Spark debug logs

Page 30: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Data foundation & analytics

Our distributed data pipelines

100s of TB logs per

day

Millions of time

series per secondTime-series, raw logs,

request tracing, dashboards

Kinesis Event Hubs

Declarative data pipeline deployments

Real-time streaming

Page 31: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

TakeawaysThe architectureManaging millions of VMs around the world in multiple clouds

Challenges & lessonsThe factory that builds and evolves the data platform is more important than the data platform itself

A cloud-agnostic platform that integrates with cloud standards and quirks is the key to multi-cloud

Data & AI accelerates data platform features, product analytics, and devops

Join us!http://databricks.com/careers

Page 32: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Feedback

Your feedback is important to us.

Don’t forget to rate and review the sessions.

Page 33: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,
Page 34: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

34

Our Product

Built aroundopen source:

Interactive data science

Scheduled jobs

SQL frontend

Data scientists

Data engineers

Business users

Cloud Storage

Compute Clusters

Databricks Runtime

Customer’s Cloud AccountDatabricks Service

Page 35: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Logos

Page 36: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Colors

Page 37: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Color Palette

Primary Colors

Page 38: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Content Slides

Page 39: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Basic Slide

▪ Bullet 1▪ Sub-bullet

▪ Sub-bullet

▪ Bullet 2▪ Sub-bullet

▪ Sub-bullet

▪ Bullet 3▪ Sub-bullet

▪ Sub-bullet

Page 40: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Reduce Long Titles

▪ Bullet 1▪ Sub-bullet

▪ Sub-bullet

▪ Bullet 2▪ Sub-bullet

▪ Sub-bullet

By splitting them into a short title, and a more detailed subtitle using this slide format that includes a subtitle area

Page 41: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Two Columns

▪ Bulleted list format▪ Bulleted list format▪ Bulleted list format▪ Bulleted list format

▪ Bulleted list format▪ Bulleted list format▪ Bulleted list format▪ Bulleted list format

Headline FormatHeadline Format

Page 42: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Two Box

▪ Bulleted list

▪ Bulleted list

▪ Bulleted list

▪ Bulleted list

CategoryCategory

Page 43: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Three Box

▪ Bulleted list

▪ Bulleted list

▪ Bulleted list

▪ Bulleted list

CategoryCategory

▪ Bulleted list

▪ Bulleted list

Category

Page 44: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Four Box

▪ Bulleted list

▪ Bulleted list

▪ Bulleted list

▪ Bulleted list

CategoryCategory

▪ Bulleted list

▪ Bulleted list

Category

▪ Bulleted list

▪ Bulleted list

Category

Page 45: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Shapes

Page 46: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

ShapesPill-shaped rectangle Double corner

rectangleDouble corner rectangle

Page 47: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Tables and Charts

Page 48: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

TableColumn Column Column

Row Value Value Value

Row Value Value Value

Row Value Value Value

Row Value Value Value

Row Value Value Value

Row Value Value Value

Row Value Value Value

Page 49: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Bar chart

Page 50: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Line chart

Page 51: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Pie Chart

Page 52: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Quotes and Text Callouts

Page 53: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Attribution FormatSecond line of attribution

This is a template for a quote slide. This is where the quote goes. Attribute the source below…

Page 54: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Databricks simplifies data and AIso data teams can innovate faster

Page 55: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Databricks simplifies data and AIso data teams can innovate faster

Page 56: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Logos

Page 57: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Spark + AI Summit Logos

Page 58: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Databricks Logos

Page 59: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Open Source Logos

Page 60: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,

Feedback

Your feedback is important to us.

Don’t forget to rate and review the sessions.

Page 61: Lessons from building large-scale, multi-cloud, SaaS ...€¦ · Approach: cloud agnostic dev framework Use lowest common denominator cloud ser vices EKS ←Kubernetes →AKS HCVault,