MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis...

60
Heterogeneous Resource Scheduling Using Apache Mesos for Cloud Native Frameworks Sharma Podila Senior Software Engineer Netflix Aug 20th MesosCon 2015

Transcript of MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis...

Page 1: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Heterogeneous Resource Scheduling Using Apache Mesos for Cloud Native Frameworks

Sharma PodilaSenior Software Engineer

Netflix

Aug 20th MesosCon 2015

Page 2: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Agenda

● Context, motivation● Fenzo scheduler library● Usage at Netflix● Future direction

Page 3: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Why use Apache Mesos in a cloud?

Resource granularity

Application start latency

Page 4: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

A tale of two frameworks

Reactive stream processing

Container deployment and management

Page 5: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Reactive stream processing, Mantis

● Cloud native● Lightweight, dynamic jobs

○ Stateful, multi-stage○ Real time, anomaly detection, etc.

● Task placement constraints○ Cloud constructs○ Resource utilization

● Service and batch style

Page 6: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Mantis job topology

Worker

Sou

rce

App

Sta

ge 1

Sta

ge 2

Sta

ge 3

Sin

k

A job is set of 1 or more stagesA stage is a set of 1 or more workersA worker is a Mesos task

Job

Page 7: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Container management, Titan

● Cloud native● Service and batch workloads● Jobs with multiple sets of container tasks● Container placement constraints

○ Cloud constructs○ Resource affinity○ Task locality

Page 8: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Job 2Set 0

Set 1

Container scheduling model

Job 1

Co-locate tasks from multiple task sets

Page 9: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Why develop a new framework?

Page 10: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Easy to write a new framework?

Page 11: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Easy to write a new framework?

What about scale?Performance?Fault tolerance?Availability?

Page 12: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Easy to write a new framework?

What about scale?Performance?Fault tolerance?Availability?

And scheduling is a hard problem to solve

Page 13: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Long term justification is needed to create a new Mesos framework

Page 14: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Our motivations for new framework

● Cloud native(autoscaling)

Page 15: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Our motivations for new framework

● Cloud native(autoscaling)

● Customizable task placement optimizations(Mix of service, batch, and stream topologies)

Page 16: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Cluster autoscaling challenge

Host 1 Host 2 Host 3 Host 4

Host 1 Host 2 Host 3 Host 4

vs.

For long running stateful services

Page 17: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Cluster autoscaling challenge

Host 1 Host 2 Host 3 Host 4

Host 1 Host 2 Host 3 Host 4

vs.

For long running stateful services

Page 18: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Components of a mesos framework

API for users to interact

Page 19: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Components of a mesos framework

API for users to interact

Be connected to Mesos via the driver

Page 20: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Components of a mesos framework

API for users to interact

Be connected to Mesos via the driver

Compute resource assignments for tasks

Page 21: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Components of a mesos framework

API for users to interact

Be connected to Mesos via the driver

Compute resource assignments for tasks

Page 22: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Fenzo

A common scheduling library for Mesos frameworks

Page 23: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Fenzo usage in frameworksMesos master

Mesos framework

Tasksrequests

Availableresource

offers

Fenzo taskscheduler

Task assignment result• Host1

• Task1• Task2

• Host2• Task3• Task4

Persistence

Page 24: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Fenzo scheduling library

Heterogeneous resources

Autoscaling of cluster Visibility of

scheduler actions

Plugins forConstraints, Fitness

High speed

Heterogeneous task requests

Page 25: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Announcing availability of Fenzo in Netflix OSS suite

Page 26: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Fenzo details

Page 27: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Scheduling problem

Fitness

Pending

Assigned

Urg

ency

N tasks to assign from M possible slaves

Page 28: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Scheduling optimizations

Speed Accuracy

First fit assignment Optimal assignment

Real world trade-offs~ O (1) ~ O (N * M)1

1 Assuming tasks are not reassigned

Page 29: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Scheduling strategy

For each task

On each host

Validate hard constraintsEval fitness and soft constraints

Until fitness good enough, and

A minimum #hosts evaluated

Page 30: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Task constraints

SoftHard

Page 31: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Task constraints

SoftHardExtensible

Page 32: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Built-in Constraints

Page 33: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Host attribute value constraint

TaskHostAttrConstraint:instanceType=r3

Host1Attr:instanceType=m3

Host2Attr:instanceType=r3

Host3Attr:instanceType=c3

Fenzo

Page 34: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Unique host attribute constraint

TaskUniqueAttr:zone

Host1Attr:zone=1a

Host2Attr:zone=1a

Host3Attr:zone=1b

Fenzo

Page 35: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Balance host attribute constraint

Host1Attr:zone=1a

Host2Attr:zone=1b

Host3Attr:zone=1c

Job with 9 tasks, BalanceAttr:zone

Fenzo

Page 36: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Fitness evaluation

Degree of fitness

Composable

Page 37: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Bin packing fitness calculator

Fitness for

Host1 Host2 Host3 Host4 Host5

fitness = usedCPUs / totalCPUs

Page 38: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Bin packing fitness calculator

Fitness for 0.25 0.5 0.75 1.0 0.0

fitness = usedCPUs / totalCPUs

Host1 Host2 Host3 Host4 Host5

Page 39: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Bin packing fitness calculator

Fitness for 0.25 0.5 0.75 1.0 0.0

Host1 Host2 Host3 Host4 Host5

fitness = usedCPUs / totalCPUs

Host1 Host2 Host3 Host4 Host5

Page 40: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Composable fitness calculatorsFitness

= ( BinPackFitness * BinPackWeight +RuntimePackFitness * RuntimeWeight) / 2.0

Page 41: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Cluster autoscaling in Fenzo

ASG/Cluster: mantisagentMinIdle: 8MaxIdle: 20CooldownSecs: 360

ASG/Cluster: mantisagentMinIdle: 8MaxIdle: 20CooldownSecs: 360

ASG/cluster: computeCluster

MinIdle: 8MaxIdle: 20CooldownSecs: 360

Fenzo

ScaleUp action:

Cluster, N

ScaleDown action:Cluster, HostList

Page 42: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Rules based cluster autoscaling

● Set up rules per host attribute value○ E.g., one autoscale rule per ASG/cluster, one cluster for network-

intensive jobs, another for CPU/memory-intensive jobs● Sample:

#Idlehosts

Trigger down scale

Trigger up scalemin

max

Cluster Name Min Idle Count Max Idle Count Cooldown Secs

NetworkClstr 5 15 360

ComputeClstr 10 20 300

Page 43: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Shortfall analysis based autoscaling

● Rule-based scale up has a cool down period○ What if there’s a surge of incoming requests?

● Pending requests trigger shortfall analysis○ Scale up happens regardless of cool down period○ Remembers which tasks have already been covered

Page 44: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Usage at Netflix

Page 45: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Cluster autoscalingN

umbe

r of M

esos

Sla

ves

Time

Page 46: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Scheduler run time (milliseconds)

Scheduler run time in milliseconds(over a week)

Average 2 mS

Maximum 38 mS

Note: times can vary depending on # of tasks, # and types of constraints, and # of hosts

Page 47: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Experimenting with Fenzo

Note: Experiments can be run without requiring a physical cluster

Page 48: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

A bin packing experiment

Host 3

Host 2Host 1

Mesos Slaves

Host 3000

Tasks with cpu=1

Tasks with cpu=3

Tasks with cpu=6

Fenzo

Iteratively assignBunch of tasks

Start with idle cluster

Page 49: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Bin packing sample results

Bin pack tasks using Fenzo’s built-in CPU bin packer

Page 50: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Bin packing sample results

Bin pack tasks using Fenzo’s built-in CPU bin packer

Page 51: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Task runtime bin packing sampleBin pack tasks based on custom fitness calculator to pack short vs. long run time jobs separately

Page 52: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Scheduler speed experiment

# of hosts # of tasks to assign each time

Avg time Avg time per task

Min time

Max time Total time

Hosts: 8-CPUs eachTask mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobsGoal: starting from an empty cluster, assign tasks to fill all hostsScheduling strategy: CPU bin packing

Page 53: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Scheduler speed experiment

# of hosts # of tasks to assign each time

Avg time Avg time per task

Min time

Max time Total time

1,000 1 3 mS 3 mS 1 mS 188 mS 9 s

1,000 200 40 mS 0.2 mS 17 mS 100 mS 0.5 s

Hosts: 8-CPUs eachTask mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobsGoal: starting from an empty cluster, assign tasks to fill all hostsScheduling strategy: CPU bin packing

Page 54: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Scheduler speed experimentHosts: 8-CPUs eachTask mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobsGoal: starting from an empty cluster, assign tasks to fill all hostsScheduling strategy: CPU bin packing

# of hosts # of tasks to assign each time

Avg time Avg time per task

Min time

Max time Total time

1,000 1 3 mS 3 mS 1 mS 188 mS 9 s

1,000 200 40 mS 0.2 mS 17 mS 100 mS 0.5 s

10,000 1 29 mS 29 mS 10 mS 240 mS 870 s

10,000 200 132 mS 0.66 mS 22 mS 434 mS 19 s

Page 55: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Accessing Fenzo

Code at https://github.com/Netflix/Fenzo

Wiki athttps://github.com/Netflix/Fenzo/wiki

Page 56: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Future directions

● Task management SLAs● Support for newer Mesos features● Collaboration

Page 57: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

To summarize...

Page 58: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Fenzo: scheduling library for frameworks

Heterogeneous resources

Autoscaling of cluster Visibility of

scheduler actions

Plugins forConstraints, Fitness

High speed

Heterogeneous task requests

Page 59: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Fenzo is now available in Netflix OSS suite at https://github.com/Netflix/Fenzo

Page 60: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Questions?

Heterogeneous Resource Scheduling Using Apache Mesos for Cloud Native Frameworks

Sharma Podilaspodila @ netflix . com

@podila