MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis...
Transcript of MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis...
![Page 1: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/1.jpg)
Heterogeneous Resource Scheduling Using Apache Mesos for Cloud Native Frameworks
Sharma PodilaSenior Software Engineer
Netflix
Aug 20th MesosCon 2015
![Page 2: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/2.jpg)
Agenda
● Context, motivation● Fenzo scheduler library● Usage at Netflix● Future direction
![Page 3: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/3.jpg)
Why use Apache Mesos in a cloud?
Resource granularity
Application start latency
![Page 4: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/4.jpg)
A tale of two frameworks
Reactive stream processing
Container deployment and management
![Page 5: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/5.jpg)
Reactive stream processing, Mantis
● Cloud native● Lightweight, dynamic jobs
○ Stateful, multi-stage○ Real time, anomaly detection, etc.
● Task placement constraints○ Cloud constructs○ Resource utilization
● Service and batch style
![Page 6: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/6.jpg)
Mantis job topology
Worker
Sou
rce
App
Sta
ge 1
Sta
ge 2
Sta
ge 3
Sin
k
A job is set of 1 or more stagesA stage is a set of 1 or more workersA worker is a Mesos task
Job
![Page 7: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/7.jpg)
Container management, Titan
● Cloud native● Service and batch workloads● Jobs with multiple sets of container tasks● Container placement constraints
○ Cloud constructs○ Resource affinity○ Task locality
![Page 8: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/8.jpg)
Job 2Set 0
Set 1
Container scheduling model
Job 1
Co-locate tasks from multiple task sets
![Page 9: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/9.jpg)
Why develop a new framework?
![Page 10: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/10.jpg)
Easy to write a new framework?
![Page 11: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/11.jpg)
Easy to write a new framework?
What about scale?Performance?Fault tolerance?Availability?
![Page 12: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/12.jpg)
Easy to write a new framework?
What about scale?Performance?Fault tolerance?Availability?
And scheduling is a hard problem to solve
![Page 13: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/13.jpg)
Long term justification is needed to create a new Mesos framework
![Page 14: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/14.jpg)
Our motivations for new framework
● Cloud native(autoscaling)
![Page 15: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/15.jpg)
Our motivations for new framework
● Cloud native(autoscaling)
● Customizable task placement optimizations(Mix of service, batch, and stream topologies)
![Page 16: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/16.jpg)
Cluster autoscaling challenge
Host 1 Host 2 Host 3 Host 4
Host 1 Host 2 Host 3 Host 4
vs.
For long running stateful services
![Page 17: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/17.jpg)
Cluster autoscaling challenge
Host 1 Host 2 Host 3 Host 4
Host 1 Host 2 Host 3 Host 4
vs.
For long running stateful services
![Page 18: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/18.jpg)
Components of a mesos framework
API for users to interact
![Page 19: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/19.jpg)
Components of a mesos framework
API for users to interact
Be connected to Mesos via the driver
![Page 20: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/20.jpg)
Components of a mesos framework
API for users to interact
Be connected to Mesos via the driver
Compute resource assignments for tasks
![Page 21: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/21.jpg)
Components of a mesos framework
API for users to interact
Be connected to Mesos via the driver
Compute resource assignments for tasks
![Page 22: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/22.jpg)
Fenzo
A common scheduling library for Mesos frameworks
![Page 23: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/23.jpg)
Fenzo usage in frameworksMesos master
Mesos framework
Tasksrequests
Availableresource
offers
Fenzo taskscheduler
Task assignment result• Host1
• Task1• Task2
• Host2• Task3• Task4
Persistence
![Page 24: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/24.jpg)
Fenzo scheduling library
Heterogeneous resources
Autoscaling of cluster Visibility of
scheduler actions
Plugins forConstraints, Fitness
High speed
Heterogeneous task requests
![Page 25: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/25.jpg)
Announcing availability of Fenzo in Netflix OSS suite
![Page 26: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/26.jpg)
Fenzo details
![Page 27: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/27.jpg)
Scheduling problem
Fitness
Pending
Assigned
Urg
ency
N tasks to assign from M possible slaves
![Page 28: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/28.jpg)
Scheduling optimizations
Speed Accuracy
First fit assignment Optimal assignment
Real world trade-offs~ O (1) ~ O (N * M)1
1 Assuming tasks are not reassigned
![Page 29: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/29.jpg)
Scheduling strategy
For each task
On each host
Validate hard constraintsEval fitness and soft constraints
Until fitness good enough, and
A minimum #hosts evaluated
![Page 30: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/30.jpg)
Task constraints
SoftHard
![Page 31: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/31.jpg)
Task constraints
SoftHardExtensible
![Page 32: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/32.jpg)
Built-in Constraints
![Page 33: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/33.jpg)
Host attribute value constraint
TaskHostAttrConstraint:instanceType=r3
Host1Attr:instanceType=m3
Host2Attr:instanceType=r3
Host3Attr:instanceType=c3
Fenzo
![Page 34: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/34.jpg)
Unique host attribute constraint
TaskUniqueAttr:zone
Host1Attr:zone=1a
Host2Attr:zone=1a
Host3Attr:zone=1b
Fenzo
![Page 35: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/35.jpg)
Balance host attribute constraint
Host1Attr:zone=1a
Host2Attr:zone=1b
Host3Attr:zone=1c
Job with 9 tasks, BalanceAttr:zone
Fenzo
![Page 36: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/36.jpg)
Fitness evaluation
Degree of fitness
Composable
![Page 37: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/37.jpg)
Bin packing fitness calculator
Fitness for
Host1 Host2 Host3 Host4 Host5
fitness = usedCPUs / totalCPUs
![Page 38: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/38.jpg)
Bin packing fitness calculator
Fitness for 0.25 0.5 0.75 1.0 0.0
fitness = usedCPUs / totalCPUs
Host1 Host2 Host3 Host4 Host5
![Page 39: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/39.jpg)
Bin packing fitness calculator
Fitness for 0.25 0.5 0.75 1.0 0.0
✔
Host1 Host2 Host3 Host4 Host5
fitness = usedCPUs / totalCPUs
Host1 Host2 Host3 Host4 Host5
![Page 40: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/40.jpg)
Composable fitness calculatorsFitness
= ( BinPackFitness * BinPackWeight +RuntimePackFitness * RuntimeWeight) / 2.0
![Page 41: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/41.jpg)
Cluster autoscaling in Fenzo
ASG/Cluster: mantisagentMinIdle: 8MaxIdle: 20CooldownSecs: 360
ASG/Cluster: mantisagentMinIdle: 8MaxIdle: 20CooldownSecs: 360
ASG/cluster: computeCluster
MinIdle: 8MaxIdle: 20CooldownSecs: 360
Fenzo
ScaleUp action:
Cluster, N
ScaleDown action:Cluster, HostList
![Page 42: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/42.jpg)
Rules based cluster autoscaling
● Set up rules per host attribute value○ E.g., one autoscale rule per ASG/cluster, one cluster for network-
intensive jobs, another for CPU/memory-intensive jobs● Sample:
#Idlehosts
Trigger down scale
Trigger up scalemin
max
Cluster Name Min Idle Count Max Idle Count Cooldown Secs
NetworkClstr 5 15 360
ComputeClstr 10 20 300
![Page 43: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/43.jpg)
Shortfall analysis based autoscaling
● Rule-based scale up has a cool down period○ What if there’s a surge of incoming requests?
● Pending requests trigger shortfall analysis○ Scale up happens regardless of cool down period○ Remembers which tasks have already been covered
![Page 44: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/44.jpg)
Usage at Netflix
![Page 45: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/45.jpg)
Cluster autoscalingN
umbe
r of M
esos
Sla
ves
Time
![Page 46: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/46.jpg)
Scheduler run time (milliseconds)
Scheduler run time in milliseconds(over a week)
Average 2 mS
Maximum 38 mS
Note: times can vary depending on # of tasks, # and types of constraints, and # of hosts
![Page 47: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/47.jpg)
Experimenting with Fenzo
Note: Experiments can be run without requiring a physical cluster
![Page 48: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/48.jpg)
A bin packing experiment
Host 3
Host 2Host 1
Mesos Slaves
Host 3000
Tasks with cpu=1
Tasks with cpu=3
Tasks with cpu=6
Fenzo
Iteratively assignBunch of tasks
Start with idle cluster
![Page 49: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/49.jpg)
Bin packing sample results
Bin pack tasks using Fenzo’s built-in CPU bin packer
![Page 50: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/50.jpg)
Bin packing sample results
Bin pack tasks using Fenzo’s built-in CPU bin packer
![Page 51: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/51.jpg)
Task runtime bin packing sampleBin pack tasks based on custom fitness calculator to pack short vs. long run time jobs separately
![Page 52: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/52.jpg)
Scheduler speed experiment
# of hosts # of tasks to assign each time
Avg time Avg time per task
Min time
Max time Total time
Hosts: 8-CPUs eachTask mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobsGoal: starting from an empty cluster, assign tasks to fill all hostsScheduling strategy: CPU bin packing
![Page 53: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/53.jpg)
Scheduler speed experiment
# of hosts # of tasks to assign each time
Avg time Avg time per task
Min time
Max time Total time
1,000 1 3 mS 3 mS 1 mS 188 mS 9 s
1,000 200 40 mS 0.2 mS 17 mS 100 mS 0.5 s
Hosts: 8-CPUs eachTask mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobsGoal: starting from an empty cluster, assign tasks to fill all hostsScheduling strategy: CPU bin packing
![Page 54: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/54.jpg)
Scheduler speed experimentHosts: 8-CPUs eachTask mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobsGoal: starting from an empty cluster, assign tasks to fill all hostsScheduling strategy: CPU bin packing
# of hosts # of tasks to assign each time
Avg time Avg time per task
Min time
Max time Total time
1,000 1 3 mS 3 mS 1 mS 188 mS 9 s
1,000 200 40 mS 0.2 mS 17 mS 100 mS 0.5 s
10,000 1 29 mS 29 mS 10 mS 240 mS 870 s
10,000 200 132 mS 0.66 mS 22 mS 434 mS 19 s
![Page 55: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/55.jpg)
Accessing Fenzo
Code at https://github.com/Netflix/Fenzo
Wiki athttps://github.com/Netflix/Fenzo/wiki
![Page 56: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/56.jpg)
Future directions
● Task management SLAs● Support for newer Mesos features● Collaboration
![Page 57: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/57.jpg)
To summarize...
![Page 58: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/58.jpg)
Fenzo: scheduling library for frameworks
Heterogeneous resources
Autoscaling of cluster Visibility of
scheduler actions
Plugins forConstraints, Fitness
High speed
Heterogeneous task requests
![Page 59: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/59.jpg)
Fenzo is now available in Netflix OSS suite at https://github.com/Netflix/Fenzo
![Page 60: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,](https://reader034.fdocuments.us/reader034/viewer/2022042303/5ece2831f6bb9c0f49301a8e/html5/thumbnails/60.jpg)
Questions?
Heterogeneous Resource Scheduling Using Apache Mesos for Cloud Native Frameworks
Sharma Podilaspodila @ netflix . com
@podila