Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000...
Transcript of Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000...
![Page 1: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/1.jpg)
Save Millions By Efficient Resource Utilization Through Mesos
u By Smarth Madan
![Page 2: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/2.jpg)
PayPal and its Code is growing YOY <WIP>
2010 ~560 Services
15 MLOC
2018 ~2000 Services
80 MLOC
2014 ~1021 Services
45 MLOC
![Page 3: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/3.jpg)
Test Cycle at PayPal Before Managed Stage
Secure VM Configure VM
Code
Deploy on VM
Learn, Debug & Troubleshoot
Failing Services
Up Rev Dependent
Services + DB Schema
Test
Keep your Dependent Services UP
![Page 4: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/4.jpg)
Developer Pain points
Deploying all the components
Tons of time for setup and test
Identifying transitive dependencies
Maintaining stable environment
![Page 5: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/5.jpg)
Infrastructure Team Pain points
Hardware requirements grows YOY
Huge Maintenance cost for ~4K test Environment
Network Bandwidth
Test topology is not same prod
![Page 6: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/6.jpg)
Requirements
o We need Production Like environment
o Cluster of machines running all services
o Multiple instances of each service with auto healing for availability
o Needed to scale as the number of users grow
o Code refresh in mins
o Easy to connect from all other VMs
![Page 7: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/7.jpg)
Resource Management
o Manage a large set of machines in terms of compute
o Moving away from static allocation of machines
o Identifying unused capacity
o Slave Categorization (diverse PayPal Tech stack)
![Page 8: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/8.jpg)
Job Scheduler
o Defines jobs with ordered tasks
o Binding jobs to specific salves using constraints
o Auto-healing
o REST API capability
o Final task to cleanup the slave for any new job
![Page 9: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/9.jpg)
What’s a Managed Stage?
Managed Stage is a continuously available Mesos based multi-node staging
environment that allows Developers and Quality engineers to certify and
release apps to LIVE.
![Page 10: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/10.jpg)
Managed Stage Architecture
CC
Standby Mesos Master Standby Mesos Master Active Mesos Master
Standby Aurora Standby Aurora
Scheduler Active Aurora Scheduler
DB
Mesos slaves
Router Pool
Pool Front Pool Mid
Pool Back
Other pools
Zookeeper 1
Zookeeper 3
Zookeeper 2
![Page 11: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/11.jpg)
Sample Aurora Task
![Page 12: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/12.jpg)
Complex Job Scenario
![Page 13: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/13.jpg)
Dynamic Service Discovery & Registration
CC
Standby Mesos Master Standby Mesos Master Active Mesos Master
Standby Aurora Standby Aurora
Scheduler Active Aurora Scheduler
DB
Mesos slaves
Router Pool
Pool Front
Pool Mid Pool Back
Other pools
Zookeeper 1
Zookeeper 3
Zookeeper 2
Job
![Page 14: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/14.jpg)
Code Refresh Cycle
o In-place and Full deploy o 5000 Total number of packages gets deployed
• Full Deploy takes < 1 Hr • Incremental deploy < 20 Mins
o Code refresh frequency : • Daily code refresh incrementally • Biweekly full code refresh
![Page 15: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/15.jpg)
Dependent Stage
MANAGED STAGE ENVIRONMENT
1000+ services
Foo [N+1]
Foo [N+1]
HAPROXY
![Page 16: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/16.jpg)
Advantage
o Developer • Stage setup is reduced by 90% • Increased productivity by 30 - 40% • Abstracting all transitive dependencies
o Infrastructure • Optimized resource utilization • Reduced hardware cost in data center • Less network traffic for deploy
![Page 17: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/17.jpg)
Test Cycle at PayPal After Managed Stage
Code
Deploy Your Service On UserStage
Test
Self-Serv
UserStage
![Page 18: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/18.jpg)
Managed Stage Versions
18
MsMaster (Live)
N
R
S
H
C
N
R
S
H
C
N
R
S
H
C
MsRelease (N+1)
N
R
S
H
C
N
R
S
H
C
N
R
S
H
C
MsLnP
N
R
S
H
C
N
R
S
H
C
N
R
S
H
C
![Page 19: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/19.jpg)
Why Mesos ?
o CI on Mesos was a success at PayPal
o Setup cost and time is really low
o Mesos & Aurora : an out-of-the-box solution
o Docker integration in future
![Page 20: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/20.jpg)
Cost Analysis
o No. of VM : ~ 5000
o Average Stage size : 16 CPU, 64 GB RAM, 256 GB HDD
o Total CPU : 80,000
o Total CPU used in Managed Stage : 1500/env
o Reclaimed CPU : 40,000 (just by 50% reduction)
![Page 21: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/21.jpg)
Roadmap o Using network isolation which was introduced in Mesos 0.22.0 o Elastic scaling bases on usage patterns
o Merging clusters with CI
o Exploring Docker for containerizing pools
![Page 22: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/22.jpg)
Vision
With a click of a button, you can create your own environment with components of specific versions.
![Page 23: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/23.jpg)
Q&A
![Page 24: Save Millions By E cient Resource Utilization Through Mesos2010 ~560 Services 15 MLOC 2018 ~2000 Services 80 MLOC 2014 ~1021 Services 45 MLOC . Test Cycle at PayPal Before Managed](https://reader035.fdocuments.us/reader035/viewer/2022070814/5f0e03567e708231d43d3023/html5/thumbnails/24.jpg)
Thanks