Post on 14-Jun-2015
Cluster Management and Containerization Benjamin Hindman, @benh
Twitter, Inc.
$ whoami
2006 -‐ 2011 2009 -‐ 2010 -‐
cluster management
cluster management (server/IT automation)
cluster management
① configuration/package management
② deployment
③ naming
cluster management
① configuration/package management
② deployment
③ naming
④ monitoring
cluster management
① configuration/package management
② deployment
③ naming
④ monitoring
ops
cluster management
① configuration/package management
② deployment
③ naming
④ monitoring
ops developers
cluster management
configuration/package management
naming deployment
configuration/package management
“what/how do things get installed?”
(10’s of machines)
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
$ ssh host ./configure && make install
configuration/package management
“what/how do things get installed?”
(10’s of machines)
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
$ ssh host rpm -‐ivh pkg-‐x.y.z.rpm
deployment
“what should run where?” “how should it be started/stopped?”
$ ssh host nohup myapp (10’s of machines)
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
deployment
“what should run where?” “how should it be started/stopped?”
$ ssh host monit start myapp (10’s of machines)
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
deployment
“what should run where?” “how should it be started/stopped?”
$ scp myapp host $ ssh host monit myapp
(10’s of machines)
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
deployment
“what should run where?” “how should it be started/stopped?”
(10’s of machines)
$ ssh host git pull && \ monit myapp
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
“how should apps find each other?”
naming
webhosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
dbhosts.txt
db1.twttr.com db2.twttr.com db3.twttr.com db4.twttr.com
(10’s of machines)
(10’s of machines)
(100’s -‐> 1000’s of machines)
to scale, need more automation
Twitter, circa 2010
webhosts.txt dbhosts.txt
$ ssh host …
(configuration/package management)
(deployment)
MySQL Cassandra Rails Hadoop memcached
challenges
challenges ① failures
failures
MySQL Cassandra Rails Hadoop memcached
MySQL Cassandra Rails Hadoop memcached
types of failure: fault domains
machine (disk, memory, CPU, etc)
rack (switch, PDU)
datacenter
challenges ② maintenance
(aka “planned failures”)
maintenance ① upgrading software (i.e., installing and
uninstalling packages)
ops developers
maintenance ① upgrading software (i.e., installing and
uninstalling packages)
② replacing machines, switches, PDUs, etc
MySQL Cassandra Rails Hadoop memcached
MySQL Cassandra Rails Hadoop memcached
challenges ③ utilization
Rails
Hadoop
memcached
utilization
utilization
Rails
Hadoop
memcached
utilization
Rails
Hadoop
memcached buy less machines
or run more applications!
challenges ① failures
② maintenance
③ utilization
challenges ① failures
② maintenance
③ utilization
planning for failure?
planning for failure
challenges ① failures
② maintenance
③ utilization
planning for utilization?
planning for utilization intra-‐machine resource sharing:
share a single machine’s resources between multiple applications (multi-‐tenancy)
intra-‐datacenter resource sharing:
share multiple machine’s resources between multiple applications
Twitter, circa 2010
what software can help me!?
cluster management
industry academia
different software
academia industry
• MPI (Message Passing Interface) • Apache (mod_perl, mod_php) • web services (Java, Ruby, …)
different scale (at first)
academia industry
• 1,000’s of machines • 10’s of machines
cluster management
academia industry
• PBS (Portable Batch System) • TORQUE • SGE (Sun Grid Engine)
• ssh • Puppet/Chef • Capistrano/Ansible
cluster managers
cluster manager provides a level-‐of-‐indirection between hardware resources (machines) and applications/jobs
MySQL Cassandra Rails Hadoop memcached
cluster manager
Rails Hadoop memcached …
…
cluster management
academia industry
• PBS (Portable Batch System) • TORQUE • SGE (Sun Grid Engine)
• ssh • Puppet/Chef • Capistrano/Ansible
batch computation!
Mesos is a modern general purpose cluster manager (i.e., not just focused on batch scheduling)
Mesos
service batch storage …
…
streaming
support many different types of computation/scheduling
Mesos
service batch storage … streaming
(1) coordinate for resources
Mesos
service batch storage … streaming
(2) launch tasks
Mesos
service batch storage … streaming
(3) launch tasks
Mesos
service batch storage … streaming
Mesos
service batch storage … streaming
Mesos
service batch storage … streaming
(4) task termination
Mesos
service batch storage … streaming
(5) task status update
challenges revisited ① failures
② maintenance
③ utilization
challenges revisited ① failures
② maintenance
③ utilization
Mesos
service batch storage … streaming
Mesos
service batch storage … streaming
Mesos
service batch storage … streaming
Mesos
service batch storage … streaming
(5) task status update
challenges revisited ① failures
② maintenance
③ utilization
Mesos
service batch storage … streaming
(1) when resources become idle, can be scheduled and reused by other schedulers
Mesos
service batch storage … streaming
(1) when resources become idle, can be scheduled and reused by other schedulers
Mesos
service batch storage … streaming
(1) when resources become idle, can be scheduled and reused by other schedulers
Mesos
service batch storage … streaming
(1) when resources become idle, can be scheduled and reused by other schedulers
Mesos
service batch storage … streaming
(1) when resources become idle, can be scheduled and reused by other schedulers
Mesos
service batch storage … streaming
(1) when resources become idle, can be scheduled and reused by other schedulers
(2) multi-‐tenancy on individual machines
Mesos
service batch storage … streaming
(1) when resources become idle, can be scheduled and reused by other schedulers
(2) multi-‐tenancy on individual machines
multi-‐tenancy
task!
task!
containers
task!
containerization started leveraging containerization technology
in ~2011
2011
LXC
2012
cgroups
2013
Docker (preliminary)
2014
how Mesos has changed cluster management at Twitter today
configuration/package management
developers
(1) bundle services as jar, tar/gzip
(2) upload to HDFS
configuration/package management (planning)
developers
(1) bundle services as jar, tar/gzip, and using Docker
(2) upload to HDFS (or use registry)
deployment
Apache Aurora (incubating), a scheduler for running stateless services written in any language (but primarily used at Twitter for JVM services)
deployment (via Aurora)
developers
(1) describe service using Python based DSL
(2) submit service to Aurora using CLI
deployment (via Marathon)
developers
(1) describe services using JSON
(2) submit service to Marathon via REST
naming
Apache ZooKeeper
using Apache ZooKeeper and server sets (github.com/twitter/commons)
naming
(1) task gets launched on machine
Apache ZooKeeper
using Apache ZooKeeper and server sets (github.com/twitter/commons)
naming
(2) service gets registered in a server set in ZooKeeper
(1) task gets launched on machine
Apache ZooKeeper
using Apache ZooKeeper and server sets (github.com/twitter/commons)
naming
(2) service gets registered in a server set in ZooKeeper
(1) task gets launched on machine
(3) other services use ZooKeeper to find services they need
Apache ZooKeeper
using Apache ZooKeeper and server sets (github.com/twitter/commons)
naming
(2) service gets registered in a server set in ZooKeeper
(1) task gets launched on machine
(3) other services use ZooKeeper to find services they need
(4) services connect directly with one another
Apache ZooKeeper
using Apache ZooKeeper and server sets (github.com/twitter/commons)
naming alternative
(2) update HAProxy with new service location
(1) task gets launched on machine
(3) other services send traffic through HAProxy
ZooKeeper/server sets requires injecting code into your clients!
where are we today?
ops developers
where are we today?
ops developers
deploys decoupled from ops (many deploys per day, per service)
maintenance consists of “draining” hosts, getting tasks rescheduled, then pulling the cord
wait … don’t virtual machines solve my cluster management
challenges?
wait … don’t virtual machines solve my cluster management
challenges?
No.
VMs are neither sufficient nor necessary!
big computers
small applications
IaaS public private
challenges revisited ① failures
② maintenance
③ utilization
challenges revisited ① failures
② maintenance
③ utilization
public or private IaaS, failures still occur (on EC2, instead of racks, have availability zones, instead of datacenters, have regions)
challenges revisited ① failures
② maintenance
③ utilization provider wins with public IaaS, better resource sharing with private IaaS, but a static partition of VMs is still a static partition!
MySQL Cassandra Rails Hadoop memcached
cluster manager
Rails Hadoop memcached …
…
Mesos: level of abstraction
Mesos build and run using resources
Mesos: level of abstraction
IaaS
Mesos
provision and manage machines
build and run using resources
Mesos on IaaS
IaaS
Mesos
use OpenStack or EC2 to run Mesos
Mesos on IaaS/hardware
IaaS
Mesos
hardware use OpenStack or EC2 or physical machines
to run Mesos
physical machines virtual machines
aggregation not virtualization
physical machines datacenter computer
aggregation not virtualization
small computers
?
big applications
"small" computers?
power wall
time
complex single core
simple many core
"big" applications?
applications don’t fit on a single computer anymore
"BIG"
(1) lots of data … (2) lots of users …
growing everyday
these applications need lots of resources (CPUs, memory, I/O)
these applications need datacenters
the datacenter is the new computer
desktop computer
server datacenter
OS
OS
OS
the datacenter computer needs an OS
operating system
“a collection of software that manages the computer hardware resources and provides common services for computer programs”
- Wikipedia
datacenter operating system
“a collection of software that manages the datacenter computer hardware resources and provides common services for computer programs”
- Wikipedia
datacenter operating system
“a collection of software that manages the datacenter computer hardware resources and provides common services for computer programs”
- Wikipedia
today
Your App
API
tomorrow
datacenter OS provides common functionality every new distributed system re-‐implements:
• failure detection • package distribution • task starting • resource isolation • resource monitoring • task killing, cleanup • …
today
Your App
API
tomorrow
provides common functionality every new distributed system re-‐implements:
• failure detection • package distribution • task starting • resource isolation • resource monitoring • task killing, cleanup • …
datacenter OS
Don’t reinvent the wheel!
case studies
distributed in-‐memory analytics framework
distributed cron scheduler (with dependencies)
github.com/apache/spark
github.com/airbnb/chronos
~Presentation()
cluster management w/ Docker + Mesos
① configuration/package management
② deployment
③ naming
datacenter OS
IaaS
Mesos
hardware
provide common functionality via an API (kernel)
your distributed system
Mesos 0.19.0 released today!
mesos.apache.org
mesos.apache.org/blog
@ApacheMesos