Enabling Hybrid Workflows with Docker/Mesos @Orbitz
-
Upload
steve-hoffman -
Category
Technology
-
view
2.963 -
download
0
Transcript of Enabling Hybrid Workflows with Docker/Mesos @Orbitz
#mesoscon2015
Steve HoffmanSenior Principal Engineer
@bacoboy
Enabling Hybrid Workflows with
Docker/Mesos @Orbitz
#mesoscon2015
#mesoscon2015
• Multiple Brands
• Websites
• Webservices
• Multiple Backends
• 500+ apps / thousands of instances
• Deployments Daily (sometimes more)
If you haven’t heard of us…
#mesoscon2015
Mesos as…• … a Microservices Platform using Docker
• … a Jenkins Build Farm
• … a Jenkins Deployment Farm
• HA Notes
• CentOS Notes
• Questions
#mesoscon2015
Case 1: Docker Microservices Platform
• Launch Docker apps in
• Multiple environments (dev -> qa -> staging -> production)
• Multiple datacenters
• Update Docker apps in rolling fashion
• Restart anything that needs it
#mesoscon2015
Build Unit Test
Deploy Dev
Deploy Prod
Deploy Staging
Acceptance TestCode
Review & Push
Production
Pre-Production
Open RFC
Close RFC
#mesoscon2015
AppApp
App
#mesoscon2015
AppApp
App
#mesoscon2015
- tasks: marathon: …
1.2.16
1.2.16
1.2.16
Deploy
#mesoscon2015
- tasks: marathon: …
1.2.16
1.2.16
1.2.16
Deploy
#mesoscon2015
- tasks: marathon: …
PUT /apps/editorial-module { “image”: “orbitz/editorial-module:1.2.17” … }
1.2.16
1.2.16
1.2.16
Deploy
#mesoscon2015
Deploy
PUT /apps/editorial-module { “image”: “orbitz/editorial-module:1.2.17” … }
- tasks: marathon: …
1.2.16
1.2.16
1.2.16
app = GET /v2/apps/editorial-module
if not app then deploy_id = POST /v2/apps { “image”: “orbitz/editorial-module:1.2.17”, “id”: “editorial-module” } else deploy_id = PUT /v2/apps/editorial-module { “image”: “orbitz/editorial-module:1.2.17” } end if
while GET /v2/deployments contains deploy_id // still deploying end
// deploy complete
#mesoscon2015
- tasks: marathon: …
PUT /apps/editorial-module { “image”: “orbitz/editorial-module:1.2.17” … }
1.2.16
1.2.16
1.2.16
Deploy
#mesoscon2015
- tasks: marathon: …
1.2.16
1.2.16
1.2.16
Deploy
#mesoscon2015
- tasks: marathon: …
1.2.16
1.2.16
1.2.16
1.2.17
1.2.17
1.2.17
Deploy
#mesoscon2015
- tasks: marathon: …
/health
1.2.16
1.2.16
1.2.16
1.2.17
1.2.17
1.2.17
Deploy
#mesoscon2015
- tasks: marathon: …
/health
200 OK
200 OK
200 OK
1.2.16
1.2.16
1.2.16
1.2.17
1.2.17
1.2.17
Deploy
#mesoscon2015
- tasks: marathon: …
1.2.17
1.2.17
1.2.17
Deploy
#mesoscon2015
And off to the next environment…
1.2.17
1.2.17
1.2.17
Deploy
#mesoscon2015
1.2.17
1.2.17
1.2.17
What if?
#mesoscon2015
1.2.17
1.2.17
#mesoscon2015
1.2.17
1.2.17
#mesoscon2015
1.2.17 1.2.17
1.2.17
#mesoscon2015
/health
200 OK
1.2.17 1.2.17
1.2.17
#mesoscon2015
1.2.17 1.2.17
1.2.17
#mesoscon2015
Video: http://bit.ly/oww-dockercon2015-video Slides: http://bit.ly/oww-microservices-dockercon2015
#mesoscon2015
Case 2: The Build Farm• Existing Solution
• Dedicated Jenkins Slaves
• Hand created
• Snapshotted & Rolled Back to “Clean” state after each Job
• Hard to Manage Build Environment for 300+ apps across many OSes, Java versions, Ruby versions, perl versions, python versions, protocol buffer compiler versions, etc…
#mesoscon2015
Master
Slave
Commit/Push or Pull Request/Merge
Before
#mesoscon2015
Master
Slave
Poll or Push Trigger
Commit/Push or Pull Request/Merge
Before
#mesoscon2015
Master
Slave
Poll or Push Trigger
Commit/Push or Pull Request/Merge
Before
#mesoscon2015
Master
Slave
Poll or Push Trigger
Commit/Push or Pull Request/Merge
Clone & Build Push Artifacts
Before
#mesoscon2015
Master
SlaveBefore
#mesoscon2015
Master
MasterSlave
Commit/Push or Pull Request/Merge
After
#mesoscon2015
Master
MasterSlave
Poll or Push Trigger
Commit/Push or Pull Request/Merge
After
#mesoscon2015
Master
Master
Slave
Slave
Poll or Push Trigger
Commit/Push or Pull Request/Merge
After
#mesoscon2015
Master
Master
Slave
Slave
Poll or Push Trigger
Commit/Push or Pull Request/Merge
After
#mesoscon2015
Master
Master
Slave
Slave
Poll or Push Trigger
Commit/Push or Pull Request/Merge
Clone & Build Push Artifacts
After
#mesoscon2015
Master
MasterSlaveAfter
#mesoscon2015
Jenkins mesos-pluginhttps://github.com/jenkinsci/mesos-plugin
#mesoscon2015
Plugin ConfigurationDon’t forget
to install libmesos on Jenkins master!
Point at ZK to find active master
Leave framework running
#mesoscon2015
In Mesos Console…Framework Registered
#mesoscon2015
Plugin Configuration Per Slave TypeJenkins Slave Tag
Which Mesos Servers this job can use
Jenkins Slave Image
1 job per docker slaveshort timeout for
single use
RAM/CPU needed
#mesoscon2015
Job ConfigurationRun on specific
Docker slave
Mark offline immediately
#mesoscon2015
Running Docker Slaves
#mesoscon2015
Success!
Ephemeral Docker Slave in Mesos its tag
Docker IP
#mesoscon2015
Case 2+: The Docker Build Farm
• I ALSO need to build my jenkins/docker slaves
• Need a jenkins docker slave to build docker images…
• Can I do Docker-in-Docker-on-Mesos?
• Will need to build manually the first time
http://kb.sparknearby.com/wp-content/uploads/2015/05/chicken-or-egg-cropped1.jpg
#mesoscon2015
The docker-builder• Jenkins Slave (Java - jre)
• Docker daemon (running & in supervisor mode!)
• Registry credentials provided to slave via Credentials Binding Plugin from Jenkins managed security
• https://wiki.jenkins-ci.org/display/JENKINS/Credentials+Binding+Plugin
• Reuse docker layers (aka share /var/lib/docker)
#mesoscon2015
wrapdocker script• Start Docker daemon and then start jenkins slave
• https://github.com/jpetazzo/dind/blob/master/wrapdocker
• I had to change (variable substitution wasn’t working): [[ $1 ]] && exec "$@"to:[[ $1 ]] && eval exec $@
https://blog.docker.com/2013/09/docker-can-now-run-within-docker/
#mesoscon2015
docker-builder DockerfileFROM docker.orbitz.net/centos:7
MAINTAINER Steve Hoffman <[email protected]>
# Need to override default YUM repos # and DNS resolutionRUN rm /etc/yum.repos.d/*ADD src/repos/*.repo /etc/yum.repos.d/ADD src/dns/resolv.conf /etc/
RUN \ # load repo metadata from above yum clean all && yum makecache && \
# install packages (jenkins needs at least # java and git) yum install -y jre1.8.0_51 git docker-engine && \
# update everything not already newer yum update -y
# For git to work in bridged mode, we need to setup user identityADD src/git/gitconfig /root/.gitconfig
# Include helper scriptADD src/wrapdocker /usr/local/bin/wrapdocker
# Mount docker daemon storageVOLUME /var/lib/docker
#mesoscon2015
Plugin Configuration Additions
Privileged Mode
Docker Builder Image
Wrapper script to start Docker daemon then
run Jenkins slave
Shared Docker layers for reuse
Additional Docker daemon options
#mesoscon2015
Job Configuration
Bind Docker registry credentials to ENV
variable
Copy to user’s ~/.dockercfg
Cleanup!
Run on docker-builder
Build
#mesoscon2015
For Example: The go-builder
• Jenkins Slave
• Compile GO program
• Package as Docker app
#mesoscon2015
go-builder DockerfileFROM docker.orbitz.net/docker-builder:latest
MAINTAINER Steve Hoffman <[email protected]>
RUN mkdir -p go/{bin,pkg,src}
ENV GOPATH /go
RUN \ # install packages yum install -y golang && \
# update everything not already newer yum update -y && \
# remove local caching repos yum clean all
#mesoscon2015
Case 3: The Deploy Farm
• Create single purpose client images with tools baked in
• Run corresponding Jenkins work against that slave
• Not just talking to Marathon… Talk to anything…
#mesoscon2015
The marathon-deployer
• Launch/Upgrade/Downgrade Docker apps via marathon
• use marathon python module library
• https://github.com/thefactory/marathon-python
• inside Ansible playbook
• http://www.ansible.com
#mesoscon2015
marathon-deployer Dockerfile (template)
FROM docker.orbitz.net/centos:7
MAINTAINER Steve Hoffman <[email protected]>
# Need to override default YUM repos and DNS resolution RUN rm /etc/yum.repos.d/* ADD src/repos/*.repo /etc/yum.repos.d/ ADD src/dns/resolv.conf /etc/
RUN \ # Need java to run a jenkins slave and git yum install -y jre1.8.0_31 git
# For git to work in bridged mode, we need to setup user identity ADD src/git/gitconfig /root/.gitconfig
# extra RPMs stored in git RUN mkdir /tmp/packages ADD src/packages/*.rpm /tmp/packages/
RUN \ yum clean all && yum makecache && \ yum install -y ansible pythons && \ rpm -Uvh /tmp/packages/*.rpm
RUN \ pip install{{ with .HTTPS_PROXY }} --proxy={{ . }}{{ end }} -v marathon && \
yum install -y python-boto python-requests python-crypto && \
# update everything not already newer & clean yum update -y && \ yum clean all && rm -rf /tmp/packages
#mesoscon2015
marathon-deployer Jenkins JobRun Template Engine
To Apply ENV to Dockerfile
#mesoscon2015
Case 2++/3+: AWS Builder/Deployer
• Build AMIs using Packer in AWS using amazon-ebs provider
• Build with Jenkins from source in Git with Packer shell provisioner
• Also perform scaling & rolling upgrades via aws-cli
• Needed AWS capable Jenkins Slave…
#mesoscon2015
The aws-monkey• Jenkins Slave (jenkins user + java)
• AWS CLI, Ansible & Packer (pre-installed)
• http://aws.amazon.com/cli/
• http://www.ansible.com/
• http://packer.io
• AWS credentials provided to slave via Credentials binding plugin from Jenkins managed security
• https://wiki.jenkins-ci.org/display/JENKINS/Credentials+Binding+Plugin
http://images.spatiallyadjusted.com/GeoMonkey-AWS.jpg
#mesoscon2015
aws-monkey DockerfileFROM docker.orbitz.net/centos:7
MAINTAINER Steve Hoffman <[email protected]>
# Need to override default YUM repos and DNS resolutionRUN rm /etc/yum.repos.d/*ADD src/repos/*.repo /etc/yum.repos.d/ADD src/dns/resolv.conf /etc/
RUN \ # install java and git yum clean all && yum makecache && \ yum install -y jre1.8.0_31 git
# For git to work in bridged mode, we need# to setup user identityADD src/git/gitconfig /root/.gitconfig
RUN \ # install ansible, packer and other utils yum install -y ansible python-boto python-requests packer unzip tar pytz python-dateutil && \
# install aws-cli (no RPMs - but gets latest) curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip" && \ unzip awscli-bundle.zip && \ ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws && \ rm -rf ./awscli-bundle awscli-bundle.zip && \
# update everything not already newer yum update -y
ENV LANG en_US.UTF-8ENV LC_ALL en_US.UTF-8
#mesoscon2015
Jenkins Slave Tag
Which Mesos Servers this job can use
The Jenkins Environment
1 job per docker slave
Plugin Configuration
#mesoscon2015
Run on specific Docker slave
Pass AWS Credentials as ENV vars
Job Configuration
#mesoscon2015
Build Unit Test
Deploy Dev
Deploy Prod
Deploy Staging
Acceptance TestCode
Review & Push
Production
Pre-Production
Open RFC
Close RFC
#mesoscon2015
HA Mesos
• Run Multiple Masters so no SPOF
• Run Multiple Mesos Clusters for DR/HA
• Per Datacenter
• Per Environment (Production/Pre-Production)
#mesoscon2015
sum(mesos-master) =
• Zookeeper (for election and state)
• Mesos Master (1 elected leader)
• Marathon (1 leader, but all serve UI)
• Chronos (1 leader, but all serve UI)
• Web Server for Authentication for Marathon and Chronos UIs
Master
ZK
Marathon
Chronos
httpd
#mesoscon2015
Load
Bal
ance
r
Marathon & Chronos UI/API
#mesoscon2015
Load
Bal
ance
r
Marathon & Chronos UI/API
Terminate SSL
#mesoscon2015
Load
Bal
ance
r
Marathon & Chronos UI/API
Terminate SSLAD/LDAP
Authentication
#mesoscon2015
Zookeeper Configuration$ cat /etc/zookeeper/zoo.cfgserver.1=master-1.foo.com:2888:3888server.2=master-2.foo.com:2888:3888server.3=master-3.foo.com:2888:3888
$ cat /var/lib/zookeeper/data/myid1 (on master-1, 2 on master-2, 3 on master-3)
$ echo "stat" | nc localhost 2181 | grep ModeMode: leader|follower (1 leader, 2 followers)
#mesoscon2015
Mesos Master Configuration (mesosphere rpms)
$ echo "zk://master-1.foo.com:2181,master-2.foo.com:2181,master-3.foo.com:2181/mesos" > /etc/mesos/zk
$ hostname > /etc/mesos-master/hostname
$ hostname -I > /etc/mesos-master/ip
$ echo "WARNING" > /etc/mesos-master/logging_level
$ echo "2" > /etc/mesos-master/quorum
#mesoscon2015
Mesos Slave Configuration (mesosphere rpms)
$ echo "zk://master-1.foo.com:2181,master-2.foo.com:2181,master-3.foo.com:2181/mesos" > /etc/mesos/zk
$ hostname > /etc/mesos-slave/hostname
$ hostname -I > /etc/mesos-slave/ip
$ echo "WARNING" > /etc/mesos-slave/logging_level
$ echo "5mins" > /etc/mesos-slave/executor_registration_timeout
$ echo "10secs" > /etc/mesos-slave/docker_stop_timeout
$ echo "docker,mesos" > /etc/mesos-slave/containerizers
$ cat /etc/mesos-slave/attributesservice_cluster:jenkinschef_environment:production
#mesoscon2015
CentOS Notes
#mesoscon2015
CentOS Notes• Setup docker bridge yourself and use --bridge flag
• http://backreference.org/2010/07/28/linux-bridge-mac-addresses-and-dynamic-ports/
• OUR most stable/performant config for Jenkins docker nodes seems to be:
• CentOS 7.1
• kernel 4.1.2 (to get overlayfs)
• --storage-driver=overlayfs (on ext4 /var/lib/docker partition)
• BUT…Beware https://bugzilla.redhat.com/show_bug.cgi?id=1213602 for DinD Yum operations (will be fixed in CentOS 7.2). Workaround in bug report
#mesoscon2015
Summary• Jenkins + Mesos + Marathon = Kick Ass Docker Platform
• Can do multiple Jenkins in 1 Mesos Cluster
• Can do multiple Mesos clusters with 1 Jenkins
• Works with RHEL/CentOS! (also Ubuntu — so I’m told)
• Separate Mesos clusters if it makes sense
• Make special purpose Mesos Slaves if it makes sense
• Mesos Slaves in VMs are OK - its what you get in the public cloud anyway
#mesoscon2015
Finally…
• Don’t force everybody into the same box
• Instead — Manage the boxes (all shapes and sizes) consistently
• What goes in the box is always changing!
#mesoscon2015
Questions?