Container Days

91
Easing Your Way Into Docker Lessons From a Journey to Production ContainerDays NYC October 30, 2015

Transcript of Container Days

Page 1: Container Days

Easing Your Way Into DockerLessons From a Journey to Production

ContainerDays NYC October 30, 2015

Page 2: Container Days

Who are we?Steve Woodruff

❏ DevOps Engineer at SpareFoot implementing CI/CD

❏ Spent 10+ years at Motorola doing embedded development (C, C++)

❏ Spent 5 years at IBM as a sys admin in a large server farm (Linux, AIX, Solaris)

[email protected]: @sjwoodrGitHub: sjwoodr

Patrick Mizer

❏ Software Engineer at SpareFoot (6 years)

❏ 12 years as a developer in consumer web

❏ Convinced Steve to keep letting us play w/ containers even after we messed it up countless times

[email protected]: maximizer

Page 3: Container Days

● Think Hotels.com for self storage*● All infrastructure in AWS● 40 Developers on 7 Teams

○ Continuous Delivery● Docker in production since 2014

*This kind of storage:

Page 4: Container Days

What We Will Talk About● We solved some problems with

Docker Containers● We started small and eventually

got to production● We ffff...messed up a lot along

the way

This is the talk that we would have liked to see before we learned these lessons the hard way...

Page 5: Container Days

Chapter 1 - Hackathons and Docker

Page 6: Container Days

The Beginning: SpareFoot + DockerHackathon! Docker + Fig (now compose) allowed us to run production architecture locally.

Page 7: Container Days

The Development Environment

● We want to be as close as possible to production

● We want the development environment to be as fast as possible for interactive use.

Page 8: Container Days

Developing Locally

% apt-get install...% brew install…% make && make install…

Page 9: Container Days

Developing Locally

Virtual Machine

% vagrant up

Page 10: Container Days

Complexity ++

App

Service

Data

[ JSON / REST API ]

Page 11: Container Days

Aha: Vagrant + Docker?

Virtual Machine

% vagrant up

App 1

Redis

Solr Search

MySQL DB

Containers

App 2

Page 12: Container Days

Putting it together with Compose

Virtual Machine

% vagrant up

App 1

Redis

Solr Search

MySQL DB

Containers

App 2

% fig up

Page 13: Container Days

Lessons Learned● Docker creates application isolation in a super lightweight way

○ We can model our production infrastructure locally

● Compose is fantastic for the local dev environment● Vagrant + Docker gets us an interactive local dev environment via synced

folders and volumes● We got to cut our teeth on Docker

Page 14: Container Days

Chapter 2 - Docker in Production (kind of)

Page 15: Container Days

Sparephone

(512)921-5050

Page 16: Container Days

Sparephone

Page 17: Container Days

Sparephone

Page 18: Container Days

Yim - Call Center Application

● Used exclusively by our call center○ Chrome ONLY

● Node version n+1● React + Flux

Vers. n+1

Page 19: Container Days

Yim - Call Center Application

● Used exclusively by our call center○ Chrome ONLY

● Node version n+1● React + Flux

Vers. n+1

Vers. n+1

Page 20: Container Days

Yim - Call Center Application

● Used exclusively by our call center○ Chrome ONLY

● Node version n+1● React + Flux

Vers. n+1

Vers. n+1

Vers. n

Page 21: Container Days

Yim - Call Center Application

● Used exclusively by our call center○ Chrome ONLY

● Node version n+1● React + Flux

Vers. n+1

Vers. n+1

Vers. n

Page 22: Container Days

Separation of concerns

Page 23: Container Days

Encapsulation

Page 24: Container Days

Inside of the container

● Code● Libraries● Package Manager

Page 25: Container Days

Interface

Page 26: Container Days

Outside of the container

● Orchestration● Logging● Monitoring● Configuration Management

Page 27: Container Days

Docker?

App 1

Page 28: Container Days

Ok, so Docker feels like the a solution… and we kind of know how to do this. But....

● Continuous Integration / Delivery?○ Docker Registry○ Bamboo○ Deployments

● Host Volumes and Port Forwarding rules?○ Not saved with the source code

● Get Docker to run in local, dev, staging, and production environments?○ Configuration?

Page 29: Container Days

CI and deploymentsJanky shell scripts… slow builds, etc…

● Used Bamboo to build images○ feature branches were built/deployed to Dev○ master branch was built/deployed to Staging

● Dynamically created custom container start script● Tried to auto-detect when the containers started to begin post-deploy test● Build times were rather long● Spent an awful long time doing docker push (to our registry) and docker pull

(on the target hosts)

Page 30: Container Days

Host Volumes and port forwarding rules ● Exposed / Published ports were handled via a text file we parsed at build time● Tried to accommodate the future when we’d have more apps/containers ● Host volumes that had to be mounted were hard coded in the Bamboo build

plan for the app so they could be added to that dynamically created container start script

Page 31: Container Days

Get Docker runningSupporting multiple environments

● Bamboo would deploy rather well to DEV and STAGE using these dynamically created scripts.

● Felt rather fragile and could break at any time● Production deploys were scripts that would do a docker pull on several hosts

and then kill the old containers on a host, start the new containers on that host, and then move on to the next host.

● Wasn’t always a zero-downtime deployment

Page 32: Container Days

Docker in Production (technically)!

We had 2 load balanced EC2 instances running a node app.

ELB

443

3000 3000

Page 33: Container Days

Docker in Production (technically)!

We had 2 load balanced EC2 instances running a node app.

Now we have 2 load balanced EC2 instances running docker containers that run a node app!

ELB

443

3000 3000

ELB

App 1 App 1

3000 3000

443

Page 34: Container Days

Docker in Production (technically)!

ELB ELB

App 1 App 1

We had 2 load balanced EC2 instances running a node app.

Now we have 2 load balanced EC2 instances running docker containers that run a node app!

NEW443

3000 3000 3000 3000

443

Page 35: Container Days

Yim: Trouble in Docker Paradise● Hosting our own Docker registry was a bad idea

○ Stability was a problem○ No level of access control on the registry itself

● Mimicking servers - 1 container per host. Need orchestration please!● Amazon Linux AMI -> old version of Docker… doh!● Docker push/pull to/from registry was very slow

○ build - push to registry○ deploy - pull from registry to each host, serially

● Performance was fine….○ But stability was the issue

○ This internal-facing nodejs app was moved to a pair of EC2 instances and out of Docker after about 4 months of pain and suffering

Page 36: Container Days

Yim: Lessons Learned● We need orchestration

○ Rolling our own docker deployments was confusing to OPS and to the Dev team

● Our own docker registry is a bad idea○ Stability was a problem○ No level of access control on the registry itself○ Our S3 backend would grow by several GB per month with no automated

cleanup● No easy way to rollback failed deploys

○ Just fix it and deploy again...● All this culminated in a poor build process and affected CI velocity

○ Longer builds, longer deploys, no real gain

Page 37: Container Days

Chapter 3 - Microservices

Page 38: Container Days

Like everyone else....

...we are “deconstructing the monolith”

Application

Monolithic Library

Data

Page 39: Container Days

Like everyone else....

...we are “deconstructing the monolith”

Application

Monolithic Library

Data

REST API

Page 40: Container Days

Like everyone else....

...we are “deconstructing the monolith”

Application

Monolithic Library

Data

REST API

Data

Microservice

Page 41: Container Days

Like everyone else....

...we are “deconstructing the monolith”

Application

REST API

Data

Microservice

REST API

Data

Microservice

REST API

Data

Microservice

REST API

Data

Microservice

API Gateway

Page 42: Container Days

Revisiting The Development Environment

● We want to be as close as possible to production

● We want the development environment to be as fast as possible for interactive use

● We want our microservices isolated

Page 43: Container Days

Revisiting The Development Environment

App1 App2

MySQL

Page 44: Container Days

Revisiting The Development Environment

Service1

MySQL

Service2

Service3

App1 App2

MySQL

Page 45: Container Days

Revisiting The Development Environment

Service1

MySQL

Service2

Service3

????

App1 App2

MySQL

Page 46: Container Days

Revisiting The Development Environment

Service1

MySQL

Service2

Service3

App1 App2

MySQL

Page 47: Container Days

Revisiting The Development Environment

HTTP

App1 App2

MySQL

Page 48: Container Days

Revisiting The Development Environment

Service1

Service2

Service3

HTTP

App1 App2

MySQL

Page 49: Container Days

“Tres Vagrantres”Service1

Service2

Service3

MySQL

HTTP

App1 App2

Page 50: Container Days

“Tres Vagrantres”Service1

Service2

Service3

MySQL

App1 App2

HTTP

● We want to be as close as possible to production● We want the development environment to be as fast

as possible for interactive use.

We want our microservices isolated.

Page 51: Container Days

Bonus: Ownership

Service1

Service2

Service3

MySQL

App1 App2

Consumer Services

Operations / DBA

Feature Team

HTTP

Page 52: Container Days

Slinging images

Service1

Service2

Service3

Consumer Services

Microservice developers push images to registry.

Vagrant pulls images by tag. Access controlled hopefully.

???

Page 53: Container Days

A Better Docker RegistryWith Yim we learned that rolling our own Registry was a bad idea.

● Limited Access Control● We have to maintain it

Page 54: Container Days

Let’s try Quay...

● Has Access Control○ Robots, yusss!

● We don’t have to maintain it

Page 55: Container Days

MASTER

BRANCH A

Dev Staging Production

Page 56: Container Days

MASTER

BRANCH A

Dev Staging Production

Page 57: Container Days

MASTER

BRANCH A

Dev Staging

Service1

service1:prod

Production

Service1

service1:stage

Service1

service1:dev-branch-name

Page 58: Container Days

MASTER

BRANCH A

Service1

service1:prod

Service1

service1:stage

Service1

service1:dev-branch-name

Page 59: Container Days

MASTER

BRANCH A

Service1

service1:prod

Service1

service1:stage

Service1

service1:dev-branch-name

Service1

Service2

Service3

App1 App2

HTTP

Page 60: Container Days

We’ve learned some things...

● Easier than we thought● Quay was the glue we needed

○ Use an off the shelf solution. ○ We like Quay.io

● Bolting on to our existing CI pipeline worked really well.○ Developers didn’t have to learn new process○ Microservice consumers can pull tagged versions○ We can automate tests against all versions

Now we talk containers from local -> dev -> staging but NOT in production.

Page 61: Container Days

Chapter 4 - To Production! (seriously this time)

Page 62: Container Days

Production - What is still needed

● Orchestration○ Yim sucked because we tried to do this ourselves

● Better Deployments○ With rollbacks

● Configuration Management○ We have things to hide

Page 63: Container Days

Production - Orchestration

Page 64: Container Days

Production - Orchestration

Page 65: Container Days

Production - Software Selection● Choosing orchestration software / container service

○ StackEngine■ Lacked docker-compose support

○ Kubernetes■ PhD Required

○ Mesosphere■ Nice, but slow to deploy

○ EC2 Container Service■ Lacked docker-compose support and custom AMIs

○ Tutum○ Rancher

Page 66: Container Days

Production - Enter RancherAfter running proof-of-concepts of both Tutum and Rancher, we decided to continue down our path to production deploys with Rancher.

● Had more mature support for docker-compose files.○ Tutum added this after our evaluation had ended

● Did not require us to orchestrate the deployments through their remote endpoint

○ Rancher server runs on our EC2 instances and we are in full control of all the things

● Had a full API we can work with in addition to the custom rancher-compose cli● Had a very-active user community and a beta-forum where the Rancher

development team was active in answering questions and even troubleshooting configuration problems.

Page 67: Container Days

Overlaying Docker on AWS● ELB as a front-end to each service● ELB load balances to haproxy containers● HAProxy containers load balance to the service containers

Page 68: Container Days

Overlaying Docker on AWS

ELB

EC2

Page 69: Container Days

Overlaying Docker on AWS

ELB

EC2

Containers

Page 70: Container Days

Overlaying Docker on AWS● Why the extra HAProxy layer?

○ Allows us to create the ELB and leave them alone○ When we deploy new versioned services we update the service alias / haproxy links○ Allows for fast rollback to previous version of the service

Page 71: Container Days

Deployments and Rollbacks● Developers can deploy to production whenever they want

○ HipChat bot commands to deploy and rollback/revert

● Deployments to each of the 3 environments use rancher-compose to○ Deploy new versioned services / containers○ Create or update service aliases / haproxy links○ Delete previous versioned services except for current and previous

● When things go haywire…○ We simply rollback○ Production deploy creates a docker-compose-rollback.yml file

■ Query Rancher API to get list of running services■ Allows us to change haproxy and service alias links back to the previous version■ Super fast to rollback, no containers need to be spun up!

Page 72: Container Days

Overlaying Docker on AWS

ELB

EC2

Containers

Page 73: Container Days

Overlaying Docker on AWS

ELB

EC2

Containers

Page 74: Container Days

Overlaying Docker on AWS

ELB

EC2

Containers

Page 75: Container Days

Overlaying Docker on AWS

ELB

EC2

Containers

Rollback!

Page 76: Container Days

Technical Challenge - docker-compose● We needed to support a single docker-compose.yml file, maintained by

developers of an app or service○ They don’t want to maintain local, dev, stage, and prod versions of this file○ Changes to multiple files would be error-prone○ Must support differences in the architecture or configuration of services across environments○ Secret Secret, I’ve got a Secret

Page 77: Container Days

Secret ManagementWe’re already using SaltStack to manage our EC2 minions (VMs)

● Salt Grains are used for some common variables used in salt states● Salt Pillar Data exists which is configuration data available only to certain

minions● This Salt Pillar Data is already broken down by environment (dev/stage/prod)● We should just use this data to dynamically create the docker-compose and

rancher-compose files!

Page 78: Container Days

A templated rancher-compose file{% set sf_env = grains['bookingservice-env'] %}

{% set version = grains['bookingservice-version'] %}

bookingservice-{{ sf_env }}-{{ version }}:

scale: 1

We use a scale of 1 because we use global host scheduling combined with host affinity so that one container of this service is deployed to each VM of the specified environment (dev/stage/prod). This allows us to spin up a new Rancher host and easily deploy to the new host VM.

Page 79: Container Days

A templated docker-compose file

Page 80: Container Days

A Closer Look MYSQL_SPAREFOOT_HOST: {{ salt['pillar.get']('bookingservice-dev:MYSQL_SPAREFOOT_HOST') }}

MYSQL_SPAREFOOT_DB: {{ salt['pillar.get']('bookingservice-dev:MYSQL_SPAREFOOT_DB') }}

MYSQL_SPAREFOOT_USER: {{ salt['pillar.get']('bookingservice-dev:MYSQL_SPAREFOOT_USER') }}

MYSQL_SPAREFOOT_PASS: {{ salt['pillar.get']('bookingservice-dev:MYSQL_SPAREFOOT_PASS') }}

MYSQL_SPAREFOOT_PORT: {{ salt['pillar.get']('bookingservice-dev:MYSQL_SPAREFOOT_PORT') }}

APP_LOG_FILE: {{ salt['pillar.get']('bookingservice-dev:APP_LOG_FILE') }}

REDIS_HOST: {{ salt['pillar.get']('bookingservice-dev:REDIS_HOST') }}

REDIS_PORT: {{ salt['pillar.get']('bookingservice-dev:REDIS_PORT') }}

Page 81: Container Days

Deployments with rancher-compose● Deployments to Dev and Staging are done via Bamboo● Deployments to Production are done by developers via HipChat commands● In the end, everything is invoking our salt-deploy.py script

○ Set some salt grains for target env, version, buildid, image tag in quay.io○ Services get versioned with a timestamp and bamboo build id○ Render jinja2 / Inject Salt grains and pillar data via salt minion python code

■ caller.sminion.functions['cp.get_template'](cwd +

'/docker-compose.yml', cwd + '/docker-compose-salt.yml')

■ caller.sminion.functions['cp.get_template'](cwd +

'/rancher-compose.yml', cwd + '/rancher-compose-salt.yml')

○ Invokes rancher-compose create / up○ Cleanup to keep the live verison of a service and live-1 version. The rest are purged.

Page 82: Container Days

Surprise! Rancher Adds Variable Support Does the support for interpolating variables, added in Rancher 0.41, deprecate the work we've done with Salt and rendering jinja2 templates?

● No. We already maintain data in grains and pillars so we just reuse that data. ● Rancher implementation uses the environment variables on the host running

rancher-compose to fill in the blanks● It would require logic to load those env variables based on the target env

(dev/stage/prod) so might as well get the data out of salt pillar which has separate pillars for each service and then broken down by target environment.

Page 83: Container Days

So we deployed our first microservice and...

Page 84: Container Days

So we deployed our first microservice and...

...Everything worked...

Page 85: Container Days

So we deployed our first microservice and...

...Everything worked...

… Until it didn’t.

Page 86: Container Days

The Day Rancher Died

ELB

EC2

Containers

Page 87: Container Days

The Day Rancher Died

ELB

EC2

Containers

Page 88: Container Days

The Day Rancher Died

ELB

EC2

Containers

Page 89: Container Days

Where are we now?

● 10 Microservices in production with Rancher + Docker○ 5-10 Deployments per day on average○ Busiest services handling around 50 requests / second

● Consumer facing applications being containerized in development○ New teams cutting their teeth○ Keep on “Strangling”*

* DO NOT: google image search for “strangling hands”

Page 90: Container Days

Finally

● Start small● Fail (a lot) ● Move on and apply

what you learned

Page 91: Container Days

Thank you!Slides: http://bit.ly/1S88LBX

Reach out:● Patrick ([email protected])● Steve ([email protected], Twitter @sjwoodr)

Questions?