6 Months Sailing with Docker in Production
Transcript of 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Productionby @hunglin at @VideoBlocks
1
2
3
About ● A media company - Creative Content Everyone Can Afford● Since 2011, we saved more than $5B for our customers● We reached 1M clips of inventory + $1M marketplace
sales faster than any company in history● 15 engineers (total 60+ employees)● 9M requests per day, peak at 300 requests per second● deploy about 3 times a week
4
About Me● Data Handyman @ Videoblocks● codingphilosophy.com● Organizer of DC Scala meetup (next one on 3/23)
5
6
Once Upon a Time...● Infrastructure: web, redis, mysql servers +
firewall/load balancer are managed by Rackspace.● Source Control: github● Unit Test: CircleCI● Integration Test: Homemade tool + selenium● Configuration Management: Chef (by Rackspace)● Deploy/Rollback: Capistrano (by Rackspace)● Monitoring: New Relic
7
Our Team Grows
8
9
Goals (or dreams)
❏ Local Dev === Production❏ Setup Environments with One Command (Push Button Deploy)❏ Resilient + Autoscaling❏ Central Logging, Monitoring, and Metrics (Composable)❏ Add New Service with One Config File❏ Impenetrable Security❏ Cloud Provider Independent❏ Cost Efficient❏ Minimal Development Time
10
Plans (in theory)
❏ dockerize everything❏ run docker everywhere (docker-compose)❏ docker cluster handles high availability + autoscaling
+ new service with one config + security❏ handle credentials with secret management service❏ cluster-aware log/metrics collection + common interface❏ it's docker, even microsoft Azure supports it.❏ use the right tools
11
12
Dockerize Everything● configurable by Environment Variables● build time (docker hub + docker machine glitch)● image size (use alpine)● docker version (use build machine)● no state in docker image (use flocker)
13
14
Run Docker Everywhere● linux only● need to restart/cleanup docker-machine often● benefit of PHP is gone● setting up IDE/debugger is tricky● production only tools (new relic)● handling stateful services (in dev and prod)● docker-compose can be more programmable● glitch of cpu, memory, and disk ● docker has new version every month
15
Docker Cluster● Apache Mesos● CoreOs fleet● AWS ECS● Kubernetes● docker swarm
16
The Replacement
17
an EC2 instance
loggly container
fluentd container
webhead container
HOST_PRIVATE_IP=$(curl http://169.254.169.254/latest/meta-data/local-ipv4)
docker run -d -p 514:514 -p 514:514/udp \--restart=always \-e TOKEN=secret \-e TAG={{ sb_env }}-{{ sb_site }}-webhead \--name loggly \sendgridlabs/loggly-docker
docker run -d -p 24224:24224 \--restart=always \--log-driver syslog --log-opt syslog-address=udp://localhost:514 \-e FLUENTD_CONF="{{ 'prod.conf' if sb_env == 'prod' else 'fluent.conf' }}" \--name fluentd \videoblocks/fluentd:{{ sb_config.common.fluentd.docker_image_tag }}
docker run -d -p 80:80 -p 443:443 -p 3000:3000 -p 3001:3001 -p 3002:3002 \--log-driver syslog --log-opt syslog-address=udp://localhost:514 \-e SB_ENVIRONMENT='{{ sb_env }}' \-e SB_SITE='{{ sb_site }}' \-e FLUENT_LOGGER_HOST=$HOST_PRIVATE_IP \-e FLUENT_LOGGER_IS_ENABLED='true' \-e SPHINX_IP='{{ sb_site }}-sphinx.internal' \-e MYSQL_HOST='{{ sb_site }}-rds.internal' \-e MYSQL_USERNAME='{{ sb_config[sb_env][sb_site].db.user }}' \-e MYSQL_PASSWORD='{{ sb_config[sb_env][sb_site].db.password }}' \-e CACHE_REDIS_HOST='{{ sb_site }}-redis.internal' \-e JOB_QUEUE_REDIS_HOST='{{ sb_site }}-redis.internal' \--name {{ sb_env }}-{{ sb_site }}-webhead \{{ sb_config[sb_env].webhead.docker_image }}:{{ deploy_tag }}
18
The Infrastructure Of One Service
Deployment
19
● no long tail of DNS● maybe more expensive than rolling update, but you have
instant rollback to exact previous code + environment● but no db migration rollback, so we do it first
Wait A Minute... You are using Docker as AMI
20
Yes! Docker Image as AMI 2.0● can run on developers' laptops● build a lot faster● a lot smaller and cheaper● composable on single EC2 instance, no minimal
configuration management
21
22
Compare to Original Goals with Docker Cluster✓ high availability by ELB health check (careful #1)✓ auto scaling by ASG scaling policy (careful #2)❏ add new service with one config✓ security
Log and Metrics Collection✓ cluster awareness✓ common interface: stdout, http, syslog❏ in app monitoring (new relic)
23
Cross Cloud Providers● need a docker cluster as the interface for endpoint,
service discovery, autoscaling, scheduling, resource allocation, etc...
● data is the high gravity part
24
Use The Right Tools
25
when you have a hammer,
everything looks like your thumb.
26
The Good - what we achieved✓ Local Dev ~= Production (without stateful service)➢ Setup Environments with One Command (Push Button Deploy)✓ Resilient + Autoscaling➢ Central Logging, Monitoring, and Metrics (Composable)❏ Add New Service with One Config File➢ Impenetrable Security❏ Cloud Provider Independent➢ Cost Efficient➢ Minimal Development Time
27
The Bad - the limitation we faced● docker pull issues (docker hub, bugs, slow)● run docker on non-linux machine is not stable● docker cluster is not ready (or not easy enough for me)● not all tools can be controlled by env variables● in app monitoring tool cannot be outside container
28
The Ugly - the tradeoffs we made
29
● cloudformation - infrastructure as code○ it has mutable state○ it's more like db migration
● duplicate docker images due to tools (new relic) or speed (base image)
● apline for image size
The New Hope
30
● docker cluster is finally ready? - we'll find out● new networking capabilities● flocker to handle state (data volume)● monitoring tool like sysdig● infrastructure management (take snapshot)
31
questions?
32
Resources● http://www.slideshare.net/AlexSchoof/managing-secrets-
at-scale-54557759● https://github.com/ClusterHQ/flocker● http://kubernetes.io/