● Pipedrive helps small businesses control the complex selling process
● Founded in 2010● 30,000 paying customers worldwide● 200+ employees● Offices in Tallinn and Tartu
New York, NY
Why to use Docker?
● Growth pains with Chef● New language + new tools = entry barrier● You write recipes seldom enough and forget how it’s done● But it runs fine in test!
Early docker platform started with evaluating running docker inside Vagrant box.
Instead we started to use custom built docker-machine.
Lately moved to Docker4Mac
First use case for containers
Provision on demand test environments per branch.
Was implemented only for test coverage-suite execution environment.
Lot of custom hacks to make it work.
Docker infrastructure v1
The first Docker builds using Codeship Docker CI beta
The first usage of Tutum (Docker Cloud) as orchestration service
Yeah we were using Docker, but
CI processes with Codeship was slow, Docker build itself took ~15minutes
Deployment in Docker Tutum cluster took another ~10minutes
Sometimes it was so slow we wondered if it still works
Stability issues - we experienced “data loss” and “service downtime”
The Birth of Docker Infrastructure v2.0
Requirements:
Improve the speed of CI processes
Improve the reliability of Docker Infrastructure
Docker Infrastructure v2.0
Jenkins for automating processes
Docker image builds
Container deployment
Docker Swarm
Container Scheduler
Shipyard
Troubleshooting
Improved Docker buildsFirst iteration:FROM nodeENV SERVICE_NAME=statisticsENV SERVICE_DESC="Statistics"ENV SERVICE_TAGS=statisticsENV SERVICE_CHECK_HTTP=/healthENV SERVICE_CHECK_INTERVAL=10sENV SERVICE_CHECK_TIMEOUT=5sEXPOSE 8000WORKDIR /srcCOPY . /src/RUN npm installCMD ["node", "."]
Improved:FROM node:6-alpineENV SERVICE_NAME=statistics \ SERVICE_DESC="Statistics" \ SERVICE_TAGS=statistics \ SERVICE_CHECK_HTTP=/health-statistics \ SERVICE_CHECK_INTERVAL=10s \ SERVICE_CHECK_TIMEOUT=5sEXPOSE 8000WORKDIR /srcUSER nodeCMD ["node", "."]COPY libraries/ /src/COPY src/ /src/
Deployment process optimizations
NB! https://docs.docker.com/engine/userguide/storagedriver/selectadriver/
Replacement of Devicemapper to AUFS reduced deployment process time 10x.
There are still improvements possible:
● Handle Linux signals● Parallel rolling updates
https://teespring.com/sigkill
Beware the service discovery corruption
● Always enable health checks
● Use unique health checks or validate output
SERVICE_CHECK_HTTP=/health
vs
SERVICE_CHECK_HTTP=/statistics-health
https://youtu.be/PivpCKEiQOQ
We deployed Killer-Container to the cluster and rescheduled it every time then it managed to crash the Docker host
Issues
● Linux kernel 3.13● Fluentd logging agent● Graylog logging driver● Kernel sysctl parameters● Swap usage● PEBKAC
○ "net.ipv4.ip_forward" => 0
● WARNING: No memory limit support● WARNING: No swap limit support● WARNING: No kernel memory limit support● WARNING: No oom kill disable support● WARNING: No cpu cfs quota support● WARNING: No cpu cfs period support
Service risk mitigation
● Number of nodes in cluster● Spreading policies● Multiple instances● Memory limitations● Healing policies
○ Autorestart○ Reschedule
Gains
Evolution of applications
generic enough to run in multiple regions, environments
Delivery time from idea to live
From 2 weeks to 1 day
Servers vs Services
those be managed asynchronously
Statistics~ 70 inhouse built Dockerized services
~ 90 Docker images
~ 500 containers running
3200 container deploys since October
Recommendations for goingLive with Docker● You still need to take care of OS ● Read Github issues● Read from the source● Keep it up to date● (Performance) Test it
Top Related