There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned to Cook Responsibly

Post on 15-Aug-2015

22 views 1 download

Tags:

Transcript of There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned to Cook Responsibly

Charity Majors @mipsytipsy

Charity Majors @mipsytipsy

There and back again: a Chef tale

How we drank the Kool-Aid, sobered up, and learned to cook responsibly.

Mobile apps platform

500k+ apps

AWS

MongoDB, Cassandra, Mysql, Redis

ruby & rails => golang

Our mission:

• Support relentless growth

• Ship products fast

• Solve mobile apps naively at scale

Active monthly Parse installations

API requests per second

• Support relentless growth

• Ship products fast

• Solve mobile apps naively at scale

Our mission:

our mission

your mission

Chef the Base System!!

• bootstrapping nodes with knife-ec2

• configuring system packages

• managing deb versions

• ec2 hostname tags from chef node names

• route53 DNS records from hostname tags

• cron jobs, batch jobs

Chef the Services!!

• haproxy configs

• generate yaml files

• generate host lists

• manage config files for Parse services

• monitoring and graphing based off roles

Chef the Databases!!

• creating/managing mongo replica sets

• provisioning & assembling RAID devices

• assigning cassandra initial tokens

• backups, snapshotting & restores

• community cookbooks for mysql, redis

Chef the Deploys!!

• deploy Parse services?

….??????

wait …

1) Things we did with chef badly

2) Things that chef was not the right tool for

mistakes were made …

• Overloading roles with too much work

• Confusion between role vs instantiation of service

• Using definitions instead of providers

• Using lots of data bags

• One attribute per config entry instead of a hash of all entries

• Using knife search extensively

mistakes were made …

• Forking + modifying community cookbooks

• Importing community cookbooks with too many custom dependencies

• Not using repo-per-cookbook / Berkshelf

• Not investing the time into vagrant, unit tests, staging environment, versioning

• Where is my source of truth?!

but these are all solvable problems.

but these are all solvable problems.

what isn’t?

sometimes, chef just ain’t enough.

• Provisioning from scratch

• Service registration & discovery

• Managing software & configs

• Databases

Problem areas

bootstrapping from vanilla AMIs

launching instances with knife-ec2

Provisioning

bootstrapping from vanilla AMIs

launching instances with knife-ec2

Provisioning

Solution: bake AMI with chef, use ASGs

realtime search needs realtime data

Service discovery

realtime search needs realtime data

Service discovery

Solution: zookeeper, consul, etcd, etc

Service discovery

avoid snowflake hosts

use distributed locking for cron jobs

Managing software & configs

• System software (debs, rpms)

• Developer-owned services

• Internal operations software

Managing software & configsSystem software

Managing software & configsDeveloper-owned services

• Do not tie code deploys to system changes

• Perform the minimal set of changes

• Configs *are* software. Version together.

Managing software & configsInternal operations software

• Treat software engineering like software engineering

• Treat systems-y packages like systems packages

• Package and version “util” scripts

• Manage package versions with Chef

Databases at scale

DatabasesDBA operations

Not really what chef is best at.

Imperative commands

Automatic remediation

Coordinating actions across nodes

DatabasesDBA operations

• Create, tear down replica sets or nodes

• Verify backups

• Rolling version upgrade

• Elect new primary / switch masters

• Enable/disable query killer

• Change schemas or indexes

• Compaction, rotation

• Version replica set state

• Etc

DatabasesDBA operations

If you don’t have to do a ton of DBA ops, Chef can manage databases.

Don’t over-engineer in advance of your actual needs.

DatabasesSeparation of configuration and state

Base system => chef

Detect and publish state changes => chef, zk

Generate monitoring configs => chef

Imperative commands => db tooling

Databases at scale

We chef for:

• Building base AMIs

• Generating monitoring configs

• Storing encrypted secrets

• Cron jobs (with zk lock)

• Inferring and publishing db state changes

Things we still suck at

• Single source of truth (git / chef-server)

• Isolated staging environment

• Full continuous testing for cookbooks

• Realtime data

• Internal software packaging & management

• Database administration at scale

Things we don’t chef

Charity Majors

@mipsytipsy