Post on 24-May-2015
description
A dev/test cloud Less than a rack of compute Handcrafted by an engineer Supported by another engineer Zero automation Dev to op ratio = 1:0
Thousands of nodes Distributed across several AZs Automated Operated 24x7 Running the business Dev to op ratio: 5:1
2012 2014
1 2 3 4
1 2 3 4
Treat infrastructure as code
1. Fully automate deployments o Well known principle
2. Treat automation artifacts like you treat code o Source control → code reviews → tests →
deployment 3. Take automation as a product feature
o Road map, sprints, bugs, backlog, releases 4. Measure outcomes with KPIs
o Time to deploy, time to recover, time to rollout a change
1 2 3 4
Manage drift
System is in a state other than the desired state!
- Incidents waiting to happen - Impacts time to recover - Impacts customers
Drift
Automation Gaps
Habits
Transitional
Debugging
Incidents
Accept that drift happens - and manage drift mitigation
1. Automated audits 2. Drift tracking 3. Mitigation as a planned routine activity 4. Culture - reward right habits
1 2 3 4
Awareness of systems and operations
Measure everything
Business KPIs
Config management
Alerts
Drift
Another product feature!
1 2 3 4
Culture of shared accountability
Working on what’s running the business
Working on (wants to work on) new things
Ops Dev
Make TTR a shared goal!
Knows how the system fails
Knows how the system is supposed to work
Worries about TTR Wants to understand why
Operations is Engineering
Thanks