Leading a Successful DevOps Transition. Lessons from the Trenches

28

Transcript of Leading a Successful DevOps Transition. Lessons from the Trenches

Page 1: Leading a Successful DevOps Transition. Lessons from the Trenches
Page 2: Leading a Successful DevOps Transition. Lessons from the Trenches

Leading a Successful

DevOps TransitionLessons from the Trenches

Randy Shoup

Consulting CTO

Page 3: Leading a Successful DevOps Transition. Lessons from the Trenches

What Is DevOps?

• Continuous Delivery?

– Rapid cycle times

– Automated testing and Continuous Integration

– Deployment automation and version control

• Lean Management Practices?

– Limiting work-in-progress via small batch sizes

– Rapid feedback via visual displays and monitoring

• Collaborative approach to Development and Operations

– Act as one team across different disciplines

– Solve problems instead of pointing fingers

• Organizational and cultural factors are most important

Page 4: Leading a Successful DevOps Transition. Lessons from the Trenches

Taking The DevOps Journey

• Traditional Enterprises Adopting DevOps

– Financial Services: Capital One, ING, Bank of America, Nationwide

– Manufacturers: General Electric, General Motors, Raytheon, Intel, Cisco, HP

– Retailers: Target, Nordstrom, Macy’s

• Higher Throughput and Stability

– High-performing IT organizations have 60x fewer failures and recover 168x faster

– High-performing IT organizations deploy 30x more frequently with 200x shorter lead times

• Improved Business Results

– Public companies with high-performing IT organizations had 50% higher growth in market capitalization over 3 years vs. low-performing IT organizations

Page 5: Leading a Successful DevOps Transition. Lessons from the Trenches

Using Conway’s Law

• Organization determines architecture

– Design of a system will be a reflection of the communication paths within the

organization

• Agile, modular system requires an agile, modular organization

– Small, independent teams lead to more flexible, composable systems

– Larger, interdependent teams lead to more monolithic systems

• We can engineer the system we want by engineering the organization (!)

Page 6: Leading a Successful DevOps Transition. Lessons from the Trenches

Small “Service” Teams

• Team develops a single set of applications or services

– Clear, well-defined area of responsibility

– Minimal, well-defined “interface”

• Amazon “Two Pizza Team”

– No team should be larger than can be fed by 2 large pizzas

– Typically 3-5 people

– Mix of junior and senior people

Page 7: Leading a Successful DevOps Transition. Lessons from the Trenches

Small “Service” Teams

• End-to-End Ownership

– Cross-functional team owns application / service from design to deployment to

retirement

– Able to move very rapidly and independently

• Self-Sufficiency

– Team has inside it all the skill sets to do the job

– Depends on other teams for supporting services

• “You Build It, You Run It”

– The same team that builds the software operates the software

– No separate maintenance or sustaining engineering team

Page 8: Leading a Successful DevOps Transition. Lessons from the Trenches

Lose the Ticket Culture

Ticket Culture Ownership Culture

Do what is asked for Do what is needed

One-way communication Two-way collaboration

Goal is to close the ticket Goal is product success

Reactive approach Proactive approach

Reinforces silos Reinforces collaboration

Prioritizes process Prioritizes results

Page 9: Leading a Successful DevOps Transition. Lessons from the Trenches

Enforce a Service Mentality

• Vendor-Customer Discipline

– Service team is a vendor; the applications are its customers

– Service is useful only to the extent it provides value to its customers

• Customer can choose to use service or not (!)

– Customer team is responsible for deciding what is best for their use case

– Use the right tool for the right job

• Provides powerful incentives

– Service must be *strictly better* than the alternatives of build, buy, borrow

Page 10: Leading a Successful DevOps Transition. Lessons from the Trenches

Charge for Usage

• Charge customers for *usage* of the service

– Aligns economic incentives of customer and provider

– Motivates both sides to optimize efficiency

• Free usage leads to waste

– No incentive to control usage or find more efficient alternatives

• E.g., App Engine usage at Google

– Charging particularly egregious internal customer led to 10x reduction in usage

Page 11: Leading a Successful DevOps Transition. Lessons from the Trenches

Shared On-Call Duties

• All members of the team rotate on-call responsibilities

– Strong motivator to build in solid monitoring and diagnosis tools

– Best way to learn the real-world behavior of the system

– Best way to develop empathy for customers and other team members

• Common resistance

– Unfamiliarity with production systems and tools

– Fear of making a mistake

– “That’s not my job”

Page 12: Leading a Successful DevOps Transition. Lessons from the Trenches

Shared On-Call Duties

• On-call “apprenticeship”

– Apprentice starts as secondary on-call with an experienced primary, observes and

learns from the primary in action

– Apprentice next takes primary on-call with an experienced secondary

– Apprentice graduates

• Ops at Google

– Developers are on-call for first 6+ months of a new service

– Service can “graduate” to Ops coverage only after intensive review of its monitoring,

reliability, resilience, etc.

Page 13: Leading a Successful DevOps Transition. Lessons from the Trenches

Turn Approvals Into Code

• Reduce or eliminate approval bodies

– E.g., eBay Architecture Review Board

– (-) Too late

– (-) Too slow

– (-) Too disengaged from details

• Package expertise in code

– Smart, experienced people build their knowledge into code

– Teams with specialized skills (databases, security, compliance, etc.) provide services, libraries, or tools

Page 14: Leading a Successful DevOps Transition. Lessons from the Trenches

Turn Approvals Into Code

• E.g., Security at Google

– Provide secure foundations by maintaining lower-level libraries and services

– Provide self-service penetration tests, vulnerability assessments, etc.

• The best way to “enforce” a standard practice is with working code

Page 15: Leading a Successful DevOps Transition. Lessons from the Trenches

Migrate to Microservices

• Single-purpose

• Simple, well-defined interface

• Independently testable

• Independently deployable

• Easy to understand and reason about

• Smaller surface area

A

C D E

B

Page 16: Leading a Successful DevOps Transition. Lessons from the Trenches

Embrace the Cloud

• Rapid Provisioning and Deployment

– Minutes, not weeks

• API-driven infrastructure

– Automatable and repeatable

– Constrained threat surface

• Pay For What You Use

– No “utilization risk” from owning / renting

– If it’s not in use, spin it down

• Build on Provider’s Scaling and Security Expertise

– Few organizations have the security resources of Amazon or Google

Page 17: Leading a Successful DevOps Transition. Lessons from the Trenches

Embrace the Cloud

• The 2010s of computing are like the 1910s of electric power

• Soon it will be just as common to run your own computing infrastructure as it

is to operate your own electric power generation

Page 18: Leading a Successful DevOps Transition. Lessons from the Trenches

Build a Quality Culture

• Quality, Performance, and Reliability are “Priority-0 features”

– “Stop the line” if there is a degradation

– Equally important to users as product features or engaging user experience

• Developers responsible for

– Features

– Quality

– Performance

– Reliability

– Manageability

Page 19: Leading a Successful DevOps Transition. Lessons from the Trenches

Build a Quality Culture

• Developers write tests and code together

– Continuous testing of features, performance, load

• Tests make better code

– Tests “have your back”

– Confidence to break things

– Confidence to refactor

• Tests help you move faster

– Catch bugs earlier, fail faster

– “Slow down to speed up”

Page 20: Leading a Successful DevOps Transition. Lessons from the Trenches

Build a Quality Culture

• E.g., Development Process at Google

– Code reviews before submission

– Automated tests for everything

– Single searchable source code repository

• Internal Open Source Model

– Not “here is a bug report”

– Instead “here is the bug; here is the code fix; here is the test that verifies the fix”

Page 21: Leading a Successful DevOps Transition. Lessons from the Trenches

Actively Manage Technical Debt

• Maintain sustainable and well-understood level of debt

– Measured by engineering effort to fix

– Plan for how and when you will pay it off

– Track feature work vs. accrued debt over time

• “Don’t have time to do it right” ?

– WRONG -- Don’t have time to do it twice (!)

– The more constrained you are on time and resources, the more important it is to do a

solid job the first time

Page 22: Leading a Successful DevOps Transition. Lessons from the Trenches

Vicious Cycle of Technical Debt

Technical Debt

“No time to do it right”

Quick-and-dirty

Page 23: Leading a Successful DevOps Transition. Lessons from the Trenches

Virtuous Cycle of Investment

Solid Foundation

ConfidenceFaster and

Better

Invest in Quality

Page 24: Leading a Successful DevOps Transition. Lessons from the Trenches

Blameless Post-Mortems

• Post-mortem After Every Incident

– Document exactly what happened

– What went right

– What went wrong

• Open and Honest Discussion

– What contributed to the incident?

– What could we have done better?

Page 25: Leading a Successful DevOps Transition. Lessons from the Trenches

Blameless Post-Mortems

• Take fear and personalization out of it

– Engineers will compete to take personal responsibility (!)

– “Finally we can fix that broken system”

• Focus on Learning and Improvement

– How should we change process, technology, documentation, etc.?

– How could we have automated the problems away?

– How could we have diagnosed more quickly?

– How could we have restored service more rapidly?

Page 26: Leading a Successful DevOps Transition. Lessons from the Trenches

DevOps in Action

• eBay Search Ranking Improvements

– Which item should appear 1st, 10th, 100th, 1000th

– Before: Small number of hand-tuned factors

– Goal: Thousands of machine-learned factors

• Rapid experimentation and feedback

– Deployed hundreds of parallel A|B tests every day

– Full year of steady, incremental improvements

• $120M in incremental eBay revenue

Page 27: Leading a Successful DevOps Transition. Lessons from the Trenches

Not Just for Unicorns

• DevOps practices have become mainstream

• High performance is achievable by any IT organization

• Organizational and cultural change requires a significant investment of time

and effort …

• … but the benefits are well worth it

Page 28: Leading a Successful DevOps Transition. Lessons from the Trenches