It's Not Your Fault - Blameless Post-mortems

36
( Blameless ) post-mortems @jasonhand It’s Not Your Fault

description

A deeper look at why we perform "blameless" post-mortems.

Transcript of It's Not Your Fault - Blameless Post-mortems

Page 1: It's Not Your Fault - Blameless Post-mortems

(Blameless)post-mortems

@jasonhand

It’s Not Your Fault

Page 2: It's Not Your Fault - Blameless Post-mortems

Jason Hand DevOps

“Handyman”

[email protected] !

@jasonhand

@jasonhand

Page 3: It's Not Your Fault - Blameless Post-mortems

A little about me…

Dir. of Platform Support - AppDirect

Dir. of Technical Support - Standing Cloud

Dir. of Operational Systems - American Fasteners, Inc.Hiker, climber, brewer, runner, biker, boarder, surfer, painter, singer, reader, writer, picker, coder, racer, camper, volunteer …. all the usual “Colorado 1-upper” crap.

@jasonhand

Page 4: It's Not Your Fault - Blameless Post-mortems

Alternative names

Also known as:(Note: Public & Internal)

Project Retrospectives

Post-mortem analysis Post-project review

Project Analysis ReviewQuality Improvement Review

Autopsy ReviewSantayana Review

After Action Review

Touchdown Meeting@jasonhand

Page 5: It's Not Your Fault - Blameless Post-mortems

Post-mortemDefined

A process intended to inform improvements by determining aspects that were successful or unsuccessful.

What ?

@jasonhand

Page 6: It's Not Your Fault - Blameless Post-mortems

Post-mortemDefined

As soon as feasible after the Incident is resolved.

When ?

@jasonhand

Page 7: It's Not Your Fault - Blameless Post-mortems

Post-mortemDefined

Everybody

Who ?

@jasonhand

Page 8: It's Not Your Fault - Blameless Post-mortems

Post-mortemDefined

To communicate with your team

Why ?

To understand what happened for learning and improving

@jasonhand

Page 9: It's Not Your Fault - Blameless Post-mortems

Post-mortemDefined

Talk about the incident timeline

Escalation steps

What was done to resolve the problem

Create a remediation plan

Make it available

How ?

@jasonhand

Page 10: It's Not Your Fault - Blameless Post-mortems

The Three R’s

Regret

Acknowledgement and apology

Reason

Initial incident detection to resolution, including the so-called “root causes.”

Remedy

Actionable remediation itemsDave Zwieback

VP Engineering - Next Big Sound

@jasonhand

( simple format )

Page 11: It's Not Your Fault - Blameless Post-mortems

(Remedy)

Specific

Measurable

Agreed Upon/Agreeable

Realistic

Timebound

Use SMART recommendations

Moving from Reaction to Action

@jasonhand

Page 12: It's Not Your Fault - Blameless Post-mortems

Blameless

image from “Across the Universe” @jasonhand

Page 13: It's Not Your Fault - Blameless Post-mortems

2011 - Hired to Standing Cloud

Cool story, bro

Cloud marketplace & automated deployment of apps

Build Support team

Provide Managed services

@jasonhand

Page 14: It's Not Your Fault - Blameless Post-mortems

Cool story, bro

@jasonhand

Page 15: It's Not Your Fault - Blameless Post-mortems

– Sydney Dekker

“Reprimanding bad apples may seem like a quick and rewarding

fix, but it’s like peeing in your pants.

!

You feel relieved and perhaps even nice and warm for a little while,

but then it gets cold and uncomfortable.

!

And you look like a fool”Quote first seen in J. Paul Reed’s “A Look at Looking in the Mirror"

@jasonhand

Page 16: It's Not Your Fault - Blameless Post-mortems

What is a blameless post-mortem?

Team members are accountable but not responsible

Complete Transparency

Deeper look at circumstances

What happened and how to improve it (specific details)

Real conditions of failure in complex systems

@jasonhand

Page 17: It's Not Your Fault - Blameless Post-mortems

– Dave Zwieback

“Your organization must continually affirm that

individuals are NEVER the “root cause” of outages.”

@jasonhand

Page 18: It's Not Your Fault - Blameless Post-mortems

Paraphrased from “Fallible Humans” by Ian Malpass - DevOpsDays - Minneapolis

source: http://www.indecorous.com/fallible_humans/@jasonhand

Page 19: It's Not Your Fault - Blameless Post-mortems

(Efficiency Thoroughness Trade Off)The trade off between:

!

being efficient vs

being thorough

ETTO

Efficient

Thorough

@jasonhand

Page 20: It's Not Your Fault - Blameless Post-mortems

- Ian Malpass

“We can be thorough and really dig into the task at hand and

understand it well but this takes time:

it is inefficient.”

@jasonhand

Page 21: It's Not Your Fault - Blameless Post-mortems

Cause & Effect

There are many factors that played a part in the problem

source: http://xkcd.com

“may be”

@jasonhand

Page 22: It's Not Your Fault - Blameless Post-mortems

Stress & Cognitive

Bias

@jasonhand

Page 23: It's Not Your Fault - Blameless Post-mortems

Yerkes-Dodson Model

source: The Human Side of Postmortems@jasonhand

Page 24: It's Not Your Fault - Blameless Post-mortems

@jasonhand

Page 25: It's Not Your Fault - Blameless Post-mortems

Reduce Stress?

… build muscle memory

Simulate many types of problems and outages as “practice” …

@jasonhand

Page 26: It's Not Your Fault - Blameless Post-mortems

Evaluative Threat

Being negatively judged plays a big role in stress

@jasonhand

Page 27: It's Not Your Fault - Blameless Post-mortems

What is stress surface?

Variables of a situation

Novel or unusual

Unpredictable

Controllable situation

Negative judgement

Lack of sleep

Problems at home

Health

Relationships

@jasonhand

Evaluative threatsALSO

Etc…

Page 28: It's Not Your Fault - Blameless Post-mortems

Capturing the Human-side

Ask questions

@jasonhand

Page 29: It's Not Your Fault - Blameless Post-mortems

Stress Questionnaire

The situation was novel or unusual?

The situation was unpredictable?

You were unable to control the situation?

Others could judge your actions negatively?

0 = Never 1 = Almost Never 2 = Sometimes 3 = Fairly Often 4 = Very Often

During the outage, how often have you felt or thought that:

@jasonhand

Page 30: It's Not Your Fault - Blameless Post-mortems

Why we don’t punish

De-incentivized to give the details

Practically guarantees a repeat of the problem

Understand why actions made sense (at the time)

Create safety AND accountability

Move away from idea of “individuals are problems”

Create new “experts”

@jasonhand

Page 31: It's Not Your Fault - Blameless Post-mortems

@jasonhand

Page 32: It's Not Your Fault - Blameless Post-mortems

Promoting from withinWhere do we start?

• Document your timeline or log data • Document conversations • Leave room for notes • Mean time to resolution / Time calculations • Level of severity • Archive it for historical retrieval • Remediation. Make it actionable

@jasonhand

The basics:

Page 33: It's Not Your Fault - Blameless Post-mortems

ToolsEtsy’s MorgueVictorOps

Post-mortem Report

@jasonhand

Internal Wiki

Page 34: It's Not Your Fault - Blameless Post-mortems

@jasonhand

Seek the truth

Don’t blame others … !

Don’t blame yourself

Thank You

Page 35: It's Not Your Fault - Blameless Post-mortems

Questions ?

@jasonhand

Page 36: It's Not Your Fault - Blameless Post-mortems

Resources

“The Human Side of Postmortems” - Dave Zwieback

“The Field Guide to Understanding Human Error” - Sydney Dekker

“A Look at Looking in the Mirror” - J. Paul Reed

“Fallible Humans” - Ian Malpass (http://www.indecorous.com/fallible_humans/)

“4 Questions to ask for an effective Technical Post Mortem” - Jeffrey O’Brien (http://www.maintenanceassistant.com/blog/4-questions-effective-technical-post-mortem/)

“Nine steps to IT post-mortem excellence” - Michael Krigsman (http://www.zdnet.com/blog/projectfailures/nine-steps-to-it-post-mortem-excellence/1069)

“Postmortem reviews: purpose and approaches in software engineering” - Torgeir Dingsøyr (http://www.uio.no/studier/emner/matnat/ifi/INF5180/v10/undervisningsmateriale/reading-materials/p08/post-mortems.pdf)

“Blameless PostMortems and a Just Culture” - John Allspaw (http://codeascraft.com/2012/05/22/blameless-postmortems/)

“What blameless really means” - Jessica Harllee (http://www.jessicaharllee.com/notes/what-blameless-really-means/)

“Each necessary, but only jointly sufficient” - John Allspaw (http://www.kitchensoap.com/2012/02/10/each-necessary-but-only-jointly-sufficient/)

@jasonhand