Brainstorming failure

31
Brainstorming Failure

Transcript of Brainstorming failure

Page 1: Brainstorming failure

Brainstorming Failure

Page 2: Brainstorming failure

Bio - Jeff Smith

• Manager, Site Reliability Engineering at Grubhub

• Yes, we are also hiring.

• Yes, there is free food. Yes, it's totally awesome to work here.

Email: [email protected]: @DarkAndNerdyBlog: http://www.allthingsdork.com

Page 3: Brainstorming failure

Systems Metaphor

Page 4: Brainstorming failure

First Class Luxuries

Page 5: Brainstorming failure

Top of the Line Internals

Page 6: Brainstorming failure

The Cockpit

Page 7: Brainstorming failure

WAT?

Page 8: Brainstorming failure

The Cockpit

Page 9: Brainstorming failure

The cockpit you're expecting

Page 10: Brainstorming failure

FEEDBACK!

Page 11: Brainstorming failure

What Do You Measure?

Page 12: Brainstorming failure

FMEAFailure Mode Effects Analysis is a step-by-step approach for identifying the possible ways a process, product or service might fail. The process is commonly leveraged in quality organizations across a wide range of industries.

Page 13: Brainstorming failure

FMEA in Software EngineeringWe can use FMEA in a number of ways in software to help us brainstorm, rank and prioritize different actionable bits about the system. The process will help us

• Identify key metrics that need tracking

• Identify monitoring and or alerts that need to be created

• Identify necessary feedback loops

Page 14: Brainstorming failure

The Case for Cross-Functional Teams

Page 15: Brainstorming failure

The Process1. Examine the process

2. Brainstorm potential failures

3. List potential effects of failure

4. Identify Your Scale

5. Assign Severity ranking

6. Assign Occurrence ranking

7. Assign Detection ranking

Page 16: Brainstorming failure

Examine the Process

Page 17: Brainstorming failure

Brainstorm Potential Failures• Brainstorming should be fluid. Everything goes

• Cross-Functional teams should be involved. (Business, development, operations, design)

Page 18: Brainstorming failure

List Potential Effects of FailureThink through the impact of failure. The impact might be something process related, reputation related or technical, just to name a few. Examples:

• Degraded customer experience

• Order not fulfilled

• Delay in payment to accounts receivable

Page 19: Brainstorming failure

Agree on Risk Level ScalesTechnology Industry

• Low severity could be degraded performance

• High severity could be complete site outage

Airline Industry

• Low severity could be departure delay

• High severity could be customer death

Page 20: Brainstorming failure

Assign Severity RankingRank the severity on a scale between 1-10.

• 1 being the severity is inconsequential

• 10 being a catastrophic failure

In some organizations, 9 and 10 are reserved for personal injury and death.

If a failure mode has more than one effect, select only the most severe of the effects

Page 21: Brainstorming failure

Assign Occurrence RankingRank the likelihood that this condition will occur.

• 1 being extremely unlikely

• 10 being inevitable.

Page 22: Brainstorming failure

Assign Detection RankingRank the likelihood that this condition would be detected if it occurred. A scenario is only considered "detected" if it is found before it would impact a customer or user.

• 1 means the control would absolutely be detected

• 10 means the control is certain to not detect the failure.

Page 23: Brainstorming failure

Calculate the Risk Priority NumberThe Risk Priority Number is a value that is calculated to rank a particular failure mode. The higher the RPN the sooner the failure mode should be addressed

Page 24: Brainstorming failure

RPN = S * O * D

Page 25: Brainstorming failure

Develop an Action PlanEvaluate the list and develop an action plan to eliminate or mitigate the items with the highest RPN value first.

• Prioritize solutions that are self-healing and exist within the system under consideration.

• Develop metrics that help to track the health surrounding a failure item

• The goal is to reduce the RPN by lowering Severity, Occurrence or Detection scores

Page 26: Brainstorming failure

Ensuring You Have a Feedback LoopThe feedback loop is a constant evaluation of these measurements and indicators. The feedback loop should give a strong indicator that the system is working as expected, while at the same time exposing trends in the environment.

Page 27: Brainstorming failure

Leading and Lagging IndicatorsLeading Indicator - A measurable factor that changes before the system enters a particular state of failure. (Metrics)

Lagging Indicator - A measurable factor that changes after the system enters a particular state of failure. (Logs/Reporting)

Page 28: Brainstorming failure

Recap• Examine your process, and assemble a cross-functional

team with different views of the system

• Brainstorm all your potential failure modes

• Calculate your RPN

• Develop action plans to reduce risk. Ensure the system is providing feedback loops to be able to identify the current state of the system

• Profit

Page 30: Brainstorming failure

Thanks!

Page 31: Brainstorming failure