The Unrealized Role of Monitoring & Alerting w/ Jason Hand

Post on 13-Jan-2017

159 views 0 download

Transcript of The Unrealized Role of Monitoring & Alerting w/ Jason Hand

The Unrealized Role of:

Monitoring & Alerting@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

THE UNREALIZEDROLE OF:

Monitoring& Alerting

@jasonhand | VictorOps | #AllDayDevOps

JASONHAND

DevOps Evangelist

VictorOps@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

2015MONITORING

SURVEY@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

WHY ARE YOU COLLECTING THIS DATA?NOTE: You may choose more than one▸ Performance analysis and trending▸ Fault and Anomaly detection▸ Capacity Planning▸ A/B Testing

@jasonhand | VictorOps | #AllDayDevOps

THE RESULTSNOTE: Respondents may have chose more than one▸ Performance analysis and trending - 63%▸ Fault and Anomaly detection - 53%▸ Capacity Planning - 45%▸ A/B Testing - 11%

@jasonhand | VictorOps | #AllDayDevOps

Tyranny of the

S.L.A.(Service Level Agreement)

@jasonhand | VictorOps | #AllDayDevOps

HIGHAVAILABILITY

Prediction & Prevention@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

THAT'S IMPORTANT

... BUT ...@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

BUSINESSOBJECTIVES?

@jasonhand | VictorOps | #AllDayDevOps

HAPPY CAMPER@jasonhand | VictorOps | #AllDayDevOps

CUSTOMERSwant more than just

99.999% UPTIME@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

WHERE'S THE

INNOVATION?@jasonhand | VictorOps | #AllDayDevOps

HOW IMPORTANT IS

Learning & Innovation?@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

The result of underutilizing monitoring & alertingis that the IT department and the organization have

no chance to...

LEARN,IMPROVE, ORINNOVATE.

@jasonhand | VictorOps | #AllDayDevOps

CONTINUALLY UNDERSTANDING & RESPONDING TO THE FEEDBACK

from

monitoring, logging, & alertingallows you to use information about events in the past to drive future

actions.

@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

It's not just about

PREDICTION& PREVENTION

@jasonhand | VictorOps | #AllDayDevOps

RESPOND &REPAIR

...QUICKLY@jasonhand | VictorOps | #AllDayDevOps

NOPE

@jasonhand | VictorOps | #AllDayDevOps

MTTRRather Than

MTBF@jasonhand | VictorOps | #AllDayDevOps

FAILURE ISINEVITABLE

@jasonhand | VictorOps | #AllDayDevOps

US·ER/ˈYOOZƏR/

DISTRIBUTED FAULT INJECTION TEST SUITE FOR PRODUCTION.

credit: Leon Fayer (@papa_fire)@jasonhand | VictorOps | #AllDayDevOps

SUCCESSis a result of

FAILURE@jasonhand | VictorOps | #AllDayDevOps

UNDERSTAND

LEARNINNOVATE

@jasonhand | VictorOps | #AllDayDevOps

RE·SIL·IENT/RƏˈZILYƏNT/

The ability to resist, absorb, recover from or successfully adapt to adversity or a change in conditions

@jasonhand | VictorOps | #AllDayDevOps

CHANGEcan cause failure

but innovation requires

CHANGE

@jasonhand | VictorOps | #AllDayDevOps

CONFLICT@jasonhand | VictorOps | #AllDayDevOps

CHANGEREQUIRED

@jasonhand | VictorOps | #AllDayDevOps

Without deviation from the norm, progress is not possible

— Frank Zappa

@jasonhand | VictorOps | #AllDayDevOps

What Did You

LEARNFrom the Recovery Efforts?

(including monitoring & alerting)@jasonhand | VictorOps | #AllDayDevOps

POSTMORTEMS / LEARNING REVIEWS:Stories of:

WHAT TOOK PLACEleading up to & during

the disruption & recovery efforts@jasonhand | VictorOps | #AllDayDevOps

WHO WASINVOLVED?

@jasonhand | VictorOps | #AllDayDevOps

WHAT DID THEY

SEE?@jasonhand | VictorOps | #AllDayDevOps

WHAT WAS

SAID?@jasonhand | VictorOps | #AllDayDevOps

WHAT

ACTIONSWERE TAKEN?

jhand.co/chatopsbook

@jasonhand | VictorOps | #AllDayDevOps

HOW DOevents & actions

CORRELATEOVER TIME?

@jasonhand | VictorOps | #AllDayDevOps

5 Why's@jasonhand | VictorOps | #AllDayDevOps

5 Why's@jasonhand | VictorOps | #AllDayDevOps

WHAT IS THE "cause"OF THE PROBLEM?

Root Cause is ...

@jasonhand | VictorOps | #AllDayDevOps

OUR...

obsession with

"Root Cause"@jasonhand | VictorOps | #AllDayDevOps

ASKING "WHY".. leads to ..

BLAME@jasonhand | VictorOps | #AllDayDevOps

BLAMINGLEADS TO..

operators hiding relevant & important information

@jasonhand | VictorOps | #AllDayDevOps

We must

BELIEVEthat our operators are doing their best given the

constraints of the "system"@jasonhand | VictorOps | #AllDayDevOps

"We are here to"

LEARNFrom Failure

(and success)@jasonhand | VictorOps | #AllDayDevOps

RATHER THAN ..@jasonhand | VictorOps | #AllDayDevOps

AVOIDFAILURE

@jasonhand | VictorOps | #AllDayDevOps

WHAT'S THE

STORY?@jasonhand | VictorOps | #AllDayDevOps

INNOVATELearning from both success & failure

to develop & implementsmall incremental improvements

is critical.@jasonhand | VictorOps | #AllDayDevOps

MONITORING &ALERTINGHelps us understand the story in greater detail

@jasonhand | VictorOps | #AllDayDevOps

LEARNINGORGANIZATION

@jasonhand | VictorOps | #AllDayDevOps

Learning does NOT come from

READING&

LISTENING@jasonhand | VictorOps | #AllDayDevOps

Learning comes from

DOING@jasonhand | VictorOps | #AllDayDevOps

Real Learning comes from:

OBSERVINGORIENTINGDECIDINGACTING

John Boyd's OODA Loop@jasonhand | VictorOps | #AllDayDevOps

Example:

LEARNING TO PLAY THE

DOBRO GUITAR@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

LEARNING

@jasonhand | VictorOps | #AllDayDevOps

WHY?Go from knowing...to understanding...

to learning

NOTE:(Requires making mistakes)

@jasonhand | VictorOps | #AllDayDevOps

We will trade some uptime in exchange for innovation-Dave Hahn (Netflix)

DevOpsDays Boise 2016@jasonhand | VictorOps | #AllDayDevOps

SHIFT OUR GAZEfrom:

MAINTAINING& PROTECTING

@jasonhand | VictorOps | #AllDayDevOps

LEARNINGWhich leads to...

IMPROVING& INNOVATING

@jasonhand | VictorOps | #AllDayDevOps

WE INCREASE VALUE OF:

- Monitoring & Alerting- IT teams

- Products & Services- Organization

@jasonhand | VictorOps | #AllDayDevOps

HYPOTHESIZEEXPLORESTRETCH

EXPERIMENTFAIL

LEARNTry Again

@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

LEARNING & INNOVATINGleads to uncovering new ways of

BUILDING, DEPLOYING, AND MAINTAINING SOFTWARE & INFRASTRUCTURE

Which leads to...@jasonhand | VictorOps | #AllDayDevOps

RESILIENTSYSTEMS

@jasonhand | VictorOps | #AllDayDevOps

The

By-productof a highly

RESILIENTsystem is ...

@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

HIGHLYAVAILABLE

SYSTEM@jasonhand | VictorOps | #AllDayDevOps

THE UNREALIZEDROLE OF:

Monitoring& Alerting is ....

@jasonhand | VictorOps | #AllDayDevOps

LEARNING&

INNOVATION@jasonhand | VictorOps | #AllDayDevOps

THANKYOU

Be Victorious!@jasonhand | VictorOps | #AllDayDevOps

@jasonhand | VictorOps | #AllDayDevOps

References:

Monitoring Survey: https://kartar.net/2015/08/monitoring-survey-2015---metrics/

Firefighter: https://www.learyfirefighters.org/wp-content/uploads/2013/09/cover-slide-1.jpg

Mechanic: https://upload.wikimedia.org/wikipedia/commons/4/4b/Flickr_-_Israel_Defense_Forces_-

_Airplane_Technician,_March_2010.jpgGnome Plan: http://www.nerdfitness.com/wp-content/uploads/2012/04/Screen-Shot-2012-03-30-at-3.15.38-AM-1024x7591.jpgNOC: https://upload.wikimedia.org/wikipedia/commons/0/03/

@jasonhand | VictorOps | #AllDayDevOps

References:

Kodak: http://file.answcdn.com/answ-cld/image/upload/v1/tk/brand_image/b59911fc/

91d6e71d30a0878dfe3cb30a22751cb874a3ea8c.jpegVW Camper: https://upload.wikimedia.org/wikipedia/commons/d/d7/

VW_Camper.jpgBlockbuster: https://jordanandeddie.files.wordpress.com/2013/11/

blockbuster-feature.jpgBorders: http://smashingtops.com/wp-content/uploads/2012/06/

borders_logo1.jpg@jasonhand | VictorOps | #AllDayDevOps

Chained Hands: https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahUKEwjgrNCD

h5TMAhXJs4MKHaoZDssQjBwIBA&url=http%3A%2F%2Fwww.publicdomainpictures.net%2Fdownload-picture.php

%3Fadresar%3D50000%26soubor%3Dhands-in-chains.jpg%26id%3D40426&bvm=bv.119745492,d.amc&psig=AFQjCNFIdnDPzSqiLA-

znIW5SCTCUHhqEw&ust=1460926880336203Inevitable: http://vignette4.wikia.nocookie.net/matrix/images/5/51/

SMITH.png/revision/latest?cb=20110214092002Bulb: https://smhttp-ssl-37293.nexcesscdn.net/media/catalog/

@jasonhand | VictorOps | #AllDayDevOps

Accident Free:http://www.compliancesigns.com/media/digital-scoreboard/1000/Safety-Awareness-Sign-DSE-195271000.gif

Stewie:http://chroniclesofredmark.com/wp-content/uploads/2014/01/

Stewie.gifchange: http://i.imgur.com/EQyC6N3.gif

Hard drive: https://i.imgur.com/pWsKSEf.gifChange: https://farm6.staticflickr.com/

5208/5270199049df99b234e9od.jpgValue: https://d13yacurqjgara.cloudfront.net/users/6437/

screenshots/1405551/value-cropped.gif@jasonhand | VictorOps | #AllDayDevOps