Give Me the Bad News Straight:  Why Models are a Broken Approach to Alerting

Post on 23-Jan-2018

537 views 1 download

Transcript of Give Me the Bad News Straight:  Why Models are a Broken Approach to Alerting

TechTalk:GiveMetheBadNewsStraight: WhyModelsareaBrokenApproachtoAlerting

DavidB.Martin

DevOps:AgileOps

CATechnologiesAPMProductManagerDO5T41T

#CAWorld

2 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

GiveMetheBadNewsStraight:WhyModelsareaBrokenApproachtoAlerting

The industry standard approach to automatic alerts is to create modelsfrom base-lining application latencies. But when something goes wrong,is it because something is really broken or because the model wasincorrect? Training the model to avoid mistakes is complex and time-intensive. CA Application Performance Management (CA APM) 10replaces the whole approach with a brand new one: react to changes inapplication stability as they occur. Outliers are automatically ignored,while tremors in latency register progressively bigger values for theintensity of an event, a little like the richter scale for earthquakes. Jointhe discussion and learn how CA APM transforms automatic alerting.

DavidB.MartinCATechnologiesProductManager

3 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

©2015CA.Allrightsreserved.Alltrademarksreferencedhereinbelongtotheirrespectivecompanies.

Thecontentprovidedinthis CAWorld2015presentationisintendedforinformationalpurposesonlyanddoesnotformanytypeofwarranty. The informationprovidedbyaCApartnerand/orCAcustomerhasnotbeenreviewedforaccuracybyCA.

ForInformationalPurposesOnlyTermsofthisPresentation

4 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Agenda

WHYMODELSAREFAILING

ABRIEFHISTORYOFAPMALERTING

CATECHNOLOGIESDIFFERENTIALANALYSIS

MODELSAREMADETOBEBROKEN

DATA-DRIVENDIVEINTOAUTOMATICALERTINGMODELS

SHEWHARTSAVESTHEDAY

1

2

3

4

5

6

5 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Keepingmypromise!

§ Iwillbeginthissessionbymakingadetailed,data-centriccaseforwhyCATechnologiesnewdifferentialanalysisfeatureisasuperior,market-leadingapproachtoautomaticalerting.

§ No,Iwillnotthenpullarabbitoutofahat.‘Cuz thisain’tmagicpeople…evenifitlookslikemagic.

§ “Anysufficientlyadvancedtechnologyisindistinguishablefrommagic.”—A.C.Clarke

6 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

WhatwasCA’slastanswer?

§ Intheearly90s,WilyimplementedHolt’sLinearExponentialSmooth(HLES)tocalculatebaselines for metrics.

§ Baselineswerefooledbyregularproductionevents—manyweremoreaboutregularpatternsinloadthanaboutmaintenanceevents.Seasonalitydebutstoaddressit.

§ Thisleadstorules—andrulesengines—toaddressedgecasesthatseasonalitydoesnotaddress(e.g.“+3std dev frombaseline”todeadenthesensitivityoftriggers).

Andwhatareourcompetitorsdoing?

7 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

What’stheproblemwiththestate-of-the-art?

§ Asthefollowingslideswillexplain,seasonalbaselinesmissproblemsthatyoudon’twanttomiss.

§ Inevitably,theyalsoreporttoooften.

§ Whentheydo,youhavetowriterulesresolvetheissuewithyourissues.

§ Nowyou’vefailedtofindtheautomaticalertinggrail.

§ Itmayactuallybemoreefficienttogobacktowritingstaticthresholdsforyourkeycomponents.

Or,agood reasonforteachingyousomeinterestingmath.

8 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD440

460

480

500

520

540

560

580

600

620

AverageResponseTime

+1StdDev

+2StdDev

+3StdDev

Thisisastableapplicationresponsetime,withbandsofstandarddeviation.Mostbaselinesarefancyformsofstandarddeviationthattakeintoaccount thingslikeseasonality.

9 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD0

200

400

600

800

1000

1200

1400

1600

1800

Anoutlier…Whattodo?Ifit’sinaseasonalwindow,ithastobeabiggeroutlier,buttheproblemof,“ToAlertorNottoAlert,”remains

thesame.

Youmusteithersendanalertforthissinglespikeorwritearuletosaythatthespikehastobe“sobig”beforeyoucare(whichisusuallydonewithamanuallywrittenrulelike

“>3stddev”).

“Mr.Opswon’tevenputdownhissandwichforasinglefailedtransaction.”

10 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD0

500

1000

1500

2000

2500

Whatabout thesituationofasustainedspike?

Supposedly, seasonalitycancelsout thenormaloperations.Buthowmanyofyouhaveappsinwhichasingleuserlogsinandstartsrunningexpensive(e.g.reporting)transactions?

Traditionalapproachhastoagaindecide:whentoalert?Ifappusersloginatirregularintervalsandperformthistypeoftransaction,thentriggeringalertson theirnormal(non-seasonal)activity?

“catalerts/dev/null”.

Buthowlongdoyouwaitthen?Onceagain,adecisionyou havetomakeand

configureforeachofyourapps.

11 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD0

500

1000

1500

2000

2500

3000

Betterhope thatsustained,normalchangesinresponsetimeareseasonalwhentheyhappen…Ifnot,youmustwriterules!

Andifyouwriterules,youmightaccidentallydeadenthethresholdtoactualproblems.Dang,gum!

12 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

OurHero:WalterShewhart

§ Inthe1920s,WalterShewhart etalworkedonqualitycontrolforburiedtelephonelines.

§ Shewhart observedthatwhileeverylinedisplaysvariation,somelinesoccasionallydisplayuncontrolledvariation.Likeaseismometer,therearenormalfluctuationsandthenthereareearthquakes.

§ Shewhart inventedcontrolchartsandtheWesternElectricRulestoidentifyuncontrolledvariance,earninghimselfthetitle:“FatherofStatisticalQualityControl.”

13 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Translationplease!

§ Shewhart taughtustofavorrealtimeobservationovermathematicalmodelsofasignal’sbehavior.

§ Westillbaselinethesignal,buttheWesternElectricRulesdefinethesituationsinwhichthesignalshouldbeconsideredinabadstateandnotasimpledeltafromthebaselinemodel.

§ Shewhart’smethodofcharacterizingthequalityofasignalmirrorsthebehaviorofahumanobserver.

Trustus,youwillunderstand thismath.

14 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Shewhart’s WesternElectricRulesStraightoffWikipedia…

ThecanonicalWesternElectricRulesuseplain,oldstandarddeviationastheirrealtimemeasure.Eachruleidentifiesapatterninthesignal:

Rule#1– Astatisticallyinterestingoutlier

Rule#2– Twosomewhatinterestingoutliersoutofthreemeasurements.

Rule#3– Foursmalleroutliersoutoffivemeasurements.

Rule#4– Manysmalloutliersovermanymeasurements.

Thismuchweflatoutstolefrommathhistory!

SeeCommentstotheright

15 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

CATechnologiesInnovation

§ WesternElectricRulesarebrilliantforbothrealtimeanalysisoftelephonesignalsandapplicationsignals.

§ Asinglerulebreach,however,istoodullabladeforslicingthroughthistoughproblem.

§ Byassigningweightstoeachrulebreach,keepingarunningsumandagingoutoldbreaches,wecanproduceasingle,normalizedvalueforvarianceintensity.

CAAPM10hasseveralpatentspending.

16 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Inabusysystem,therearealwaysvaryinglevelsofstability.

Inthispicture,canyou tellwhichsignalsareleaststable?

17 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Thissignalexperiencedanoutlier,butitdidn’tturnblue.

Asinglerulebreachisn’tenough for“Petetoputdownhissandwich.”

18 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Inthiscase,thechangeinstabilitywassustainedoveraboutfortyminutes.

Whathappened? Click tofindout…

19 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Thisapplicationexperiencedaremarkabledegradationinperformanceoveraforty-minuteperiodoftime.

Botholdandournewapproachwouldalerthere,butCA’salertwouldhappenearlyintheeventandtriggertracecollectionautomatically.

Theoldapproachmightnothaveletanoperatorknowforthirtyminutesormore,basedontherulestheyconfigured.

20 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Triageisabattlefieldmedicineterm:wherearethewoundedsoldiers?

CA’sapproachmeansidentifyingchronicproblemsaswellasacuteones.Whichoftheselinesaremorestable,but stillhavingchronicstabilityeventsatregularintervals?

21 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

22 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

23 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

DifferentialAnalysisDefaultConfiguration

24 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

25 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

CATECHNOLOGIESTEAMPEGASUSClockwisefromleft:

PrashantPathak,MarkLoSacco,WeiniYu,PrasannaRamVenkatachalam,NareshChippada,CareyFeldstein,

PaulCallahanandSai KrishnaRayanapati.[notpictured:me]

26 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

RecommendedSessions

SESSION# TITLE DATE/TIME

DO5X189SHowtoAchieveaCustomer-Centric ViewinanOmni-ChannelWorld 11/18/2015 at1:00pm

DO5X194SMonitorMicroservices, Containers, Cloud Foundry andNodewithCAApplication PerformanceManagement 11/18/2015 at4:30pm

DO5X193SCustomizeCAApplicationPerformanceManagementwithTipsforUsingtheCAApplicationPerformanceManagementOpenAPIs

11/19/2015 at4:30pm

27 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MustSeeDemos

ApplicationPerformanceManagementandDevOps,featuringAPMuseinpreproduction scenarios

ApplicationPerformanceManagementTheater5

ApplicationPerformanceManagement,ModernMonitoring, featuringthenewAPMTeamCenter

ApplicationPerformanceManagementTheater5

Ensuringa“5star”mobileappexperiencewithCAMobileAppAnalytics

MobileAppAnalyticsTheater5

UnifiedMonitoring:APMIntegrationsincludingUIM

ApplicationPerformanceManagementTheater5

28 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

FollowOnConversationsAt…

SmartBarApplicationPerformanceManagementTheater5

TechTalksApplicationPerformanceManagementTheater5

QuestionandAnswerDAVID.B.MARTIN@CA.COM