Machine Learning + Analytics

43
Copyright © 2016 Splunk Inc. Machine Learning + Analytics in Splunk Beau Morgan Senior Sales Engineer – Security SME 1 of 1 billion

Transcript of Machine Learning + Analytics

Page 1: Machine Learning + Analytics

Copyright©2016SplunkInc.

MachineLearning+AnalyticsinSplunkBeauMorganSeniorSalesEngineer– SecuritySME

1of1billion

Page 2: Machine Learning + Analytics

2

DisclaimerDuringthecourseofthispresentation,wemaymakeforwardlookingstatementsregardingfuture

eventsortheexpectedperformanceofthecompany.Wecautionyouthatsuchstatementsreflectourcurrentexpectationsandestimatesbasedonfactorscurrentlyknowntousandthatactualeventsorresultscoulddiffermaterially.Forimportantfactorsthatmaycauseactualresultstodifferfromthose

containedinourforward-lookingstatements,pleasereviewourfilingswiththeSEC.Theforward-lookingstatementsmadeinthethispresentationarebeingmadeasofthetimeanddateofitslivepresentation.Ifreviewedafteritslivepresentation,thispresentationmaynotcontaincurrentoraccurateinformation.Wedonotassumeanyobligationtoupdateanyforwardlookingstatementswemaymake.Inaddition,anyinformationaboutourroadmapoutlinesourgeneralproductdirectionandissubjecttochangeatanytimewithoutnotice.Itisforinformationalpurposesonlyandshallnot,beincorporatedintoanycontractorothercommitment.Splunkundertakesnoobligationeithertodevelopthefeaturesor

functionalitydescribedortoincludeanysuchfeatureorfunctionalityinafuturerelease.

Page 3: Machine Learning + Analytics

3

MachineLearningandYou

Page 4: Machine Learning + Analytics

4

Agenda

• MachineLearning101&Splunk

• DemooftheMachineLearningToolkit&Showcase

• HowtobesuccessfulwithML+Splunk

Page 5: Machine Learning + Analytics

5

WhydoweneedMachineLearning?

- ImproveDecisionMaking,ImproveFutureActions

- ForecastorPredictKPIs,AlertonDeviation

- Uncoverhiddentrendsorrelationships

AllofthisrequiresDiverseDatafromacrossManySilos.LotsofUnstructured,RealTimeData.

Page 6: Machine Learning + Analytics

6

RunTheBusinessinReal-time

DataFromthePast Real-timeData StatisticalForecastT– afewdays T+afewdays

SecurityOperationsCenter

ITOperationsCenter

BusinessOperationsCenter

Predictive(Models)

Descriptive(BITools,DataLakes) Greyspace

Page 7: Machine Learning + Analytics

7

What is “Learning”?[Prediction]• When we see thick clouds and an overcast sky, we

predict that it’s likely going to rain

[Estimation/ Regression]• Estimate how much an apartment costs based on its

location, condition and prices of properties in that neighborhood

[Classification/ Clustering]• Determine the gender of a person based on her/his

features, hair style and the way s/he dresses

[Anomaly Detection] • Identify the odd one out

[Reinforcement Learning]• If I made a mistake this time, can I do better next time?

Allofushavehadsomeexperienceinlearning.But…what’sbehindourexperience?Howdowetranslatethatknowledgetocode?

Page 8: Machine Learning + Analytics

8

MachineLearning101:Whatisit?MachineLearningisaprocessforgeneralizingfromexamples

Examples=exampleor“training”dataGeneralizing=building“statisticalmodels”tocapturecorrelationsProcess=neverquitedone,wekeepvalidating&refittingmodelsforincreasingaccuracy

SimpleMachineLearningworkflow:ExploredataFITmodelsbasedondataAPPLYmodelsinproductionKeepvalidatingmodels

“Allmodelsarewrong,butsomeareuseful.”

Page 9: Machine Learning + Analytics

9

ML101:ExistingApplicationsRecall:EXPLORE>FIT>APPLY>VALIDATE>REPEAT

• Facedetection:findfacesinimages

• Spamfiltering:identifySPAMmessages

• ShoppingRecommendations:predictwhatcustomerswouldliketobuy

• Frauddetection:identifycreditcardtransactionswhichmaybefraudulentinnature

• Weatherforecast:predictwhetherornotitwillraintomorrow;estimatedailymax/min

Page 10: Machine Learning + Analytics

10

ThreeTypesofMachineLearning1.Supervised Learning:generalizingfromlabeled data

OR?

Gatherdata:• Dimensions• StemLength• Color• Etc.

Page 11: Machine Learning + Analytics

11

ThreeTypesofMachineLearning2.Unsupervised Learning:generalizingfromunlabeled data

Willmyhomesell?Gatherdata:• Squarefeet• Levels• Parksnearby• Schools• Zipcode• Etc.

Page 12: Machine Learning + Analytics

12

ThreeTypesofMachineLearning3.ReinforcementLearning:generalizingfromrewards intime

RecommendationEngines

Page 13: Machine Learning + Analytics

13

OverviewofMLatSplunk

13

CorePlatformSearch PackagedPremiumSolutions

CustomML

PlatformforOperationalIntelligence

Page 14: Machine Learning + Analytics

14

SearchIncludesMachineLearningCorePlatformSearchisapowerfulandhighlyflexibleinterfacebuiltwithML

anomalydetection

Page 15: Machine Learning + Analytics

15

SplunkITServiceIntelligence

GetData Defineservices,entitiesandKPIs

Monitorandtroubleshoot

Analyzeanddetect

Data-Defined,Data-DrivenServiceInsights

PackagedML:AdaptiveThresholdsandAnomalyDetection

OneofseveralPremiumSolutions

Page 16: Machine Learning + Analytics

16

SplunkMachineLearningToolkit

Assistants: Guidemodelbuilding,testing,&deploymentforcommonobjectivesShowcases: Interactiveexamplesforover25typicalIT,security,business,IoTusecases

Algorithms: 25+standardalgorithmsavailableprepackagedwiththetoolkitSPLMLCommands:Newcommandstofit,testandoperationalizemodelsPythonforScientificComputingLibrary:300+opensourcealgorithmsavailableforuse

Buildcustomanalyticsforanyusecase

ExtendsSplunkplatformfunctionsandprovidesaguidedmodelingenvironment

Page 17: Machine Learning + Analytics

17

What’sNewin2.0?

17

• Newnameandabbreviation• Noeventlimits(removalof50Klimitonfittingmodels)

• Configurableresourcecapsviamlspl.conf

• Searchheadclusteringsupport• Distributed/streamingapply• Scheduledfit• Newalgorithms(seeSlide7)

– Featureengineeringandselection– Stochasticgradientdescent(e.g.)– ARIMA

• Multi-algorithmsupportacrossAssistants

• Scatterplotmatrixviz• Alerting• Tooltips• In-apptours• ClusterNumericEventsassistant

Page 18: Machine Learning + Analytics

18

SplunkMLAlgorithms(v2.0,.Conf2016)• ARIMA• SGDClassifier• SGDRegressor• DecisionTreeClassifier• DecisionTreeRegressor• AdaBoostRegressor• BernoulliNB• Birch• DBSCAN• ElasticNet• FieldSelector• GaussianNB• KMeans

• KernelPCA• KernelRidge• Lasso• LinearRegression• LogisticRegression• OneClassSVM• PCA• RandomForestClassifier• RandomForestRegressor• Ridge• SVM• SpectralClustering• TFIDF• StandardScaler

Page 19: Machine Learning + Analytics

Copyright©2016SplunkInc.

ToTheDemo!

Page 20: Machine Learning + Analytics

Copyright©2016SplunkInc.

SuccessWithML:TheProcess

Page 21: Machine Learning + Analytics

DomainExpertise(IT,Security,…)

DataScienceExpertise

SplunkExpertise

CustomMachineLearning– SuccessFormula

Identifyusecases

Drivedecisions

Setbusiness/opspriorities

SPL

Dataprep

Statistics/mathbackground

Algorithmselection

Modelbuilding

SplunkMLToolkitfacilitatesandsimplifiesviaexamples&guidance

Operationalsuccess

Page 22: Machine Learning + Analytics

22

Summary:TheMLProcess

1. Getallrelevantdatatoproblem

2. Exploredata,andfitpredictivemodelsonpast/real-timedata

3. Apply&validatemodelsuntilpredictionsareaccurate

4. ForecastKPIs&notableeventsassociatedtousecase

5. SurfaceincidentstoXOps,whoINVESTIGATES&ACTS

Problem:<Stuffintheworld>causesbigtime&moneyexpense.ValueHypothesisSolution:BuildMLmodeltoforecast<possibleincidents>,actpre-emptively&learn

Operatio

nalize

Page 23: Machine Learning + Analytics

23

MachineLearningProcesswithSplunk

23

CollectData

Explore/Visualize

Model

Evaluate

Clean/Transform

Publish/Deploy

props.conf,transforms.conf,DatamodelsAdd-onsfromSplunkbase,etc.

Pivot,TableUI,SPLMLToolkit

Alerts,Dashboards,Reports

Page 24: Machine Learning + Analytics

24

StepstoBuildingYourOwnMLAppPrioritize&solvethebigproblems:

DatacenterorcriticalinfrastructurefailingHard-to-find,high-riskbehaviors

UseALLdatatohelpsolveproblems:E.g.,can’tidentifyappcrasheswithoutappdataEnrichmachinedatawithtickets,appdata,DB,etc.

Findthestakeholders:Whoownstheseproblems?Whowillinvestinyoutobuildasolution?

Solutionsnotscienceprojects:Ifit’smission-critical,treatitassuch(Dev->QA->Prod)Prototype:buildsimpleMVPs,showvalue,iterate

Page 25: Machine Learning + Analytics

25

Fit,Apply&ValidateModelsMLSPL – NewgrammarfordoingMLinSplunkfit – fitmodelsbasedontrainingdata– [training data] | fit LinearRegression costly_KPI from

feature1 feature2 feature3 into my_model

apply – applymodelsontestingandproductiondata– [testing/production data] | apply my_model

ValidateYourModel (TheHardPart)– Whyhard?Becausestatisticsishard!Also:modelerror≠realworldrisk.– Analyzeresiduals,mean-squareerror,goodnessoffit,cross-validate,etc.– TakeSplunk’sAnalytics&DataScienceEducationcourse

Page 26: Machine Learning + Analytics

26

MLCommandsinCoreSPL

Cluster – groups events together based on how textually similar they are to each other.

Anomalies – finds events or field values that are unusual or unexpected

Predict - forecasts values for one or more sets of time-series data

Kmeans - Kmeans clustering on events.

Anomalousvalue - anomaly score for each field of each event, relative to the values of this field across other events.

Anomalydetection - identifies anomalous events by computing a probability for each event and then detecting unusually small probabilities.

• X11 - exposes seasonal trend in your time series.

Associate – Change in entropy between two fields.

Findkeywords - Given a set of numbered groups (from say cluster) calculates the common words found in each cluster.

Analyzefields - What is the ability of a set of fields to predict a single field. Univariate analysis.

And lots more

ReferenceDocs->http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/ListOfSearchCommands

Page 27: Machine Learning + Analytics

27

Don’tForget:80%ofDataScienceisDataMunging

Trendline – Moving Averages of fields

Erex- Use the erex command to extract data from a field when you do not know the regular expression to use

Correlation – Co-occurence NOT correlation as per Pearson et., don’t mix this up.

Autoregress – Copies one or more previous values for a field into an event.

Contingency - Frequency distribution matrix.

Cofilter - find how many times field1 and field2 values occurred together.

ReferenceDocs->http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/ListOfSearchCommands

Stats, Eventstats, Streamstats,Timechart, Chart – stats reporting

Eval - evaluate new fields

And lots more

Page 28: Machine Learning + Analytics

28

WhatNow?

28

• GettheMachineLearningToolkitfromSplunkbase• GowatchMachineLearningVideosonSplunkYoutube Channelhttp://tiny.cc/splunkmlvideos• GotoMachineLearningstalks:https://conf.splunk.com/

– AdvancedMachineLearninginSPLwiththeMachineLearningToolkitbyJacobLeverich– ExtendingSPLwithCustomSearchCommandsandtheSplunkSDKforPythonbyJacobLeverich

• SeveralCustomersandPartnerTalks– Cisco,Scianta Analytics,AsianTelco,etc.

• EarlyAdopterAndCustomerAdvisoryProgram:[email protected]• ProductManager:[email protected]• FieldExpert:[email protected]

http://tiny.cc/splunkmlapp

Page 29: Machine Learning + Analytics

Copyright©2016SplunkInc.

Thankyou!

Page 30: Machine Learning + Analytics

Copyright©2016SplunkInc.

Appendix

Page 31: Machine Learning + Analytics

Copyright©2016SplunkInc.

CustomerStories

Page 32: Machine Learning + Analytics

32

MachineLearningCustomerSuccess

NetworkIncidentDetectionServiceDegradationDetection Security/FraudPrevention

PrioritizeWebsiteIssuesandPredictRootCause

PredictGamingOutagesFraudPrevention

MachineLearningConsultingServices AnalyticsAppbuiltonMLToolkit

Optimizingoperationsandbusinessresults

CellTowerIncidentDetectionOptimizeRepairOperations

Entertainment Company

15

Page 33: Machine Learning + Analytics

33

MLToolkitCustomerUseCases

33

Speedingwebsiteproblemresolutionbyautomaticallyrankingactionsforsupportengineers

Reducingcustomerservicedisruptionwithearlyidentificationofdifficult-to-detectnetworkincidents

Minimizingcelltowerdegradationanddowntimewithimprovedissuedetectionsensitivity

Improvingcelltoweruptimeandreducingrepairtruckroleswithanomalydetectionandrootcauseanalysis

Predictingandavertingpotentialgamingoutageconditionswithfiner-graineddetection

EnsuringmobiledevicesecuritybydetectinganomaliesinIDauthentication

PreventingfraudbyIdentifyingmaliciousaccountsandsuspiciousactivitiesEntertainment Company

Page 34: Machine Learning + Analytics

34

DetectNetworkOutliersReduceddowntime+increasedserviceavailability=bettercustomersatisfaction

34

MLUseCase Monitornoiserisefor20,000+celltowerstoincreaseserviceanddeviceavailability,reduceMTTR

Technicaloverview • Acustomizedsolutiondeployedinproductionbasedonoutlierdetection.• Leveragepreviousmonthdataandvotingalgorithms

“TheabilitytomodelcomplexsystemsandalertondeviationsiswhereITandsecurityoperationsareheaded…SplunkMachineLearninghasgivenusaheadstart...”

Page 35: Machine Learning + Analytics

35

ReliablewebsiteupdatesProactivewebsitemonitoringleadstoreduceddowntime

35

“SplunkMLhelpsusrapidlyimproveend-userexperiencebyrankingissue severitywhichhelpsusdeterminerootcausesfasterthusreducingMTTRandimprovingSLA”

• Veryfrequentcodeandconfig updates(1000+daily)cancausesiteissues• Finderrorsinserverpools,thenprioritizeactionsandpredictrootcause

• CustomoutlierdetectionbuiltusingMLToolkitOutlierassistant• BuiltbySplunkArchitectwithnoDataSciencebackground

MLUseCase

Technicaloverview

Page 36: Machine Learning + Analytics

Copyright©2016SplunkInc.

Example:EnergyData

Page 37: Machine Learning + Analytics

Sensordatadeliveringmillionsofdollarsinenergysavings.

Page 38: Machine Learning + Analytics

RobotAnalyticstoReduceCostsintheSupplyChain

4% IncreasedThroughputperDistributionCenter

Aggregatemachinedatafromrobots

Failurepatterndetectionandreporting

Preventativemaintenancescheduling

Page 39: Machine Learning + Analytics

39

EnergyData

39

ProcessData

86348;24.03.1523:59:59;140808,297;140746,031;140919,500;

24-03-201501:00:59;EPIP02-03-A;SB;PPR;PR;PRODUCTION;PR;aRTC:accountedtransaction(equip02_evnt_job_unit01);;;;;;;0,014;753,000

correlationovertime(join)

ProductionstatusEnergyconsumption

- Transparencyofequipmentonshopfloorlevel- Discoverprocessweaknesses- Conditionbasedandpredictivemaintenance- Optimizationofenergyefficiencyofequipment- Optimizationofenergypurchasingprocess(forecast/predictive)…etc ….

IncreasedefficiencySavedenergySaved$$$

Usecases

Page 40: Machine Learning + Analytics

40

TypicalWorkflowforAnalyzingSensorData

40

COLLECT ENRICH ANALYZE

lookupdata

dataanalytics

feedbackloop

sensordata

middleware

Page 41: Machine Learning + Analytics

41

Energyandprocessdataformaintenance

Energynotjustaoptimizationtarget– butalsoaninfluencingfactorfor

maintenancescenarios(rapidimpactfactor)

Maplowlevelprocessstatustoparticularenergyconsumptionprofilesand

learnnormalstatesandboundariesfromrawsignalA B C D

Page 42: Machine Learning + Analytics

42

ConditionMonitoring&AlertingAnomalydetectionandproactivemonitoring

42

Page 43: Machine Learning + Analytics

43

PredictiveMaintenance

43

• Predictanomaliesforaparticularprocessstep

Heatmap shows(recommend)timespaninwhichanerrormighthappen

PredictprocessstepsInwhichanerrormighthappen

Extrapolationofprocessstepswithintegratedpredictfunctionorotherregressionsmodels