Post on 08-Jan-2017
Debugging Skynet
A Machine Learning Approach to Log Analysis
ianir ideses - Logz.io
The Problem - Overlogging• Millions of logs per week
• Important logs get lost in the clutter
• Need to surface the relevant logs, deemphasize irrelevant logs
Proposed Solution• A Machine Learning approach
• Can sift through large amounts of data
• Can evolve and react to changes in data
• Requires large amounts of data to be effective
Machine Learning• Unsupervised• Clustering• Anomaly detection
• Supervised• Recommender systems• Classifiers
Unsupervised Machine Learning• No labels are needed, just lots of data
• Useful when reducing a large amount of data points to a smaller cluster subset
Unsupervised Machine Learning
"GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.Confi"GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1."GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291"GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 200 7352"GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253"GET /twiki/bin/oops/TWiki/AppendixFileSystem?template=oopsmore¶m1=1."GET /twiki/bin/view/Main/PeterThoeny HTTP/1.1" 200 4924"GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.Configuratio"GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1" 401 12851"GET /twiki/bin/view/TWiki/WebTopicEditTemplate HTTP/1.1" 200 3732
"GET /app_dev.php/ HTTP/1.1" 200 6715 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36""GET /bundles/framework/css/body.css HTTP/1.1" 200 6657 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.231"GET /bundles/framework/css/structure.css HTTP/1.1" 200 1191 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42."GET /bundles/acmedemo/css/demo.css HTTP/1.1" 200 2204 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311"GET /bundles/acmedemo/images/welcome-quick-tour.gif HTTP/1.1" 200 4770 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko)"GET /bundles/acmedemo/images/welcome-demo.gif HTTP/1.1" 200 4053 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrom
Nov 20 17:27:55 HANNIBAL MyProgram[13163]: Program started by User 1000 Nov 21 17:27:53 HANNIBAL MyProgram[13163]: Program terminated by User 1000 Nov 21 17:27:58 JANE MyProgram[13163]: Program started by User 555Nov 23 18:27:53 ARILOU MyProgram[13163]: Program stopped by User 777
Supervised Machine Learning• Learning from labeled examples
• Requires a well defined question:• Is this email spam?• Is this object a car?• Is this log interesting?
• Deployed successfully in many domains, most notable classifiers are NN, SVM, Bayesian Classifiers
Supervised Machine Learning - SVM• Data elements are arranged in vectors• Each vector index is assigned a weight in the training phase• A score is computed by summing up the relevant weights
0.1
0.5
-0.9
0.3
Xconnection error success failure“Connection failure”: 0.1 + 0.3 = 0.4
“Connection success”: 0.1 - 0.9 = -0.8
Log Relevancy• An ill posed problem
• Relevancy is user specific
• People tend to search forknown issues
• There are also unknownunknowns
• Labels are potentiallyvery tedious to acquire
Proposed Solution - Labels• Acquiring labels:• Implicit/explicit user behavior
• Inter-user similarities
• Public knowledge bases
Machine Learning in Practice• Data is textual, numerical and alphanumerical
• Classifiers that have shown good results:• Random Forests, resemble flow chart decision making• Linear SVM
• Both classifiers are easy to interpret in the feature space
Machine Learning in Practice
connected: -0.157199772246to provider: -0.15319903564connected successfully: -0.15319903564
unable: 0.671539714688topic: 0.678756599452error: 0.788508324168
Machine Learning in Practice - Modules• Log normalization
• Label acquisition
• Model training
• Log classification and enhancement
Log Normalization• Lower case, stem, stop words
• Identify common fields (timestamp, severity, etc’)
• Identify variable, functions, class names
• Identify known reserved words
• Cluster logs that share the same prototype
Labeler• Different sources for labels• CQA sites• Explicit user interaction• Implicit user interaction• Heuristics
Log Enhancer• Use knowledge about log events to add prior data
• Suggest solutions to known problems
• Tag relevant logs for display to the user
Flow
Log Normalization
Labeler
ML - Training Log Enhancer
Logs
Classifiers
Logs
Machine Learning at Scale• Use Spark to drive high throughput, high scale
• Tbytes of data, daily
• Spot Instances to keep costs at bay
To Sum Up• Formulate your question• Get enough data• Get enough labels• Clean data
• Train your classifier