How Portal Can Change Your Security Forever - Kati Rodzon at BSidesLV
BSidesLV 2013 - Using Machine Learning to Support Information Security
-
Upload
alexandre-pinto -
Category
Technology
-
view
1.342 -
download
5
description
Transcript of BSidesLV 2013 - Using Machine Learning to Support Information Security
![Page 1: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/1.jpg)
Using Machine Learning to support Information Security
Alexandre Pinto [email protected]
@alexcpsec@MLSecProject
Proving Ground (Many Thanks to Joel Wilbanks)
![Page 2: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/2.jpg)
• This is a talk about DEFENDING not attacking– NO systems were harmed on the development of
this talk.– This is NOT about some vanity hack that will be
patched tomorrow– We are actually trying to BUILD something here.
• This talk includes more MATH thank the daily recommended assumption by the FDA.
• You have been warned...
WARNING!
![Page 3: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/3.jpg)
• 12 years in Information Security, done a little bit of everything.
• Past 7 or so years leading security consultancy and monitoring teams in Brazil, London and the US.– If there is any way a SIEM can hurt you, it did to me.
• Researching machine learning and data science in general for the past year or so. Participates in Kaggle machine learning competitions (for fun, not for profit).
• First presentation in a real Infosec conference! (give or take a few hours)
Who’s Alex?
![Page 4: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/4.jpg)
• The elephant in the room• Enter Machine Learning• Principles and Kinds of ML• ML and InfoSec• MLSec Project• How to get started?• Take Aways
Agenda
![Page 5: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/5.jpg)
The elephant in the room• “Internet-scale companies”
![Page 6: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/6.jpg)
The elephant in the room
![Page 7: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/7.jpg)
• “Machine learning systems automatically learn programs from data” (*)
• You don’t really code the program, but it is inferred from data.
• Intuition of trying to mimic the way the brain learns: that’s where terms like artificial intelligence come from.
Enter Machine Learning
(*) CACM 55(10) - A Few Useful Things to Know about Machine Learning
![Page 8: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/8.jpg)
• Sales
Applications of Machine Learning
• Trading
• Image and Voice Recognition
![Page 9: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/9.jpg)
• Fraud detection systems:– Is what he just did consistent with
past behavior?• Network anomaly detection (?):
– NOPE!– More like statistical analysis, bad
one at that• Predicting likelihood of attack
actors– Create different predictive models
and chain them to gain more confidence in each step.
Security Applications of ML
• SPAM filters
![Page 10: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/10.jpg)
• Data Mining:
How to do Machine Learning?
• Exploring the space:
![Page 11: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/11.jpg)
• Supervised Learning:– Classification (NN, SVM,
Naïve Bayes)– Regression (linear,
logistic)
Kinds of Machine Learning
Source – scikit-learn.github.io/scikit-learn-tutorial/
• Unsupervised Learning :– Clustering (k-means)– Decomposition (PCA, SVD)
![Page 12: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/12.jpg)
• Paper from Microsoft Research circa Sept’98!
• (Thanks, Wikipedia!)
Kinds of ML: Naïve Bayes (SPAM filters)
![Page 13: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/13.jpg)
• One of the simplest examples of ML• Try to infer a relationship between a result variable (y)
and a linear combination of others (x), minimizing the “squared error” (distance measurement)
Kinds of ML: Linear Regression
Jesse Johnson – shapeofdata.wordpress.com
![Page 14: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/14.jpg)
Kinds of ML: SVM FTW!• One of my favorite algorithms!• Support Vector Machines (SVM):
– Good for classification problems with numeric features– Not a lot of parameters, it helps control overfitting, built in
regularization in the model, usually robust– However, sometimes slow to train (# of points, # of features)– Also awesome: hyperplane separation on an unknown infinite
dimension.
Jesse Johnson – shapeofdata.wordpress.comNo idea… Everyone copies this
![Page 15: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/15.jpg)
• SIEM and Log Monitoring tools are just vertical BI applications (from the 90’s)
• “I don't have time for your marketing hype!” – Infosec• How many logs you think there are in your
organization?
ML and Infosec
![Page 16: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/16.jpg)
InfoSec Data Scientists
Data Science Venn Diagram by Drew Conway
• “Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.” -- Josh Willis, Cloudera
![Page 17: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/17.jpg)
Considerations on Data Gathering
• Models will (generally) get better with more data– But we always have to consider bias and variance as we
select our data points– Also adversaries – we may be force fed “bad data”, find
signal in weird noise or design bad (or exploitable) features• “I’ve got 99 problems, but data ain’t one”
Domingos, 2012 Abu-Mostafa, Caltech, 2012
![Page 18: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/18.jpg)
• Adversaries - Exploiting the learning process• Understand the model, understand the
machine, and you can circumvent it• Something InfoSec community knows very well• Any predictive model on Infosec will be pushed
to the limit (LIMIT!)• Again, think back on the way SPAM engines evolved.
Considerations on Data Gathering
![Page 19: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/19.jpg)
MLSec Project
• Sign up, send logs, receive reports generated by robots machine learning models!– FREE! I need the data! Please help! ;)
• Looking for contributors, ideas, skeptics to support project as well.
• Visit https://www.mlsecproject.org , message @MLSecProject or just e-mail me.
![Page 20: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/20.jpg)
• We developed an algorithm to detect malicious behavior from log entries of firewall blocks
• Over 6 months of data from SANS DShield• We don’t focus on frequency or network
anomaly detection. Get ground truth “badness” and roll with it.
• After a lot of statistical-based math (true positive ratio, true negative ratio, odds likelihood), it can pinpoint actors that would be 13x-18x more likely to attack you.
MLSec Project
![Page 21: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/21.jpg)
Map of the Internet
• (Hilbert Curve)• Block port 22 • 2013-07-20
0
10
127
MULTICAST AND FRIENDS
![Page 22: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/22.jpg)
Map of the Internet
• (Hilbert Curve)• Block port 22 • 2013-07-20
0
10
127
MULTICAST AND FRIENDS
CN
RU
CN,BR,TH
![Page 23: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/23.jpg)
• Behavior: block on port 22
• Trial inference on 100k IP addresses per Class A subnet
• Logarithm scale: brightest tiles are 10 to 1000 times more likely to attack.
MLSec Project
![Page 24: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/24.jpg)
MLSec Project - Some interesting results
• Ok, robot: show me who the “evil guys” are on port 80 (most likelihood of attack), by AS name
![Page 25: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/25.jpg)
MLSec Project - Some interesting results
• ZOMG! It KNOWS! Call John Connor!• 1st model did not take into consideration web crawler activity.• Without netsec/infosec experience, scientists would be
scratching heads for days.
• Ok, robot: show me who the “evil guys” are on port 80 (most likelihood of attack), by AS name
![Page 26: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/26.jpg)
• Programming is a must (Python / R)• Statistical knowledge keeps you from
making dumb mistakes• Specific machine learning courses and
books:– Coursera (ML/ Data Analysis / Data Science)
• Practice, Practice, Practice:– Kaggle– KDD, VAST, VizSec
How to get started?
![Page 27: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/27.jpg)
• Big data is here! *BUZZWORD ALERT*• Machine learning / predictive analytics are
coming.• In 6-12 months, everyone will wish they were a
Data Scientist (not really!)• There is a lot of applicability in InfoSec• Embrace the change: the correct applicability of
ML models can greatly enhance defensive practices.
• MLSec Project is cool, check out my talk in BH/DC• And MOST IMPORTANTLY…
Take Aways
![Page 28: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/28.jpg)
Machine Learning = ROBOT Unicorns + Rainbows
![Page 29: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/29.jpg)
Machine Learning = ROBOT Unicorns + Rainbows
![Page 30: BSidesLV 2013 - Using Machine Learning to Support Information Security](https://reader031.fdocuments.us/reader031/viewer/2022020803/546c4757b4af9f702c8b4fbe/html5/thumbnails/30.jpg)
Thanks!• Q&A?• Feedback is welcome! • (bad = Joel’s fault :P)
Alexandre Pinto [email protected]
@alexcpsec@MLSecProject
"Prediction is very difficult, especially if it's about the future." - Niels Bohr