Applying Machine Learning to Network Security Monitoring - BayThreat 2013
-
Upload
alex-pinto -
Category
Technology
-
view
6.938 -
download
2
description
Transcript of Applying Machine Learning to Network Security Monitoring - BayThreat 2013
![Page 1: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/1.jpg)
Applying Machine Learning to Network Security Monitoring
Alexandre Pinto Chief Data Scien4st | MLSec Project
@alexcpsec @MLSecProject!
![Page 2: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/2.jpg)
• This is a talk about BUILDING not breaking – NO systems were harmed on the development of this talk. – This is NOT about 1337 Android Malware
• Only thing we are likely to break here is the 4me limit on the talk
• This talk includes more MATH than the daily recommended
intake by the FDA.
• All stunts described in this talk were performed by trained professionals.!
WARNING!
![Page 3: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/3.jpg)
• 13 years in Informa4on Security, done a liRle bit of everything. • Past 7 or so years leading security consultancy and monitoring
teams in Brazil, London and the US. – If there is any way a SIEM can hurt you, it did to me.
• Researching machine learning and data science in general for the past year or so and presen4ng about the intersec4on of it and Infosec throughout the year.
• Created MLSec Project in July 2013 to give structure to the research being done.
Who's Alex?
![Page 4: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/4.jpg)
• Defini4ons • Big Data • Data Science • Machine Learning
• Y U DO DIS? • Network Security Monitoring • PoC || GTFO • Feature Intui4on • How to get started?
Agenda
![Page 5: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/5.jpg)
Big Data + Machine Learning + Data Science
![Page 6: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/6.jpg)
Big Data + Machine Learning + Data Science
![Page 7: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/7.jpg)
Big Data
![Page 8: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/8.jpg)
(Security) Data ScienEst
Data Science Venn Diagram by Drew Conway!
• “Data Scien4st (n.): Person who is beRer at sta4s4cs than any so`ware engineer and beRer at so`ware engineering than any sta4s4cian.”
-‐-‐ Josh Willis, Cloudera
![Page 9: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/9.jpg)
• “Machine learning systems automa4cally learn programs from data” (*)
• You don’t really code the program, but it is inferred from data.
• Intui4on of trying to mimic the way the brain learns: that's where terms like ar#ficial intelligence come from.!
Enter Machine Learning
(*) CACM 55(10) -‐ A Few Useful Things to Know about Machine Learning (Domingos 2012)
![Page 10: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/10.jpg)
• Supervised Learning: – Classifica4on (NN, SVM, Naïve Bayes)
– Regression (linear, logis4c)!
Kinds of Machine Learning
Source – scikit-‐learn.github.io/scikit-‐learn-‐tutorial/general_concepts.html
• Unsupervised Learning : – Clustering (k-‐means) – Decomposi4on (PCA, SVD)
![Page 11: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/11.jpg)
ClassificaEon Example
VS!
![Page 12: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/12.jpg)
Regression Example
![Page 13: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/13.jpg)
ConsideraEons on Data Gathering • Models will (generally) get beRer with more data
– But we always have to consider bias and variance as we select our data points
– Also adversaries – we may be force fed “bad data”, find signal in weird noise or design bad (or exploitable) features
• “I’ve got 99 problems, but data ain’t one”!
Domingos, 2012 Abu-‐Mostafa, Caltech, 2012
![Page 14: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/14.jpg)
• Sales!
ApplicaEons of Machine Learning
• Trading
• Image and Voice Recogni4on
![Page 15: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/15.jpg)
• Common reac4ons from Security Professionals: • “Eh, cool…” *blank stare* *walks away* • “Are you high, bro?”
Y U DO DIS?
• “Why aren’t you doing some cool research like Android Malware?”
![Page 16: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/16.jpg)
Math is HARD
![Page 17: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/17.jpg)
• Fraud detec4on systems: – Is what he just did consistent with past behavior?
• Network anomaly detec4on (?): – More like bad sta4s4cal analysis – Did not advance a lot, IMO
• Predic4ng likelihood of aRack actors – Create different predic4ve models and chain them to gain more confidence in each step.!
Security ApplicaEons of ML
• SPAM filters
![Page 18: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/18.jpg)
• Adversaries -‐ Exploi4ng the learning process • Understand the model, understand the machine, and you can circumvent it
• Something InfoSec community knows very well • Any predic4ve model on InfoSec will be pushed to the limit
• Again, think back on the way SPAM engines evolved.!
ConsideraEons on Data Gathering
![Page 19: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/19.jpg)
Network Security Monitoring
![Page 20: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/20.jpg)
• Rules in a SIEM solu4on invariably are: – “Something” has happened “x” 4mes; – “Something” has happened and other “something2” has happened, with some rela4onship (4me, same fields, etc) between them.
• Configuring SIEM = iterate on combina4ons un4l: – Customer or management is foole.. I mean sa4sfied; – Consul4ng money runs out
• Behavioral rules (anomaly detec4on) helps a bit with the “x”s, but s4ll, very laborious and 4me consuming.!
CorrelaEon Rules: A Primer
![Page 21: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/21.jpg)
• Alert-‐based: – “Tradi4onal” log management – SIEM – Using “Threat Intelligence” (i.e blacklists) for about a year or so
– Lack of context – Low effec4veness – You get the results handed over to you
Kinds of Network Security Monitoring
• Explora4on-‐based: – Network Forensics tools (2/3 years ago)
– Elas4c Search based LM systems
– High effec4veness – Lots of people necessary – Lots of HIGHLY trained people
• Big Data Security Analy4cs (BDSA): – Run explora4on-‐based monitoring on Hadoop – More like Big Data Security Monitoring (BDSM)
![Page 22: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/22.jpg)
Alert-‐based + ExploraEon-‐based
![Page 23: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/23.jpg)
A wild army of robots appears
![Page 24: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/24.jpg)
Using robots to catch bad guys
![Page 25: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/25.jpg)
• We developed a set of algorithms to detect malicious behavior from log entries of firewall blocks
• Over 6 months of data from SANS DShield (thanks, guys!) • A`er a lot of sta4s4cal-‐based math (true posi4ve ra4o, true nega4ve ra4o, odds likelihood), it could pinpoint actors that would be 13x-‐18x more likely to aRack you.
• Today more like 30x on the SANS data, and finding around 80% of “badness” in par4cipant deployments.!
PoC || GTFO
![Page 26: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/26.jpg)
• Assump4ons to aggregate the data • Correla4on / proximity / similarity BY BEHAVIOR • “Bad Neighborhoods” concept: – Spamhaus x CyberBunker – Google Report (June 2013) – Moura 2013
• Group by Geoloca4on • Group by Netblock (/16, /24) • Group by ASN – (thanks, Team Cymru)!
Feature IntuiEon: IP Proximity
![Page 27: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/27.jpg)
Map of the Internet
• (Hilbert Curve) • Block port 22 • 2013-‐07-‐20
0
10
127
MULTICAST AND FRIENDS
CN
RU
CN, BR, TH
You are here!
![Page 28: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/28.jpg)
• Even bad neighborhoods renovate: – ARackers may change ISPs/proxies – Botnets may be shut down / relocate – A liRle paranoia is Ok, but not EVERYONE is out to get you (at least not all at once)!
Feature IntuiEon: Temporal Decay
• As days pass, let's forget, bit by bit, who aRacked
• Last 4me I saw this actor, and how o`en did I see them!
![Page 29: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/29.jpg)
• Behavior: block on port 22
• Trial inference on 100k IP addresses per Class A subnet
• Logarithm scale: brightest 4les are 10 to 1000 4mes more likely to aRack.
MLSec Project
![Page 30: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/30.jpg)
• Who resolves to this IP address? • Number of domains that resolve to the IP address • Distribu4on of their life4me • Entropy, size, ccTLDs • Registrar informa4on
• Reverse DNS informa4on… • History of DNS registra4on… • (Thanks, DNSDB!)
Feature IntuiEon: DNS features
![Page 31: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/31.jpg)
• YAY! We have a bunch of numbers per IP address/domain! • How do you define what is malicious or not?
• “Advanced exper4se in both informa4on security and data science will be a necessary ingredient in enabling accurate discrimina4on between malicious and benign ac4vity. “
-‐ Anton Chuvakin, Gartner
• Kinda easy for security tools (if you trust them) • Web applica4on logs need deeper sta4s4cal analysis • Not normal / standard devia4on thing
!
Training the Model
![Page 32: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/32.jpg)
• Programming is a must (Python / R) • Sta4s4cal knowledge keeps you from making dumb mistakes
• Specific machine learning courses and books: – Coursera (ML/ Data Analysis / Data Science)
• Prac4ce, Prac4ce, Prac4ce: – Explore your data! – (Security Onion) – Kaggle – KDD, VAST, VizSec!
How do I get started on this?
![Page 33: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/33.jpg)
MLSec Project
• Sign up, send logs, receive reports generated by machine learning models!
• Working with several companies on trying out these models on their environment with their data
• We are hiring (KINDA)
• Visit h]ps://www.mlsecproject.org , message @MLSecProject or just e-‐mail me.!
![Page 34: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/34.jpg)
• Inbound aRacks on exposed services (DEFCON/BH 2013): – Informa4on from inbound connec4ons on firewalls, IPS, WAFs – Feature extrac4on and supervised learning
• Malware Distribu4on and Botnets: – Informa4on from outbound connec4ons on firewalls, DNS and Web Proxy
– Ini4al labeling provided by intelligence feeds and AV/an4-‐malware – Semi-‐supervised learning involved
• Kill-‐chain Ensemble Models: – Increased precision by composing different behaviors – Web server path -‐> go through Firewall, then IPS, then WAF – Early confirma4on of aRack failure or success
MLSec Project -‐ Current Research
![Page 35: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.fdocuments.us/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/35.jpg)
Thanks! • Q&A? • Feedback?
Alexandre Pinto @alexcpsec
@MLSecProject hRps://www.mlsecproject.org/
" Essen4ally, all models are wrong, but some are useful." -‐ George E. P. Box