Machine Learning in Information Security by Mohammed Zuber

Machine Learning in Information Security

Mohammed Zuber

Agenda

• Definitions– Big Data– Data Science– Machine Learning

• Kinds of Machine Learning• Machine Learning and Infosec• MLSec Project

As Usual

Big Data

Four V’s of Big Data• Volume – Data Quality

• Velocity – Data Speed

• Variety – Data Types

• Veracity– Messiness

Data Science• Data science is an interdisciplinary field about

processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured.

• “Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.” -- Josh Willis, Cloudera.

Data Science Venn Diagram by Drew Conway

Machine Learning

• “Machine learning systems automatically learn programs from data” *

• You don’t really code the program, but it is inferred from data.

• Intuition of trying to mimic the way the brain learns: that’s where terms like artificial intelligence come from.

* CACM 55(10) - A Few Useful Things to Know about Machine Learning

Machine Learning

Applications of Machine Learning

Sales

Trading Audio & Video Recognitions

Security Applications of Machine Learning

• Fraud detection systems– Is what he just did consistent with past behavior?

• Network anomaly detection– More like statistical analysis.

• Predicting likelihood of attack actors– Create different predictive models and chain them

to gain more confidence in each step.• SPAM Filters

Types of Machine Learning

Supervised Learning• Classification

– (NN, SVM, Naïve Bayes) • Regression

– (linear, logistic)

Unsupervised Learning• Clustering

– (k-means)• Decomposition

– (PCA, SVD)

Machine Learning in InfoSec

• SIEM and Log Monitoring tools are just vertical BI applications (from the 90’s)

• How many logs you think there are in your organization?

Kinds of Network Security Monitoring

• Alert-based: • “Traditional” log management • SIEM • Using “Threat Intelligence” (i.e blacklists) • Lack of context • Low effectiveness • You get the results handed

over to you

• Exploration-based: • Network Forensics tools• Elastic Search based LM

systems • High effectiveness • Lots of highly trained people

necessary

• Big Data Security Analytics: • Run exploration-based monitoring on Hadoop • More like Big Data Security Monitoring (BDSM)

MLSec Project

• Sign up, send logs, receive reports generated by machine learning models!

• Working with several companies on trying out these models on their environment with their data

• Visit https://www.mlsecproject.org

https://www.mlsecproject.org/



How do I get started on this?

• Programming is a must (Python / R)• Statistical knowledge keeps you from making

dumb mistakes• Specific machine learning courses and books: –

Coursera (ML/ Data Analysis / Data Science)• Practice, Practice, Practice: – – Explore your data– Security Onion – Kaggle

Thank You

Most of the information is taken from http://www.slideshare.net/AlexandrePinto10/

http://www.slideshare.net/AlexandrePinto10/

http://www.slideshare.net/AlexandrePinto10/

Machine Learning in Information Security by Mohammed Zuber

Technology

Transcript of Machine Learning in Information Security by Mohammed Zuber