Machine Learning in Information Security by Mohammed Zuber

20
Machine Learning in Information Security Mohammed Zuber

Transcript of Machine Learning in Information Security by Mohammed Zuber

Machine Learning in Information Security

Mohammed Zuber

Agenda

• Definitions– Big Data– Data Science– Machine Learning

• Kinds of Machine Learning• Machine Learning and Infosec• MLSec Project

As Usual

Big Data

Four V’s of Big Data• Volume – Data Quality

• Velocity – Data Speed

• Variety – Data Types

• Veracity– Messiness

Data Science• Data science is an interdisciplinary field about

processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured.

• “Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.” -- Josh Willis, Cloudera.

Data Science Venn Diagram by Drew Conway

Machine Learning

• “Machine learning systems automatically learn programs from data” *

• You don’t really code the program, but it is inferred from data.

• Intuition of trying to mimic the way the brain learns: that’s where terms like artificial intelligence come from.

* CACM 55(10) - A Few Useful Things to Know about Machine Learning

Machine Learning

Applications of Machine Learning

Sales

Trading Audio & Video Recognitions

Security Applications of Machine Learning

• Fraud detection systems– Is what he just did consistent with past behavior?

• Network anomaly detection– More like statistical analysis.

• Predicting likelihood of attack actors– Create different predictive models and chain them

to gain more confidence in each step.• SPAM Filters

Types of Machine Learning

Supervised Learning• Classification

– (NN, SVM, Naïve Bayes) • Regression

– (linear, logistic)

Unsupervised Learning• Clustering

– (k-means)• Decomposition

– (PCA, SVD)

Machine Learning in InfoSec

• SIEM and Log Monitoring tools are just vertical BI applications (from the 90’s)

• How many logs you think there are in your organization?

Kinds of Network Security Monitoring

• Alert-based: • “Traditional” log management • SIEM • Using “Threat Intelligence” (i.e blacklists) • Lack of context • Low effectiveness • You get the results handed

over to you

• Exploration-based: • Network Forensics tools• Elastic Search based LM

systems • High effectiveness • Lots of highly trained people

necessary

• Big Data Security Analytics: • Run exploration-based monitoring on Hadoop • More like Big Data Security Monitoring (BDSM)

MLSec Project

• Sign up, send logs, receive reports generated by machine learning models!

• Working with several companies on trying out these models on their environment with their data

• Visit https://www.mlsecproject.org

How do I get started on this?

• Programming is a must (Python / R)• Statistical knowledge keeps you from making

dumb mistakes• Specific machine learning courses and books: –

Coursera (ML/ Data Analysis / Data Science)• Practice, Practice, Practice: – – Explore your data– Security Onion – Kaggle

Thank You

Most of the information is taken from http://www.slideshare.net/AlexandrePinto10/