Identifying Web Attacks Via Data Analysis

46

description

This presentation will look at detection of SQL injection using Machine Learning as well as profiling web traffic to find misbehaving hosts. The goal is to get beyond "Top N" types of analysis and begin using multiple features to guide us towards interesting traffic. With these techniques multiple log types can be used, everything from web server logs to proxy logs.

Transcript of Identifying Web Attacks Via Data Analysis

Page 1: Identifying Web Attacks Via Data Analysis
Page 2: Identifying Web Attacks Via Data Analysis

Mike Sconzo

@sooshie

R&D at Click Security

Focused on data analysis for security use cases

Interested in machine learning/statistical analysis

NetWitness

ERCOT

Sandia National Labs

Page 3: Identifying Web Attacks Via Data Analysis

● Introduction● How to use basic log information to detect

different attack types○ Drive-by○ SQL Injection

● Closing

Page 4: Identifying Web Attacks Via Data Analysis

● Python○ IPython○ pandas○ numpy○ matplotlib○ scikit learn

● Bro● Google● sqlmap● JBroFuzz● sqlparse

Page 5: Identifying Web Attacks Via Data Analysis

● Gather data● Clean up data● Explore data● Select/create features (numeric only)*● Run machine learning algorithm*● Analyze results

*optional

Page 6: Identifying Web Attacks Via Data Analysis
Page 7: Identifying Web Attacks Via Data Analysis

Is it possible to find clients being exploited by various exploit kits by just looking at traffic patterns?

● Gather data● Clean up data● Explore data● Analyze results

Page 8: Identifying Web Attacks Via Data Analysis
Page 9: Identifying Web Attacks Via Data Analysis

● 21GB of Network Traffic● 7600 Samples● 687627 Files● 807537 HTTP Requests

Page 10: Identifying Web Attacks Via Data Analysis

*MHR will be used as our ground truth

Page 11: Identifying Web Attacks Via Data Analysis
Page 12: Identifying Web Attacks Via Data Analysis
Page 13: Identifying Web Attacks Via Data Analysis
Page 14: Identifying Web Attacks Via Data Analysis
Page 15: Identifying Web Attacks Via Data Analysis
Page 16: Identifying Web Attacks Via Data Analysis
Page 17: Identifying Web Attacks Via Data Analysis
Page 18: Identifying Web Attacks Via Data Analysis
Page 19: Identifying Web Attacks Via Data Analysis
Page 20: Identifying Web Attacks Via Data Analysis
Page 21: Identifying Web Attacks Via Data Analysis
Page 22: Identifying Web Attacks Via Data Analysis
Page 23: Identifying Web Attacks Via Data Analysis

Is it possible to used supervised learning (classification) to detect strings that are likely SQL Injection?● Gather data● Explore data● Clean up data● Transform data● Select/create features (numeric only)● Run machine learning algorithm● Analyze results

Page 24: Identifying Web Attacks Via Data Analysis
Page 25: Identifying Web Attacks Via Data Analysis
Page 26: Identifying Web Attacks Via Data Analysis
Page 27: Identifying Web Attacks Via Data Analysis
Page 28: Identifying Web Attacks Via Data Analysis
Page 29: Identifying Web Attacks Via Data Analysis
Page 30: Identifying Web Attacks Via Data Analysis

*Transform the data into a form that might give better insight than a signature

Page 31: Identifying Web Attacks Via Data Analysis
Page 32: Identifying Web Attacks Via Data Analysis

● Strings are great, but patterns might be better● Extract patterns from the strings● N-Grams!!!

Page 33: Identifying Web Attacks Via Data Analysis
Page 34: Identifying Web Attacks Via Data Analysis
Page 35: Identifying Web Attacks Via Data Analysis
Page 36: Identifying Web Attacks Via Data Analysis
Page 37: Identifying Web Attacks Via Data Analysis
Page 38: Identifying Web Attacks Via Data Analysis
Page 39: Identifying Web Attacks Via Data Analysis
Page 40: Identifying Web Attacks Via Data Analysis
Page 41: Identifying Web Attacks Via Data Analysis
Page 42: Identifying Web Attacks Via Data Analysis
Page 43: Identifying Web Attacks Via Data Analysis
Page 44: Identifying Web Attacks Via Data Analysis
Page 45: Identifying Web Attacks Via Data Analysis

● It’s possible to make quality decisions/find interesting activity using data

● The more data you have the more accurate your predictions can be

● Gathering (the right) data for the use case is important● Cleaning the data takes a lot of effort, but it’s necessary● Unfortunately none of this is a silver bullet, but it can help point you

in the right direction(s)● None of this is magic, you can do it too!

Page 46: Identifying Web Attacks Via Data Analysis

http://clicksecurity.github.io/data_hacking/