Post on 18-Nov-2014
description
BITS PilaniHyderabad Campus
Pratik Narang, Jagan Mohan Reddy, Chittaranjan Hota
BITS Pilani, Hyderabad Campus
narangpratik@gmail.com
23rd August 2013
ACM Compute 2013, Vellore
Feature Selection for Detection of
Peer-to-Peer Botnet traffic
Outline• Introduction
o P2P Networks
o P2P Botnets
• Work overview
• Related Work
• Our worko Generating traffic
o Feature extraction & selection
o Evaluation of feature selection techniques
o Future scope of work
What is a P2P Network?
A
D
E F
G
H
FH
GA
EC
C
B
P2P overlay layer
Native IP layer
D
B
AS1
AS2
AS3
AS4
AS5
AS6
Generic P2P Architecture
Capability &
Configuration
Peer Role Selection
Operating System
NAT/ Firewall Traversal
Routing and Forwarding
Neighbor Discovery
Join/Leave
Bootstrap
Overlay Messaging API
Content
Storage
Search API
Uses & Misuses
5
Traditional Botnets
Bot-Master
Peer-to-Peer BotnetsBot-Master
Work overview Evaluation of 3 feature selection algorithms-
Correlation-based Feature Selection
Consistency-based Subset Evaluation
Principal Component Analysis
Models built with 3 machine learning algorithms- Naïve Bayes classifier
Bayes Networks
C4.5 Decision trees
Performance evaluation for the detection of some
recent and well-known P2P botnets.
Related work• Early work using feature selection algorithms [1] [2]
used the DARPA dataset, which is no longer suitable
for today’s security research.
• Early approaches for P2P botnet detection [3]
applied static, port based analysis- easily defeated
by modern botnets.
• Recent work [4] [5] has employed machine learning
and data mining techniques for detection of P2P
botnets.
Our work
Machine Learning Algorithms
Bayes Network Naïve Bayes C4.5 Decision Trees
Feature Selection
Correlation-based Feature Selection Consistency-based Subset Evaluation Principal Component Analysis
Feature Extraction
source min. packet size dest. TCP Push flag count source avg. packet size dest. total volume duration …
Flow Extraction
<Source IP, Source port, Destination IP, Destination port, Protocol>
Network captures
jNetPcap Library with Java module
Generating Traffic
Botnet traffic generation
InternetInfo. Sec. Lab
Dist. Sys. Lab Multimedia
Lab
HostelsWing
Data collection for P2P
and web traffic
Anonymization
(Anon tool)
Botnet
detection
module
Firewall
Core
Switch 6509
Distribution
Switch 4500Access
Switch 2500
Content
Mgmt.
Application
Servers
DB
Cluster
IDS
Ethernet
Dataset
Data Application Number of flows
Benign dataHTTP, HTTPS, SMTP, FTP, POP 30,000 flows
P2P apps- eMule, BitTorrent, Mute, Gnutella etc. 50,000 flows
Botnet data[4,5]
Zero Access 720 flows
SkyNet 770 flows
Waledac 80,000 flows
Storm 2,20,000 flows
Feature Extraction & Selection
• A ‘Flow’ defined by:
• <Source IP, Source port, Dest. IP, Dest. port, Protocol>
• Features extracted from each flow:• Packet count (bi-directional)
• Packet size (bytes) (min, max, mean and standard deviation)
(bi-directional)
• Total volume (bytes) (bi-directional)
• Inter-arrival times (min, max, mean and standard deviation)
(bi-directional)
• TCP Push flag count (bi-directional)
• Duration of the flow (no context of direction)
• TOTAL - 23 features extracted from each flow
Feature Extraction & Selection
• Three Feature Selection techniques used:
1. Correlation-based Feature Selection (CFS)
2. Consistency-based Subset Evaluation (CSE)
3. Principal Component Analysis (PCA)
• Evaluated with three algorithms:
1. Naïve Bayes
2. Bayes Network
3. C4.5 Decision Trees
Feature Extraction & Selection
Feature Selection Search method
No. of features
Description
CFSBest first
search5
source packet count, source min. packet size, source max. packet size, dest. max. packet size, source inter-arrival time std.
CSEBest first
search8
source min. packet size, source max. packet size, dest. max. packet size, source avg. packet size, dest. avg. packet size, source max. inter-arrival time, flow duration, source volume
PCA - 12 A linear combination of features
Evaluation of Feature Selection Techniques
0
10
20
30
40
50
60
70
80
90
100
NaiveBayes BayesNet C4.5
85.2
97.08 98.23
81.51
95.92 98.18
80.24
96.2 98.23
82.16
96.67 98.17
Acc
ura
cy i
n %
Classification Algorithm
Full CFS CSE PCA
93
94
95
96
97
98
99
NaiveBayes BayesNet C4.5
98.9
96.9
98.9
95.2 95.3
98.9
96.1
95.7
99
95.4
96.2
98.9
De
tect
ion
Ra
te i
n %
Classification algorithm
Full CFS CSE PCA
FNTNFPTP
TNTPAccuracy
FNTP
TPrate
Detection
Evaluation of Feature Selection Techniques
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
NaiveBayes BayesNet C4.5N
orm
aliz
ed c
lass
ific
atio
n s
pee
dClassification Algorithm
Full CFS CSE PCA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
NaiveBayes BayesNet C4.5
No
rmal
ized
Bu
ild
Tim
es
Classification Algorithm
FULL CFS CSE PCA
Primary Observations
Future Scope Ensemble of classifiers
(Work in Progress- paper submitted to I-CARE 2013)
Close-to-real-time Detection Tool
(Work in progress)
Space-efficient data structures
References1. A. H. Sung and S. Mukkamala. The feature selection and intrusion detection
problems. In Advances in Computer Science-ASIAN 2004. Higher-Level
Decision Making, pages 468–482. Springer, 2005.
2. S. Chebrolu, A. Abraham, and J. P. Thomas. Feature deduction and
ensemble design of intrusion detection systems. Computers & Security,
24(4):295–307, 2005.
3. R. Schoof and R. Koning. Detecting peer-to-peer botnets. University of
Amsterdam, 2007.
4. S. Saad, I. Traore, A. Ghorbani, B. Sayed, D. Zhao, W. Lu, J. Felix, and P.
Hakimian. Detecting p2p botnets through network behavior analysis and
machine learning. In Privacy, Security and Trust (PST), 2011 Ninth Annual
International Conference on, pages 174–180. IEEE, 2011.
5. B. Rahbarinia, R. Perdisci, A. Lanzi, and K. Li. Peerrush: Mining for unwanted
p2p traffic. In DIMVA. 2013.
narangpratik@gmail.com
Visit our Research Group: www.netclique.in