Implementation of Machine Learning and Chaos Combination for Improving Attack Detection Accuracy on...
-
Upload
annabelle-baldwin -
Category
Documents
-
view
215 -
download
0
Transcript of Implementation of Machine Learning and Chaos Combination for Improving Attack Detection Accuracy on...
Implementation of Machine Learning and Chaos Combinationfor Improving Attack Detection Accuracy on Intrusion Detection
System (IDS)
Bisyron WahyudiKalamullah Ramli
Department of Electrical Engineering
Universitas Indonesia
Network Security
The most important element In the network security: IDS
Intrusion detection principles: Misuse detection (signature base) Anomaly detection (statistics) Classification with Machine Learning
(research)
Background: IDS
Intrusion detection too many false alarm More often arise new types of attack Required effective and adaptable
detection method Classification with Machine Learning gives
the best result depend on the kernel function and its parameters, and network data attributes/features.
There are no systematic theories concerning how to choose the appropriate kernel/parameters.
Background: Problem
1. Capturing packets transferred on the network. 2. Extracting an extensive set of
attributes/features of the network packets data that can describe a network connection or a host session.
3. Learning a model that can accurately describe the behavior of abnormal and normal activities by applying data mining techniques.
4. Detecting the intrusions by using the learned models.
Data Mining Approach for IDS
Classification (Supervised)
Clustering (Unsupervised)
K Nearest Neighbor (K-NN) K-Means
Naïve Bayes Hierarchical Clustering
Artificial Neural Network DBSCAN
Support Vector Machine Fuzzy C-Means
Fuzzy K-NN Self Organizing Map
Data Mining Approach
Machine Learning
Input Training Data
(x,y)
Model Development
Learning Algorithm
Model Implementatio
n
Input Test Data (x,?)
OutputTest Data (x,y)
SVM Classification
Kernel Name Definition of Function
Linear K(x,y)= x.y
Polynomial K(x,y)= (x.y + c)d
Gaussian RBF K(x,y)= exp(-IIx-yII2/2.σ2)
Sigmoid (Tangent Hyperbolic)
K(x,y)= tanh(σ(x.y) + c)
Inverse Multiquadric
K(x,y)= 1/ √IIx-yII2 + c
Kernel Function
x and y pair of data from train dataset σ, c, d > 0 constant parameter
How to choose the optimal/significant input dataset feature.
How to set the best kernel function and parameters: σ, ε and C.
SVM Performance
Three important dynamic properties: the intrinsic stochastic property, ergodicity and regularity
Advantage of chaos escape from local minima
More efficient to obtain optimization parameters by means of its powerful global searching ability
Chaos
System Design
MetodologiData Collection
Data Preprocessing
Model Development
Data Classification
Training Dataset
Test Dataset
KDDCUP ’99 DARPA Dataset
Predicted Intrusion
Data
Data PreprocessingDataset
TransformationDataset
Normalization
Range Discretization
Format Conversion
Dataset Division: Training & Test
KDDCUP ’99 DARPA Dataset
Test Dataset
Training Dataset
Model DevelopmentInput
Training Data (x,y)
Parameter Selection with
Chaos Optimization
Learning Algorithm
(SVM)
Model Implementation
Input Test Data (x,?)
OutputTest Data (x,y)
Kernel Function Selection
Fitur 1-9 : intrinsic feature extracted from header paket Fitur 10-22 : atribut konten yang didapat dari pengetahuan ahli dari paket Fitur 23-31 : atribut konten dari koneksi 2 detik sebelumnya Fitur 32-41 : atribut trafik dari mesin yang didapat dari 100 koneksi
sebelumnya Fitur Payload : payload berdasarkan waktu (minggu)
Feature in KddCup
Intrinsic Attributes
These attributes are extracted from the headers' area of the network packets
Content Attributes
These attributes are extracted from the contents area of the network packets based on expert person knowledge
Time Traffics Attributes
To calculate these attributes we considered the connections that occurred in the past 2 seconds
Machine Traffic Attributes
To calculate these attributes we took into account the previous 100 connections
21
Network Traffic Classification
The features that used in previous works are eight features from Mukkamala are: src_bytes, dst_bytes, Count, srv_count, dst_host_count, dst_host_srv_count, dst_host_same_src_port_rate, dst_host_srv_diff _host_rate.
Selected Features
The features that used in previous works are 24 features from Natesan are:Duration, protocol_type, Service, Flag, src_bytes, dst_bytes, Hot, num_failed_logins, logged-in, num_compromised, root_shell, num_root, num_file_creations, num_shells, num_access_files, is_host_login, is_guest_login, Count, serror_rate, rerror_rate, diff_srv_rate, dst_host_count, dst_host_diff_srv_rate, dst_host_srv_serror_rate.
Selected Features
Proposed Features
Proposed Features
Data Pre-
processing
Simulation Experiment
Simulation Process Design
Using payload can improve accuracy of IDS in detecting R2L. Using SVM with RBF kernel, accuracy detection rates up to
98.2%. Based on experiment, average detection of all features are
best using 28 features using payload :
Experiment Result
Features Set Non payload Using Payload
8 Feature 95.78% 95.99%
24 feature 95.73% 95.91%
28 Feature 95.91% 96.08%