Network Intrusion Detection Using Random Forests

19
Network Intrusion Network Intrusion Detection Using Random Detection Using Random Forests Forests Jiong Zhang Jiong Zhang Mohammad Zulkernine Mohammad Zulkernine School of Computing School of Computing Queen's University Queen's University Kingston, Ontario, Canada Kingston, Ontario, Canada

description

Network Intrusion Detection Using Random Forests. Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada. Outline. Motivation Intrusion detection system Data mining meets intrusion detection Proposed architecture Challenges and solutions - PowerPoint PPT Presentation

Transcript of Network Intrusion Detection Using Random Forests

Page 1: Network Intrusion Detection Using Random Forests

Network Intrusion Detection Network Intrusion Detection Using Random ForestsUsing Random Forests

Jiong ZhangJiong Zhang

Mohammad ZulkernineMohammad Zulkernine

School of ComputingSchool of Computing

Queen's UniversityQueen's University

Kingston, Ontario, CanadaKingston, Ontario, Canada

Page 2: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 22

OutlineOutline

MotivationMotivation Intrusion detection systemIntrusion detection system Data mining meets intrusion Data mining meets intrusion

detectiondetection Proposed architectureProposed architecture Challenges and solutionsChallenges and solutions Experimental resultsExperimental results Conclusion and future workConclusion and future work

Page 3: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 33

MotivationMotivation Intrusion Prevention System (firewall) Intrusion Prevention System (firewall)

can not prevent all attacks.can not prevent all attacks.

InternetInternet

Intruder

Intruder Victim

Firewall

Page 4: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 44

Motivation (contd.)Motivation (contd.)

Statistical data for intrusionsStatistical data for intrusions• Total losses of 2004 (reported): Total losses of 2004 (reported):

$141,496,560$141,496,560.. Source: FBI survey for Year 2004Source: FBI survey for Year 2004

• 50%50% of security breaches are of security breaches are undetected.undetected.

Source: FBI Statistics for Year 2000Source: FBI Statistics for Year 2000

Page 5: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 55

Intrusion Detection Intrusion Detection TechniquesTechniques

Misuse DetectionMisuse Detection• Extracts patterns of known intrusionsExtracts patterns of known intrusions• Cannot detect novel intrusions Cannot detect novel intrusions • Has low false positive rateHas low false positive rate

Anomaly DetectionAnomaly Detection• Builds profiles for normal activitiesBuilds profiles for normal activities• Uses the deviations from the profiles to detect Uses the deviations from the profiles to detect

attacksattacks• Can detect unknown attacksCan detect unknown attacks• Has high false positive rateHas high false positive rate

Page 6: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 66

Network Intrusion Detection Network Intrusion Detection

System (NIDS)System (NIDS) Monitors network traffic to detect Monitors network traffic to detect

intrusions intrusions Monitors more targets on a networkMonitors more targets on a network Detects some attacks that host-Detects some attacks that host-

based systems missbased systems miss Does not affect network operationsDoes not affect network operations

Page 7: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 77

Current NIDS Current NIDS Many current NIDSs (like snort) :Many current NIDSs (like snort) : Rule-based Rule-based Unable to detect novel attacksUnable to detect novel attacks High maintenance costHigh maintenance cost

Page 8: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 88

Rule Based vs. Data MiningRule Based vs. Data Mining

Rule based systemsRule based systems

Data mining based systemsData mining based systems

Intrusion Data Security Experts Rules

Labeled DataData Mining

EnginePatterns

Page 9: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 99

Data Mining Meets Data Mining Meets Intrusion Detection Intrusion Detection

Extract patterns of intrusions for Extract patterns of intrusions for misuse detectionmisuse detection

Build profiles of normal activities for Build profiles of normal activities for anomaly detectionanomaly detection

Build classifiers to detect attacksBuild classifiers to detect attacks Some IDSs have successfully Some IDSs have successfully

applied data mining techniques in applied data mining techniques in intrusion detectionintrusion detection

Page 10: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1010

Proposed Architecture Proposed Architecture

AlarmerDetector

Pattern BuilderData Set

SensorsOn-line Pre-Processors

Off line

On line

Architecture of the proposed NIDS

NetworksNetworksNetworksNetworks

Database(On line)

Off-line Pre-processor

Database(Off line)

Patterns

PacketsAudited

dataFeaturevectors

Featurevectors

Alarms

Trainingdata

Page 11: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1111

Random ForestsRandom Forests Unsurpassable in accuracy among Unsurpassable in accuracy among

the current data mining algorithmsthe current data mining algorithms Runs efficiently on large data set Runs efficiently on large data set

with many featureswith many features Gives the estimates of what features Gives the estimates of what features

are importantare important No nominal data problemNo nominal data problem No over-fittingNo over-fitting

Page 12: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1212

Imbalanced IntrusionImbalanced Intrusion ProblemsProblems

• Higher error rate for minority intrusionsHigher error rate for minority intrusions• Some minority intrusions are more Some minority intrusions are more

dangerousdangerous• Need to improve the performance for Need to improve the performance for

the minority intrusions the minority intrusions Proposed SolutionProposed Solution

• Down-sample the majority intrusions Down-sample the majority intrusions and over-sample the minority intrusionsand over-sample the minority intrusions

Page 13: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1313

Feature Selection Feature Selection

Essential for improving detection Essential for improving detection raterate

Reduces the computational costReduces the computational cost Many NIDSs select features by Many NIDSs select features by

intuition or the domain knowledgeintuition or the domain knowledge

Page 14: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1414

Feature Selection over Feature Selection over the KDD’99 Datasetthe KDD’99 Dataset

Calculate variable Calculate variable importance using importance using random forests. random forests.

Select the 38 Select the 38 most important most important features in features in detection. detection.

-10 -5 0 5 10 15

32310353317

86

321424

536401312

4163422

12

293138373018194127

9261128253915

72021

Fe

atu

re

Importance

Page 15: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1515

Some FeaturesSome Features

The two most important featuresThe two most important features• Feature 3. service type, such as http, telnet, and ftpFeature 3. service type, such as http, telnet, and ftp• Feature 23. count, # connections to the same host as Feature 23. count, # connections to the same host as

the current one during past two secondsthe current one during past two seconds The three least important featuresThe three least important features

• Feature 7. land, 1 if connection is from/to the same Feature 7. land, 1 if connection is from/to the same host/port; 0 otherwisehost/port; 0 otherwise

• Feature 20. num_outbound_cmds, # of outbound Feature 20. num_outbound_cmds, # of outbound commands in an ftp sessioncommands in an ftp session

• Feature 21. is_hot_login, 1 if the login belongs to the Feature 21. is_hot_login, 1 if the login belongs to the “hot” list; 0 otherwise“hot” list; 0 otherwise

Page 16: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1616

Parameter Optimization Parameter Optimization for Random Forestsfor Random Forests

Optimize the Optimize the parameter parameter MtryMtry of of random forests to random forests to improve detection improve detection rate.rate.

Choose 15 as the Choose 15 as the optimal value, which optimal value, which reaches the reaches the minimum of the oob minimum of the oob error rate. error rate.

0.00165

0.0017

0.00175

0.0018

0.00185

0.0019

0.00195

0.002

0.00205

0.0021

0.00215

5 10 15 20 25 30 35 38

Mtry

Oob

Erro

r Rat

e0

100

200

300

400

500

600

Tim

e

Oob Error Rate

Time

Page 17: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1717

Performance Comparison Performance Comparison on the KDD’99 Dataseton the KDD’99 Dataset

Our approach Our approach provides lower provides lower overall error rate and overall error rate and cost compared to the cost compared to the best KDD’99 result.best KDD’99 result.

Feature selection Feature selection can improve the can improve the performance of performance of intrusion detection. intrusion detection.

Overall Error Rate

6.95%

7.00%

7.05%

7.10%

7.15%

7.20%

7.25%

7.30%

7.35%

Best KDDResult

Experimentwithoutfeature

selection

Experiment with featureselection

Cost of Misclassification

0.225

0.226

0.227

0.228

0.229

0.23

0.231

0.232

0.233

0.234

Best KDDResult

Experimentwithoutfeature

selection

Experiment with featureselection

Page 18: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1818

Conclusion and Future WorkConclusion and Future Work

Random forests algorithm can help Random forests algorithm can help improve detection performance and improve detection performance and select features.select features.

Sampling techniques can reduce the time Sampling techniques can reduce the time to build patterns and increase the to build patterns and increase the detection rate of minority intrusions. detection rate of minority intrusions.

In future, we will focus on anomaly In future, we will focus on anomaly detection and a multiple classifier detection and a multiple classifier architecture.architecture.

Page 19: Network Intrusion Detection Using Random Forests

PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1919