Benchmarking Anomaly-based Detection Systems
description
Transcript of Benchmarking Anomaly-based Detection Systems
![Page 1: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/1.jpg)
Benchmarking Anomaly-based Detection Systems
Ashish GuptaNetwork Security
May 2004
![Page 2: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/2.jpg)
Overview
• The Motivation for this paper– Waldo example
• The approach• Structure in data• Generating the data and anomalies• Injecting anomalies• Results
– Training and Testing: the method– Scoring– Presentation– The ROC curves: somewhat obvious
![Page 3: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/3.jpg)
MotivationDoes anomaly detection depend on
regularity/randomness of data ?
![Page 4: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/4.jpg)
Where’s Waldo !
![Page 5: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/5.jpg)
Where’s Waldo !
![Page 6: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/6.jpg)
Where’s Waldo !
![Page 7: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/7.jpg)
The aim
• Hypothesis:– Differences in data regularity affect anomaly
detection– Different environments different regularity
• Regularity– Highly redundant or random ?– Example of environment’s affect
010101010101010101010101Or
0100011000101000100100101
![Page 8: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/8.jpg)
Consequences
One IDS : Different False Alarm Rates
Need custom system/training for each environment ?
Temporal affects: Regularity may vary over time ?
![Page 9: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/9.jpg)
Structure in dataMeasuring randomness
![Page 10: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/10.jpg)
010101010101010101010101Or
0100011000101000100100101
Measuring Randomness
Relative Entropy Sequential Dependence+
Conditional Relative Entropy
![Page 11: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/11.jpg)
The benchmark datasets
• Three types:– Training data ( the background data)– Anomalies– Testing data ( background + anomalies )
• Generating the sequences– 5 sets, each set 11 files ( for increasing
regularity)– Each set different alphabet size– Alphabet size decides complexity
![Page 12: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/12.jpg)
Anomaly Generation
• What’s a surprise ? – Different from the expected probability
• Types:– Juxta-positional : different arrangements of data
• 001001001001001001111– Temporal
• Unexpected periodicities– Other types ?
![Page 13: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/13.jpg)
Types in this paper
• Foreign symbol– AAABABBBABABCBBABABBA
• Foreign n-gram
– AAABABAABAABAAABBBBA• Rare n-gram
– AABBBABBBABBBABBBABBBABBAA
![Page 14: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/14.jpg)
• Injecting anomalies– Make sure not more than 0.24 %
![Page 15: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/15.jpg)
The experiments
The Hypothesis is true
![Page 16: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/16.jpg)
• The hypothesis:– Nature of “normal” background noise affects
signal detection• The anomaly detector
– To detect anomalous subsequences– Learning phase n-gram probability table– Unexpected event anomaly !– Anomaly threshold decides level of surprise
![Page 17: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/17.jpg)
• Example of anomaly detectionAAA 0.12
AAB 0.13
ABA 0.20
BAA 0.17
BBB 0.15
BBA 0.12
AAC ANOMALY !
![Page 18: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/18.jpg)
Scoring
• Event outcomes– Hits– Misses– False alarms
• Threshold– Decides level of surprise– 0 completely unsurprising, 1 astonishing– Need to calibrate
![Page 19: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/19.jpg)
Presentation of results
• Presents two aspects:– % correct detections– % false detections
• Detector operates through a range of sensitivities– Higher sensitivity ? – Need the right sensitivity
![Page 20: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/20.jpg)
![Page 21: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/21.jpg)
Interpretation
• Nothing overlaps regularity affects detection !
![Page 22: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/22.jpg)
• What does this mean ?• Detection metrics are data dependent• Cannot say:
– My XYZ product will flag down 75% percent anomalies with 10% false hit rate !
– Sir, are you sure ?
![Page 23: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/23.jpg)
Real world data
• Regularity index for system calls for different users
![Page 24: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/24.jpg)
• Is this surprising ?• What about network traffic ?
![Page 25: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/25.jpg)
Conclusions
Data Structure Anomaly Detection Effectiveness
Evaluation is data dependent
![Page 26: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/26.jpg)
Conclusions
Change in regularityDifferent system
Or
Change the parameters
![Page 27: Benchmarking Anomaly-based Detection Systems](https://reader035.fdocuments.us/reader035/viewer/2022062814/56816836550346895dddf69c/html5/thumbnails/27.jpg)
Quirks ?
• Assumes rather naïve detection systems– “Simple retraining will not suffice”
• An intelligent detection can take this into account.
• What is really an anomaly ? – If data is highly irregular, won’t randomness
produce some anomalies by itself• Anomaly is a relative term
– Here anomalies are generated independently