NetViewer: A Network Traffic Visualization and Analysis Tool
description
Transcript of NetViewer: A Network Traffic Visualization and Analysis Tool
USENIX LISA’05
NetViewer: A Network Traffic Visualization and Analysis Tool
Seong Soo Kim
A. L. Narasimha Reddy
Electrical and Computer Engineering
Texas A&M University
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 2USENIX LISA’05
Contents• Introduction and Motivation• Our Approach• NetViewer’s Architecture• NetViewer’s Functionality• Evaluation of Netviewer• Conclusion
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 3USENIX LISA’05
Attack/ Anomaly- Single attacker (DoS)- Multiple Attackers (DDoS)- Multiple Victims (Worms, viruses)
Aggregate Packet header data as signals Image based anomaly/attack detectors
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 4USENIX LISA’05
Motivation (1)
• Previous studies looked at individual flows behavior These become ineffective with DDoS
Aggregate Analysis• Link speeds are increasing
- currently at G b/s, soon to be at 10~100 G b/sNeed simple, effective mechanisms
• Packet inspection can’t be expensive• Can we make them simple enough to implement
them at line speeds?
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 5USENIX LISA’05
Motivation (2)• Signature (rule)-based approaches are
tailored to known attacks
- Become ineffective when traffic patterns or attacks changeNew threats are constantly emerging
Quick identification of network anomalies is necessary to contain threat
• Can we design general mechanisms for attack detection that work in real-time?
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 6USENIX LISA’05
Our Approach (1)• Look at aggregate information of traffic
- Collect data over a large duration (order of seconds)
- Can be higher if necessary
• Use sampling to reduce the cost of processing
• Process aggregate data to detect anomalies - Individual flows may look normal look at
the aggregate picture
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 7USENIX LISA’05
Our Approach (2) - Environment
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 8USENIX LISA’05
NetViewer’s Architecture• Packet Parser : Collects and filters raw packets and traffic data from packet
header traces or NetFlow records.
• Signal Computing Engine : Analyzes the statistical properties of aggregate traffic distributions.
• Detection Engine : Thresholds setting through statistical measures of traffic signal.
• Visualization Engine : Employing image processing , and displaying traffic signals and images
• Alerting Engine : Attacks and anomalies are detected/identified in real-time
Packet Parser
Statistical Analysis&
Anomaly Detection
Visualization&
Alerting
NetworkTraffic
DetectionReport
The block diagram of NetViewer
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 9USENIX LISA’05
Packet Parser (1)• Packet headers carry a rich set of information
- Data : Packet counts, byte counts, the number of flows- Domain : Source/destination address, source/destination Port
numbers, protocol numbers
• Processing traffic header poses challenges.- Discrete spaces- Large Domains
- 232 IPv4 addresses- 216 Port numbers
Need Mechanisms to reduce the domain size Need Mechanisms to generate useful signals
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 10USENIX LISA’05
• 2 dimensional arrays count[i][j]
- To record the packet count for the address j in ith field of the IP address
• Normalized packet counts
• Effects
- Constant, small memory regardless of the packets, 232 (4G) 4*256 (1K)
- Running time O(n) to O(lgn)
- Somewhat reversible hash function
255,..,0
3,2,1,0,
]][][[]][][[
2550
j
i
njicountnjicount
pj
ijn
Packet Parser (2) – Data structure for reducing domain size
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 11USENIX LISA’05
• Simple example
• IP of Flow1 = 165. 91. 212. 255, Packet1 = 3IP of Flow2 = 64. 58. 179. 230, Packet2 = 2IP of Flow3 = 216. 239. 51. 100, Packet3 = 1IP of Flow4 = 211. 40. 179. 102, Packet4 = 10IP of Flow5 = 203. 255. 98. 2, Packet5 = 2
0 64 128 192 255
3 3 3
3
Packet Parser (3) – Data structure for reducing domain size
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 12USENIX LISA’05
• Simple example
• IP of Flow1 = 165. 91. 212. 255, Packet1 = 3IP of Flow2 = 64. 58. 179. 230, Packet2 = 2IP of Flow3 = 216. 239. 51. 100, Packet3 = 1IP of Flow4 = 211. 40. 179. 102, Packet4 = 10IP of Flow5 = 203. 255. 98. 2, Packet5 = 2
0 64 128 192 255
2 3 2 10 1
10 2 3 1 2
1 2 12 3
2 1 10 2 3
Packet Parser (3) – Data structure for reducing domain size
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 13USENIX LISA’05
Signal Computing Engine
255,..,0
3,2,1,0,
]1][][[
]1][][[
]][][[
]][][[255
0255
0
j
i
njicount
njicount
njicount
njicountC
jjijn
255,..,0
3,2,1,0,
]1][][[]1][][[
]][][[]][][[
2550
2550
j
i
njicountnjicount
njicountnjicount
pjj
ijn
• Correlation- To measure the strength of the linear relationship between adjacent sampling
instants
• Delta
– The difference of traffic intensity– It is remarkable at the instant of beginning and ending of attacks
• Scene change Analysis– Variance of pixel intensities in the image
3
0
255
0
3
0
255
0
2 2
1
10241
)(1024
1
i jijnijnijn
i jijnijn
pp and sintensitie pixel are p where,
ppS
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 14USENIX LISA’05
Detecting Engine – Threshold setting
• From generated distribution signals (S), derive statistical thresholds
- High threshold TH : Traffic distribution less correlated than usual
- Low threshold TL : Traffic distribution more uniform than usual
L
HL
H
TS if ,random
TST if ,normal
TS if ,msemi-rando
atustraffic st
X NX
%7.99)0.30.3.(Pr),(~ 2
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 15USENIX LISA’05
Visualization Engine
• Treat the traffic data as images
• Apply image processing based analysis
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 16USENIX LISA’05
Image Generation
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 17USENIX LISA’05
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 18USENIX LISA’05
Generated various traffic Images
• Image reveals the characteristics of traffic
– Normal behavior mode
– A single target (DoS)
– Semi-random target : a subnet is fixed and other portion of address is change
(Prefix-based attacks)
– Random target :
horizontal (Worm) and vertical scan (DDoS)
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 19USENIX LISA’05
Alerting Engine
• Scrutinize the statistical quantities – correlation and delta
• Identify the IP addresses of suspicious attackers and victims
• Lead to some form of a detection signal• Generate the detection report
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 20USENIX LISA’05
NetViewer’s Functionality• Traffic Profiling
– General information of current network traffic• Monitoring
– Monitor traffic distribution signal (S) over the latest time-window• Anomaly Reporting
– Image-based traffic in the source/destination IP address domain and the 2-dimensional domain
• Auxiliary Function– Multidimensional Image– Attack Tracking– Automatic Spoofed Address Masking
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 21USENIX LISA’05
Traffic Profiling Function (1)
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 22USENIX LISA’05
Traffic Profiling Function (2)• Understanding the general nature of the traffic ay the
monitoring point• Bandwidth in Kbps and Kpps (packet per sec.)• Protocol : the proportion occupied by each traffic
protocol in percent• Top 5 flows : the topmost 5 flows in packet count or
byte count or flow number– Based on LRU (least Recently Used) policy cache
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 23USENIX LISA’05
Monitoring Function (1)
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 24USENIX LISA’05
Monitoring Function (2)
• Traffic distribution signal (S) over the latest time-window- 3 kinds of selected signals – S of packet count, S of byte count, S of flow count
- Source IP : packet count distribution signal in the source IP address domain
- Source FLOW : the number of flow distribution signal in the source IP address domain
- Source PORT : packet count distribution signal in the source IP port domain
- MULTIDIMENSIONAL : multiple components of the above signals in source domain
• Pr : the anomalous probability of current traffic under Gaussian distribution
• Signal : the distribution signal computed by
– illustrated with dotted vertical lines of 3 level and : mean value and standard deviation of distribution signal using
EWMA
ppSi j
ijnijn
3
0
255
0
2 2
1
)(1024
1
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 25USENIX LISA’05
Anomaly Reporting Function (1)
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 26USENIX LISA’05
Anomaly Reporting Function (2)– normal network traffic
• Use variance of pixel intensities– Distribution of traffic
over the observed domain
• During anomalies, the traffic distributions different from normal traffic– Higher correlation
(DOS)– Lower correlation
(worms)
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 27USENIX LISA’05
Anomaly Reporting Function (3)– semi-random targeted attacks
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 28USENIX LISA’05
Anomaly Reporting Function (4)– random targeted attacks
• Worm propagation type attack
• DDoS propagation type attack
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 29USENIX LISA’05
Anomaly Reporting Function (5)– complicated attacks
• Complicated and mixed attack pattern
• The horizontal (dotted or solid) line => specific source scanning destination addresses.
• The vertical line => random sources assail specific destination
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 30USENIX LISA’05
Anomaly Reporting Function (6)– Summary of Visual representation of traffic
• Worm attacks – horizontal line in 2D image
• DDoS attacks – vertical line in 2D image Line detection algorithm
• Visual images look different in different traffic modes
• Motion prediction can lead to attack prediction
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 31USENIX LISA’05
Anomaly Reporting Function (7)
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 32USENIX LISA’05
Anomaly Reporting
Function (7)- Identification
**************************************************************[ Time : Tue 10-14-2003 05:12:00 ] --------------------------------------------------------------Source IP[1] 134. correlation = 17.48% possession = 18.77% delta = 2.50% SSource IP[1] 141. correlation = 4.33% possession = 3.94% delta = 0.79% SSource IP[1] 155. correlation = 58.20% possession = 56.80% delta = 2.84% SSource IP[1] 210. correlation = 5.66% possession = 6.51% delta = 1.60% SSource IP[2] 75. correlation = 17.47% possession = 18.77% delta = 2.51% SSource IP[2] 110. correlation = 4.62% possession = 5.25% delta = 1.21% SSource IP[2] 223. correlation = 4.31% possession = 3.94% delta = 0.78% SSource IP[2] 230. correlation = 58.21% possession = 56.84% delta = 2.76% SSource IP[3] 7. correlation = 15.59% possession = 17.02% delta = 2.74% SSource IP[3] 14. correlation = 53.99% possession = 52.31% delta = 3.41% SSource IP[4] 41 correlation = 15.16% possession = 16.36% delta = 2.30% SSource IP[4] 50 correlation = 52.58% possession = 50.83% delta = 3.54% S--------------------------------------------------------------Identified No. 1st = 4, 2nd = 4, 3rd = 2, 4th = 2==============================================================Destination IP[1] 18. correlation = 4.37% possession = 3.88% delta = 1.01% SDestination IP[1] 128. correlation = 6.08% possession = 7.01% delta = 1.75% SDestination IP[1] 131. correlation = 53.65% possession = 52.33% delta = 2.67% SDestination IP[2] 181. correlation = 56.03% possession = 54.00% delta = 4.15% SDestination IP[4] 26 correlation = 3.89% possession = 3.58% delta = 0.65% S--------------------------------------------------------------Identified No. 1st = 3, 2nd = 1, 3rd = 0, 4th = 1==============================================================* Identified Suspicious Source IP address(es) 134. 75. 7. 41 correlation = 17.48% possession = 18.77% delta = 2.50% S 141.223.xxx.xxx correlation = 4.33% possession = 3.94% delta = 0.79% S 155.230. 14. 50 correlation = 58.20% possession = 56.80% delta = 2.84% S 210.xxx.xxx.xxx correlation = 5.66% possession = 6.51% delta = 1.60% S-------------------------* Identified Suspicious Destination IP address(es) 18.xxx.xxx.xxx correlation = 4.37% possession = 3.88% delta = 1.01% 128.xxx.xxx.xxx correlation = 6.08% possession = 7.01% delta = 1.75% S 131.181.xxx.xxx correlation = 53.65% possession = 52.33% delta = 2.67%**************************************************************
The detection report of anomaly identification.
• Identify IP using statistical measures
• Black list
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 33USENIX LISA’05
Flow-based Network Traffic
• The number of flows based visual representation– The number of flows in
address domain.
– The black lines illustrate more concentrated traffic intensity.
– An analysis is effective for revealing flood types of attacks.
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 34USENIX LISA’05
Port-based Network Traffic
• Port number based visual representation– Normalized packet
counts in port-number domain.
– An analysis is effective for revealing portscan types of attacks.
• Normal network traffic
• Attack traffic: SQL Slammer worm• 0d 1434 = 0x 059A = 0d 5 + 0d 154
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 35USENIX LISA’05
Multidimensional Visualization
• Study multi-dimensional signals in IP address
i) packet counts R
ii) number of flows G
iii) the correlation of packet counts B
• Comprehensive characteristics.
• Diverse analysis.
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 36USENIX LISA’05
Evaluation in Address-based signalsTime D. TP FP NP NP LR 5 NLR 6
Real-time SA 81.5%
637/782
0.06%
2/3563 76.3% 0.15% 1451.2/
508.7
0.19/
0.24
DA 87.1%
681/782
0.42%
15/3563 88.4% 0.15% 206.9/
589.3
0.13/
0.12
(SA, DA)
94.2%
737/782
0.48%
17/3563
_ _ 197.5 0.06
• NP Test shows a little high performance than 3• 2 dimensional is better than 1 dimensional.
1. True Positive rate by 3, the number of detection / the number of anomalies.
2. False Positive rate by 33. Expected true positive rate by NP test
4. Expected false positive rate by NP test
5. Likelihood Ratio in measurement by 3 / LR in NP test
6. Negative Likelihood Ratio by 3 / NLR in NP test
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 37USENIX LISA’05
Port-based signalsTime D. TP FP NP NP LR NLR
Real-time SP 83.4%
652/782
0.14%
5/3563 94.9% 0.07% 594.1/
1428.8
0.17/
0.05
DP 96.2%
752/782
0.17%
6/3563 90.5% 0.14% 571.1/
630.4
0.04/
0.09
(SP, DP)
96.8%
757/782
0.25%
9/3563
_ _ 383.2 0.03
• Port-based signal could be a powerful signal
• Particularly useful for probing/scanning attacks
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 38USENIX LISA’05
Multidimensional signals
Time D. TP FP LR NLR
Real-time
(S, D)97.1%
759/782
0.62%
22/3563 157.2 0.03
Post mortem
(S, D)97.4%
762/782
0.34%
12/3563 289.3 0.03
• Combined with three distinct image-based signals : address-based, flow-based and port-based
• Improve the detection rates considerably
• It is possible to detect complicated attacks using various signals
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 39USENIX LISA’05
Attack Tracking - Motion prediction
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 40USENIX LISA’05
Automatic Spoofed address Masking
• Unassigned by IANA – especially, 1st byte• Blue-colored polygons indicate the reserved IP addresses – there
should be no pixels matching the unassigned space• Destination IP : normal traffic• Source IP : SQL slammer using (randomly) address spoofed traffic
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 41USENIX LISA’05
Comparison with IDS
• Intrusion detection system (IDS) is signature-based compared to our measurement-based.– Compares with predefined rules
– Need to be updated with the latest rules.
• Snort as representative IDS.• Both show similar detection on TAMU trace.• Snort is superior in identification
– But missed heavy traffic sources and new patterns
– Required more processing time.
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 42USENIX LISA’05
Advantages
• Not looking for specific known attacks• Generic mechanism• Works in real-time
– Latencies of a few samples– Simple enough to be implemented inline
• Window and Unix versions are released at http://dropzone.tamu.edu/~skim/netviewer.html
• Comments to [email protected] or [email protected]
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 43USENIX LISA’05
Conclusion• We studied the feasibility of analyzing packet header data
as Images for detecting traffic anomalies.• We evaluated the effectiveness of our approach for real-
time modes by employing network traffic.
• Real-time traffic analysis and monitoring is feasible– Simple enough to be implemented inline
• Can rely on many tools from image processing area– More robust offline analysis possible– Concise for logging and playback
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 44USENIX LISA’05
Thank you !!
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 45USENIX LISA’05
Identification (2): Entire IP address level• Step 1: Employ 4 independent hash
functions as a Bloom filter, h1(am), h2(am), h3(am), h4(am).
• Step 2: Concatenation of suspicious IP bytes using -vicinity.
Continue to the 4th byte.
• Step 3: Membership query of generated 4-byte IP address
Automatic containment for identified attacks
Seong Soo Kim & A.L.Narasimha Reddy
Texas A&M University 46USENIX LISA’05
Processing and memory complexity• Two samples of packet header data 2*P, P is the size of
the sample data• Summary information (DCT coefficients etc.) over
samples S• Total space requirement O(P+S)• P is 232 4*256 = 1024 (1D), 264 256K (2D)• S is 32*32 16 Memory requires 258K
• Processing O(P+S)• Update 4 counters per domain
• Per-packet data-plane cost low.