- 1 - Data Reduction for the Scalable Automated Analysis of Distributed Darknet Traffic Michael...
-
date post
19-Dec-2015 -
Category
Documents
-
view
226 -
download
1
Transcript of - 1 - Data Reduction for the Scalable Automated Analysis of Distributed Darknet Traffic Michael...
- 1 -
Data Reduction for the Scalable Automated Analysis of
Distributed Darknet Traffic
Data Reduction for the Scalable Automated Analysis of
Distributed Darknet Traffic
Michael Bailey, Evan Cooke, David Watson and Farnam Jahanian
University of Michigan
Karl Rosaen, Niels ProvosGoogle, Inc
Internet Measurement Conference 2005
Thursday, October 20th, 2005
- 2 -
Roadmap
•Motivation for hybrid sensors and filtering
• Explore the bounds of source IP filtering at individual sensors
• Show how source IP filtering across sensors is limited
• Discuss and evaluate a new scheme for filtering across sensors
- 3 -
Fundamental Shift
• Not about big splash, about big cash
• Increasing robust and complex tools enabling increasingly sophisticated attacks without a corresponding increase in attacker knowledge.
• As a result there is shift from a need to understand how the system was compromised to a need to understand how the compromised system is used.
How do you observe behavior AND continue to
catch new exploits and characterize global threat
dynamics?
- 4 -
Hybrid Frameworks
• In order to address needs of new threats we look to combine two existing techniques:• A Blackhole/Dark IP/Network Telescope sensor monitors an
unused globally advertised address block that contains no active hosts. Traffic is the result of DDoS backscatter, worm propagation, misconfiguration, or other scanning (Breadth)
• Honeyfarms are collections of high-interaction honeypots often running actual operating systems and applications along with (complex) forensic monitoring software (Depth)
• Fast and comprehensive data about the emergence of the threat with detailed forensics on the way threat behaves
- 5 -
Hybrid Architecture
Some hybrid projects:• Internet Motion Sensor (IMS)http://ims.eecs.umich.edu/
• Potemkin http://www.cs.ucsd.edu/~savage/papers/Sosp05.pdf
• iSinkhttp://www.cs.wisc.edu/~pb/isink_final.pdf
• Collapsarhttp://www.cs.purdue.edu/homes/jiangx/collapsar/publications/collapsar.pdf
- 6 -
The key problem
• The biggest problem for hybrids today is scalability• A single wide address darknet (/8) can see Tens or Hundreds of
Gigabytes of packet data per day• One approach is to scale the honeypots to the offered
connection load• Scalability, Fidelity and Containment in the Potemkin Virtual
Honeyfarm. SOSP 2005
• Volume of forensic data• E.g. a single honeypot instrumented to capture all sources of non-
determinism (ala ReVirt/Backtracker) can capture over a GB per day per IP
In this paper we examine filtering of darknet traffic inorder to reduce the offered connection load andvolume of data to be analyzed
- 7 -
Filtering at an individual DarkNet
• Begin with existing work on filtering at individual darknets:• Characteristics of Internet Background Radiation. IMC 2004
• Proposed a variety of Source IP-* methods and showed that Source-Destination filtering saw from 96%-98% reductions in packets
• Great! So let’s apply these methods to 14 IMS sensors in August 2004• Explore the methods that were proposed and validate the results
• Determine why they are so effective
- 8 -
Internet Motion Sensor (IMS)
Tier 1 ISPs, Large Enterprise, Broadband,
Academic, National ISPs, Regional ISPs
Initial /8 deployment in 2001. Currently 60 address blocks at 18 networks on 3 continents
/26 x 5
/25 x 1
/24 x 18
/23 x 2
/22 x 4
/21 x 2
/20 x 8
/19 x 1
/18 x 6
/17 x 3
/16 x 9
/8 x 1
- 9 -
% reduction in packets via source-* filtering
MEAN Inter-
Sensor
STDDEV
MIN
Source-Connection
95% 2.7% 53.8%
Source-(dst) Port
93% 3.1% 46.7%
Source-Payload
91% 3.9% 49.1%
• Supported previous results, with differences that can be plausibly explained by monitor block size and monitoring time effects
• Two additional observations relevant to a run time system• The effectiveness of filtering is different between sensors• The effectiveness of filtering is different over time
Why is the filtering at individual sensors so good?
- 10 -
Role of a source IP in traffic at a sensor
• 90% of the packets are from 10% of the unique source IP addresses
- 11 -
Role of a port in traffic seen at a sensor
• Over 90% of the packets target .5% of the TCP/UDP destination ports
- 12 -
How many ports did they contact?
• 55% contact a single port, 99% did less than 10
• A small number did a very large number of ports
Filtering at individual sensors works because a relatively small number of sources send a lot of packets to a small number of ports.
- 13 -
How many sources are there?
• Cumulative number of unique sources at 41 sensors for 21 days from March 19th - April 8th 2005
• Small sensors (/24) see several thousand unique sources per day and large sensors (/8) see several million
• We need additional filtering!
- 14 -
Sources are not prevalent across locations
• Examine the AVERAGE overlap in unique sources per day between sensors over a month period.
• While some blocks do see large overlap (d/8 and f/17 saw 82%) most blocks have very little
Reduction of source based methods across sensors is very small. Each new sensor brings with it its own unique sources
- 15 -
Intersection in Top Ten Ports
• Examine the top ten ports over a day, week and month time frame.
• Determine how many of those ports appear at each of the sensors.
• Only a few ports are visible at all sensors (e.g. TCP/1433, TCP/445, TCP/135, TCP/139). Many are only visible at one.
- 16 -
Why are we seeing different things?
• Impact on monitored block size, scanning rate and observation time on the probability of identifying a random scanning event • Network telescopes. Technical Report CS2004-0795, UC San
Diego, July 2004.
• Lifetime of the events• Targeted behaviors
• The zombie roundup: Understanding, detecting, and disrupting botnets. SRUTI 2005 Workshop
•Maleware Internals• Exploiting Underlying Structure for Detailed Reconstruction of an
Internet-scale Event. IMC 2005
- 17 -
So now what?
• Source based methods are effective at filtering because sources repeat themselves
• However: • there are lots of unique sources at each sensor
• neither the sources nor the ports overlap between sensors
•We need to devise a scheme for additional filtering between sensors:• that addresses visibility into remote scanning events
• that accounts for target attack behavior
- 18 -
Filtering
• Algorithm• At each sensor compare the average number of
unique source IP addresses contacting a destination port over the most recent window to the history window
•Calculate the number of sensors for which this ratio is greater than an EVENT_THRESHOLD. If the number of sensors are greater than the COVERAGE_THRESHOLD, create and event and forward traffic
- 19 -
Filtering
• Insights• Examine only traffic that demonstrates a significant
increase in number of unique sources contacting a specific port, rather than examining individual IPS•Similar to the observation in the context of scanning patterns
from: An effective architecture and algorithm for detecting worms with various scan techniques. NDSS 2004
• Eliminate targeted behavior by only evaluating if a significant number of sensors see this behavior
- 20 -
Evaluation
• Deployment on IMS sensors during first quarter of 2005
• Evaluation showed 13 unique events in 5 groups
• Validation against security lists and operator logs (e.g. NANOG, ISC) showed the scheme to capture all the human detected events.
Description
Port Date Multiple
Coverage
WINS tcp42 01/13/05 17:31
5.36 0.4815
tcp42 01/14/05 05:31
61.85 0.8889
tcp42 01/14/05 17:31
9.77 0.6667
Squid and
tcp3128
02/05/05 11:31
7.73 0.4074
Alt-HTTP tcp3128
02/05/05 23:31
18.19 0.4074
SYN Scan tcp8080
02/05/05 10:51
7.53 0.4074
tcp8080
02/05/05 22:51
20.95 0.3704
MYSQL tcp3306
01/26/05 09:31
43.89 0.3704
tcp3306
01/26/05 21:31
8.2 0.4444
tcp3306
01/27/05 09:31
5.7 0.4074
Syn Scan tcp5000
01/08/05 14:31
3.42 0.6667
Veritas tcp6101
02/23/05 21:32
3.54 0.3704
tcp6101
02/24/05 09:32
3.81 0.3333
- 21 -
Effect of coverage on events
• Coverage represents the percentage of sensors that saw an increase in unique sources
• Only a small handful of events are prevalent across all sensors.
- 22 -
Recent TCP/42 Activity
• November 24, 2004 vulnerability announced on remotely exploitable overflow in the WINS server component of Microsoft Windows
• January 2005, news of significant amounts of increased activity on tcp/42 was noted in multiple reports.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
- 23 -
TCP/42 Payloads
• Captured live payloads that match byte-for-byte with template exploit code
• Same exploit is being reused to inject many different payloads (same exploit with very different shellcode)
• Evidence suggest attacks are from manual tools not automated worm.
• However vulnerability is “wormable”http://ims.eecs.umich.edu/reports/port42/
- 24 -
Wrap-up
• Source based methods are effective in filtering at individual sensors because a relatively small number of sources contact the same ports repeatedly.
• Source IP addresses, and surprisingly destination ports, do not consistently overlap across sensors
• We proposed a filtering mechanism that addresses the limited visibility of blocks into remote events and targeted attack behavior
• We evaluated this mechanism by deploying it across IMS sensors and comparing over 3 months period with human events of interest in operator logs.
- 25 -
Acknowledgements
For more information on the Internet Motion Sensor:
http://ims.eecs.umich.edu [email protected]
Thanks to the ISPs, academic institutions, and organizations for hosting the IMS! Thanks to Danny McPherson, Jose Nazario, Robert Stone, Rob Malan, and Dug Song at Arbor Networks and Larry Blunk, Bert Rossi, and Manish Karir at Merit Network.
And of course our sponsor: