Automated Worm Fingerprinting
description
Transcript of Automated Worm Fingerprinting
![Page 1: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/1.jpg)
Automated Worm Fingerprinting
Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage
Department of Computer Science and EngineeringUniversity of California, San Diego
Presented at : Operating System Design & Implementation (OSDI) 2004
Ramanarayanan Ramani (Ram)Support for this work was provided by NIST Grant 60NANB1D0118 and NSF Grant
0137102.
![Page 2: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/2.jpg)
Overview Why Automated Systems Detecting Worms Characterize Worms Worm Containment Worm Behavior Identify Worm Signatures Earlybird System Design Statistics Conclusion
![Page 3: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/3.jpg)
Why Automated Systems Identify worm – Manually
characterize Signature – Update Antivirus & Network filters
Code Red worm took 14 hours to infect
Slammer took 10 minutes – no time to manually identify signature
Need automatic worm signature identification & secure networks
![Page 4: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/4.jpg)
Detecting Worms Network Telescopes : Monitor
request to large unused, yet routable address space
Can Identify random scan worms Cannot identify Hit-list or Email
worms Cannot characterize the signature
![Page 5: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/5.jpg)
Detecting Worms Using Honeypots Not allow any malicious incoming traffic Unwanted outgoing traffic : may be due
to worm : identify malicious code performing this
Use malicious code to identify signature Takes long time & requires manual
signature identification
![Page 6: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/6.jpg)
Detecting Worms Host-based behavioral detection Analyze patterns of system calls. (e.g.) Route Received packet to be
sent Identify suspicious activity Expensive to manage Needs to employed in every system
separately
![Page 7: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/7.jpg)
Characterize Worm Characterization is the process of
analyzing and identifying a new worm or exploit
Create a priori vulnerability signatures
Can only be applied to vulnerabilities that are already well-known and well-characterized manually
![Page 8: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/8.jpg)
Characterize Worm First Automated System : Used by
IBM for virus Allow to infect “Decoy” programs Identify invariant strings in Infected
objects to characterize viruses Assumes the presence of a known
instance of a virus and a controlled environment to monitor
![Page 9: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/9.jpg)
Characterize Worm Honeycomb system of Kreibich and
Crowcroft Host-based intrusion detection
system Automatically generates signatures
by looking for longest common subsequences among sets of strings found in message exchanges.
Very slow
![Page 10: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/10.jpg)
Characterize Worm Kim and Karp's Autograph system Autograph also uses network-level data to
infer worm signatures Employ Rabin fingerprints to index counters of
content substrings Use white-lists to set aside well known false
positives Has extensive support for distributed
deployments Relies on a pre filtering step that identifies
flows with suspicious scanning activity
![Page 11: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/11.jpg)
Worm Containment Mechanism used to slow or stop
the spread of an active worm Host quarantine String-matching Connection throttling
![Page 12: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/12.jpg)
Worm Behavior Behave quite differently from the
popular client-server and peer-to-peer applications
Have some common behavior patterns across worms – useful to identify and characterize them
![Page 13: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/13.jpg)
Worm Behavior Content invariance Some or all of the worm program is
invariant across every copy Some worms make use of limited
polymorphism - encrypting each worm instance independently and/or randomizing filler text
But still some portion is invariant
![Page 14: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/14.jpg)
Worm Behavior Content prevalence Worms are designed foremost to
spread - the invariant portion of a worm's content will appear frequently on the network as it spreads or attempts to spread
![Page 15: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/15.jpg)
Worm Behavior Address dispersion Packets containing a live worm will
tend to reflect a variety of different source and destination addresses
This range increases when there is a major outbreak
![Page 16: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/16.jpg)
Identify Worm SignaturesProcessTrafc(payload,srcIP,dstIP)1 prevalence[payload]++2 Insert(srcIP,dispersion[payload].sources)3 Insert(dstIP,dispersion[payload].dests)4 if (prevalence[payload]>PrevalenceTh5 and size(dispersion[payload].sources)>SrcDispTh6 and size(dispersion[payload].dests)>DstDispTh)7 if (payload in knownSignatures)8 return9 endif10 Insert(payload,knownSignatures)11 NewSignatureAlarm(payload)12 endif
![Page 17: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/17.jpg)
Identify Worm Signature This method is called Content Sifting Too much data to be handled in high
speed networks Too many substrings need to be
stored Too much time taken to process one
packet
![Page 18: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/18.jpg)
Earlybird System Design Scan network & process packets Identify repeating substrings along
with list of the source & destination If repetition is over threshold, set
substring to be signature & ask network security system to block packets with respective signature
![Page 19: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/19.jpg)
Estimate Content prevalence Finding the packet payloads that appear at least x times among the N
packets sent during a given interval Uses multi-stage filters with conservative update to dramatically
reduce the memory footprint of the problem
Append the destination port and protocol to the content before hashing Detecting repeating strings with a small fixed length B Compute a variant of Rabin fingerprints for all possible substrings of a
certain length Each packet with a payload of s bytes has s - B +1 strings of length , so
the memory references used per packet – very high
![Page 20: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/20.jpg)
Estimating address dispersion Address dispersion is critical for avoiding false
positives Count the distinct source IP addresses and
destination IP addresses associated with each piece of content suspected of being generated by a worm
Use approximate counting of distinct addresses using Bitmaps
Direct Bitmaps : 32-bits. Hash Addresses to One bit and set that bit
For a threshold of 30 distinct addresses – 20 bits set
Ability to estimate the actual values of each counter is less
![Page 21: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/21.jpg)
Estimating address dispersion Earlybird technique – Scaled Bitmaps Accurately estimates address dispersion using
five times less memory Sub-sampling the range of the hash space (e.g.) To count up to 64 sources using 32 bits,
one might hash sources into a space from 0 to 63 yet only set bits for values that hash between 0 and 31 - ignoring half of the sources
We track a continuously increasing count by simply increasing this scaling factor whenever the bitmap is filled
![Page 22: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/22.jpg)
Estimating address dispersion Once the bitmap is scaled to a new
configuration, the addresses that were active throughout the previous configuration are lost and adjusting for this bias directly can lead to double counting
So we use multiple bitmaps to store history – here we use 3
![Page 23: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/23.jpg)
Estimating address dispersion
![Page 24: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/24.jpg)
Estimating address dispersionUpdateBitmap(IP)1 code = Hash(IP)2 level = CountLeadingZeroes(code)3 bitcode = FirstBits(code << (level+1))4 if (level base and level < base+numbmps)5 SetBit(bitcode,bitmaps[level-base])6 if (level == base and CountBitsSet(bitmaps[0]) == max)7 NextConguration()8 endif9 endif
ComputeEstimate(bitmaps,base)1 numIPs=02 for i= 0 to numbmps-13 numIPs=numIPs+b ln(b/CountBitsNotSet(bitmaps[i]))4 endfor5 correction= 2(2^base - 1) / (2^numbmps - 1) . b ln(b/(b - max))6 return numIPs 2base=(1 – (2 ^ (-numbmps)))+correction
![Page 25: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/25.jpg)
CPU scaling Processing each packet payload as a single
string is easy But when applying Rabin fingerprints, the
processing of every substring of length B can overload the CPU during high traffic load – too much processing
A packet with 1,000 bytes of payload and B = 40, requires processing 960 Rabin fingerprints
To reduce processing time – sample the packets which are processed
Randomly sampling substrings to process could cause us to miss a large fraction of the occurrences of each substring
![Page 26: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/26.jpg)
CPU Scaling Instead use value sampling and select only those
substrings for which the fingerprint matches a certain pattern – like last six bits are 0
The probability of detecting a worm with a signature of length x
Probability of tracking a worm with a signature of 100 bytes is 55%, but for a worm with a signature of 200 bytes it increases to 92%, and for 400 bytes to 99.64%
![Page 27: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/27.jpg)
Complete System
![Page 28: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/28.jpg)
Program LoopProcessPacket()1 InitializeIncrementalHash(payload,payloadLength,dstPort)2 while (currentHash=GetNextHash())3 if (currentADEntry=ADEntryMap.Find(currentHash))4 UpdateADEntry(currentADEntry,srcIP,dstIP,packetTime)5 if ( (currentADEntry.srcCount > SrcDispTh)
and (currentADEntry.dstCount > DstDispTh) )6 ReportAnomalousADEntry(currentADEntry,packet)7 endif8 else9 if ( MsfIncrement(currentHash) > PravalenceTh)10 newADEntry=InitializeADEntry(srcIP,dstIP,packetTime)11 ADEntryMap.Insert(currentHash,newADEntry)12 endif13 endif14 endwhile
![Page 29: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/29.jpg)
Statistics Implementation is written in C The aggregator also uses the
MySql database to log all events Used popular rrd-tools library for
graphical reporting PHP scripting for administrative
control
![Page 30: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/30.jpg)
Content prevalence threshold
•Using a 60 second measurement interval and a whole packet CRC, over 97 percent of all signatures repeat two or fewer times and 94.5 percent are only observed once•Using a finer grained content hash or a longer measurement interval increases these numbers even further
![Page 31: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/31.jpg)
Address dispersion threshold
After 10 minutes there are over 1000 signatures with a low dispersion threshold of 2
Using a threshold of 30, there are only 5 or 6 prevalent strings meeting the dispersion criteria
![Page 32: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/32.jpg)
Garbage Collection
When the timeout is set to 100 seconds, then almost 60 percent of all signatures are garbage collected before a subsequent update
Using a timeout of 1000 seconds, this number is reduced to roughly 20 percent of signatures
![Page 33: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/33.jpg)
Positives Automatic Detection,
Characterization & Containment Low processor time consumed Low memory consumption Identify new worms and produce
signatures – even E-Mail worms
![Page 34: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/34.jpg)
Problems Can’t identify worms with very less or no
invariant portion Can use compression modules like zip to confuse
Earlybird Vulnerabilities in IPSec, SSL & VPN can’t be
secured Attempt to evade our monitor through traditional
IDS evasion techniques – like IP spoofing Stealth worm difficult to identify Purposely create worm defense to disallow some
service by spreading similar packets
![Page 35: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/35.jpg)
Suggestions Uncompress Packets & Identify original
contents Need to have system as firewall for
Secure protocols Use triggering data across time scales
(In paper) or maintain history of slowly repeating data
Check working of worm – see if it is really a worm in infected systems
![Page 36: Automated Worm Fingerprinting](https://reader036.fdocuments.us/reader036/viewer/2022062316/5681684f550346895dde4e57/html5/thumbnails/36.jpg)
Questions