Chapter 4 Anti-Virus. Anti-Virus Three tasks for anti-virus 1. Detection o Infected or not?...
-
Upload
jamie-yetman -
Category
Documents
-
view
220 -
download
3
Transcript of Chapter 4 Anti-Virus. Anti-Virus Three tasks for anti-virus 1. Detection o Infected or not?...
Chapter 4
Anti-Virus
Anti-Virus Three tasks for anti-virus1. Detection
o Infected or not? Provably undecidable…
2. Identificationo May be separate from detection,
depending on detection method used
3. Disinfectiono Remove the virus
Detection: Static Methods Generic methods
o Detects known and unknown viruseso For example, anomaly detection
Virus-specific methodso Detects known viruseso For example, signature detection
Static --- virus code not running Dynamic --- virus code running
Detection Outcomes
Detection Outcomes Also can have ghost positive Virus remnant “detected”
o But virus is no longer there How can this happen?
o Previous disinfection was incomplete
Static Detection Detection without running virus
code Three approaches…1. Scanners
o Signature
2. Heuristicso Look for “virus-like” code
3. Integrity Checkerso Hash/checksum
Scanners On-demand
o Files scanned when you say so On-access
o Constant scanning in backgroundo Whenever file is accessed, it’s
scanned
Scanners Signature scanning
o Viruses represented by “signature”o Signature == pattern of bits in a virus (might
include wildcards) “Hundreds of thousands of signatures” Not feasible to scan one-by-one
o Multiple pattern searcho Efficiency is critical
We look in detail at several algorithms
Algorithm: Aho-Corasick Developed 1975, bibliographic
search Based on finite automaton (graph)
o Circles are search stateso Edges are transitionso Double circles are final states/output
And a failure functiono What to do when no suitable
transitiono I.e., where to resume “matching”
Algorithm: Aho-Corasick When virus scanning, search for
virus signature, which is bit string For simplicity, illustrate algorithm
using English words For our example… Scan for any of the following
words:o hi, hips, hip, hit, chip
Algorithm: Aho-Corasick
Aho-CorasickExample
Algorithm: Aho-Corasick How to construct automaton?
o And failure function Build the automaton --- next slide
o A “trie”, also known as a “prefix tree” Then determine failure function
o Two slides ahead
Aho-Corasick:Trie
Labels added in breadth-first order
Closest to root get smallest numbers
Aho-Corasick: Failure Function
Depth 1 nodes o Fail goes back to start state
For other stateso Go back to earliest place where
search can resumeo Pseudo-code is in the book
Aho-Corasick The bottom line… Linear search that can find
multiple signatureso Like searching in parallel for related
signatures Efficient representation of
automaton is the challengeo Both time and space issues
Algorithm: Veldman Linear search on “reduced”
signatureso Sequential search on reduced set
From each signature, select 4 adjacent non-wildcard byteso Want as many signatures as possible
to have each selected 4-byte pattern Then use 2 hash tables to filter…
o Hash tables: 1st 2 bytes & 2nd 2 bytes
Algorithm: Veldman Example Suppose the following 5 signatures
o blar?g, foo, greep, green, agreed Select 4-byte patterns, no
wildcards:
Algorithm: Veldman Hashes act as filters Test things that pass thru both
filterso In this example, get things like “grar”
Algorithm: Veldman Veldman allows for wildcards and
complex signatureso Aho-Corasick does not
But both algorithms analyze every byte of input
Is it possible to do better?o That is, can we skip some of the
input?
Algorithm: Wu-Manber Like Veldman’s algorithm
o But can skip over bytes that can’t possibly match
o Faster, improved performance Illustrate algorithm with same
signatures used for Veldman’s:o blar?g, foo, greep, green, agreed
Algorithm: Wu-Manber
Calculate MINLENo Min length of any pattern
substring Two hash tables
o SHIFT --- number of bytes that can safely be skipped
o HASH --- mapping to signatures Input bytes denoted b1,b2,…,bn
Start at bMINLEN consider byte pairs
Algorithm: Wu-Manber
Example: Suppose hash tables are…
Wu-ManberExample
Here, MINLEN = 3
Start at bMINLEN
Algorithm: Wu-Manber How to construct hash tables? It’s a 4-step process
o Calculate MINLENo Initialize SHIFT tableo Fill SHIFT tableo Fill HASH table
Algorithm: Wu-Manber Calculate MINLEN
o Minimum number of adjacent, non-wildcard bytes in any signature
For this example, we haveo blar?g 4 foo 3o greep 5 green 5o agreed 6
So we have MINLEN = 3
Algorithm: Wu-Manber
SHIFT table Extract MINLEN pattern
substringso blar?g bla foo fooo greep gre green greo agreed agr
Extract all distinct 2-byte sequenceso bl, la, fo, oo, gr, re, ag
If input pair is not one of these, safe to skip MINLEN - 1 bytes
Algorithm: Wu-Manber SHIFT table Initialize SHIFT table to MINLEN – 1 For 2-byte pairs: bl, la, fo, oo, gr, re, ag
oDenote as xyoLet qxy be rightmost ending position of
xy in any pattern substringoFor example, gr in agr and gre, but bl in
blaoSo, qgr = 3 while qbl = 2
oThen set SHIFT[xy] = MINLEN – qxy
Note: Wildcard matches everything…
Algorithm: Wu-Manber
HASH table If SHIFT[xy] = MINLEN – qxy = 0
o Then we are at right edge of a pattern So, set HASH[xy] to all signatures
with pattern substring ending xy For example
o HASH[gr] agreedo HASH[re] greep, green
Algorithm: Wu-Manber Here, we illustrated simplest form
of the algorithm More advanced forms can handle
10s of thousands of signatures Worst case performance is terrible
o Sequential search thru every byte of input for every signature…
But tests show it’s good in practice
Testing How can we know if scanner
works? Test on live viruses?
o Might not be a good idea EICAR standard antivirus test file
o Not too useful either So, what to do?
o Author doesn’t have any suggestions!
Improving Performance “Grunt scanning” --- scan
everythingo Slow slow slow
Search only beginning and end of files
Scan code entry pointo And points reachable from entry point
If position of virus in file is known…o Make it part of the “signature”
Limit scans to size of virus(es)
Improving Performance Only scan certain types of files
o Not so viable today Only rescan files that have
changedo How to detect change?o Where to store this info? Cache?
Database? Tagged to file?o Updates to signatures? Must rescan…o How to checksum efficiently?
Improving Performance How to checksum efficiently?
o Checksum entire file might take longer than scanning it
o Only checksum parts that are scanned
How to avoid checksum tampering?o Encrypt? Where to store the key?o Checksum the checksums?o Other?
Improving Performance Improve the algorithm
o Maybe tailor algorithms to file type Optimize implementation
o May be of limited value Other?
Static Heuristics Like having expert look at code… Look for “virus-like” code
o Static, so we don’t execute the code 2 step process
o Gather datao Analyze data
Static Heuristics
What data to gather? “Short signatures” or boosters
o Junk codeo Decryption loopo Self-modifying codeo Undocumented API callso Unusual/non-compiler instructionso Strings containing obscenities or
“virus”Stopper --- thing virus would not do
Static Heuristics Other heuristics include… Length of code
o Too short? May be appended virus Statistical analysis of instructions
o Handwritten assemblyo Encrypted code
Might look for signature heuristicso Common characteristics of signatures
Static Heuristics Analysis phase May be simple…
o Weighted sum of various factorso Unusual opcodes, etc.
…or complexo Machine learning (HMM, neural nets,
etc.)o Data miningo Heuristic search (genetic algorithm,
etc.)
Integrity Checkers Look for unauthorized change to
files Start with 100% clean files Compute checksums/hashes Store checksums Recompute checksums and
compareo If they differ, a change has occurred
Integrity Checkers
3 types of integrity checkers Offline --- recompute checksums
periodically (e.g., once/week) Self-checking --- modify file to
check itself when run o Essentially, a beneficial “virus”o For example, virus scanner self-checks
Integrity shell --- OS performs checksum before file executed
Detection: Dynamic Methods
Detection based on running the codeo Observe the “behavior”
Two type of dynamic methodso Behavior monitor/blockerso Emulation
Behavior Monitor/Blocker Monitor program as running Watch for “suspicious” behavior What is suspicious?
o It’s too far from “normal” What is normal?
o A statistical measure --- mean, average
How far is too far?o Depends on variance, standard
deviation
Behavior Monitor/Blocker “Normal” monitored in 3 ways…1. Actions that are permitted
o White list, positive detection
2. Actions that are not permittedo Black list, negative detection
3. Some combination of these two Analogies to immune system
o Distinguish self from non-self
Behavior Monitor/Blocker “Care must be taken… because
anomalous behavior does not automatically imply viral behavior”o That’s an understatement!
This is the fundamental problem in anomaly detectiono Potential for lots of false positives
Behavior Monitor/Blocker
Look for short “dynamic signatures”o Like signature detection, but input
string generated dynamically But what to monitor? Infection-like behavior?
o Open an exe for read/writeo Read code start address from headero Write start address to headero Seek to end of exe, append to exe, etc.
Behavior Monitor/Blocker How to reduce false positives?
o Consider “ownership” --- some apps get more leeway (e.g., browser clearing cache)
How to prevent damage?o “Dynamic” implies code actually
running…o System undo capability?
How long to monitor? o Monitoring increases overheado Can virus outlast monitor?
Emulation Execute code, but not for real… Instead, emulate execution Emulation can provide all of the info
gotten thru code executiono But much safer
“Execute” code in emulatoro Gather info for static/dynamic signatures or
heuristicso Behavior blocker stuff applies too
Emulation Emulation and polymorphic
detectiono Let virus decrypt itselfo Then use ordinary signature scan
When has decryption occurred?o Use some heuristics…o Execution of code that was modified
(decrypted) or in such a memory location
o More than N bytes of modified code, etc.
Emulator Anatomy Emulate by single-stepping thru
code?o Easily detected by viruses (???)o Danger of virus “escaping” emulator
“A more elaborate emulation mechanism is needed”o Why?
Conceptually, 5 parts to an emulatoro Next slide please…
Emulator Anatomy 5 parts to new-and-improved
emulator1. CPU emulation --- nothing more to
say2. Memory emulation3. Hardware and OS emulation4. Emulation controller5. Extra analyses
Memory Emulation This could be difficult…
o 32-bit addressing, so 4G of “memory” Do we need to emulate all of this?
o No, most apps only uses small amount
Keep track of memory that’s modified and where it is locatedo Only need to deal with memory that
is modified by a specific app/virus
Hardware/OS Emulation Use stripped-down, fake OS, due to…
o Copyright issueso Sizeo Startup timeo Emulator needs additional monitoring
What about OS system calls?o Return faked/fixed valueso Don’t faithfully emulate some low-level
stuff
Emulation Controller When does emulation stop?
o Can’t expect to run code to completion…
Use heuristics to decide when to stopo Number of instructions?o Amount of time?o Threshold on percent of instructions
that modify memory?o “Stoppers”? E.g., assume virus
wouldn’t write output before being malicious
Emulator: Extra Analyses Post-emulation analysis For example, look at histogram of
instructionso Does it match typical polymorphic?o Does it match a metamorphic family?
Other examples of post-emulation analysis???
If at First You Don’t Succeed Emulation controller may re-invoke
emulator for the following reasonso Rerun with different CPU parameterso Test interrupt handlerso Test multiple possible entry pointso Test for self-replication on “goat” fileso Test untaken branches in codeo Test “unused” memory locations
Emulator Optimizations Improve performance, reduce size
and/or complexityo Use the real file system (with caution)o “Data” files must be checked for
malware, use lots of stopperso Cache state --- if match is found to
previous (non-virus) run, goto next file Cache register values, size, stack pointer
and contents, number of writes, checksums, etc.
Comparison of Techniques Recall, the techniques
considered…1. Scanning2. Static heuristics3. Integrity check4. Behavior blocker5. Emulation
Comparison of Techniques Scanning Pros:
o Precise ID of malware Cons:
o Requires up-to-date signatureso Cannot detect new/unknown malware
Comparison of Techniques Static heuristics Pros:
o Detect known and unknown malware Cons:
o Detected malware not identifiedo False positives
Comparison of Techniques Integrity check Pros:
o Can be efficient and fasto Detect known and unknown malware
Cons:o Detected after infection & not
identifiedo Can’t detect in new/modified fileo Heavy burden on users/admins
Comparison of Techniques Behavior blocker Pros:
o Known and unknown malware detected
Cons:o Probably won’t identify malwareo High overheado False positiveso Malware runs on system before
detected
Comparison of Techniques Emulation Pros:
o Known, unknown, polymorphic detection
o Malware executed in “safe” environment
Cons:o Slowo Malware might outlast emulatoro Might not provide identification
Detection: Bottom Line Static analysis is fast
o Good approach when it works Dynamic analysis can “peel away a
layer of obfuscation”o Dynamic analysis is relatively costly
Verification, Quarantine, Disinfect
What to do after virus detected?1. Verify that it really is a virus2. Quarantine infected code3. Disinfect --- remove infection
These are done rarely, so can be slow and costly in comparison to detection
Verification After detection comes verification Why verify?
o Secondary test needed due to short, general signature, or…
o …no signature, due to detection method
Behavior, heuristic, emulation, etc.o Do not usually provide identification
Writer might try to make virus look like some other virus
Verification How to verify? “X-ray” the virus If encrypted, decrypt it, or
frequency analysis might sufficeo Like simple substitution cipher
Extract info/stats, etc.
Verification After x-ray analysis…
o Longer virus-specific signatureso Checksum all or part of viruso Call special-purpose verification code
Note that these probably won’t work on (good) metamorphic code
Quarantine Isolate detected virus from system
o Then ask user if it’s OK to disinfecto Or do further analysis of virus
How to quarantine virus?o Copy to a “quarantine” directory?o Hide it in “invisible” location?o Encrypt it?
Disinfect Disinfect == remove infection Not always possible to return file to
it’s original stateo E.g., file might have been overwritten
Disinfection methods… Delete the infected file
o Pros and cons?
Disinfect Disinfection methods… Restore files from backup
o Pros and cons? Use virus-specific info
o Info may be found automatically --- compare infected files with uninfected
o E.g., appended virus, changes start address, appends itself to file, etc.
o Like a chosen plaintext attack
Disinfect Disinfection methods… Use virus-behavior specific info
o E.g., prepended virus changes header Save some info about files
o Headers info, for exampleo Then changed parts can be restoredo Integrates well with integrity checkero Restore parts until checksum
matches…
Disinfect Disinfection methods… Use the virus to disinfect
o Stealth virus may give original code Generic disinfection
o Virus may restore code when executed
o Might be dangerous to run virus code…
o …emulation is a better strategy, maybe even disinfect as part of detection
Virus Databases What to put in a virus database?
o Name of virus?o Characteristics of virus?o Signatures?o Encrypted/hashed signatures?o Disinfection info?o Other info?
Virus Databases How to update
database/signatures?o Push or pull?o Automatic or manual?o How often to update?o How to distribute updates?o Distribute entire database or deltas?
Also must be able to update AV software
Virus Updates Update process is a BIG target
o AV’s machines that distribute updateso Insider attack at AV siteo Trick user to getting “AV” from
attackero Man-in-the-middle attack on
communications between user/AV
Virus Description Languages
AV vendors have specialized virus description languages
2 examples given in the book
Short Subjects A few quick points… Anti-stealth techniques Macro viruses Compiler optimizations and
detection
Anti-Stealth Techniques Recall, stealth viruses hide
presence Anti-stealth as part of AV?
o Detect and disable stealth --- check that OS calls go to right place
o Bypass usual OS features --- direct calls to BIOS, for example
Macro Virus Detection Macro viruses tricky to detect
o Macros are in source codeo Easy to change sourceo Robust execution when errors occur
So, any changes can create new virus
AV might create a new viruso Eg, incomplete disinfection
Macro virus can infect other macros
Macro Viruses One redeeming feature… They operate in restricted domain
o So easier to determine “normal”o Reduces number of false positives
Most/all are not parasitico More like companion viruses
All the usual detection techniques can be applied
Macro Viruses: Disinfection Delete all macros in infected
document Delete all associated macros Delete macro if in doubt (heuristic) Emulation to find all macros used
by infected macro, and delete them Basic idea?
o Err on side of caution/deletion Macro viruses not so common today
Compiler Optimization Compilers use similar techniques as
AV “Optimizing compiler” for
detection??o Constant propagation – reduces
variableso Dead code (executed, but not needed)o Polymorphics may have lots of dead
code If used, efficiency could be an issue
o Compilers extensively studiedo Bad cases well-known, so virus writers
might take advantage of these