Selected Topics in Automated Diversity
description
Transcript of Selected Topics in Automated Diversity
Carnegie Mellon
Selected Topics in Automated Diversity
Stephanie ForrestUniversity of New Mexico
Mike Reiter Dawn SongCarnegie Mellon University
Carnegie Mellon
Automated Diversity for Security Computer systems are highly uniform
Easy targets for standardized attacks. Use idea of biological diversity:
Introduce changes that make each system unique Attack will need to be rewritten for each computer Provide population resilience to unknown environmental threats
Two approaches: Interface diversity: Adapt vulnerable interfaces such as machine
language, system call numbers, and standard library locations. Implementation diversity: Utilize diverse implementations of
common services Two projects:
Randomized instruction set emulation [Barrantes, Ackley and Forrest] Behavioral distance for anomaly detection [Gao, Reiter and Song]
Carnegie Mellon
Randomized Instruction Set Emulation (RISE)An example of interface diversity Many current attacks insert binary code into a running program which is then executed. RISE protects the code itself, rather than points-of-entry:
Perimeter defense (e.g., stack protection) not enough. Randomize binary code instruction set for every program:
Foreign malicious code will try to execute code in the standard format and will fail. Knowledge of a particular translation will gain access only to that particular program.
Modify compiler/virtual machine to accept this “new” language: Prototype in open-source binary-to-binary translator Valgrind. Related to encrypting compilers.
Carnegie Mellon
How does foreign code infect a running program?
Carnegie Mellon
Carnegie Mellon
Carnegie Mellon
Carnegie Mellon
Results
Prototype implementation available under GPL from http://www.cs.unm.edu/~immsec:
Normal code runs properly. Binary code injection attacks stopped (100% of tested examples).
Performance (preliminary): Emulation overhead of Valgrind is high. Incremental cost of RISE is small. (Very) roughly a factor of 2 slowdown in current configuration. Significant space penalty:
Libraries Mask
Carnegie Mellon
Carnegie Mellon
Host-Based Anomaly Detector
User SpaceKernel Space
Is this system call request anomalous?
Model3 5 11
Anomalous?(Y/N)
Can we use another computer as the model?
Carnegie Mellon
Fault-Tolerant System
Commercial Off-the-shelf applications: may not produce the same responses Intrusions that do not result in observable deviation in the responses Need to observe the behavior
Request
Output
Voting
Response
Response
Response
Carnegie Mellon
The Problem
3 43 5 3 4
9 6 302 10 46 6 222
Match?
Diverse Platform (Linux and Windows) System call numbers observed do not have semantic meanings System calls may not have one-to-one correspondence System call sequences may have different length
Diverse Implementation (Apache and Abyss) Correspondence may not exist between individual system calls
Carnegie Mellon
Evolutionary Distance Are two DNA sequences derived from a common ancestral
sequence? Evolutionary distance between two DNA sequences
Substitutions Deletions Insertions
ATGCGTCGTTATCCGCGAT
ATGC-GTCGTTAT-CCG-CGAT
A C G T -
A 0 - - - -
C 0.3 0 - - -
G 0.1 0.1 0 - -
T 0.2 0.2 0.1 0 -
- 0.3 0.6 0.5 0.8 01.22.08.06.05.0 Dist
Insertion/Deletion (I/D) Symbols
Carnegie Mellon
Behavioral Distance and Evolutionary Distance
Similarities Evaluate difference between two sequences Substitutions, Deletions and Insertions
Differences Same system call number in two sequences are not the “same” We do not have the cost table in behavioral distance measure We have training data
Carnegie Mellon
Behavioral Distance Behavioral distance calculation Learning the cost table
Initializing the cost table Iteratively updating the cost table
System call phrase extraction
Carnegie Mellon
Behavioral Distance Calculation
,,
,,
2,21,22
2,11,11
sss
sss
),( nsExt The set of sequences obtained by inserting n-len(s) I/D symbols into s, at any location
)','(),( ,21
,1','
21 min21
i
n
ii
ss
sscostssDist
),('),(' 2211 nsExtsnsExts
ATGCGTCGTTATCCGCGAT
ATGC-GTCGTTAT-CCG-CGAT
Carnegie Mellon
Learning the Cost Table Training data: subjecting the replicas to a battery of well-formed
(benign) requests and observing the system calls induced Initializing the cost table
The first approach: comparing semantics of individual system calls The second approach: using frequency information
Iteratively updating the cost table Use the initialized cost table to calculate behavioral distance between
system call sequences in the training data Results of the behavioral distance reveal the “proper alignments”
between system calls Use these “proper alignments” to update the cost table
Carnegie Mellon
System call Phrases Correspondence may not exist between individual system calls Behavioral distance calculation is very slow when sequences are
long Solution: group system calls into system call phrases
System call phrases are also called system call subsequences A system call phrase is a sequence of system calls that frequently
appear together in program execution TEIRESIAS algorithm (also taken from Biology) TEIRESIAS algorithm has been used in other intrusion/anomaly
detection systems
Carnegie Mellon
Evaluation – Experimental Setup
Linux
Windows XP
· Duplicate Request
· Behavioral Distance
Calculation
· Output Voting
Carnegie Mellon
Behavioral Distance – Same Application
Apache Webserver Myserver Webserver
Carnegie Mellon
Behavioral Distance – Different Application
Linux: Apache WebserverWindows: Myserver Webserver
Linux: Myserver WebserverWindows: Apache Webserver
Carnegie Mellon
Behavioral Distance – Mimicry Attacks
Server on Linux Apache Myserver Myserver Apache
Server on Windows Apache Myserver Apache Myserver
Mimicry on Linux
10.28319499.9093%
26.656983100%
6.90859099.4555%
32.764897100%
Mimicry on Windows
6.84281399.4555%
9.96778099.4555%
13.354194100%
5.28087599.4555%
Mimicry on Linux
3.73698.9111%
13.657100%
2.73198.9111%
13.813100%
Mimicry on Windows
2.6598.7296%
2.17498.0944%
2.18798.9111%
2.6497.8221%
Attacker knows individual IDS on one replica
Attack knows behavioral distance and the cost table
Behavioral distance of the best mimicry attack
True acceptance rate when threshold is set to detect the best mimicry attack
Carnegie Mellon
Performance Overhead
Carnegie Mellon
Conclusion Behavioral distance detects an attack on one process that
causes its behavior to deviate from that of another Behavioral distance makes evasion attacks more difficult with
moderate overhead