Post on 13-Jan-2016
description
Detection of ASCII Malware
Parbati Kumar Manna
Dr. Sanjay Ranka
Dr. Shigang Chen
2
Internet Worm and Malware
• Huge damage potential Infects hundreds of thousands of
computers Costs millions of dollars in damage Melissa, ILOVEYOU, Code Red,
Nimda, Slammer, SoBig, MyDoom
• Mostly uses Buffer Overflow
• Propagation is automatic (mostly)
3
Recent Trends
• Shift in hacker’s mindset
• Malware becoming increasingly evasive and obfuscative
• Emergence of Zero-day worms
• Arrival of Script Kiddies
4
Motivation for ASCII Attacks
• Prevalence of servers expecting text-only input
• Text-based protocols
• Presumption of text being benign
• Deployment of ASCII filter for bypassing text
5
IDS Detecting ASCII Attack?
• Disassembly-based IDS
All jump instructions are ASCII
Higher proportion of branches
Exponential disassembly cost
High processing overhead for IDS
• Frequency-based IDS
PAYL evaded by ASCII worm
6
Buffer Overflow
7
• Opcode Unavailability Shellcode requires binary opcodes Here only xor, and, sub, cmp etc. Must generate opcodes dynamically
• Difficulty in Encryption No backward jump Can’t use same decrypter routine
for each encrypted block No one-to-one correspondence
between ASCII and binary
Constraints of ASCII Malware
0 m a y v a r y
ASCII binary
8
Creation of ASCII Malware
9
Buffer Overflow using ASCII
Overflowing a buffer using an ASCII string:
10
• Opcode Unavailability Dynamic generation of opcodes
needs more ASCII instructions for each binary instruction
• Difficulty in Encryption No backward jump means
decrypter block for each encrypted block must be hardcoded
Long sequence of contiguous valid instructions likely high MEL
Detection of ASCII Malware
What is this MEL?
11
• Indicates maximum length of an execution path
Need to disassemble (and execute) from all possible entry points
All branching must be considered• Abstract payload execution
Used for binary worms with sled Effectiveness dwindled presently
Maximum Executable Length
12
Benign Text has Low MEL
• Contains characters that correspond to invalid instructions
Privileged Instruction (I/O) Arbitrary Segment Selector More Memory-accessing
instructions – may use uninitialized registers
Long sequence of contiguous valid instructions unlikely low MEL
13
Proposed Solution
Question:
• How long is “long”?
• Find out the maximum length of valid instruction sequence
• If it is long enough, the stream contains a malware
14
• Toss a coin n times
• What is the probability that the max distance between two consecutive heads is ?
Probabilistic Analysis
Head (H) Invalid Instruction (I)
Tail (T) Valid Instruction (v)
T H T T H T T T T T H T T TV I V V I V V V V V I V V V
15
Probabilistic Analysis
n = number of coin tosses p = probability of a head Xi = R.V.s for inter-head
distancesXmax = Max inter-head distanceC.D.F of Xmax = Prob [Xmax ≤ x]
= [1 – p(1-p)x ]n
F.P. rate = 1 - Prob [Xmax ≤ τ] = 1 - [1 – p(1-p)τ ]n
16
Probabilistic Analysis
For a fixed N = k (exactly k invalid instructions)
17
Probabilistic Analysis
For all possible values of N:
18
Threshold Calculation
n , p , (false positive rate)
(max inter-head distance)
Known
Unknown
)1log(
log))1(1log(1
p
pn
Threshold
19
Independence Assumption
2 test contingency table
Observed Expected
I2 is valid
I2 is invali
d
I1 is valid
I2 is invalid
I1 is valid 8960 2797 8922 2835
I1 is invalid 2797 938 2835 900
• Validity of an instruction is an independent event
• All the Xi’s are independent (while Xi = n)
20
Threshold Calculation
With increasing n, we must choose a larger to keep the same rate of false positive
21
Threshold Calculation
With decreasing p, we must choose a larger to keep the same rate of false positive
22
Determine n
size)n instructio (average
)charactersinput ofnumber (
I
Cn
E[I] = E[Prefix chain length] + E[core instruction length]
Obtained from character frequency of input data
23
1.Privileged instructions
2.Wrong Segment Prefix Selector
3.Un-initialized memory access
Determine p
Invalid Instructions
Only 1. and 2. can be determined on a standalone basis
24
Experimental Setup
25
Implementation
26
Experimental Setup
• Benign data setup ASCII stream captured from live CISE network
using Ethereal
• Malicious data setup Existing framework used to generate ASCII worm
by converting binary worms
• Promising experimental results for max valid instruction length Benign: all max values all below threshold Malicious: values significantly higher than
27
Experimental Results (DAWN)
28
Experimental Results (APE-L)
29
Contrasting with APE
• Full content examination
• Threshold calculation
• Sled Vs. malware
• Exploiting text-specific properties
30
Multilevel Encryption
Encryption
Decryption
binary ASCII ASCII
ASCII ASCII binary
Only Visible decrypter
31
Multilevel Encryption
Text0x20 – 0x3F
Text0x40 – 0x5F
Text0x60 – 0x7E
Binary
Binary
32
Questions
33
Thank you