Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp...

29
Pairwise Alignment of Metamorphic Computer Viruses Student: Scott McGhee Advisor: Dr. Mark Stamp Committee: Dr. David Taylor Dr. Teng Moh
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp...

Page 1: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Pairwise Alignment of Metamorphic Computer Viruses

Student: Scott McGheeAdvisor: Dr. Mark StampCommittee: Dr. David Taylor

Dr. Teng Moh

Page 2: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Agenda

Introduction– Virus Obfuscation Techniques– Existing Virus Detection Methods– Experimental Detection Using Hidden Markov Models– Proposed Approach Using Profile Hidden Markov Models– Op-code Sequences– Example Multiple Alignment– Pairwise Alignment

Page 3: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Agenda (cont’d)

Creating / Scoring Alignments– Substitution Scoring– Gap Penalties– Creating a Pairwise Alignment– Creating a Multiple Alignment– Feng-Doolittle Algorithm– Sequence Preprocessing

Case Studies Application Demo Conclusion

Page 4: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Introduction

Viruses are becoming increasingly more complicated It is becoming easier for amateur programmers to

create viruses using kits that are readily available online

Some viruses have the capability to change itself from one generation to the next making it difficult to detect

The goal is to explore a new approach to detecting these kinds viruses

Page 5: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Obfuscation Techniques

Encrypted Viruses– Static decryptor, and an encrypted virus body– Key changes from one generation to the next– Weakness is the decryptor never changes

Polymorphic Viruses– An encrypted virus with varying decryptors– Weakness is the virus body still never changes

Metamorphic Viruses– Virus body can change– Assembly morphing engine– Virus Generators

Page 6: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Existing Virus Detection Methods

Code Emulation– Simulated virtual environment– Retrieval of unencrypted form of the virus

Pattern Based Scanning– Detect patterns or signatures

Heuristic Analysis– Detect capabilities of an application

Page 7: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Experimental Approach: Using A Standard Hidden Markov Model

Introduced in a previous student’s Master’s Writing Project

Use a set of disassembled viruses in a particular family of viruses to train a hidden Markov model (HMM)

Use the HMM to score an arbitrary assembly Designate a threshold such that if the score is over

the threshold the assembly must have been a virus Promising results have been shown

Page 8: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Proposed Approach: Using a Profile Hidden Markov Model

Instead of using a standard HMM the proposal is to use a profile HMM

Profile HMMs will use position specific information within the sequence

A profile HMM is trained using a multiple alignment This project will concentrate on the problem of

creating multiple alignments for op-code sequences This approach is used in other fields which use

sequence analysis

Page 9: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Op-code Sequences

An application such as a virus can usually be decompiled into assembly

Represent a virus as a sequence of op-codes

The op-codes are parsed from the assembly Each op-code is given a representative

character

Page 10: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Example Multiple Alignment

F C D B A A E 0

C D B A A E A A

C D A B A E A A

C D A B A E A A

F C D B 1 A A E A

A B A E A A

C D A B A E A A

D B A A F A A

A F A B P A A E A

A B A A E A A

Page 11: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Example Multiple Alignment

F C D B - A A E 0 -

- C D B - A A E A A

- C D A - B A E A A

- C D A - B A E A A

F C D B 1 A A E - A

- A - B - A - E A A

- C D A - B A E A A

- - D B - A A F A A

A F A B P A A E - A

- A - B - A A E A A

Page 12: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Pairwise Alignment

A special case of a multiple alignment deals with only 2 sequences

A pairwise alignment can be viewed as substitutions and gap insertions

A B A A - - - A D D

A B C A B C D - - DSubstituteA with C

Insert gap size 3

Insert gap size 2

Page 13: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Creating / Scoring Alignments

Page 14: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Substitution Scoring

Each possible substitution can be assigned a score and placed into a substitution matrix

Ideally the scores should be statistically correlated to the probability that the substitution would take place

Without a comprehensive statistics on substitutions of op-codes in real viruses, these values can be guessed

A simple example is given here

A B C D

A 10 -5 -5 -5

B -5 10 -5 -5

C -5 -5 10 -5

D -5 -5 -5 10

Page 15: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Gap Penalties

When inserting a gap, the score will be penalized The penalty is usually a function of the length of the

gap Common gap penalties include

– Linear Gap Penalty, each gap has the same cost– Affine Gap Penalty, opening a gap is more expensive than

extending a gap

The overall score of a pairwise alignment will be the sum total of substitution scores and gap penalties

Page 16: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Creating a Pairwise Alignment

Use Dynamic Programming optimum(X1…m ,Y1…n) = MAX

– optimum(X1…m-1 , Y1…n) + cost add 1 more gap to X

– optimum(X1…m , Y1…n-1) + cost add 1 more gap to Y

– optimum(X1…m , Y1…n) + substitution score of mth symbol in X with nth symbol of Y

Can compute the optimal alignment in time O(m*n) for sequences of size m and n

Page 17: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Creating a Multiple Alignment

Use a Progressive Alignment Choose 2 sequences to create a pairwise alignment

using dynamic programming Progressively add sequences to this alignment

– Choose a sequence in the alignment, and one not in the alignment

– Create a pairwise alignment– Update the other sequences in the alignment with any new

gaps that were inserted, add the new aligned sequence to the overall alignment

Page 18: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Feng-Doolittle Algorithm

How do you choose the order in which you add the sequences to the MSA?

If given a set of n sequences, pre-compute alignment scores between each possible pair of sequences (n choose 2 pairs)

Data can be represented as a distance matrix of a fully connected graph of size n

Compute a minimum spanning tree, to minimize the cost (or maximize the score)

Page 19: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Feng-Doolittle Algorithm (cont’d)

Start with the alignment with the high scoring alignment and follow the tree

10

1

2

3

4

5

6

7

8

9

1 2 3 4 5 6 7 8 9 10

1 --- 85 63 74 70 84 61 57 62 70

2 85 --- 79 73 66 59 94 61 59 51

3 63 79 --- 75 68 60 55 85 52 65

4 74 73 75 --- 105 54 60 78 59 53

5 70 66 68 105 --- 40 61 79 58 39

6 84 59 60 54 40 --- 68 45 75 78

7 61 94 55 60 61 68 --- 64 72 42

8 57 61 85 78 79 45 64 --- 50 70

9 62 59 52 59 58 75 72 50 --- 81

10 70 51 65 53 39 78 42 70 81 ---

Page 20: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Feng-Doolittle Algorithm (cont’d)

MSA Before New Alignment5) CDABBAFCDB1AAEAA+CEDA+EQ+CDABABABALF4LBBAFBSBAAAAA4) 2AABBAFCDABA+EAABCEDCDEQFCDABA+APALF4+BBA++SBAAAAA8) ++AABA+CDB+AAEAA+CEDCDEQ+CDABPBA+ABF4+BBAFBSBMAAAA3) A+ABBAFCDABA+EAA+CEDCDEQA++ABFBAN++F4+BBAFBTYBAAAA

New Alignment2) A-ABNBAFCD-BAAEAABCEDA-EQ-CDABAB--BAF4NBBM-BTYBAAAA3) A+AB-BAFCDABA+EAA+CEDCDEQA++ABFBAN++F4+BBAFBTYBAAAA ^ (gap introduced)

MSA After New Alignment5) CDAB+BAFCDB1AAEAA+CEDA+EQ+CDABABABALF4LBBAFBSBAAAAA4) 2AAB+BAFCDABA+EAABCEDCDEQFCDABA+APALF4+BBA++SBAAAAA8) ++AA+BA+CDB+AAEAA+CEDCDEQ+CDABPBA+ABF4+BBAFBSBMAAAA3) A+AB+BAFCDABA+EAA+CEDCDEQA++ABFBAN++F4+BBAFBTYBAAAA2) A+ABNBAFCD+BAAEAABCEDA+EQ+CDABAB++BAF4NBBM+BTYBAAAA ^ (gap matched)

Page 21: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Sequence Preprocessing

Some metamorphic viruses will permute subroutines

Permuted sequences will not align well Removing the permutations in each of the

sequences will produce the best alignment Using subroutine matching, a permutation

can be found which will maximize the scores

Page 22: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Case Studies

Page 23: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Selected Viruses

Next Generation Virus Creation Kit (NGVCK)– Advanced assembly morphing engine– Junk code insertion– Function reordering

Virus Creation Lab Win 32 (VCL32)– No function reordering

Phalcon/Skism Mass-Produced Code Generator (PS-MPC)

– No function reordering

Page 24: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

NGVCK Results

0

20

40

60

80

100

120

140

-6 -4 -2 0 2 4 6 8

Score

# o

f A

lig

nm

en

ts

Raw Sequences

Preprocessed

0%

10%

20%

30%

40%

50%

60%

70%

80%

10 30 50 70 90 110 130 150 170

Group Size

Co

nserv

ati

on

%

Raw Sequences

Preprocessed

Raw NGVCK viruses did not align well

Preprocessing was required in order to create usable alignments

Profile HMM was able to detect viruses with a 6.8% false-positive rate and 1% false-negative rate

Page 25: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

VCL32 and PS-MPC

The raw viruses both aligned well and did not require preprocessing

VCL32 aligned the best The Profile HMM was

able to detect both viruses with 0% false-positive and false-negative rates

0

5

10

-5 0 5 10 15 20

Score

# o

f A

lig

nm

en

ts

VCL32

PS-MPC

NGVCK(Preprocessed)

0%

20%

40%

60%

80%

100%

3 8 13 18 23 28

Group Size

Co

nserv

ati

on

%

VCL32

PS-MPC

NGVCK(Preprocessed)

Page 26: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Visual Representation of Multiple Alignments Created

Raw NGVCK, groups of 20 Preprocessed NGVCK, groups of 20

PS-MPC, groups of 15

VCL32, group of 10

Page 27: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Application Demo

Page 28: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Conclusion

The profile HMM works well on metamorphic viruses which do not permute subroutines

Future research is needed in order to fully understand the affects of preprocessing on the profile HMM

Page 29: Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Thank you

Email questions to [email protected]