Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of...

12
Forensic Analysis of Toolkit- Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast Conference Gatlinburg, Tennessee November 12-13, 2009

Transcript of Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of...

Page 1: Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.

Forensic Analysis of Toolkit-Generated

Malicious ProgramsYasmine Kandissounon

TSYS School of Computer ScienceColumbus State University

2009 ACM Mid-Southeast ConferenceGatlinburg, Tennessee

November 12-13, 2009

Page 2: Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.

State of the Threat• (Jan – Jun 2009) Microsoft Security Intelligence Report :

– 115,854,807 infections in first half 2009– 94,985,967 infections in second half 2008 An increase of about 22%

• (2008) AVTest Labs – 15,000 to 20,000 new specimens analyzed each day.

(4 times as many as in 2006, 15 times as many as in 2005)

• (ESET ) Talented teams of programmers• Automated Malware Creation:

– W32.Evol, W32.Simile, W32.NGVCK, W32.VCL, etc.

Page 3: Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.

What Does the AV Industry Need?

• Automation– (Szor 2005) The need for analysis by humans is a major

bottleneck!• Ability to quickly and accurately detect new malware.

– (Team Cymru, 2008) 1000 new samples submitted, only 37% detected by commercial AV products!

• Badly needs “good” Generic Signatures – (Kaspersky Lab 2008) Windows Explorer was flagged as

malicious– AVIEN’s HARLEY (On average, current detection(using

generic signatures) rates are no better than 70%-80%)

Page 4: Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.

Our Problem: Engine Generated Malware

ENGINE

VIRUS SAMPLE

Variant1 Variant2Variant

nVariant3 Networ

k

Too many signatures challenge the detector

Malware detectorSignature Database (Virus Definitions)

In

Out

Page 5: Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.

Solution: Use Engine Signature

ENGINE

VIRUS SAMPLE

Variant1 Variant2Variant

nVariant3 Interne

t

Use one small piece of info about the engine to detect all of the variants.

Malware detectorEngine Signature

In

Out

Network

Page 6: Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.

MALWARE GENERATION AS A HIDDEN MARKOV MODEL

NOP

*

CALLJMP

MOV

Transition Matrix = Engine Signature(Choice of relevant instructions = 5 most frequent instructions)

NOP MOV PUSH CALL JMP *NOP 0.00 0.33 0.33 0.00 0.00 0.33MOV 0.21 0.29 0.14 0.21 0.00 0.07PUSH 0.00 0.60 0.40 0.00 0.00 0.00CALL 0.00 0.67 0.00 0.00 0.00 0.33JMP 0.00 0.50 0.00 0.00 0.50 0.00* 0.00 0.67 0.00 0.00 0.33 0.00

PUSH0.33

0.33

0.50

0.21MOVMOVJNZ

MOVMOVPUSHMOVNOPMOVNOPADDJMPJMPMOVMOVNOPPUSH

JZPUSHMOVCALLMOVCALLSUBMOVPUSHMOVCALLPOPMOVMOV

Transition matrix is n+1 by n+1 and represents the engine Problem: Find smallest n that will induce best accuracy

MOVMOV*

MOVMOVPUSHMOVNOPMOVNOP

*JMPJMPMOVMOVNOPPUSH

*PUSHMOVCALLMOVCALL

*MOVPUSHMOVCALLPOPMOVMOV

Take only the n most frequent instructions, for some n.

0.33

0.330.21

0.290.21

0.070.60

0.40

0.67

0.33

0.50

0.67

Page 7: Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.

Subjects and Preparation

• 100 malware samples of W32.Evol and W32.Simile (Metamorphic viruses)

• 100 malware samples generated by NGVCK

• 100 malware samples generated by VCL– Source: www.vx.netlux.org.

• 100 benign samples– Source: sourceforge.net ,

download.com, installation of Windows Vista.

Page 8: Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.

Classification Method

• For each sample– Identify a training subset of size 30– Compute the transition matrix for each trainer– Take the average of these.– This average is the engine signature for the sample.

• For each instance not used for training– Compute the transition matrix of the instance– Compute the Euclidian Distance between the instance and

each of the engine signatures generated in the above stage– The signature that is found to be closest to this instance’s

transition matrix is declared to be the instances’ family. If there are ties, choose one at random.

Page 9: Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.

Average Matrix Classifier (1st Order Markov Chain)

• Results:

RELEVANT INSTRUCTIONS MISCLASSIFICATIONS

20 5.33%

25 7.33%

10 8%

15 11%

Page 10: Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.

Conclusion and Further Work

• Conclusion – Good Accuracy (8% misclassifications)– Small Signature (11 by 11 matrix)– Fast Detection (12 min for 150 tests)

• Further Work– 2nd order– Work with more samples– Work with other families of malware– Different ways of choosing the relevant instructions– Try a different distance measure

Page 11: Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.

References

• http://www.microsoft.com/security/portal/Threat/SIR.aspx• http://www.washingtonpost.com/wp-dyn/content/article/

2008/03/19/AR2008031901439.html• http://packetstormsecurity.org/mag/40hex/40HEX-

10/40HEX-10.001J• http://www.research.ibm.com/antivirus/SciPapers/Tesauro/

NeuralNets.html. Last retrieved April 12, 2009• M.R. Chouchane. “Approximate Detection of Machine-

morphed Malicious Programs”. Ph.D. Dissertation. (2008)• Using Engine Signature to Detect Metamorphic Malware.

Chouchane and Lakhotia, WORM 2006.

Page 12: Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.

References

• Ivan Krsul and Eugene H. Spafford, Authorship Analysis: Identifying the Author of a Program. Computers & Security (1997)

• Peter Szor, The Art of Computer Virus Research and Defense. (Chapter 7) 2005

• Wing Wong and Mark Stamp, Hunting for Metamorphic Engines. J Comput Virol (2006)

• www.vx.netlux.org, last retrieved April 12, 2009