Malware Detection - A Machine Learning Perspective

20
Malware Detection - A Machine Learning Perspective C.K.Chen 2014.06.05

description

Summary of some research papers about machine learning applied in malware detection

Transcript of Malware Detection - A Machine Learning Perspective

Page 1: Malware Detection - A Machine Learning Perspective

Malware Detection -A Machine Learning Perspective

C.K.Chen

2014.06.05

Page 2: Malware Detection - A Machine Learning Perspective

Outline

• A Large Wave of Malware Is Coming

• Is Machine Learning the Savior• You Can't Make Something out of Nothing

• A Garbage In, Garbage Out Game?

• Model, Model, It’s All About The Model

• Every Evaluation in Every Paper is ‘Perfect’

• Democracy World in Machine Learning

• WYSIWYG

• Known Where Your Enemy Is

Page 3: Malware Detection - A Machine Learning Perspective

A Large Wave of Malware Is Coming

• There are million malware created every year

McAfee Labs Threat Report in Fourth Quarter 2013

Page 4: Malware Detection - A Machine Learning Perspective

Your Anti-Virus Will Not Tell You

• Although theoverall detectionlooks well

Page 5: Malware Detection - A Machine Learning Perspective

Attack Windows in AntiVirus

Anti-Virus Lifecycle

• Attack Windows

Malware Life Cycle

Page 6: Malware Detection - A Machine Learning Perspective

Is Machine Learning the Savior

• Problem is that • Signature generation is mutual work and time

comsuming

• Most malware is not brand new one, but modify or rewrite from old one• Automatic malware creation tool chain

• Mutation Technique

• May leave some clue for us

• Machine learning shed a light to aromatic construct model and detect malware

Page 7: Malware Detection - A Machine Learning Perspective

How Machine Learning Work?

• Training• Feature Extraction -> Learning Algorithms -> Generate Classfier

• Testing• Feature Extraction -> Classifier -> Classifier Result

Page 8: Malware Detection - A Machine Learning Perspective

Catalogs of Machine Learning Approaches

• Catalog by Representation/Feature Selection/Classification Algorithms

Page 9: Malware Detection - A Machine Learning Perspective

You Can't Make Something out of Nothing

• Data Set is the first step for ML• No data, ML can do nothing

• Where to collect samples• Web, Honet Pot, User Upload

• Balanced vs. Imbalanced data

Page 10: Malware Detection - A Machine Learning Perspective

A Garbage In, Garbage Out Game

• There are so many features can be choose• The quality of feature decide the precision of machine

learning

• Feature• Static / Dynamic / PE Structure

• N-gram

• Feature Selection is needed• ReliefF

• Chi-squared

• F-Statistics

Page 11: Malware Detection - A Machine Learning Perspective
Page 12: Malware Detection - A Machine Learning Perspective

Model, Model, It’s All About The Model

• Most important part• You need to choose the model which can interpreter

your data more closefitting

• How to choose modelNumerical Data

Classical Classifier (SVM)

Catalog Data Dummy Variable

Decision Tree

Sequence Data N-gram Algorithms

Bayes, Markov Chain

Page 13: Malware Detection - A Machine Learning Perspective

Every Evaluation in Every Paper is ‘Perfect’

• Unlike other research area, malware detection has no standard benchmark• Malware created every day

• Privacy wealthy

• Also no guideline for evaluation

• Therefore, some researchers observe this problem and do a great survey• Provide some rule to rvaluate

Page 14: Malware Detection - A Machine Learning Perspective
Page 15: Malware Detection - A Machine Learning Perspective
Page 16: Malware Detection - A Machine Learning Perspective

Is Machine Learning the Savior

• Machine learning can help us to recognize similar and variant malware

• It can not identify brand new malware

• Machine learning based detector need carefully training and long time for tuning

Page 17: Malware Detection - A Machine Learning Perspective

Democracy World in Machine Learning

• There are many type of classifier• SVM, Decision Tree, Neural Network, ….

• Voting to increasing precision

Page 18: Malware Detection - A Machine Learning Perspective

WYSIWYG

Page 19: Malware Detection - A Machine Learning Perspective

Known Where Your Enemy Is

• In security field, bad guy always try to break your system• Causative game

• Attacker poisons data

• Defender trains ML on poisoned data

• Exploratory game• Defender trains on clean data

• Attacker evades learned classifier/detector

Page 20: Malware Detection - A Machine Learning Perspective

Reference

1. McAfee Labs Threat Report in Fourth Quarter 2013

2. http://www.fireeye.com/blog/corporate/2014/05/ghost-hunting-with-anti-virus.html

3. AV alone is not enough to protect PC from zero-day malware

4. AV Isn't Dead, It Just Can't Keep Up

5. AV comparatives, File Detection Test of Malicious Software, 2014

6. G. Yan, N. Brown, and D. Kong, “Exploring Discriminatory Features for Automated Malware Classification,” DIMVA, 2013.

7. A. Shabtai, R. Moskovitch, Y. Elovici, and C. Glezer, “Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey,” Inf. Secur. Tech. Rep., 2009.

8. C. Rossow, C. J. Dietrich, C. Grier, C. Kreibich, V. Paxson, N. Pohlmann, H. Bos, and M. Van Steen, “Prudent Practices for Designing Malware Experiments: Status Quo and Outlook,” IEEE S&P, 2012.

9. D. Kong and G. Yan, “Discriminant malware distance learning on structural information for automated malware classification,” Proc. 19th ACM SIGKDD KDD ’13, 2013.