Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

24
Fourteenforty Research Institute, Inc. FFRI, Inc. http://www.ffri.jp Improving accuracy of malware detection by filtering evaluation dataset based on its similarity Junichi Murakami Director of Advanced Development

description

In recent years, it has been getting more difficult to detect malware by a traditional method like pattern matching because of the improvement of malware. Therefore machine learning-based detection has been introduced and reported that it has achieved a high detection rate compared to a traditional method in various research. However, it is well-known that the accuracy of detection significantly degrades against data that differ from training dataset. This study provides a method to improve accuracy of detection by filtering evaluation dataset based on similarity between evaluation and training dataset.

Transcript of Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

Page 1: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

Fourteenforty Research Institute, Inc.

FFRI, Inc. http://www.ffri.jp

Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

Junichi Murakami Director of Advanced Development

Page 2: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

• This slides was used for a presentation at CSS2013

– http://www.iwsec.org/css/2013/english/index.html

• Please refer the original paper for the detail data

– http://www.ffri.jp/assets/files/research/research_papers/MWS2013_paper.pdf (Written in Japanese but the figures are common)

• Contact information

[email protected]

– @FFRI_Research (twitter)

Preface

2

Page 3: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

• Background

• Problem

• Scope and purpose

• Experiment 1

• Experiment 2

• Experiment 3

• Consideration

• Conclusion

Agenda

3

Page 4: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

Background – malware and its detection

4

Increasing

malware

Targeted Attack

(Unknown malware)

Malware generators

Obfuscators

Limitation of

signature matching

other methods

Heuristic

Could reputation

Machine learning Bigdata

Page 5: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

Background – Related works

5

Features

Static information

Dynamic information

Hybrid

Algorithms

SVM

Naive bayes

Perceptron, etc.

Evaluation

TPR/FRP, etc.

ROC-curve, etc.

Accuracy, Precision

• Mainly focusing on a combination of the factors below

– Features selection and modification, parameter settings

• Some good results are reported (TRP:90%+, FRP:1%-)

Page 6: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

• General theory of machine learning:

– Accuracy of classification declines if trends of training and testing data are different

• How about malware and benign files

Problem

6

? ?

Page 7: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

① Investigating differences between similarities of malware and benign files(Experiment-1)

② Investigating an effect for accuracy of classification by the difference(Experiment-2)

③Based on the result above, confirming an effect of removing data whose similarity with a training data is low (Experiment-3)

Scope and purpose

7

Page 8: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

• Used FFRI Dataset 2013 and benign files we collected as datasets

• Calculated the similarity of each malware and benign files (Jubatus, MinHash)

• Feature vector: A number of 4-gram of sequential API calls

– ex: NtCreateFile_NtWriteFile_NtWriteFile_NtClose: n times NtSetInformationFile_NtClose_NtClose_NtOpenMutext: m times

Experiment-1(1/3)

8

malware

benign A B C ...

A

B

C

...

A B C ...

A ー 0.8 0.52 ...

B ー ー 1.0 ...

C ー ー ー ...

... ー ー ー ー

Page 9: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

Grouping malware and benign files based on their similarities

Experiment-1(2/3)

9

Threshold of similarity (0.0 - 1.0) benign

malware

Page 10: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

Experiment-1(3/3)

10

0%

20%

40%

60%

80%

100%

正常

マル

ウェ

正常

マル

ウェ

正常

マル

ウェ

正常

マル

ウェ

正常

マル

ウェ

0.8 0.85 0.9 0.95 1

仲間無

仲間有

Threshold of similarity

It is more difficult to find similar benign files compared to malware

malw

are

malw

are

malw

are

malw

are

malw

are

benig

n

benig

n

benig

n

benig

n

benig

n

unique

not unique

Page 11: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

• How much does the difference affect a result?

• 50% of malware/benign are assigned to a training, the others are to a testing dataset(Jubatus, AROW)

Experiment-2(1/3)

11

benign

malware

train

jubatus

classify

jubatus TPR: ?

FPR: ?

TPR: True Positive Rate FPR: False Positive Rate

train

testi

ng

Page 12: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

Experiment-2(2/3)

12

benign

malware

train

jubatus

classify

jubatus TPR: ?

FPR: ?

train

testi

ng

• How much does the difference affect a result?

• 50% of malware/benign are assigned to a training, the others are to a testing dataset(Jubatus, AROW)

Page 13: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

The accuracy declines if trends of training and testing data are different

Experiment-2(3/3)

13

0 50 100 0 1 2 3 4 5

■TPR ■FPR

97.996(not unique)

81.297(unique)

0.624(not unique)

4.49(unique)

-16.699

+3.866

% %

Page 14: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

14

benign(train) malware(train)

benign(test) malware(test)

dividing line

Experiment-3(1/6) – After a training

malware

benign

Page 15: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

Experiment-3(2/6) – After a classification

15

benign(train) malware(train)

benign(test) malware(test)

dividing line

Page 16: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

16

FP

FN

Experiment-3(2/6) – After a classification

benign(train) malware(train)

benign(test) malware(test)

dividing line

Page 17: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

Experiment-3(3/6) – Low similarity data

17

TP(accidentally)

FN

FN

benign(train) malware(train)

benign(test) malware(test)

dividing line

Page 18: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

Experiment-3(4/6) – Effect to TPR

18

0.88

0.90

0.92

0.94

0.96

0.98

1.00

0

200

400

600

800

1000

1200

1400

0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

TP

FN

TPR

Threshold of similarity

The n

um

ber

of cla

ssifie

d d

ata

Page 19: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

Experiment-3(5/6) – Effect to FPR

19

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014

0

500

1000

1500

2000

2500

0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

TN

FP

FPR

The n

um

ber

of cla

ssi

fied d

ata

Threshold of similarity

Page 20: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

Experiment-3(6/6)

20

0%

20%

40%

60%

80%

100%

120%

0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

マルウェア 正常系ソフトウェア

Threshold of similarity

The n

um

ber

of cla

ssifie

d d

ata

/ The n

um

ber

of to

tal te

stin

g d

ata

Transition of the number of classified data

malware benign

Page 21: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

• In real scenario:

– trying to classify an unknown file/process whether it is benign files or not

• If we apply Experiment-3:

– Files are classified only if similar data is already trained

– If not, files are not classified which results in

• FN if the files is malware

• TF if the files is benign (All right as a result)

• Therefore it is a problem about “TPR for unique malware” (Unique malware is likely to be undetectable)

Consideration(1/3)

21

Page 22: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

• If malware have many variants as the current

– ML-based detection works well

• Having many variants ∝ malware generators/obfuscators

• We have to investigate

– Trends of usage of the tools above

– Possibility of anti-machine learning detection

Consideration(2/3)

22

Page 23: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

• How to deal with unclassified (filtered) data

1. Using other feature vectors

2. Enlarging a training dataset (Unique → Not unique)

3. Using other methods besides ML

Consideration(3/3)

23

Page 24: Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

• Distribution of similarity for malware and benign are difference (Experiment-1)

• Accuracy declines if trends of training and testing data are different (Experiment-2)

• TPR of unique malware declines when we remove low similarity data (Experiment-3)

• Continual investigation for trends of malware and related tools are required

• (Might be necessary to develop technology to determine benign files)

Conclusion

24