Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

Fourteenforty Research Institute, Inc.

FFRI, Inc. http://www.ffri.jp

Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

Junichi Murakami Director of Advanced Development

FFRI, Inc.

• This slides was used for a presentation at CSS2013

– http://www.iwsec.org/css/2013/english/index.html

• Please refer the original paper for the detail data

– http://www.ffri.jp/assets/files/research/research_papers/MWS2013_paper.pdf (Written in Japanese but the figures are common)

• Contact information

– [email protected]

– @FFRI_Research (twitter)

Preface

2

http://www.iwsec.org/css/2013/english/index.html

http://www.iwsec.org/css/2013/english/index.html

http://www.ffri.jp/assets/files/research/research_papers/MWS2013_paper.pdf



mailto:[email protected]



FFRI, Inc.

• Background

• Problem

• Scope and purpose

• Experiment 1

• Experiment 2

• Experiment 3

• Consideration

• Conclusion

Agenda

3

FFRI, Inc.

Background – malware and its detection

4

Increasing

malware

Targeted Attack

(Unknown malware)

Malware generators

Obfuscators

Limitation of

signature matching

other methods

Heuristic

Could reputation

Machine learning Bigdata

FFRI, Inc.

Background – Related works

5

Features

Static information

Dynamic information

Hybrid

Algorithms

SVM

Naive bayes

Perceptron, etc.

Evaluation

TPR/FRP, etc.

ROC-curve, etc.

Accuracy, Precision

• Mainly focusing on a combination of the factors below

– Features selection and modification, parameter settings

• Some good results are reported (TRP:90%+, FRP:1%-)

FFRI, Inc.

• General theory of machine learning:

– Accuracy of classification declines if trends of training and testing data are different

• How about malware and benign files

Problem

6

? ?

FFRI, Inc.

① Investigating differences between similarities of malware and benign files(Experiment-1)

② Investigating an effect for accuracy of classification by the difference(Experiment-2)

③Based on the result above, confirming an effect of removing data whose similarity with a training data is low (Experiment-3)

Scope and purpose

7

FFRI, Inc.

• Used FFRI Dataset 2013 and benign files we collected as datasets

• Calculated the similarity of each malware and benign files (Jubatus, MinHash)

• Feature vector: A number of 4-gram of sequential API calls

– ex: NtCreateFile_NtWriteFile_NtWriteFile_NtClose: n times NtSetInformationFile_NtClose_NtClose_NtOpenMutext: m times

Experiment-1(1/3)

8

malware

benign A B C ...

A

B

C

...

A B C ...

A ー 0.8 0.52 ...

B ーー 1.0 ...

C ーーー ...

... ーーーー

FFRI, Inc.

Grouping malware and benign files based on their similarities

Experiment-1(2/3)

9

Threshold of similarity (0.0 - 1.0) benign

malware

FFRI, Inc.

Experiment-1(3/3)

10

0%

20%

40%

60%

80%

100%

正常

系

マル

ウェ

ア

正常

系

マル

ウェ

ア

正常

系

マル

ウェ

ア

正常

系

マル

ウェ

ア

正常

系

マル

ウェ

ア

0.8 0.85 0.9 0.95 1

仲間無

仲間有

Threshold of similarity

It is more difficult to find similar benign files compared to malware

malw

are

malw

are

malw

are

malw

are

malw

are

benig

n

benig

n

benig

n

benig

n

benig

n

unique

not unique

FFRI, Inc.

• How much does the difference affect a result?

• 50% of malware/benign are assigned to a training, the others are to a testing dataset(Jubatus, AROW)

Experiment-2(1/3)

11

benign

malware

train

jubatus

classify

jubatus TPR: ?

FPR: ?

TPR: True Positive Rate FPR: False Positive Rate

train

testi

ng

FFRI, Inc.

Experiment-2(2/3)

12

benign

malware

train

jubatus

classify

jubatus TPR: ?

FPR: ?

train

testi

ng

• How much does the difference affect a result?

• 50% of malware/benign are assigned to a training, the others are to a testing dataset(Jubatus, AROW)

FFRI, Inc.

The accuracy declines if trends of training and testing data are different

Experiment-2(3/3)

13

0 50 100 0 1 2 3 4 5

■TPR ■FPR

97.996(not unique)

81.297(unique)

0.624(not unique)

4.49(unique)

-16.699

+3.866

% %

FFRI, Inc.

14

benign(train) malware(train)

benign(test) malware(test）

dividing line

Experiment-3(1/6) – After a training

malware

benign

FFRI, Inc.

Experiment-3(2/6) – After a classification

15



dividing line

FFRI, Inc.

16

FP

FN

Experiment-3(2/6) – After a classification



dividing line

FFRI, Inc.

Experiment-3(3/6) – Low similarity data

17

TP(accidentally)

FN

FN



dividing line

FFRI, Inc.

Experiment-3(4/6) – Effect to TPR

18

0.88

0.90

0.92

0.94

0.96

0.98

1.00

0

200

400

600

800

1000

1200

1400

0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

TP

FN

TPR


The n

um

ber

of cla

ssifie

d d

ata

FFRI, Inc.

Experiment-3(5/6) – Effect to FPR

19

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014

0

500

1000

1500

2000

2500

0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

TN

FP

FPR

The n

um

ber

of cla

ssi

fied d

ata


FFRI, Inc.

Experiment-3(6/6)

20

0%

20%

40%

60%

80%

100%

120%

0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

マルウェア正常系ソフトウェア


The n

um

ber

of cla

ssifie

d d

ata

/ The n

um

ber

of to

tal te

stin

g d

ata

Transition of the number of classified data

malware benign

FFRI, Inc.

• In real scenario:

– trying to classify an unknown file/process whether it is benign files or not

• If we apply Experiment-3:

– Files are classified only if similar data is already trained

– If not, files are not classified which results in

• FN if the files is malware

• TF if the files is benign (All right as a result)

• Therefore it is a problem about “TPR for unique malware” (Unique malware is likely to be undetectable)

Consideration(1/3)

21

FFRI, Inc.

• If malware have many variants as the current

– ML-based detection works well

• Having many variants ∝ malware generators/obfuscators

• We have to investigate

– Trends of usage of the tools above

– Possibility of anti-machine learning detection

Consideration(2/3)

22

FFRI, Inc.

• How to deal with unclassified (filtered) data

1. Using other feature vectors

2. Enlarging a training dataset (Unique → Not unique)

3. Using other methods besides ML

Consideration(3/3)

23

FFRI, Inc.

• Distribution of similarity for malware and benign are difference (Experiment-1)

• Accuracy declines if trends of training and testing data are different (Experiment-2)

• TPR of unique malware declines when we remove low similarity data (Experiment-3)

• Continual investigation for trends of malware and related tools are required

• (Might be necessary to develop technology to determine benign files)

Conclusion

24

Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

Technology

Transcript of Improving accuracy of malware detection by filtering evaluation dataset based on its similarity