Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional...

34
Challenges in Android Malware Detection OWASP BeNeLux Days @ Belval Friday, 18 th March 2016 KEVIN ALLIX, [email protected] SnT, UniversitØ du Luxembourg

Transcript of Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional...

Page 1: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Challenges in Android MalwareDetectionOWASP BeNeLux Days @ BelvalFriday, 18th March 2016

KEVIN ALLIX, [email protected]

SnT, Université du Luxembourg

Page 2: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Contents

1 IntroductionAndroid in one minuteWith great market shares comes great risksThe Rise of the Malware

2 Machine Learning

3 Does it work?AntiVirusIn the WildData Leakage

4 Conclusion

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 1 / 23

Page 3: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Android

Android in one minute

A complete Software Stack

Linux based Kernel + custom (Non POSIX) Libc

Dalvik Virtual MachineUserland Apps written in Java, and compiled to Dalvik ByteCode

Self-contained Applications packaged in One file

Solid User Base (Billion)

Strong ecosystem (Millions of Apps)

+ alternative markets (AppChina, Amazon, Opera, GetJar, etc.)

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 2 / 23

Page 4: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

With great market shares comes great risksAndroid is becoming Ubiquitous

A lot of device types (Media Players, STBs, DECT Phones, etc.)Huge amounts of personal data on each deviceConnected to both the phone network and the Internet

A target of choice for attackersKevin Allix (SnT - uni.lu) OWASP 18-03-2016 3 / 23

Page 5: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

The Rise of the Malware

1

Bank Phishing Apps (2010)

Botnet (2010)

GPS tracking disguised as a game (2010)

SMS Trojan, SMS Leakage, Contacts Leakage, etc.

Without using any Exploit (i.e. without breaking thepermission-based security model)

1Malicious Mobile Threats Report 2010/2011, Juniper Networks, 2011

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 4 / 23

Page 6: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Research Question

Android Malware Detection

How can we detect Malware Applications?

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 5 / 23

Page 7: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Android Malware Detection I

The traditional Antivirus method

Collect supicious samples

Analyze each sample (Static and/or dynamic analysis)

Extract a signature

What I’m trying to do

Given a set of known malwareAnd given a set of known goodware

Use Data Mining to detect unknown malware samples

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 6 / 23

Page 8: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Android Malware Detection II

Machine-Learning Android Malware : A Recipe

Extract a Feature Vector from each known Malware sample;

Extract a Feature Vector from each known Goodware sample;

Extract a Feature Vector from an unknown Android App;

Add some Machine Learning Magic.

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 7 / 23

Page 9: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Machine Learning IFeature Vector. . .

A Feature is just a characteristic, a property, a trait

Example for Human Beings: Age, Gender, Height, Weight, Skincolor, Eye color, Hair color, etc.

Can you spot correlations between those variables?

Can you spot variables that would allow to guess the variableGender?

→ Machine Learning finds correlations between variables

Machine Learning will spot that on average, men are taller thanwomen

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 8 / 23

Page 10: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Machine Learning II

Feature MatrixExample with 2 Features and One class:

Height Weight Gender185.42 70.3 male172.72 60.3 female185.42 70.3 male157.48 49.9 female180.34 68.0 male170.18 68.0 female172.72 70.3 male

156.845 48.9 female...

......

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 9 / 23

Page 11: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Back to Android Malware

What Features to detect Malware ?→Put everything you can think of that may be statistically different formalware.

!TROLL ALERT! Have no idea at all ?

You don’t know what you’re doing ?

YOLO ?"Deep Learning" is made for you !

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 10 / 23

Page 12: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Back to Android Malware

What Features to detect Malware ?→Put everything you can think of that may be statistically different formalware.

!TROLL ALERT! Have no idea at all ?

You don’t know what you’re doing ?

YOLO ?

"Deep Learning" is made for you !

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 10 / 23

Page 13: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Back to Android Malware

What Features to detect Malware ?→Put everything you can think of that may be statistically different formalware.

!TROLL ALERT! Have no idea at all ?

You don’t know what you’re doing ?

YOLO ?"Deep Learning" is made for you !

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 10 / 23

Page 14: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Two Families of featuresStatic Analysis

+Can be fast+Can be relatively simple

−Blind to many things

Dynamic Analysis

+Can see more things (like downloaded code)

−Can see more things (so much data)

−Cannot be fast−Exercising apps ? Fuzzing a GUI is highly inefficient, and notnecessarily effective

Given the cost in time and CPU of dynamic Analysis, most researchersgo the static Analysis way

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 11 / 23

Page 15: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Evaluation

But features are just the first step

Now you need to evaluate the performance of your malwaredetecor. . .That’s incredibly hard to do properly

A few examples of issues. . .

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 12 / 23

Page 16: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

What is a Malware?Remember the scary Juniper graph?

I can do scary graphs as well

GooglePlay Anzhi AppChina fdroid

Malware Goodware

21%79% 73%

27%54%46%

2%

98%

(Malware == detected by at least 1 Antivirus)

That one is slightly less scarry

GooglePlay Anzhi AppChina fdroid

Malware Goodware

1%

99%33%

67%32%

68%0%

100%

(Malware == detected by at least 10 Antivirus)

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 13 / 23

Page 17: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

What is a Malware?Remember the scary Juniper graph?

I can do scary graphs as well

GooglePlay Anzhi AppChina fdroid

Malware Goodware

21%79% 73%

27%54%46%

2%

98%

(Malware == detected by at least 1 Antivirus)

That one is slightly less scarry

GooglePlay Anzhi AppChina fdroid

Malware Goodware

1%

99%33%

67%32%

68%0%

100%

(Malware == detected by at least 10 Antivirus)

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 13 / 23

Page 18: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

What is a Malware?Remember the scary Juniper graph?

I can do scary graphs as well

GooglePlay Anzhi AppChina fdroid

Malware Goodware

21%79% 73%

27%54%46%

2%

98%

(Malware == detected by at least 1 Antivirus)

That one is slightly less scarry

GooglePlay Anzhi AppChina fdroid

Malware Goodware

1%

99%33%

67%32%

68%0%

100%

(Malware == detected by at least 10 Antivirus)Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 13 / 23

Page 19: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Ground-truthTo do Machine Learning, we need:

A set of known MalwareA set of known Goodware

There are a few (small) sets of known Malware.Interestingly, there is no set of known Goodware.

Using AntiVirus Products

Not every AV agree

Well. . . All AVs Disagree

Some flag Adware

Some Don’tSome Do Sometimes

→ AVs do NOT share a common definition of what is a Malware

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 14 / 23

Page 20: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Ground-truthTo do Machine Learning, we need:

A set of known MalwareA set of known Goodware

There are a few (small) sets of known Malware.Interestingly, there is no set of known Goodware.

Using AntiVirus Products

Not every AV agree

Well. . . All AVs Disagree

Some flag Adware

Some Don’tSome Do Sometimes

→ AVs do NOT share a common definition of what is a Malware

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 14 / 23

Page 21: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Ground-truthTo do Machine Learning, we need:

A set of known MalwareA set of known Goodware

There are a few (small) sets of known Malware.Interestingly, there is no set of known Goodware.

Using AntiVirus Products

Not every AV agree

Well. . . All AVs Disagree

Some flag Adware

Some Don’tSome Do Sometimes

→ AVs do NOT share a common definition of what is a Malware

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 14 / 23

Page 22: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

AVs

!TROLL ALERT! Increase your performance

By choosing the definition that makes your detector look good.

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 15 / 23

Page 23: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

AVs

!TROLL ALERT! Increase your performanceBy choosing the definition that makes your detector look good.

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 15 / 23

Page 24: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

In-the-Lab vs in-the-Wild

Size does matter

Malware Detectors are often tested on very small datasets

Their performance may be over-estimated

By a Whole lot

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 16 / 23

Page 25: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

In-the-Lab vs in-the-Wild

Size does matter

Malware Detectors are often tested on very small datasets

Their performance may be over-estimated

By a Whole lot

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 16 / 23

Page 26: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Data Leakage: The Time issue

One slight methodology problem. . .

We don’t know the future.

Yes, I learned that during my PhD

"Science is a slow process"

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 17 / 23

Page 27: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Data Leakage: The Time issue

One slight methodology problem. . .

We don’t know the future.Yes, I learned that during my PhD

"Science is a slow process"

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 17 / 23

Page 28: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Data Leakage: The Time issue

One slight methodology problem. . .

We don’t know the future.Yes, I learned that during my PhD

"Science is a slow process"

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 17 / 23

Page 29: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

The Time Issue: [Back To The Future Style]

Testing an approach in an historically Incoherent way tells us:a) How this approach would perform Now on Malware from thePastorb) How this approach would have performed in the Past with(then-)Present Malware if it has had access to the (then-)Future

However, it does Not tell us how it would perform Now on PresentMalware

Does your brain hurt ?

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 18 / 23

Page 30: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Use-Cases I

Filtering / Finding brand new Malware

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 19 / 23

Page 31: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Use-Cases II

Cleaning Markets

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 20 / 23

Page 32: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

State of the Art ?

Nearly everyone does the time in-coherent way

Knowing the Future helps a lot!

I guess those two things are unrelated. . .

History Matters!

History should be taken into account when evaluating a Malwaredetector;

Approaches whose evaluation ignores History may actuallyperform badly where we need them most;

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 21 / 23

Page 33: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Conclusion

Automatic Malware Detection? We’re not quite there. . .

What is needed ?

Dependability, Dependability, DependabilityIncrease trust in Machine Learning-based malware detectors ?→ Predicting performance where it cannot be assessed yet→ Explanation

PracticalityHow to tune an approach to match its user needs ?

Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 22 / 23

Page 34: Challenges in Android Malware Detection - OWASP · Android Malware DetectionI The traditional Antivirus method Collect supicious samples Analyze each sample (Static and/or dynamic

Thank You!

Questions?Kevin Allix (SnT - uni.lu) OWASP 18-03-2016 23 / 23