Finding the Linchpins of the Dark Web: A Study on Topologically...

30
Finding the Linchpins of the Dark Web: A Study on Topologically Dedicated Hosts on Malicious Web Infrastructures Zhou Li, Indiana University Bloomington Sumayah Alrwais, Indiana University Bloomington Yinglian Xie, MSR Silicon Valley Fang Yu, MSR Silicon Valley XiaoFeng Wang, Indiana University Bloomington 1

Transcript of Finding the Linchpins of the Dark Web: A Study on Topologically...

Page 1: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

Finding the Linchpins of the Dark Web: A Study on Topologically Dedicated Hosts

on Malicious Web Infrastructures

Zhou Li, Indiana University Bloomington Sumayah Alrwais, Indiana University Bloomington

Yinglian Xie, MSR Silicon Valley Fang Yu, MSR Silicon Valley

XiaoFeng Wang, Indiana University Bloomington

1

Page 2: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

The Big Web

2

Page 3: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

The Dangerous Web

3

Page 4: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

4

What bothers you?

Phishing

Scam

Drive-by Download

Theft

Bot

Page 5: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

5

Who is behind?

Individual Hacker Criminal Organization

Page 6: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

6

Malicious Web Infrastructure

Redirector

Phishing Server

Scam Server

Exploit Server

Compromised Site

SEO Site

SPAM Link

Page 7: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

7

Our Work - I

What is the topological view of malicious Web infrastructure?

Build redirection graph

Page 8: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

8

Our Work - II

What are the linchpins?

Topologically Dedicated Malicious Hosts: All redirection

paths are malicious

Stay in “dark” side, far from “bright” side

Page 9: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

9

Our Work - III

How can we find the linchpins?

Topologically detector

Not relying on semantics of attacks

Study of Traffic Direction System (TDS)

Landscape, Lifetime, Parking

Page 10: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

10

Outline

Build graph Find the linchpins

Study Traffic Direction System (TDS)

Page 11: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

11

Data Collection

Dynamic Crawler Firefox extension

Redirection: JavaScript, iframe, Http(302), Meta-refresh, …

Data Source Feed Type Start End # Doorway

URLs Microsoft Malicious 3/2012 8/2012 1,558,690

WarningBird[1] Malicious 3/2012 5/2012 358,232 Twitter Search Mostly good 3/2012 8/2012 1,613,924

Alexa Mostly good 2/2012 8/2012 2,040,720

[1] WarningBird: Detecting Suspicious URLs in Twitter Stream, Lee et.al, NDSS’12

5.5M URLs, 7 Months

<script> var a = “<iframe src =…>”; document.write(a); </script>

Client-side redirection

Page 12: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

12

Building Redirection Graph Data Labeling

Malicious (Forefront)

Legitimate (Whitelist)

Unknown

Node Constructing Node: Hostname-IP Cluster (HIC)

Edge Constructing

Link 2 nodes if there is an URL redirection

2M Nodes, 9M Edges

URL

URL URL

Hostname URL

URL URL

Hostname

HIC

Page 13: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

13

Building Redirection Graph (Cond.)

Non-dedicated Malicious (Mixed)

Legitimate (Totally good)

>70% malicious paths through 15k dedicated hosts

Dedicated Malicious (Totally bad)

Page 14: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

14

Outline

Build graph Find the linchpins

Study Traffic Direction System (TDS)

Page 15: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

15

Finding the Linchpins is Challenging

Serve for different attacks and different sources

Hide by cloaking

Mixed with legitimate nodes

Page 16: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

16

Properties of Dedicated Hosts

Non-dedicated Malicious (Mixed)

Dedicated Malicious (Totally bad)

“Dark” World

“Bright” World

80.40%

Legitimate (Totally good)

2.25%

Page 17: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

17

Topological Detector

Can we leverage this topological feature? Use known bad and good as seed

Detect dedicated hosts

Expand the result using topology

Work on different attacks and different sources

YES!

Page 18: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

18

Hunting Dedicated Hosts Score Propagation using PageRank

GS=10

BS=10

GS=8.65

GS=7.5

BS=4.4

BS=0

BS=3.89

GS=0

GS=0 BS=4.4

Assign score to good seed

Assign score to bad seed

Score Propagation

Choose nodes with high

bad score and low

good score

Good Bad

Page 19: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

19

Evaluation

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

# De

tect

ed P

aths

/ #

All

Mal

icio

us P

aths

#Selected Paths / #Seed Paths

selected seed onlytotal

5% bad hosts, 7x expansion rate

0.025% FPR, 0.34% FDR

0.000%

0.005%

0.010%

0.015%

0.020%

0.025%

0.030%

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

0% 20% 40% 60% 80% 100%

FPR

FDR

# selected paths / # seed paths

FDR

FPR

Using x% of known malicious hosts as bad seed How many more malicious paths can be identified?

Page 20: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

20

Evaluation (Cond.)

Detect new hosts 6k new malicious hostnames identified

Detection result can serve as new seed 12x expansion rate if rolling back the detected result

Path sharing across different sources 56% malicious paths from WarningBird feed overlapped

with Microsoft feed

Page 21: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

21

Outline

Build graph Find the linchpins

Study Traffic Direction System (TDS)

Page 22: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

22

Traffic Direction System (TDS)

Underground traffic brokers who buy traffic from generators (e.g., malicious doorways) and sell to consumers (e.g., exploit servers)

>50% of the malicious paths go through TDS.

Spam

Blackhat SEO Phishing

Exploit Server

TDS

Page 23: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

23

TDS Landscape

71% TDS use Sutra TDS kit.

26% Dynamic DNS, 14% Free Domain Providers.

Inbound traffic: 97% from doorway, 6% from non-doorway

redirectors.

Outbound traffic of Active TDS: 49% to exploit servers, 3%

to scam sites.

Page 24: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

24

TDS Lifetime

Query Passive DNS Database from SIE

Lifetime: interval of host when A record bounding to bad IP

Result 65 days (exclude DDNS/ Free Domain Providers) Much longer than exploit servers, 2.5 hours[1].

[1] Manufacturing Compromise: The Emergence of Exploit-as-a-Service, Chris et. al., CCS’12

Page 25: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

25

TDS Parking

Even suspended TDSes monetized by attackers 20% TDS hosts parked. Continue to receive high volume of traffic after

parking. 62% traffic to ad-networks.

0

0.2

0.4

0.6

0.8

1

0.00001 0.001 0.1 10 1000 100000Traffic Per day (log scale)

Regular parking TDS parking

10x more traffic than legitimate ones

Page 26: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

26

Conclusion

Dedicated Hosts are critical in Malicious Web Infrastructure Serve >70% malicious paths.

Detect Dedicated Hosts only using Topological information. No need to know semantics of specific attacks. 7x expansion rate, <1% FPR & FDR

TDS Hosts deserve in-depth study Key role in malicious Web infrastructure Long lifetime (65 days) Traffic monetized even after suspended

Page 27: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

27

Page 28: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

28

Validation

Forefront reporting

Content and URL clustering

Safebrowsing reporting

Page 29: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

Dedicated Malicious Hosts

29

Compromised Site

SEO Site

Social Network Link

Email Link

Redirector

Phishing Site

Scam Site

Exploit Site

Page 30: Finding the Linchpins of the Dark Web: A Study on Topologically …index-of.co.uk/Blackhat/zhou-oakland13-slides.pdf · 2019-03-07 · A Study on Topologically Dedicated Hosts on

30

Limits of Prior Works

Code Analysis [Cova’10, Curtsinger’10]

Obfuscate Code

Code Execution [Provos’08, Wang’06]

Detect Emulator

URL Pattern [Zhang’11, John’11]

Regenerate URL Pattern Redirection Chain [Li’12, Lee’12, Lu’11]

Depend on Semantics of Attacks