Blacklisting and Blocking Sources of Malicious Traffic

52
Blacklisting and Blocking Sources of Malicious Traffic Sources of Malicious Traffic Athina Markopoulou Uni sit f C lif ni I in University of California, Irvine Joint work with Fabio Soldo, Anh Le @ UC Irvine and Katerina Argyraki @ EPFL 1

Transcript of Blacklisting and Blocking Sources of Malicious Traffic

Page 1: Blacklisting and Blocking Sources of Malicious Traffic

Blacklisting and Blocking Sources of Malicious Traffic Sources of Malicious Traffic

Athina MarkopoulouUni sit f C lif ni I inUniversity of California, Irvine

Joint work with Fabio Soldo, Anh Le @ UC Irvine Jo nt work w th Fab o Soldo, Anh Le @ UC Irv ne and Katerina Argyraki @ EPFL

1

Page 2: Blacklisting and Blocking Sources of Malicious Traffic

OutlineOutline

MotivationMot vat onMalicious Internet Traffic: Attack and Defense

Two Defense Mechanisms Proactive: Predictive Blacklisting

d F lReactive: Source-Based Filtering

C l siConclusion

2

Page 3: Blacklisting and Blocking Sources of Malicious Traffic

Malicious Traffic on the Internet

Compromising systems

Malicious Traffic on the Internet

p g yscanning, worms, website attacksphishing, social engineering attacks....

Launching attacksspamclick fraudclick-fraudDenial-of-Service attacks…

B t tBotnetslarge groups of compromised hosts, remotely controlled

3

Page 4: Blacklisting and Blocking Sources of Malicious Traffic

The solution requires many components

Monitoring and detection of malicious activity

The solution requires many components

Monitoring and detection of malicious activity– in the network and/or at hosts– signature-based, behavioral analysis

Mitigation – at the hosts: remove malicious code– in the network: block, rate-limit, scrub malicious traffic

Internet architecture Internet architecture

4

Page 5: Blacklisting and Blocking Sources of Malicious Traffic

Defense at the edge of the networkDefense at the edge of the network

N k 1 Network 2Network 1 Network 2

Logging IDS Firewall Logging IDS Firewallrouter router

L i IDS Fi ll

Network 3 Network 4

Logging IDS FirewallLogging IDS Firewall

Our focus is on (1) blacklisting and (2) blocking malicious traffic5

Page 6: Blacklisting and Blocking Sources of Malicious Traffic

Dshield Dataset

6 months of IDS+firewall logs from Dshield.org (May-Oct 2008):

Dshield Dataset

6 months of IDS firewall logs from Dshield.org (May Oct 2008)~600 contributing networks, 60M+ source IPs, 400M+ logs

Contributing network

Dshield.org

Logs Time Victim ID Src IP Dst IP Src Port Dst Port Protocol FlagsLogs Time Victim ID(contributor)

Src IP Dst IP Src Port Dst Port Protocol Flags

P h f d d l d b hPros: huge amount of data, diverse sample, used by many researchersCons: no detailed information on alerts, may include errors

6

Page 7: Blacklisting and Blocking Sources of Malicious Traffic

OutlineOutline

BackgroundBackgroundMalicious Internet Traffic: Attack and Defense

Two Defenses Mechanisms Proactive: Predictive Blacklisting

d F lReactive: Source-Based Filtering

C l siConclusion

7

Page 8: Blacklisting and Blocking Sources of Malicious Traffic

Predictive Blacklisting

Problem definition:

Predictive Blacklisting

Problem definition: – Given past logs of malicious activity collected at various

locationsP di t lik l t d li i t ffi t h i ti – Predict sources likely to send malicious traffic to each victim network in the future.

Blacklist: – list of “worst” (e.g. top-100) attack sources

Prediction vs DetectionPrediction vs. Detection

8

Page 9: Blacklisting and Blocking Sources of Malicious Traffic

Data analysisSuperposition of several behaviors

Data analysiser

tsm

ber

of a

l

D

Nu

Source (“Attacker”) IPDay

9

Page 10: Blacklisting and Blocking Sources of Malicious Traffic

A multi-level prediction model

Different predictors capture different patterns in

A multi-level prediction model

Different predictors capture different patterns in the dataset: – Model temporal dynamics

M d l i l l i b i i / k– Model spatial correlation between victims/attackers

Combine different predictorsComb ne d fferent pred ctors

Formulate as a Recommendation Systems problem– in particular collaborative filtering

10

Page 11: Blacklisting and Blocking Sources of Malicious Traffic

Recommender systems: example

Netflix: you rate movies and you get suggestions

Recommender systems: example

11

Page 12: Blacklisting and Blocking Sources of Malicious Traffic

Formulating Predictive Blacklisting

Recommendation System Predictive Blacklistingas a Recommendation System (CF)

3 2 ? ? 13 4 ?

AttackersUsers

3 2 ? ?

1 ? ? 4

- 13 4 ?

? - 3 ?- ? ? 1

? - 12 1- 7 ? 1

? ? ? ?ims

ms

6 3 1 9

? ? 2 ?

? 11 - 2

3 8 ? -

? - 12 1

4 ? - 273 - ? 9

1

? ? ? ?

? ? ? ?Vi

cti

Item

? ? 2 ?

R = Rating Matrix

8 ?2 ? 6 -? 21 - ?

11 2 ? -? ? ? ?

? ? ? ?User Attack? ? ? ?Userrating

Attackvolume

Goal: predict rating matrix: ra,v(t) 12

Page 13: Blacklisting and Blocking Sources of Malicious Traffic

Predictor I: (attacker, victim) pairT l d iTemporal dynamics

)(, trTSva

Data analysis: attacks from the same source within short time

13

Page 14: Blacklisting and Blocking Sources of Malicious Traffic

Predictor I: (a v) time seriesPredictor I: (a, v) time series)(, trTS

va

Data analysis: repeated attacks within short time periodsPrediction:

– Use EWMA model to capture this temporal trendp p– Accounts for the short memory of attack sources.– Computationally efficient– Includes as special case t=1

Past activityat time t’ ≤ t

Predicted activity

14

Page 15: Blacklisting and Blocking Sources of Malicious Traffic

Predictor II: similar victims

Data analysis: victims share common attackers.

spatial correlation

– [Katti et al, IMC 2005], [Zhang et al, Usenix Security 2008]

C Our approach:

Common attackers

Victims

15

Page 16: Blacklisting and Blocking Sources of Malicious Traffic

Predictor II: similar victimsdefining similarity

• Similarity of victims u,v captures:y p– the number of common attackers– and when they are attacked

C Our approach:

1 1 0 0v1a1 a2 a3 a4

Common attackers

1 1 0 0

1 1 0 0

1 1 1 0

v2v3

victims

0 0 1 1v416

Page 17: Blacklisting and Blocking Sources of Malicious Traffic

Predictor II: similar victimsk-nearest neighbors (kNN)

)(, tr KNNva

Traditional kNN: “trust” your peers– Identify k most similar victims (“neighbors”) + predict your rating based on theirs

N h ll d i i iNew challenges due to time varying ratings

Sum over the

Our approach: Sum over the

neighborhood of v

Time series forecastgiven past logs

Predicted activity

given past logs

Similarity between ytime-varying vectors

17

Page 18: Blacklisting and Blocking Sources of Malicious Traffic

Predictor III: Attackers-Victims l

Data analysis:

Co-clustering

– group of attackers consistently target the same group of victims.– this behavior often persists over time

We used the Cross-Association (CA) method to automatically identify dense clusters of victims-attackers.

18

Page 19: Blacklisting and Blocking Sources of Malicious Traffic

Predictor III: Attackers-Victims P d Prediction

)(, tr CAEWMAva

Intuition:– pairs (a,v) in dense clusters are more likely to occur– use the density of the cluster, as the predictor

, where

EWMA-CA: further weight by persistence over time

19

Page 20: Blacklisting and Blocking Sources of Malicious Traffic

A multi-level prediction modelpSummary

Different predictors capture different patterns: – Temporal trends

EWMA TS of (attacker victim) • EWMA TS of (attacker,victim) – Neighborhood models:

• KNN: Similarity of victims• EWMA CA: Interaction of attackers-victims

Combine different predictorsCombine different predictors

20

Page 21: Blacklisting and Blocking Sources of Malicious Traffic

Combining different predictors

W i ht d A

Combining different predictors

Weighted Average – with weights proportional to the accuracy of each predictor on a pair (a,v).

21

Page 22: Blacklisting and Blocking Sources of Malicious Traffic

Performance AnalysisB li Bl kli i T h iBaseline Blacklisting Techniques

• Local Worst Offender List (LWOL)• Local Worst Offender List (LWOL)– Most prolific local attackers– Reactive but not proactive

• Global Worst Offender List (GWOL)• Global Worst Offender List (GWOL)– Most prolific global attackers– Might contain irrelevant attackers– Non prolific attackers are elusive to GWOL

• Collaborative Blacklisting (HPB)– [J. Zhang, P. Porras, J. Ullrich, “Highly Predictive Blacklisting”, USENIX Security 2008]– Also implemented and offered as a service (HPB) by Dshield.org– Methodology: Use link-analysis on the victims similarity graph to predict future attacks

22

Page 23: Blacklisting and Blocking Sources of Malicious Traffic

Performance Analysis

60 d f D hi ld l 5 d t i i 1 d t ti BL l th 1000

total hit countPerformance Analysis

60 days of Dshield logs, 5 days training, 1 day testing, BL length=1000, The combined method

– significantly improves the hit count (up to 70%, 57% on avg)– exhibits less variation over timeexhibits less variation over time

Combined method

HPBHPB

GWOL

23

Page 24: Blacklisting and Blocking Sources of Malicious Traffic

Predicting Attacksh i h b d ?what is the best we can do?

Training, day t1 Test, day t2

12 - 1 33 5 - - 3 5 - 17 4 - -viLocalUB(vi)=3

Local Upper Bound: #IPs in training & test window of a particular contributor

2 - 1 1 - - -

12 - 1 33 5 - -

- - 7 - 3 29 6

- 1 - - 5 - -

3 5 - 17 4 - -

1 2 - 1 5 31 4

- - - - 2 - - 1 - - 2 4 - -

x - x x x x x x x - x x x x GlobalUB=5

Global Upper Bound: # IPs in training window of any contributor

24

Global Upper Bound: # IPs in training window of any contributor

Page 25: Blacklisting and Blocking Sources of Malicious Traffic

Predicting AttacksPredicting Attacksroom for improvement

Collaboration helps!

Large gap from prior methodsOur method (|BL|=1000)

25

Page 26: Blacklisting and Blocking Sources of Malicious Traffic

Performance Analysis

Robustness achieved by diverse methods

yrobustness to random errors

E.g. an attacker may send traffic to a single victim (detected by temporal) or to several victims (detected by spatial behavior); or he can limit his attack activity

26

Page 27: Blacklisting and Blocking Sources of Malicious Traffic

Predictive Blacklisting as a RS System

b

Summary Predictive Blacklisting as a RS System

Contributions– Combined predictors that capture different patterns in the data– Significant improvement with simple techniques

• still room for further improvement• still room for further improvement– New formulation as a recommenders system (collaborative filtering) problem

• paves the way to powerful techniques: • e.g., capture global structure (latent factors), joint spatio-temporal models

References– F.Soldo, A.Le, A.Markopoulou, "Predictive Blacklisting as an Implicit

Recommendation system“, IEEE INFOCOM 2010 and in arXiV.org– In the news: MIT Technology Review, Slashdot, ACM TechNews

27

Page 28: Blacklisting and Blocking Sources of Malicious Traffic

How to use a list of malicious sources?How to use a list of malicious sources?

• A policy decision:– E.g. scrub, give lower priority, block, monitor, do nothing …

• One option is to block (filter) malicious sources– when: during flooding attacks by million-node botnets– where: at firewalls or at the routers

28

Page 29: Blacklisting and Blocking Sources of Malicious Traffic

OutlineOutline

BackgroundBackgroundMalicious Internet Traffic: Attack and Defense

Two Defenses Mechanisms Proactive: Predictive Blacklisting

l d F lReactive: Optimal Source-Based Filtering

C l siConclusion

29

Page 30: Blacklisting and Blocking Sources of Malicious Traffic

Filtering at the routersFiltering at the routers

• Access Control Lists (ACLs)( )– Match a packet header against rules, e.g. source and

destination IP addresses– Source-based filter: ACL that denies access to a source Source based filter: ACL that denies access to a source

IP/prefix

l l • Filters implemented in TCAM– Can keep up with high speeds– Limited resource Limited resource

• There are less filters than attack sources

30

Page 31: Blacklisting and Blocking Sources of Malicious Traffic

Filter Selection at a Single Routerd ff b f fil ll l dtradeoff: number of filters vs. collateral damage

cattackers

Filter an attack source A.B.C.D

. . . . . . . . .c

c cc cc

legitimate users

Filter a prefix A.B.C.*

ISP

edge routerC

edge router

V 31

Page 32: Blacklisting and Blocking Sources of Malicious Traffic

Optimal Source-Based FilteringOptimal Source Based Filtering

Design a family of filter selection algorithms that:t k i t• take as input:

– a blacklist of malicious (bad) sources – a whitelist of legitimate (good) sources– a constraint on the number of filters Fmax– a constraint on the number of filters Fmax– a constraint on the access bandwidth C– the operator’s policy

• optimally select which source IP prefixes to filteroptimally select which source IP prefixes to filter– so as to optimize the operator’s objective – subject to the constraints

A B C *

0 2^32-1 A.B.C.D

A.B.C.

so far, heuristically done (through ACLs or rate limiters) 32

Page 33: Blacklisting and Blocking Sources of Malicious Traffic

Optimal Source-Based Filtering p gA General Framework

[l,r]: range in the IP spaceg pp/l: prefix p of length lF max: number of filters (<<N)

: whether we block range [l r] or not: whether we block range [l,r] or not: weight assigned to source IP address, i.

: cost of blocking a range [l,r]

33

Page 34: Blacklisting and Blocking Sources of Malicious Traffic

Optimal Source-Based Filtering E i O ’ P li Expressing Operator’s Policy

• Assignment of weights Wi is the operator’s knob:– indicates volume of traffic sent, or importance assigned by the operator– Wi>0 (good source i), Wi<0 (bad source i ), Wi=0 (indifferent)

• Objective function

=

=

cost of good sources in range [l,r]

cost of bad sources in range [l r]cost of bad sources in range [l,r]

34

Page 35: Blacklisting and Blocking Sources of Malicious Traffic

Filter Selection AlgorithmsP bl O iProblem Overview

• RANGE-based: filter IP or range [l,r]g[Soldo, El Defrawy, Markopoulou, Van De Merwe, Krishnamurthy: ITA’09]

– FILTER-ALL-RANGE– FILTER-SOME-RANGE

FILTER ALL DYNAMIC RANGE– FILTER-ALL-DYNAMIC-RANGE

• PREFIX-based: filter IP source or prefix[Soldo, Markopoulou, Argyraki: INFOCOM’09, arXiv.org][Soldo, Markopoulou, Argyraki INFOCOM 09, arXiv.org]– FILTER-ALL: block all malicious sources– FILTER-SOME: block some malicious sources– FILTER-ALL-DYNAMIC: BL varies over time

FLOODING: b d idth st i t t ss t– FLOODING: bandwidth constraint at access router– DISTRIBUTED-FLOODING: filters at multiple routers

35

Page 36: Blacklisting and Blocking Sources of Malicious Traffic

Filter Selection AlgorithmsAl ith O iAlgorithms Overview

• RANGE-based: filter IP or range [l,r]g[Soldo, El Defrawy, Markopoulou, Van De Merwe, Krishnamurthy: ITA’09]

– FILTER-ALL-RANGE– FILTER-SOME-RANGE

FILTER ALL DYNAMIC RANGE– FILTER-ALL-DYNAMIC-RANGE

• PREFIX-based: filter IP source or prefix[Soldo, Markopoulou, Argyraki: INFOCOM’09, arXiv.org][Soldo, Markopoulou, Argyraki INFOCOM 09, arXiv.org]– FILTER-ALL: O(N)– FILTER-SOME: O(N)– FILTER-ALL-DYNAMIC: O(N)

FLOODING: NP h d s d l i l l O(C2N) h isti– FLOODING: NP-hard, pseudo-polynomial alg. O(C2N) + heuristic– DISTRIBUTED-FLOODING: distributed solution

following a dynamic programming approachg y p g g pp

36

Page 37: Blacklisting and Blocking Sources of Malicious Traffic

Longest Common Prefix Tree of a BLLongest Common Prefix Tree of a BL• LCP-Tree(BL) : binary tree, leaves are addresses in BL,

intermediate nodes are their longest common prefixesg p f• It can be found from the full binary tree of IP prefixes• E.g. for BL={10.0.0.2, 10.0.0.3, 10.0.0.7}, the LCP-Tree(BL) is:

10.0.0.2/31

10.0.0.0/29

3 bad, 5 good addresses10.0.0.2/31

10 0 0 2/32 10 0 0 3/32 10 0 0 7/32

0 good, 2 bad addresses

• Finding a set of filters:– no need to look for all possible sets of prefixes

10.0.0.2/32 10.0.0.3/32 10.0.0.7/32

no need to look for all possible sets of prefixes – sufficient to look only for prunings of the LCP tree– lends itself to a dynamic programming approach 37

Page 38: Blacklisting and Blocking Sources of Malicious Traffic

Filter-All-PrefixP bl SProblem Statement

• Given: a blacklist BL, weight wi (for each good IP i), Fmax filters• choose: prefixes p/l (x /l)choose: prefixes p/l (xp/l)• so as to: filter all bad addresses and minimize collateral damage

38

Page 39: Blacklisting and Blocking Sources of Malicious Traffic

Filter-All-PrefixD i P i Al i hDynamic Programming Algorithm

: cost of optimal allocation of F filters within a prefix pp p p

psL sRL sR

F-n ≥ 1,filters within left subtree

n ≥ 1,filters within right

subtree

39

n=1,1,…,F: means that we want to block all malicious sources (leaves)

Page 40: Blacklisting and Blocking Sources of Malicious Traffic

Filter-All-PrefixP l h E l

Fmax = 4N = 10

DP Algorithm: Example

Fmax = 4

0/1

32/5

57/6 58/6

Page 41: Blacklisting and Blocking Sources of Malicious Traffic

Filter-Some-Prefix

Fmax = 4N = 10Fmax = 4

32/5

57/6 58/63/6

Page 42: Blacklisting and Blocking Sources of Malicious Traffic

N 10Filter-All-Prefix-Dynamic

Ti i Fmax = 4N = 10

Need to be

Time-varying case

(re)computed:O(Fmaxlog(N))

26

7

0 22

7 75

31 3710 15 17 22 32 33 57 583

6 6 0 2

42

Page 43: Blacklisting and Blocking Sources of Malicious Traffic

FLOODINGP bl SProblem Statement

• Given: a blacklist BL, a whitelist WL, a weight of address = traffic volume generated weight of address = traffic volume generated, a constraint on the link capacity C, and Fmax filters

• choose: source IP prefixes, xp/l• so as to: minimize the collateral damage g

and fit the total traffic within the link capacity C

43

Page 44: Blacklisting and Blocking Sources of Malicious Traffic

FLOODINGDP Al i hDP Algorithm

• FLOODING is NP-hard – reduction from knapsack with cardinality constraint (1.5K)

• An optimal pseudo-polynomial dynamic programming An optimal pseudo polynomial dynamic programming algorithm, solves the problem in: O((CFmax)2N)– similar to the previous DP but solve 2-dimensional KP

l– the LCP-Tree includes both good and bad addresses– DP extended to take into account the capacity constraint

• A heuristic, by adjusting the granularity (ΔC>1) of C44

Page 45: Blacklisting and Blocking Sources of Malicious Traffic

Distributed Floodingfil l filters at several routers

attackers

• Deploy filters at several routers– increase total filter budget

E h ( ) h

. . . . . .

cc cc c

c

• Each router (u) has its own:– view of good/bad traffic– capacity in incoming link– filter budget

. . .

filter budget• Filtering at several routers:

– not only which prefix to block– but also on which router

• Solution:– can be solved in a distributed way

outperforms independent decisions Victim– outperforms independent decisions

45

Page 46: Blacklisting and Blocking Sources of Malicious Traffic

Evaluation using Dshield dataFLOODING li i iFLOODING vs. rate limiting

• Attack sources, from a point of view of a single victim in Dshield• Good sources: [Kohler et al. TON’06, Barford et al. PAM’06]• Before attack: good traffic was C/10 < C• During attack: bad traffic is 10C g

CD

/N

46Optimal filter selection preserves the good traffic and drops the bad.

Page 47: Blacklisting and Blocking Sources of Malicious Traffic

Intuition why optimization helpsy p pcompared to non-optimized filtering

• Malicious sources are clustered in the IP address spacep• Malicious sources are not co-located with legitimate sources

• Filtering can block IP prefixes with malicious sources, without penalizing (many) legitimate sources. 47

Page 48: Blacklisting and Blocking Sources of Malicious Traffic

Evaluation using Dshield data (2)l lFILTER-ALL-PREFIX vs. generic clustering algorithms

• Malicious addresses:– attacking 2 specific victim networks (most and least clustered) in Dshield datasetg p ( )

• Good addresses generated:– using a multifractal [Kohler et al. TON’06, Barford et al. PAM’06]

48Optimal filter selection outperforms generic clustering

Page 49: Blacklisting and Blocking Sources of Malicious Traffic

Evaluation using Dshield data (3)DISTRIBUTED FLOODING h l f di iDISTRIBUTED-FLOODING: the value of coordination

D/N

C

49Coordination among routers helps

Page 50: Blacklisting and Blocking Sources of Malicious Traffic

Optimal Source-Based Filtering SSummary

F k f ti l filt l ti • Framework for optimal filter selection – defined various filtering problems – designed efficient algorithms to solve themg g

• Lead to significant improvements on real datasets– Compared to non-optimized filter selection , to generic

clustering, or to uncoordinated routers– because of clustering of malicious sources

50

Page 51: Blacklisting and Blocking Sources of Malicious Traffic

OutlineOutline

BackgroundBackgroundMalicious Internet Traffic: Attack and Defenses

T D f M h Two Defenses Mechanisms Proactive: Blacklisting as a Recommendation SystemReactive: Filtering as an Optimization ProblemReactive: Filtering as an Optimization Problem

ConclusionConclusionParts of larger system that collects and analyzes data from multiple sensors and takes appropriate action

51

Page 52: Blacklisting and Blocking Sources of Malicious Traffic

Thank you!Thank you!

[email protected]://newport.eecs/uci.edu/~athina

52