Identification of Attack Nodes from Traffic Matrix Estimation

24
International Trusted Internet Workshop 2005 1 Identification of Attack Nodes from Traffic Matrix Estimation Yuichi Ohsita Osaka University

Transcript of Identification of Attack Nodes from Traffic Matrix Estimation

Page 1: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 1

Identification of Attack Nodesfrom Traffic Matrix Estimation

Yuichi OhsitaOsaka University

Page 2: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 2

What is DDoS (Distributed Denial-of Service)?One of the most serious problems

Number of DDoS attacks is increasingSerious economic loss

Overview of DDoSAn attacker hacks remote hosts and installs attack tools The hosts attack the same server at the same time

attacker

Hacked hosts

server

Page 3: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 3

Necessity and difficulty of identification of attack nodes

Attackers are highly distributedAttacker can generate as high rate attack as a single point defense cannot deal with.

We must block attack packets at distributed pointsTo effectively block attacks, we should block on the paths from attackers to the victim.

We need to identify attack nodesProblem: Identification of attack nodes is difficult

Attackers can easily spoof the source address

Page 4: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 4

Existing methods to identify attack nodes

Existing methodsWhen forwarding a packet, the router sends identification information to the destination

ICMP traceback, Packet Marking Method.Each router stores packet digests

Hash-based tracebackProblem

These methods cannot work with legacy routers

Page 5: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 5

Goal of our research

Problem of traditional methodUnable to work with legacy routers

We must implement all or most of routers.

Our goalIdentification of attack nodes which can work with legacy routers

Method using information which can be obtained from legacy routers

We can obtain statistics of link loads through SNMP

Page 6: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 6

We can identify the attack sources that are increasing the traffic to the victimTraffic between each source and each destination can be estimated from link loads by traffic matrix estimation.

Traffic matrix estimation is a method proposed for traffic engineering

Identification of attack nodes by monitoring traffic

Rapidly increase

attacker

Page 7: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 7

Overview of our method

Monitored network

Monitoring nodes

1. Collecting link loaddata from every router

2. Estimation of theincrease in traffic

•we modify the existing method to estimate traffic between source and destination (traffic matrix)

3. Identification ofattack sources

Page 8: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 8

Existing method to estimate traffic matrixMethod using the gravity model

Typical estimation method which can estimate very fast.Traffic between a source and a destination is assumed to be proportional to the total traffic at the source and at the destination.

A

B

C

1 Mbps

2 Mbps

3 Mbps

3 Mbps

2 Mbps

1 Mbps

Traffic from A to B is estimated as

Mbps25.03Mbps1Mbps

1MbpsMbps1

Con trafficegress totalBon trafficegress TotalBon trafficegress Total

on trafficingressTotal

=+

×=

A

Page 9: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 9

Problem of existing method using the gravity modelThe impact of the attack traffic is distributed among the edge links that have legitimate traffic to the victim.

A

B

C

1 Mbps+1Mbps

2 Mbps

3 Mbps

3 Mbps

2 Mbps

1 Mbps+1Mbps

Even when traffic from A to Bincreased by 1Mbps

Traffic from A to B is estimated as

Mbps8.03Mbps2Mbps

2MbpsMbps2 =+

×

The estimation results:

Traffic from A to B: increased by 0.55 Mbps

Traffic from C to B: increased by 0.45 Mbps

Our methodEstimation method focusing not on the total rate but on the increase in traffic.

We can eliminate the effect of the amount of legitimate traffic

Problem of existing estimation method

Page 10: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 10

Steps to estimate the increase in traffic

Calculation of the increase in traffic on each link

Estimation of the increase in traffic between source and destination

Estimation by gravity modelModification of the result by using statistics of internal links

Estimation of the average link loads

XXG −=

GX

X Average link loads of legitimate traffic

Loads on each link

Increase in traffic on each link

Page 11: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 11

Estimating the increase using the gravity modelEstimating the increase in traffic from to as

A

B

C

Increaseby 1Mbps

decrease by 0.2 Mbps

Increase by 0.2Mbps

Increase by 0.1Mbps

decrease by 0.1 Mbps

Increase by 1Mbps

The increase in traffic from A to B is estimated as

⎪⎪⎪⎪⎪

⎪⎪⎪⎪⎪

<<×−

>>×

<

(others)0

0) 0,(

0) 0,(

outj

in

)0(:

out

outjin

outj

in

)0(:

out

outjin

out

out

ggg

gg

ggg

gg

i

gkk

i

i

gkk

i

k

k

Mbps9.00.1Mbps1Mbps

Mbps1Mbps1 =+

×

Increase in egress traffic on link

Increase in ingress traffic on link

inkg

outkg

k

k

i j

Page 12: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 12

Relation between link loads and traffic of flows

The total amount of traffic on the link is the sum of the traffic of flows that are passing the link

Page 13: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 13

The total amount of traffic on the link is the sum of the traffic of flows that are passing the link

AFG =

F

G

A

Relation between link loads and traffic of flows

Routing matrix whose entry defined as

Increase in traffic on each links

Increase in traffic between each source and each destination

⎩⎨⎧

=(others)0

)link traverse to from (traffic1),,(

kjia kji

kjia ),,(

Page 14: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 14

We adjust the increase in traffic estimated by the gravity model to satisfy

The gravity model uses only statistics on edge links

How to adjust the increaseWe obtain the final result as

AFG =

)'(' 1 AFGAFF −+= −F

'F1−A

G

Using the traffic statistics on the internal links

Increase in traffic on each link

Pseudo-inverse of routing matrix

Increase in traffic estimated by the gravity model

Page 15: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 15

Assumption and constraint for estimating average of legitimate trafficWe assume that the average rate of legitimate traffic is basically estimated by the weighted average of the monitored traffic rate

We must estimate the average of the legitimate traffic without the effect of sudden and rapid increase

This causes difficulties in the identification of the increaseWe should update the average by satisfying

Our method assumes the situation covered by AFG =

nnn XXX )1(1 αα −+=+

nX

nX

AFG =

)10( <<α

Page 16: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 16

Steps to estimate average of legitimate traffic

We extract the element not increasing rapidly from estimated traffic

We define as a vector whose element is0, in the case that traffic from i to j increase rapidlyOtherwise, the estimated increase in traffic from i to j

We can eliminate the effect of rapid increaseWe update as

is the routing matrixWe can update the average by satisfying

nF̂

nnnn XFAXX )1()ˆ(1 αα −++=+ )10( <<αA

),(̂ jif

nX

AFG =

Page 17: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 17

Assumption for identification of attack nodes

Attack nodes are the sources increasing the traffic on the victim

When an attack starts, the traffic sharply increases from the attackers to the victim.The larger the increase is, the more serious the impact on the network resources is.

The total rate of attack traffic can be estimated from the increase of the egress traffic to the victim.

Setting a static threshold to the increase in traffic is not sufficient.

When the number of attackers is large, the impact is serious even if the rate from each attacker is not so large.

Page 18: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 18

Steps to identify attack nodesEstimate total attack rate

We identify the source of the largest estimated increase as attack source

The identification of another attack node is continued until the sum of estimated increase of identified attack nodes is larger than

γμ −−= outoutout~ gg

out~g

outgoutμ

γ

out~g

parameter indicating the variation in the rate of the legitimate traffic

the average of the last values of

Increase in traffic on the link connected to the victimoutgJ

Page 19: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 19

EvaluationWe evaluate our method by simulation

TopologyThe backbone topology of Abilene

Legitimate traffic pattern Traffic monitored at the gateway of Osaka University

We made 110 groups of packets based on a 16 bit prefix of the source address.We calculated the aggregated traffic rate for each group at a 60 seconds interval.

Page 20: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 20

Metrics used for evaluationFalse-positive

Cases where a source not generating attack traffic is erroneously identified as an attack source.

False-negativeCases where an attack source cannot be identified.

False-positive rate

False-negative ratenodesattack of #

positve-false of #

fficattack tra generatingnot sources of #negative-false of #

Page 21: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 21

Number of attack nodes vs. false-positive, false-negativeWe simulate attacks changing the number of attack nodes from 1 to 5We injected attack packets at 16 different times.The total rate of attack traffic is 1000 packets/sec irrespective of the number of attack sourcesWe set to 200 Packets/secOur method can accurately identify attack sources regardless of the number of attack nodes

γ

4 (0.05)12 (0.15)54 (0.04)3 (0.04)43 (0.02)0 (0.00)30 (0.00)0 (0.00)22 (0.01)0 (0.00)1

# of False-positives(false-positive rate)

# of False-negatives(false-negative rate)

# of attack nodes

Page 22: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 22

Our method can reduce the number of false-positives by setting γ to a larger value.A large γ causes many false-negatives.

vs. false-positive, false-negative

Pro

babi

lity

0

0.2

0.4

0.6

0.8

1

0 400 800 1200 1600

False-positive rate

False-negative rate

γ (Packets/sec)

γ

Page 23: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 23

γ vs. attack rate from unidentified attack nodesWe simulated our method, changing the attack rate.We injected attack packets at 16 different times.Number of attack nodes is 4.The total rate of attack traffic from unidentified attack sources is closely related to γ

We can set γ adequately by defining the maximum attack rate that does not affect the network resources.

Pro

babi

lity

0

0.2

0.4

0.6

0.8

1

0 200 400 600 800 1000 γ (Packets/sec)

Atta

ck ra

te fr

om u

nide

ntifi

ed

atta

ck n

odes

(Pac

kets

/sec

)

0

200

400

600

800

1000Attack rate (maximum)

Attack rate (average)

False-positive rate

Page 24: Identification of Attack Nodes from Traffic Matrix Estimation

International Trusted Internet Workshop 2005 24

ConclusionWe propose a method to identify attack nodes by estimating the increase in traffic between sources and destinations

Our method can work with legacy routersThe increase is estimated from link loads which can be obtained through SNMP

Our method can distinguish attack nodes from legitimate clients

We use the increase to identify attack nodesSimulation results show that our method can accurately identify attack nodes