Tanja Zseby Cooperative Association for Internet Data Analysis … · 2012. 1. 9. · 14 of 44....

Post on 26-Feb-2021

2 views 0 download

Transcript of Tanja Zseby Cooperative Association for Internet Data Analysis … · 2012. 1. 9. · 14 of 44....

Entropy in IP Darkspace Data

Tanja Zseby

Cooperative Association for Internet Data Analysis (CAIDA) and

Fraunhofer Institute for Open Communication Systems (FOKUS)

CERT FloCon, January 2012

IP Darkspace

• Global routable IP address space– announced by routing– but no hosts attachedall traffic destined to darkspace is unsolicited

• UCSD telescope– /8 darkspace– Used for different analysis (security, outages, etc.)

• Other IP darkspace monitors:– Internet Motion Sensor, Team cymru Darknet Project,

iSink, …

2 of 44

Scanning

3 of 44

Backscatter

4 of 44

Analysis of Darkspace Data

• Detection of incidents– Scanning activities– Backscatter– Misconfigurations– Network outages

Analysis (patterns, scope,..) Early warning „Cleaning up“ address space

5 of 44

DSA related work

• General Analysis Techniques– Brownlee. One-way Traffic Monitoring with iatmon. To appear at PAM 2012– Ahmed et al. Characterising anomalous events using change - point correlation

on unsolicited network traffic. In Identity and Privacy in the Internet Age, 2009.

• Security and Misconfigurations– Wustrow et al. Internet background radiation revisited. IMC 2010– Aben. Conficker. ISOI 2009– Moore et al.Code-Red: a case study on the spread and victims of an Internet

worm. IMW 2002

• Network Outages– Dainotti et al. Analysis of Country-wide Internet Outages Caused by Censorship,

IMC 2011

• Darkspace Construction– Janies, Collins, Darkspace Construction and Maintenance, FloCon 2011

• IPv6 Darkspace– Huston: IPv6 Background Radiation, NANOG50, 2010– Ford, et al. Initial Results from an IPv6 Darknet, 2006

…and others.6 of 44

Metrics and Techniques

Packetclassification

Packet count per class

Classification rules(feature combinations)

Packetclassification

Packet count per class

Distributions for selected features

Classification rules(selected features)

Time series of packet countsfor selected feature combinations

t

C3C2C1

classes classes classes

T1 T2 T3

7 of 44

Example Metrics

• Time series of packet counts– Overall packet count– Packets to a specific port– Packets with specific TCP flags

• Source groups based on source behavior– Packet features (e.g. SYNs to specific port)– Inter Arrival Times (IATs)

• Distributions– IP addresses, port numbers

8 of 44

Challenges

• High amount of data– Many repetitions/boring events (TCP-SYNs,…)– whole distributions huge amount of data

• Selection of suitable classification rules– Separate known events from new/interesting packets– Feature selection difficult– Features of interest may change– High analysis effort– Detection of different events requires various metrics

9 of 44

Problem Statement

• Goal: detect and classify „events of interest“ – New vulnerabilities (increased scanning)– New victims of attacks (increased backscatter)– Misconfigurations– Network outages

• Ideal: Comprehensive metric – capture all events of interest

• Conditions– Keep storage requirements low

10 of 44

Characteristics of DS Events

• Hostscans (new vulnerability)– Many new sources (attackers) send to a specific destination port

• Backscatter (from DoS attacks with spoofed addresses)– Several sources (victims) send a lot of data to many destination

addresses using a specific source port• Misconfiguration (configuration of wrong destination IP)

– Several sources send to a specific destination IP and specific destination port

• Outages– Source IPs from outage region are missing fewer source IPs

• DDoS (to a destination IP in darkspace)– Many new sources (bots or spoofed) send to a specific destination IP

and specific destination port• Portscan

– One or several hosts send to a specific destination IP and many destination port 11 of 44

Expected Effects on Distributions

Hostscan Backscatter Misconfig Outage DDoS(rare)

Portscan(rare)

sIP random (attackers)

specific(victims)

specific specific (somemissing)

random (attackers)

specific(attackers)

dIP random random specific depends specific specific

sPort random* specific depends depends random* random*

dPort specific random* specifc depends specific random

Distinction of specific/random entropy !*assuming random sPort selection by attack tools

12 of 44

Sample Entropy

“You should call it entropy, […] …no one really knows what entropy really is, so in a debate you will always

have the advantage.“ John von Neumann’s suggestion to Claude Shannon according to Max Jammer “Dictionary of

the History of Ideas: Entropy”

Sample Entropy

Total number of observations

Histogram

[LaCD05] Lakhina, Crovella, Diot: Mining Anomalies Using Traffic Feature Distributions. SIGCOMM2005

Definition from [LaCD05]:

14 of 44

Related Work

Entropy-based anomaly detection:• Lee/Xiang 2001

– Information Theoretic Measures for Anomaly Detection

• Feinstein/Schnackenberg 2003– Detection of DDoS attacks based on source IP

entropy• Lakhina et al.2005

– Detection of scanning, DDoS, outages based on combinations of entropy from addresses and ports

15 of 44

Entropy Example

All packets equal

Entropy = minH(X) = 0

Each packet different

Entropy= maxH(X) = log2N

freq

freq

feature i

feature i

H(x)=max

H(x)=min

16 of 44

Expected Entropy Patterns

Hostscan Backscatter Misconfig Outage DDoS(rare)

Portscan(rare)

sIP random specific specific specific random specific

dIP random** random** specific depends specific specific

sPort random* specific depends depends random* random*

dPort specific random* specific depends specific random

**dIP has already high entropy in “normal” operation

*assuming random sPort selection by attack tools

17 of 44

Analysis

• Time periods– Nov 2008– Jan/Feb 2011– Oct 2011

• Calculation of Sample Entropy– sIP, dIP, sPort, dPort– Time intervals: 1 hour

• Tools: SiLK, R

18 of 44

NOV 2008

19

20 of 44

Nov 2008H

(sIP

)H

(dIP

)H

(dPo

rt)

H(s

Port

)#p

kts[

x108

]

21 of 44

Nov 2008H

(sIP

)H

(dIP

)H

(dPo

rt)

H(s

Port

)A B

#pkt

s[x1

08]

Classification of Event B

Hostscan Backscatter Misconfig Outage DDoS(rare)

Portscan(rare)

sIP random specific specific specific random specific

dIP random** random** specific depends specific specific

sPort random* specific depends depends random* random*

dPort specific random* specific depends specific random

22 of 44

23 of 44

Distributions: sIP, dPort

H(sIP)=5.97# unique sIPs: 206,159

4.5

0.8 12

2

H(sIP)=9.36# unique sIPs: 421,563

H(dPort)=8.15top ports: 1434,445, 3072,..

H(dPort)=6.59top ports: 445, 62997, 137,..

B

A

B

A

sIP rank (log) dPort rank (log)

#pkt

s[x

106 ]

#pkt

s[x

106 ]

#pkt

s[x

106 ]

#pkt

s[x

106 ]

JAN/FEB 2011

24

25 of 44

Jan-Feb 2011H

(sIP

)H

(dIP

)H

(dPo

rt)

H(s

Port

)#p

kts[

x108

]

26 of 44

Jan-Feb 2011H

(sIP

)H

(dIP

)H

(dPo

rt)

H(s

Port

)#p

kts[

x108

]

27 of 44

Jan 2011H

(sIP

)H

(dIP

)H

(dPo

rt)

H(s

Port

)#p

kts[

x108

]

28 of 44

A BH

(sIP

)H

(dIP

)H

(dPo

rt)

H(s

Port

)#p

kts[

x108

]

Classification of Event B

Hostscan Backscatter Misconfig Outage DDoS(rare)

Portscan(rare)

sIP random specific specific specific random specific

dIP random** random** specific depends specific specific

sPort random* specific depends depends random* random*

dPort specific random* specific depends specific random

29 of 44

30 of 44

Distributions: sIP, dIP

H(sIP)=10.97

# unique sIPs: 3,022,603

Much more sIPs

H(sIP)=16.40

# unique sIPs: 23,733,290

sIP rank (log) 1e+07

1e+06

H(dIP)=14.4

dIP rank (log)

500

10

A lot of packets to one IP

H(dIP)=0.59

#pkt

s[x

106 ]

B

A

B

A

#pkt

s[x

106 ]

#pkt

s[x

106 ]

#pkt

s[x

106 ]

1.2

10

31 of 44

Distributions: sPort, dPort

500

8010

1.2

H(sPort)= 8.42

sPorts dispersed

H(sPort)=10.43

Top dPort: 445

H(dPort)=3.19

A lot of packets to one port

H(dPort)=0.22Top dPort: 80

B

A

B

A

sPort rank (log) dPort rank (log)

#pkt

s[x

106 ]

#pkt

s[x

106 ]

#pkt

s[x

106 ]

#pkt

s[x

106 ]

32 of 44

A CH

(sIP

)H

(dIP

)H

(dPo

rt)

H(s

Port

)#p

kts[

x108

]

Classification of Event C

Hostscan Backscatter Misconfig Outage DDoS(rare)

Portscan(rare)

sIP random specific specific specific random specific

dIP random** random** specific depends specific specific

sPort random* specific depends depends random* random*

dPort specific random* specific depends specific random

33 of 44

34 of 44

Distributions: sIP, dIP

H(sIP)=10.97

# unique sIPs: 3,022,603

10A

sIP rank (log)

C

H(dIP)=14.4A

C

dIP rank (log)

A lot of packets from few sIPs

H(sIP)=6.05

10

90 4.5

#pkt

s[x

106 ]

#pkt

s[x

106 ]

#pkt

s[x

106 ]

#pkt

s[x

106 ] H(dIP)=15.60

35 of 44

Distributions: sPort, dPort8010

H(sPort)= 8.42

Top dPort: 445

H(dPort)=3.19

C

A

C

A

sPort rank (log) dPort rank (log)

#pkt

s[x

106 ]

#pkt

s[x

106 ]

#pkt

s[x

106 ]

#pkt

s[x

106 ]

H(dPort)=8.14

dPorts dispersed

60100

Top sPorts: 80, 9021

A lot of packets from one port

H(sPort)=4.70

OCT 2011

36

37 of 44

Oct 2011H

(sIP

)H

(dIP

)H

(dPo

rt)

H(s

Port

)

A B

#pkt

s[x1

08]

38 of 44

Oct 2011H

(sIP

)H

(dIP

)H

(dPo

rt)

H(s

Port

)

A B

#pkt

s[x1

08]

Classification of Event B

Hostscan Backscatter Misconfig Outage DDoS(rare)

Portscan(rare)

sIP random specific specific specific random specific

dIP random** random** specific depends specific specific

sPort random* specific depends depends random* random*

dPort specific random* specific depends specific random

39 of 44

40 of 44

Distributions: sIP, dPort

H(sIP)= 10.37 H(dPort)=3.43

B

A

B

A

#pkt

s[x

106 ]

#pkt

s[x

106 ]

#pkt

s[x

106 ]

#pkt

s[x

106 ]

H(sIP)=5.55

A lot of packets from few sIPs

H(dPort)=8.14

dPorts dispersed

sIP rank (log) dPort rank (log)

6

80

70

45

41 of 44

Oct 2011

SYN-ACKs#pkt

s[x

106 ]

all packets

H

Discussion

• Entropy– Good indicator for new incidents in darkspace– Comprehensive metric to detect and classify different incidents

• Future considerations:– Detection of slow and small changes

• Outages were not visible with current time interval• Stealth scanning• check fine grained time intervals

– Time interval vs. calculation effort– Entropy calculation effort compared to other methods– Problems with nested events– Combination with other metrics (geolocation, source groups,…)– Combination with other DS monitors

42 of 44

CAIDA Workshop on Darkspace Analysis

• May 2012, San Diego• Objectives

– Bring community together– Share experiences– Share data, results– Establish global distributed DS network

• Participation by invitation– If interested contact me

43 of 44

Thank You!

Contact: tanja@caida.org