AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We...
Transcript of AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We...
![Page 1: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/1.jpg)
Automa'cally Inferring the Evolu'on of Malicious Ac'vity on the Internet
David Brumley Carnegie Mellon University
Shobha Venkataraman AT&T Research
Oliver Spatscheck AT&T Research
Subhabrata Sen AT&T Research
![Page 2: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/2.jpg)
<ip1,+> <ip2,+> <ip3,+> <ip4,-‐>
2
Tier 1
E K
A
...
Spam Haven
Labeled IP’s from spam assassin, IDS logs, etc.
Evil is constantly on the move
Our Goal: Characterize regions changing from
bad to good (Δ-‐good) or good to bad (Δ-‐bad)
![Page 3: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/3.jpg)
Research Ques,ons
Given a sequence of labeled IP’s 1. Can we identify the speciYic regions on the Internet that have changed in malice?
2. Are there regions on the Internet that change their malicious activity more frequently than others?
3
![Page 4: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/4.jpg)
4
Spam Haven
Tier 1
Tier 1
Tier 2
D
X
Tier 2
B C
K
Per-‐IP often not interesting
A
... DSL CORP
A
X
Challenges
1. Infer the right granularity
E
Previous work: Fixed granularity
Per-‐IP Granularity
(e.g., Spamcop)
![Page 5: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/5.jpg)
5
Spam Haven
Tier 1
Tier 1
Tier 2
D E
Tier 2
B C
W
A
... DSL CORP
A
BGP granularity
(e.g., Network-‐Aware clusters [KW’00])
Challenges
1. Infer the right granularity
X X
Previous work: Fixed granularity
![Page 6: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/6.jpg)
B C
6
Spam Haven
Tier 1
Tier 1
Tier 2
D
X
Tier 2
B C
K
Coarse granularity
A
... DSL CORP
A
K
Challenges
1. Infer the right granularity
E E CORP
Our work: Infer granularity
Medium granularity
Well-‐managed network: Yine granularity
![Page 7: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/7.jpg)
7
Spam Haven
Tier 1
Tier 1
Tier 2
D E
Tier 2
B C
W
A
... DSL SMTP
Challenges
1. Infer the right granularity
2. We need online algorithms
A
Yixed-‐memory device high-‐speed link
X
![Page 8: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/8.jpg)
Research Ques,ons
Given a sequence of labeled IP’s
1. Can we identify the speciYic regions on the Internet that have changed in malice?
2. Are there regions on the Internet that change their malicious activity more frequently than others?
8
Δ-‐Change
Δ-‐Motion
We Present
![Page 9: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/9.jpg)
Background 1. IP PreYix trees 2. TrackIPTree Algorithm
9
![Page 10: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/10.jpg)
10
Spam Haven
Tier 1
Tier 1
Tier 2
D E
Tier 2
B C
W
A
... DSL CORP
A
X
1.2.3.4/32
8.1.0.0/24
IP Pre1ixes: i/d denotes all IP addresses i covered by Yirst d bits
Ex: 8.1.0.0-‐8.1.255.255
Ex: 1 host (all bits)
![Page 11: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/11.jpg)
11
One Host
Whole Net 0.0.0.0/0
0.0.0.0/1 128.0.0.0/1
128.0.0.0/2 192.0.0.0/2 0.0.0.0/2 64.0.0.0.0/2
An IP pre1ix tree is formed by masking each bit of an IP address. 0.0.0.0/32 0.0.0.1/32
0.0.0.0/31
128.0.0.0/3 160.0.0.0/3
128.0.0.0/4 152.0.0.0/4
![Page 12: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/12.jpg)
12
0.0.0.0/0
0.0.0.0/1 128.0.0.0/1
0.0.0.0/2 64.0.0.0.0/2 128.0.0.0/2 192.0.0.0/2
A k-IPTree Classi1ier [VBSSS’09] is an IP tree with at most
k-‐leaves, each leaf labeled with good (“+) or bad (“-‐”).
128.0.0.0/3 160.0.0.0/3
128.0.0.0/4 152.0.0.0/4
0.0.0.0/32 0.0.0.1/32
0.0.0.0/31
6-‐IPTree + -
+ -
+ +
Ex: 64.1.1.1 is bad
Ex: 1.1.1.1 is good
![Page 13: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/13.jpg)
13
/1
/16
/17
/18 + -
- In: stream of labeled IPs
... <ip4,+> <ip3,+> <ip2,+> <ip1,-‐>
TrackIPTree Algorithm [VBSSS’09]
Out: k-‐IPTree
TrackIPTree
![Page 14: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/14.jpg)
Δ-‐Change Algorithm 1. Approach 2. What doesn’t work 3. Intuition 4. Our algorithm
14
![Page 15: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/15.jpg)
15
Goal: identify online the speciYic regions on the Internet that have changed in malice.
/0
/1
/16
/17
/18
T1 for epoch 1
+ -
+
/0
/1
/16
/17
/18
T2 for epoch 2
+ +
-
Δ-‐Bad: A change from good to bad
Δ-‐Good: A change from bad to good
Δ-‐Good: A change from bad to good
Epoch 1 IP stream s1 Epoch 2 IP stream s2 ....
![Page 16: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/16.jpg)
16
Goal: identify online the speciYic regions on the Internet that have changed in malice.
/0
/1
/16
/17
/18
T1 for epoch 1
+ -
+
/0
/1
/16
/17
/18
T2 for epoch 2
+ +
-
False positive: Misreporting that a change occurred
False Negative: Missing a real change
![Page 17: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/17.jpg)
17
Goal: identify online the speciYic regions on the Internet that have changed in malice.
Idea: divide time into epochs and diff • Use TrackIPTree on labeled IP stream s1 to learn T1 • Use TrackIPTree on labeled IP stream s2 to learn T2 • Diff T1 and T2 to Yind Δ-‐Good and Δ-‐Bad
/0
/1
/16
/17
/18
T1 for epoch 1
+ -
-
/0
/1
/16
T2 for epoch 2
-
Different Granularities! ✗
![Page 18: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/18.jpg)
18
Goal: identify online the speciYic regions on the Internet that have changed in malice.
Δ-‐Change Algorithm Main Idea: Use classi1ication errors between Ti-‐1 and Ti
to infer Δ-‐Good and Δ-‐Bad
![Page 19: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/19.jpg)
19
Ti-‐2 Ti-‐1 Ti TrackIPTree TrackIPTree
Si-‐1 Si
Fixed
Ann. with class. error
Si
Told,i
Ann. with class. error
Si-‐1
Told,i-‐1
compare (weighted) classi1ication
error (note both based on same tree)
Δ-‐Good and Δ-‐Bad
Δ-‐Change Algorithm
![Page 20: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/20.jpg)
Comparing (Weighted) Classifica,on Error
20
/16 IPs: 200 Acc: 40%
IPs: 150 Acc: 90%
IPs: 110 Acc: 95%
Told,i-‐1
IPs: 40 Acc: 80%
IPs: 50 Acc: 30%
/16 IPs: 170 Acc: 13%
IPs: 100 Acc: 10%
IPs: 80 Acc: 5%
Told,i
IPs: 20 Acc: 20%
IPs: 70 Acc: 20%
Δ-‐Change Somewhere
![Page 21: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/21.jpg)
Comparing (Weighted) Classifica,on Error
21
/16 IPs: 200 Acc: 40%
IPs: 150 Acc: 90%
IPs: 110 Acc: 95%
Told,i-‐1
IPs: 40 Acc: 80%
IPs: 50 Acc: 30%
/16 IPs: 170 Acc: 13%
IPs: 100 Acc: 10%
IPs: 80 Acc: 5%
Told,i
IPs: 20 Acc: 20%
IPs: 70 Acc: 20%
InsufYicient Change
![Page 22: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/22.jpg)
Comparing (Weighted) Classifica,on Error
22
/16 IPs: 200 Acc: 40%
IPs: 150 Acc: 90%
IPs: 110 Acc: 95%
Told,i-‐1
IPs: 40 Acc: 80%
IPs: 50 Acc: 30%
/16 IPs: 170 Acc: 13%
IPs: 100 Acc: 10%
IPs: 80 Acc: 5%
Told,i
IPs: 20 Acc: 20%
IPs: 70 Acc: 20%
InsufYicient Traf1ic
![Page 23: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/23.jpg)
Comparing (Weighted) Classifica,on Error
23
/16 IPs: 200 Acc: 40%
IPs: 150 Acc: 90%
IPs: 110 Acc: 95%
Told,i-‐1
IPs: 40 Acc: 80%
IPs: 50 Acc: 30%
/16 IPs: 170 Acc: 13%
IPs: 100 Acc: 10%
IPs: 80 Acc: 5%
Told,i
IPs: 20 Acc: 20%
IPs: 70 Acc: 20%
Δ-‐Change Localized
![Page 24: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/24.jpg)
Evalua,on 1. What are the performance characteristics? 2. Are we better than previous work? 3. Do we Yind cool things?
24
![Page 25: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/25.jpg)
25
Performance
In our experiments, we : – let k=100,000 (k-‐IPTree size) – processed 30-‐35 million IPs (on day’s trafYic) – using a 2.4 Ghz Processor
IdentiYied Δ-‐Good and Δ-‐Bad in <22 min using <3MB memory
![Page 26: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/26.jpg)
26
changes. Our experiments were run on a on a 2.4GHzSparc64-VI core. Our current (unoptimized) implementa-tion takes 20-22 minutes to process a day’s trace (around30-35 million IP addresses) and requires less than 2-3 MBof memory storage.We note that the ground truth in our data provides labels
for the individual IP addresses, but does not tell us the pre-fixes that have changed. Thus, our ground truth allows us toconfirm that the learned IPTree has high accuracy, but wecannot directly measure false positive rate and false nega-tive rate of the change-detection algorithms. Thus, our ex-perimental results instead demonstrate that our algorithmcan find small changes in prefix behaviour very early on realdata, and can do so substantially better than competing ap-proaches. Our operators were previously unaware of mostof these!-change prefixes, and as a consequence, our sum-marization makes it easy for operators to both note changesin behaviour of specific entities, as well as observe trends inmalicious activity. 7
4.1 Comparisons with Alternate ApproachesWe first compare !-Change with previous approaches
and direct extensions to previous work. We compare twodifferent possible alternate approaches with !-Change: (1)using a fixed set of network-based prefixes (i.e., network-aware clusters, see Sec. 2.2) instead of a customized IP-Tree, (2) directly differencing the IPTrees instead of using!-Change. We focus here on only spam data for space rea-sons.Network-aware Clusters. As we described in Sec-tion 3.2, our change-detection approach has no false pos-itives – every change we find will indeed be a change inthe input data stream. Thus, we only need to demonstratethat !-Change finds substantially more !-changes thannetwork-aware clusters (i.e., has a lower false negative rate),and therefore, is superior at summarizing changes in mali-cious activity to the appropriate prefixes for operator atten-tion.We follow the methodology of [29] for labeling the
prefixes of the network-aware clusters optimally (i.e., wechoose the labeling that minimizes errors), so that we cantest the best possible performance of network-aware clus-ters against !-Change. We do this allowing the network-aware clusters multiple passes over the IP addresses (eventhough!-Change is allowed only a single pass), as detailedin [29]. We then use these clusters in place of the learnedIPTree in our change-detection algorithms.We first compare !-change prefixes identified by the
network-aware clustering and !-Change. This compari-son cannot be directly on the prefixes output by the two ap-
7As discussed in Section 1, our evaluation focuses exclusively onchanges in prefix behaviour, since prior work [28, 29] already finds per-sistent malicious behaviour.
15 20 25 30 350
20
40
60
80
100
120
Interval in Days
No.
of !!c
hang
e Pr
efix
es
!!ChangeNetwork!aware
(a) !-change Prefixes
15 20 25 30 35103
104
105
106
Interval in Days
IPs i
n !!c
hang
e Pr
efix
es
!!ChangeNetwork!aware
(b) IPs in!-change prefixes
Figure 11. Comparing !-Change algorithm withnetwork-aware clusters on the spam data: !-Changealways finds more prefixes and covers more IPs
proaches, as slightly different prefixes may reflect the sameunderlying change in the data stream, e.g., network-awareclusters might identify a /24 while !-Change identifies a/25. In order to account for such differences, we grouptogether prefixes into distinct subtrees, and match a groupfrom the network-aware clustering to the appropriate groupfrom !-Change if at least 50% of the volume of changedIPs in network-aware clustering was accounted for in !-Change. In our results, network-aware clustering identifiedno!-change prefixes that were not identified by!-Change;otherwise, we would have do the reverse matching as well.Furthermore, this is what allows us to compare the num-ber of !-changes that were identified by both algorithms,otherwise we would not be able to make this comparison.Fig. 11(a) shows the results of our comparison for 37
days. Network-aware clustering typically finds only a smallfraction of the!-change prefixes discovered by!-Change,ranging from 10% ! 50%. On average, !-Change findsover 2.5 times as many!-change prefixes as network-awareclusters. We compare also the number of IPs in !-changeprefixes identified by the network-aware clustering and !-Change in Fig. 11(b). The !-change prefixes discoveredby !-Change typically account for a factor of 3-5" IP ad-dresses as those discovered by the network-aware cluster-ing. It indicates that network-aware clustering does not dis-cover many changes that involve a substantial volume of theinput data. On many days, especially on days with changes,the fraction of IP addresses not identified by network-awareclusters, however, is still smaller than the fraction of pre-fixes that it does not identify. This indicates that network-aware clustering identifies the larger, coarser changes, butmisses the fine-grained changes.Network-aware clusters perform so poorly because the
prefix granularity required to identify !-changes typicallydoes not appear at all in routing tables. Indeed, as our anal-ysis in Section 4.2 shows, a large number of!-change pre-fixes come from hosting providers, many of which do noteven appear in BGP prefix tables.
Possible Differencing of IPTrees. We now show thatthe possible differencing approach described in Section 2.2
2.5x as many changes on average!
How do we compare to network-‐aware clusters? (By PreYix)
![Page 27: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/27.jpg)
Spam
27
10 15 20 25 30 350
0.2
0.4
0.6
0.8
1
Frac
tion
of P
refix
es
Prefix Length
(a) Sizes of!-change Prefixes
15 20 25 30 350
5
10
15
20
Day
!!g
ood
pref
ixes
!!ChangeNetwork!Aware
(b) Case Study 2: Drop in Internet-wide Spam-ming Activity
20 30 40 50 60 700
50
100
!!c
hang
e pr
efix
es
Day
A
B
(c) Case Study 3: New Botnet Activity
Figure 15. (a) shows sizes of !-change prefixes. (b) shows Case Study 2: !-good prefixes with drop in spamming activityduring the Grum takedown (arrow indicates when the takedown started). There is a sharp increase in !-good prefixesafter the takedown. (c) shows Case Study 3: !-change Prefixes in New Botnet Activity. A and B mark the !-bad prefixesdiscovered when over 22,000-36,000 new bot IPs appeared in the feed.
of the changing malicious activity, since there is likely to bea diversity of malicious activity when new threats emerge.
Summary. Table 1(b) (Fig. 12) summarizes the differ-ent prefixes for ! = 0.05%, 0.01%, categorized by the typeof change they have undergone. As in Section 4.2, theprefixes discovered increases sharply when ! is increased.However, note that in this experiment, there are very signif-icant numbers of!-good prefixes discovered as well – over56% of all the prefixes discovered are !-good, unlike thespam data. This is primarily because the active IP addressspace changes very little, while bot IP addresses appear inthe feed for much shorter durations (e.g., this may be as botsget cleaned, or bot signatures get outdated). A former botIP would then generate mostly legitimate traffic (its mali-cious traffic would drop, but its legitimate activity remainsthe same, and so it would get labelled as legitimate), and thecorresponding IP regions thus become!-good.
Case Study 3: New Botnet Activity. Our case studyillustrates the value of discovering !-bad prefixes internalto a large ISP’s prefix blocks. Figure 15(c) shows the time-series of the!-change prefixes discovered over two monthsof our data set. The highlighted days (A and B) mark twosharp increases in the number of!-change prefixes discov-ered. These correspond to days with dramatic increases inthe number of new bot IPs seen in the data feed – 22.1 &28.6 thousand at the two days marked as A and 36.8 thou-sand at B Further analysis showed that on days marked A,nearly all of of these new bot IPs are from the DNSChangerbotnet [8], and are responsible for 19 & 31 !-bad pre-fixes. On day B, these new bot IPs are from Sality [25]and Conficker [6], and 66!-bad prefixes correspond to thenew IPs from Sality and Conficker. By contrast, network-aware clusters were only able to discover 5-12 prefix blocksas !-bad during these events. These !-bad prefixes comefrom smaller regional ISPs, the tier-1 ISP’s dial-up and DSLblocks; most of these prefixes had little to botnet activity(as identified by the vendor) earlier. Thus, in these two in-stances,!-Change effectively reduces the workload for op-
0 0.2 0.4 0.6 0.8 10.7
0.75
0.8
0.85
0.9
0.95
1
False positive rate
True
pos
itive
rat
e
TreeMotion
Figure 16. ROC curve for !-Motion’s accuracy
erator from manually investigating over 22,000-36,000 newbot IPs to investigating 19-66 new IP prefixes, a drop of twoorders of magnitude.
4.4 Structural Analysis of IP DynamicsOur earlier results demonstrate that there is constant
change in the Internet’s malicious activity. We now explorethe structure underlying these changes with the !-Motionalgorithm, focusing our analysis on spam dataset due tospace. We use a snapshot of the change-IPtreeW generatedby!-Motion, 60 days into the dataset;W ’s high predictiveaccuracy indicates it can distinguish frequently-changingregions well, as shown by the ROC curve in Fig. 16. Weuse W to classify every IP in our data set as ”change” or”non-change”, and then aggregate the IPs by country andowning company. We define freq-ratio to be the fraction ofthe total IPs of that entity that are marked as change IPs,and analyze the freq-ratio of different aggregations.Table 3 (Fig. 17) shows a breakdown for the origin of the
frequently changing IPs. Together, these countries accountfor 90% of the data seen at our mail servers. We note thatcountries like China, Korea, Russia [23], which are knownto harbor lot of spammers actually change very infrequently,while countries like US and Canada change 3-4 times morefrequently. This makes sense, as countries where ISPs ag-gressively fight spammer infestations are likely to experi-ence a more frequent change in malicious activity. Table4 shows a breakdown by ISP type. Once again, hostingproviders have a substantially higher ratio than the other cat-
Grum botnet takedown
![Page 28: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/28.jpg)
28
10 15 20 25 30 350
0.2
0.4
0.6
0.8
1
Frac
tion
of P
refix
es
Prefix Length
(a) Sizes of!-change Prefixes
15 20 25 30 350
5
10
15
20
Day
!!g
ood
pref
ixes
!!ChangeNetwork!Aware
(b) Case Study 2: Drop in Internet-wide Spam-ming Activity
20 30 40 50 60 700
50
100!!c
hang
e pr
efix
es
Day
A
B
(c) Case Study 3: New Botnet Activity
Figure 15. (a) shows sizes of !-change prefixes. (b) shows Case Study 2: !-good prefixes with drop in spamming activityduring the Grum takedown (arrow indicates when the takedown started). There is a sharp increase in !-good prefixesafter the takedown. (c) shows Case Study 3: !-change Prefixes in New Botnet Activity. A and B mark the !-bad prefixesdiscovered when over 22,000-36,000 new bot IPs appeared in the feed.
of the changing malicious activity, since there is likely to bea diversity of malicious activity when new threats emerge.
Summary. Table 1(b) (Fig. 12) summarizes the differ-ent prefixes for ! = 0.05%, 0.01%, categorized by the typeof change they have undergone. As in Section 4.2, theprefixes discovered increases sharply when ! is increased.However, note that in this experiment, there are very signif-icant numbers of!-good prefixes discovered as well – over56% of all the prefixes discovered are !-good, unlike thespam data. This is primarily because the active IP addressspace changes very little, while bot IP addresses appear inthe feed for much shorter durations (e.g., this may be as botsget cleaned, or bot signatures get outdated). A former botIP would then generate mostly legitimate traffic (its mali-cious traffic would drop, but its legitimate activity remainsthe same, and so it would get labelled as legitimate), and thecorresponding IP regions thus become!-good.
Case Study 3: New Botnet Activity. Our case studyillustrates the value of discovering !-bad prefixes internalto a large ISP’s prefix blocks. Figure 15(c) shows the time-series of the!-change prefixes discovered over two monthsof our data set. The highlighted days (A and B) mark twosharp increases in the number of!-change prefixes discov-ered. These correspond to days with dramatic increases inthe number of new bot IPs seen in the data feed – 22.1 &28.6 thousand at the two days marked as A and 36.8 thou-sand at B Further analysis showed that on days marked A,nearly all of of these new bot IPs are from the DNSChangerbotnet [8], and are responsible for 19 & 31 !-bad pre-fixes. On day B, these new bot IPs are from Sality [25]and Conficker [6], and 66!-bad prefixes correspond to thenew IPs from Sality and Conficker. By contrast, network-aware clusters were only able to discover 5-12 prefix blocksas !-bad during these events. These !-bad prefixes comefrom smaller regional ISPs, the tier-1 ISP’s dial-up and DSLblocks; most of these prefixes had little to botnet activity(as identified by the vendor) earlier. Thus, in these two in-stances,!-Change effectively reduces the workload for op-
0 0.2 0.4 0.6 0.8 10.7
0.75
0.8
0.85
0.9
0.95
1
False positive rate
True
pos
itive
rat
e
TreeMotion
Figure 16. ROC curve for !-Motion’s accuracy
erator from manually investigating over 22,000-36,000 newbot IPs to investigating 19-66 new IP prefixes, a drop of twoorders of magnitude.
4.4 Structural Analysis of IP DynamicsOur earlier results demonstrate that there is constant
change in the Internet’s malicious activity. We now explorethe structure underlying these changes with the !-Motionalgorithm, focusing our analysis on spam dataset due tospace. We use a snapshot of the change-IPtreeW generatedby!-Motion, 60 days into the dataset;W ’s high predictiveaccuracy indicates it can distinguish frequently-changingregions well, as shown by the ROC curve in Fig. 16. Weuse W to classify every IP in our data set as ”change” or”non-change”, and then aggregate the IPs by country andowning company. We define freq-ratio to be the fraction ofthe total IPs of that entity that are marked as change IPs,and analyze the freq-ratio of different aggregations.Table 3 (Fig. 17) shows a breakdown for the origin of the
frequently changing IPs. Together, these countries accountfor 90% of the data seen at our mail servers. We note thatcountries like China, Korea, Russia [23], which are knownto harbor lot of spammers actually change very infrequently,while countries like US and Canada change 3-4 times morefrequently. This makes sense, as countries where ISPs ag-gressively fight spammer infestations are likely to experi-ence a more frequent change in malicious activity. Table4 shows a breakdown by ISP type. Once again, hostingproviders have a substantially higher ratio than the other cat-
Botnets 22.1 and 28.6 thousand new
DNSChanger bots appeared
38.6 thousand new ConYicker and Sality bots
![Page 29: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/29.jpg)
Caveats and Future Work
“For any distribution on which an ML algorithm works well, there is another on which is works poorly.” – The “No Free Lunch” Theorem
29
Our algorithm is efYicient and works well in practice.
....but a very powerful adversary could fool it into having many false negatives. A formal characterization is future work. !
![Page 30: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/30.jpg)
Conclusion
Δ-‐Change and Δ-‐Motion: two new online algorithms for capturing how malice evolves on the internet – Scalable – Discovers right IP granularity – Finds cool changes
30
![Page 31: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/31.jpg)
31
Questions?
![Page 32: AutomacallyInferringtheEvoluonof MaliciousAc ... · 4.1 Comparisons with Alternate Approaches We first compare ∆-Change with previous approaches and direct extensions to previous](https://reader033.fdocuments.us/reader033/viewer/2022050312/5f742316e4582458a40e2ed7/html5/thumbnails/32.jpg)
END