Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a...

32
A thesis submitted in partial satisfaction of the requirements for the degree of Master of Computer Science and Engineering in the Graduate School of the University of Aizu Dynamics of Malicious Software on the Internet by m5081224, Tatehiro Kaiwa Febrary 2006

Transcript of Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a...

Page 1: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

A thesis submitted in partial satisfaction of the

requirements for the degree of

Master of Computer Science and Engineering

in the Graduate School of

the University of Aizu

Dynamics of Malicious Software

on the Internet

by

m5081224, Tatehiro Kaiwa

Febrary 2006

Page 2: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

The thesis titled

Dynamics of Malicious Softwareon the Internet

by

m5081224, Tatehiro Kaiwa

is reviewed and approved by:

Main referee

Professor Date

Atsushi Kara

Associate Professor Date

Hiroshi Toyoizumi

Professor Date

Toshiaki Miyazaki

The University of Aizu

Febrary 2006

Page 3: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

Dedicated to professor Kara and Toyoizumi

who makes me write such a document.

iii

Page 4: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

Contents

Chapter 1 Introduction 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Scale-free Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Chapter 2 Observation of Wild Worm 32.1 Activity of Worm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Arrival Intervals of Worm . . . . . . . . . . . . . . . . . . . . . . . . 3

Chapter 3 Simulation 63.1 Network Topology in Simulation . . . . . . . . . . . . . . . . . . . . 63.2 Simulation Model of Worm Spread Dynamics . . . . . . . . . . . . . 93.3 Fitting Model to Observation . . . . . . . . . . . . . . . . . . . . . . 103.4 Local Network Structure Inference . . . . . . . . . . . . . . . . . . . 10

Chapter 4 Mathematical Model of Outbreak 15

Chapter 5 Hub Defense Strategy 17

Chapter 6 Conclusion 20

References 21

iv

Page 5: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

List of Figures

Figure 1.1 Example of power-law degree distribution. . . . . . . . . . . . 2

Figure 1.2 Degree distribution of the e-mail network [1]. . . . . . . . . . 2

Figure 2.1 Observational image. Each worm infecting PC sends an e-mailmessage with its own copy attached to other addresses independently.We can observe only the e-mails reaching our mail server. . . . . . . . 4

Figure 2.2 Arrival intervals of e-mail with some specific worms attachedobserved on the mail server in University of Aizu. Time 0 means thetime on which the first e-mail attached each worms arrived at Univer-sity of Aizu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Figure 3.1 This is GUI of BRITE. We can change some parameters whenwe create something. For example, Node Placement, Growth Type, thenumber of nodes and so on. . . . . . . . . . . . . . . . . . . . . . . . 6

Figure 3.2 Example of a Scale-free Network. The number of nodes is 30and the number of links per newly added node is 2. There are a few hubnodes and the almost other nodes only have a few link. Some colorednodes are hub nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Figure 3.3 These figures show a relationship between a distribution of thenumber of links of each node and the number of nodes. The numberof nodes is 10000 and the number of links per newly added node is 20. 8

Figure 3.4 Image of a community sending e-mail messages with a wormattached. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Figure 3.5 Result of simulation compared with observed data of Netsky.P.Time 0 means the time on which a first attack arrives at each observingpoint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Figure 3.6 Figure of arrival intervals. An infected node sends about fivee-mail messages per a second,

�is 5. The number of neighbors of

this observation node is 20. The mean of links the neighbors have is1156.19. The figure set time 0 as the start of simulation. . . . . . . . . 11

Figure 3.7 Figure of arrival intervals. An infected node send five e-mailmessages per a second,

�is 5. The number of neighbors of this obser-

vation node is 20. The mean of links the neighbors have is 619.5. Thefigure set time 0 as the start of simulation. . . . . . . . . . . . . . . . 12

v

Page 6: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

Figure 3.8 Figure of arrival intervals. An infected node send five e-mailmessages per a second,

�is 5. The number of neighbors of this obser-

vation node is 20. The mean of links the neighbors have is 645.5. Thefigure set time 0 as the start of simulation. . . . . . . . . . . . . . . . 12

Figure 3.9 Figure of the number of infected nodes. The click probability �

is 0.01. After 40 hours this simulation started, outbreak is occurred. . 13Figure 3.10 The graph shows the number of infected nodes with time. The

� is 0.1 and 0.01. All nodes except an initial infected node is liable toinfect and the number of total nodes is 10000. . . . . . . . . . . . . . 14

Figure 5.1 Effect of changing the number of immune hub nodes with pa-rameter � =0.01. Each h is 0, 500, 1000, 3000 and 5000. . . . . . . . . 18

Figure 5.2 Comparison between a case of that we make some nodes se-lected randomly immune and some case of that we make hub nodesimmune. A parameter � is 3000. . . . . . . . . . . . . . . . . . . . . 19

vi

Page 7: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

List of Tables

Table 2.1 Information of some worms. . . . . . . . . . . . . . . . . . . . 3

vii

Page 8: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

Acknowledgment

I wish to thank Dr. Kara for his support and encouragement. I wish to express myappreciation to Dr. Toyoizumi for his excellent advice and diligent efforts to guide methrough this project.

viii

Page 9: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

Abstract

In this paper, we present a way to build a model how computer worms propagatein mail network, and then propose an effective defense method against a mass-mailingworm. In mail network, two nodes are connected when e-mails are exchanged betweenthem. As some papers pointed out already, mail network can be successfully regardedas a Scale-free Network. The mass-mailing worm spread can be regarded as the propa-gation in such Scale-free network. The From-field of an e-mail messages a worm sendsoften varies and/or is spoofed. The log data of e-mails reveal neither the mail networkstructure nor the worm dynamics. We use arrival intervals of worms to analyze themail network structure and the worm dynamics. Also, we show that we can effectivelystops the worm outbreak by regulating worm activity on hubs on the mail network.

Page 10: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses
Page 11: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

Chapter 1

Introduction

1.1 Introduction

The most popular way to spread malicious software programs is commonly referredto as computer worms like Blaster [2] and Netsky [3]. There are some methods that aworm uses for its propagation. One of the popular methods is to send e-mail messageswith its own copy attached. The e-mail has some trap which, when activated, executesthe copy of malicious software. For example, Swen [4] sends an e-mail message whosecontents claim to be patches for Microsoft Internet Explore, or delivery failure noticesfrom qmail. Although a lot of new viruses appear in few days, most of them are notnuisance because there are effective filters quickly available. However it is expectedthat in future the worms will be more virulent and, thus, result in significantly greaterdamage.

Currently, there are some paper [5, 6, 7] analyzing spread of malicious mobile code.Then, there are a lot of anti-virus software that have various effective countermeasuresagainst various computer viruses and worms. The popular method is to use signatures.The system achieves almost 100% detection rate if the system has already the signaturefor the target. However, the system has some weak points. The system cannot detecta new virus or worm. The system also needs a large memory for signatures. Further,The system needs the time to detect the viruses and worms.

The purpose of this research is to analyze and estimate how a computer virus or wormpropagates in a mail network and to analyze and estimate an effective method againsta mass-mailing worm. To estimate that, we focus attention on the arrival intervals ofe-mail message with a specific computer worm attached. Because the From-field of ane-mail message a worm sends often varies and/or is spoofed, it is only arrival intervalsthat we can obtain from received e-mails. As the data of arrival time of many e-mailmessages with each computer worms attached are observed at the mail server in Uni-versity of Aizu.

Also, we have to consider a structure of the e-mail network in the Internet to researcha propagation of a worm. In mail network, two nodes are connected when e-mails areexchanged between them. As some papers [8, 9] pointed out already, mail network canbe successfully regarded as a Scale-free [8, 9] Network. Figure 1.2 shows degree dis-tribution of the e-mail constracted from log files of the e-mail server at Kiel University.Thus, we regard the structure of mail network as Scale-free.

1

Page 12: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

1.2 Scale-free Network

Scale-free Networks are networks with power-law degree distribution (See Figure1.1). Recently, some papers prove what various networks such as the Internet, propa-gation of an epidemic and a genome have Scale-free Network structure. In Scale-freeNetwork, some nodes called hub nodes exhibit extremely high connectivity and almostall other nodes exhibit low connectivity. In this paper, we define Scale-free Network asa network with power-law degree distribution and a network containing hub nodes. Asthere are a small number of hub nodes, a Scale-free Network has generally two charac-teristics. The first characteristic is that a Scale-free Network has tolerance to randomattack to most nodes but it has weakness to attack to the hub nodes. The second char-acteristic is a Scale-free Network has also a characteristic of Small-world [10].

Network grows daily. When a new node belongs to a network, the node does notconnect to another randomly selected one. In real networks linking is never random.A subtle law of preferential attachment governs evolution of a network. Scale-freeNetwork is such a network.

������������ ����������

� ��� ������� �

Figure 1.1: Example of power-law degree distribution.

Figure 1.2: Degree distribution of the e-mail network [1].

2

Page 13: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

Chapter 2

Observation of Wild Worm

2.1 Activity of Worm

Most mass-mailing worms work as follows. A mass-mailing worm is executed atonce, some e-mail messages with its own copy per unit of time. An amount of timeelapses before the recipient reads the e-mail message. After that, he executes mailer orsomething to read some e-mail messages, in which the e-mail message with attachmentfile of the worm is contained. At this time, the mass-mailing worm program runs if heexecutes the attachment file. Then, the worm will use the recipient’s address book,inbox and/or other e-mail addresses saved in recipient’s HDD to spread its own copiesto other e-mail addresses. In addition to that, there are some worms which will infectother malicious programs, which are Trojan horse, back door, and so on.

2.2 Arrival Intervals of Worm

Because the From-field of an e-mail message a worm sends often varies and/or isspoofed, it is difficult to identify the PC a mass-mailing worm has infected and a correctdata we can obtain from received e-mails is arrival intervals. To obtain arrival intervalsof e-mail messages with a worm attached, we use a data observed on a mail server inUniversity of Aizu. The data contains what kind of worm an e-mail message has andwhen the e-mail message arrives. When we can observe arrival of one or more e-mailmessages with a worm attached, there are one or more infected PCs having an e-mailaddress of a student of University of Aizu. (See Figure 2.1) We can observe arrivaltime of all e-mail messages with a known worm attached like Figure 2.1.

Virus Name discovered (Symantec) First arrival observed at u-aizu

Netsky.P Mar. 21, 2004 Mar. 23, 2004 - 01:37:20Swen.A Sep. 18, 2003 Sep. 18, 2003 - 23:07:31

Table 2.1: Information of some worms.

Figure 2.2 shows arrival intervals of e-mail messages with each worm attached. Weobserve them on the mail server in University of Aizu. Table 2.1 shows the date on

3

Page 14: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

���������������� �

���������������� �

���������������� �

�����������������! #"�!

$%&'()

$

%

&

'(

)

*+�,��! #"��.-�����0/�213 � 3�4"�5���5��67�

1+������+8���91+�: ;�,�,

Figure 2.1: Observational image. Each worm infecting PC sends an e-mail messagewith its own copy attached to other addresses independently. We can observe only thee-mails reaching our mail server.

20 40 60 80hour

2000

4000

6000

8000

10000

12000

14000

16000

seconds

Netsky.P

Swen.A

Figure 2.2: Arrival intervals of e-mail with some specific worms attached observed onthe mail server in University of Aizu. Time 0 means the time on which the first e-mailattached each worms arrived at University of Aizu.

4

Page 15: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

which each worm is discovered by Symantec corp. and first arrival time observed onthe mail server in University of Aizu.

5

Page 16: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

Chapter 3

Simulation

3.1 Network Topology in Simulation

In order to simulate the dynamics underlying mass-mailing worm’s propagation, weneed to create an e-mail address network. There are some papers [1, 10, 11] discussedan e-mail network or social network, we assume that an e-mail network are a Scale-Free network. We use a BRITE tool [12] to create a Scale-free Network which we usein this simulation. Figure 3.1 is GUI of BRITE.

Figure 3.1: This is GUI of BRITE. We can change some parameters when we createsomething. For example, Node Placement, Growth Type, the number of nodes and soon.

6

Page 17: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

Now, we suppose that an e-mail address network obeys a Scale-free Network, andwe employ Barabasi-Albert (BA) model [8] to create a Scale-free Network. Creatinga Scale-free Network with BRITE, we obtain that a network obeys a power-law withexponent -3. It is too hard to assess whether that exponent value is similar to a networka virus or worm use actually. However, being the number of link distributions ofcomputer and social networks lie in the range of -2.0 to -3.4 [13], the exponent value is-3 we assume. This Figure 3.2 is an example of a Scale-free Network with BA model.

Then, the Figures 3.3 show a relationship between a degree of links of each node and

Figure 3.2: Example of a Scale-free Network. The number of nodes is 30 and thenumber of links per newly added node is 2. There are a few hub nodes and the almostother nodes only have a few link. Some colored nodes are hub nodes.

the number of nodes. We can see that the relationship obeys power-law because thisdouble-logarithmic plot is similar to liner. The number of nodes exhibits a power-lawwith exponent -3.

7

Page 18: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

100 200 300 400 500 600 700the num of links

200

400

600

800

1000

thenumofnodes

100 150 200 300 500 700

the num of links

1

5

10

50

100

500

1000

thenumofnodes

Figure 3.3: These figures show a relationship between a distribution of the number oflinks of each node and the number of nodes. The number of nodes is 10000 and thenumber of links per newly added node is 20.

8

Page 19: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

3.2 Simulation Model of Worm Spread Dynamics

As discussed in Section 2.2, we assume that we can observe the arrival time. Thereis a community and each of PC has an e-mail address (See Figure 3.4). If two nodesshare address each other, the nodes are considered to have a link between them. Then,there is a student belonging to the community in University of Aizu. As the student be-longs to the community with the e-mail address of the university, all e-mail messagesexcluding e-mail messages with a worm attached to him is observed the mail server inthe university. We set the number of nodes in the community is 10000. We assume

���������������

������ �

��� ��� ��� ������� � �����������! �"���������!��"��� � ����#!� �$������%��&!����'��(&)�!*+&�#�#-,.� ��� ��& �!� �� � �/,�0��� ��� � &!����'��(&)�!*+&�#�#-,.� ��� 0

1!����!�/, � ������! 324�65!�

7!&)�!*98!�/,:�!�/,

Figure 3.4: Image of a community sending e-mail messages with a worm attached.

that the connectivity between the nodes in the community obeys power-law with BAmodel. We also need to introducing at least three statuses at each node, that we callsusceptible, infected and immune. Susceptible means that the node has a probabilitywith which the node is potentially infected by a virus or worm. Susceptible containswhen a user has not yet checked for new e-mails. Infected means that the user hasopened the attachment and the virus has successfully infected the PC sending copiesof itself to all of the neighboring nodes. In the simulation, we assume that a node, onceinfected, remains in this status forever. Immune means that the virus cannot infect thenode. Immune contains when the user protect own PC with using some ways such asan anti-virus software, a patch, or no open the attachment. Once immunized, the noderemains in this status forever. A node a worm infects at once sends some e-mail mes-sages attached its own copy to other e-mail addresses per unit of time. We assume thateach interval of e-mail messages which a worm sends follow an independent exponen-tial distribution with mean ;-< �

. We assume that a worm select a link in an infectednode randomly. After receiving the infected mail, the nodes which are liable to infectbe infected with a click probability � . This parameter � is a probability with which arecipient receiving an e-mail messages with a worm attached executes the worm pro-gram. The probability � has just an influence on a speed of propagation. If a worm

9

Page 20: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

infect all neighbors of an observer, � have no influence on any data of the observationnode. In this simulation, we assume that all nodes have a same � . We suppose thata susceptible node infected at once becomes an infected node and the node holds theown state ever. Then the node attacks the other susceptible nodes that connect the nodesuch as the other infected nodes. Because we may assume that an interval at which auser execute mailer program have large enough time than the transmission delay. Thus,we suppose that the transmission delay is negligible in this simulation.

3.3 Fitting Model to Observation

To fit our spread model of a worm to observed arrival intervals of real worms, wepay attention to the part where arrival intervals stop decreasing. It means that most ofneighbors of an observer are infected by that time. In our simulation correspondingly,most of neighbors of an observer have to be infected by the time. And we also haveto consider decrease of arrival intervals from first arrival of real worm. Because aparameter

�is an estimated value empirically, we cannot change

�. Thus, we need

to change � and the number of the e-mail addresses a node has of other community.Let � is the number of links per newly added node. Because all nodes have manye-mail addresses, it is wrong that � is lower value. Though we estimate the numberof the e-mail addresses more, we cannot obtain fitting model. On the other hands, asthere is a restriction least upper bound of the number of nodes, we cannot estimatethat � is higher value because a higher value � makes our network topology a meshtopology. From results of many simulations, we decide that’s parameters. Assumingthat the number of the e-mail each node has is ten times the number of links connectedto the community, that � is 20 and that � is 0.01, we obtain Figure 3.5. Figure 3.5shows the result of simulation compared with observed data of Netsky.P. We can seethat by adjusting our model we can make both the simulated dynamics and the real oneto be similar. About 10 hours, arrival intervals of observed data become larger thansimulation. This is because some infected neighbors stop to attack in the observation.

3.4 Local Network Structure Inference

We can estimate a network structure around an observer from observed data.Figure 3.6, Figure 3.7 and Figure 3.8 are results of Simulation. Each node has some

neighbors and the number of neighbors is similar.In Figure 3.6, we can see that there are some infected neighbors in early time thanothers. We can estimate that is occurred by some infected hub nodes connected to theobserver. On the other hand, we can see that the arrival intervals decrease slowly thanothers. The reason why that is that the number of infected neighbors increase moreslowly than others. If there are some infected neighbors when first attack arrives, de-crease of arrival intervals is more slowly.

Comparing Figure 3.7 with Figure 3.8, we can see that there is a difference of out-break time in same community.

Figure 3.7 shows arrival intervals of a node of which the number of neighbors are in-fected at short times is large. On the other hand, a case of a node of which the number

10

Page 21: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

10 20 30 40 50 60timeHhourL

1000

2000

3000

4000

5000

6000

7000

8000

ArrivalIntervals

HsecL

Simulation

Netsky.P

Figure 3.5: Result of simulation compared with observed data of Netsky.P. Time 0means the time on which a first attack arrives at each observing point.

0 20 40 60 80 100 120 140timeHhourL

1000

2000

3000

4000

5000

6000

7000

8000

ArrivalInterval

HsecL

2.5

5.

7.5

10.

12.5

15.

17.5

20.

thenumofInfectedNeighbors

Figure 3.6: Figure of arrival intervals. An infected node sends about five e-mail mes-sages per a second,

�is 5. The number of neighbors of this observation node is 20.

The mean of links the neighbors have is 1156.19. The figure set time 0 as the start ofsimulation.

11

Page 22: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

0 20 40 60 80 100 120 140timeHhourL

1000

2000

3000

4000

5000

6000

7000

8000

ArrivalInterval

HsecL

2.5

5.

7.5

10.

12.5

15.

17.5

20.

thenumofInfectedNeighbors

Figure 3.7: Figure of arrival intervals. An infected node send five e-mail messages pera second,

�is 5. The number of neighbors of this observation node is 20. The mean of

links the neighbors have is 619.5. The figure set time 0 as the start of simulation.

0 20 40 60 80 100 120 140timeHhourL

1000

2000

3000

4000

5000

6000

7000

8000

ArrivalInterval

HsecL

2.5

5.

7.5

10.

12.5

15.

17.5

20.

thenumofInfectedNeighbors

Figure 3.8: Figure of arrival intervals. An infected node send five e-mail messages pera second,

�is 5. The number of neighbors of this observation node is 20. The mean of

links the neighbors have is 645.5. The figure set time 0 as the start of simulation.

12

Page 23: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

of neighbors are infected increase slowly is Figure 3.8.

20 40 60 80 100 120 140timeHhourL

2000

4000

6000

8000

10000

thenumof

infection

nodes

Figure 3.9: Figure of the number of infected nodes. The click probability � is 0.01.After 40 hours this simulation started, outbreak is occurred.

From Figure 3.9, we can see an outbreak time approximately. We can see that the in-fected neighbors of each observation node increase after this time from Figure 3.6, 3.7and 3.8. Thus, we can see that a virus may also occur an outbreak in the Internet whenwe observe a decrease of arrival intervals. Therefore, we have to do a countermeasureagainst a virus or worm before observed arrival intervals start to decrease.

Figure 3.10 shows a relationship between the number of all infected nodes and time.We can see that the graph is similar to logistic curve [5] and that a part when a virusoccur outbreak. Moreover, we can see that a change of the parameter � makes theoutbreak time earlier or later.

13

Page 24: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

20 40 60 80 100 120 140timeHhourL

2000

4000

6000

8000

10000

the

num

of

infected

nodes

c=0.1 c=0.01

Figure 3.10: The graph shows the number of infected nodes with time. The � is 0.1and 0.01. All nodes except an initial infected node is liable to infect and the number oftotal nodes is 10000.

14

Page 25: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

Chapter 4

Mathematical Model of Outbreak

Now we will consider a mathematical model [14] of worm dynamics on scale-freenetwork. We will evaluate the total number of nodes a worm infects. Two nodes areconsidered to have a link when they share their email addresses. Then, we suppose thata computer virus or a worm propagates in the network the links of e-mail addressesgenerate. We assume that the number of links of each node is independently andidentically distributed. And let the number of one of any nodes’ links is

�, and let the

number of other links of a node connects a link chosen arbitrarily is���

. Because theprobability that those nodes with many links are chosen is high, the probability densityof���

is as follows. (using the probability density of�

):���������� � ����������� ������� � (4.1)

Especially, the expectation���������

is as follows:

���������� �� � "!�������#$�% �&�'��� �

(4.2)

�����)(*��������+� (4.3)

In a long-tailed distribution such as a power-law distribution, a relation between�&�'�����

and�������

is�&�'�����-,$�&�'���

. As a tail become more longer,�&�'�&���

also become morelarger. In case that a worm infects a node, we propose that other nodes will be infectedwith probability . when the node is connected to the first infected node. And then,we evaluate the total number of infected nodes when the propagation occurs a set ofrandom nodes, and let the size of this initial set be / . This problem is called thepercolation problem, and it is a well-known problem in physics. Now, we propose thatwe random choice a link in the network and assume that the propagation occurs fromthe link. Let the total outbreak size in that case is / � . We assume that the networksize is enough large and there is no infection loop. Then, we can show a relationshipbetween / and / � as follows:

/ ;10 2�3 54 / ��6 387 (4.4)

15

Page 26: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

where � is the number of infection links which the first chosen node has. / and / �are independently and identically distributed. On the other hand, when the number ofinfection links of a node which is connected to the first chosen link is � � , we obtain arecursive equation as follows:

/ ;10 2 ��� 4�3 54 / ��6 3 � (4.5)

Now, both � and � � are integers which is greater than zero. The expectation of/ ��6 3 ,�&� / ��� is less than infinity because the number of links a node has is finite. Then

the expectation of � and � � , �&� � � and�&� � ��� are both less than infinity because the

number of links is also not infinity. As / ��6 4 7 / ��6 ( 7������ 7 / ��6 2 be a sequence of � iidrandom variables distributed as random variable / � , we obtain:

����� 54 � �� ������ ����� � (4.6)

The equation (4.6) is Wald’s equation [15]. Evaluating the expectation of the equation(4.4) and (4.5) with using the equation (4.6), we obtain:��� / �� ; 0 ��� � � ��� / � � 7 (4.7)��� / ���� ; 0 � �&� � ����� ; ��� / � � � (4.8)

It follows that: �&� / �� ; 0 ��� � �� � �&� � � � (4.9)

; 0 . �������� � . ��������� � (4.10)

If we estimate . �&�'���� � . ��������� 7 (4.11)

we can obtain the outbreak size E[S]. On the other hand, from (4.10), we can seethat the outbreak size

�&� / � diverge when�&� � ��� . ��������� ������ �������� � � �

. Thus weneed to control

�&� � ��� to prevent the outbreak. Making��� � � � decrease, we need to

decrease the variance of � or we increase the mean of � . Since it is impossible toincrease

�&� � � , we have to decrease the variance of � when we consider an effectivecountermeasure.

16

Page 27: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

Chapter 5

Hub Defense Strategy

We consider that an effective method with the node degree distribution against a viruspropagation. A Scale-free topology is known more commonly as a topology have asusceptibility to attacked to some hub nodes. In addition to that, we have learned anapproach with mathematic in chapter 4. In this chapter, we simulate propagation ofa worm in follow condition. Now, we consider a community having 10000 e-mailaddresses. Let � is the number of immune hub nodes. We assume that an e-mailaddress corresponds with a node in the community and that each e-mail address hassome links connecting other addresses in the community and that the degree of thelinks obeys power-law with BA model. Then, the number of links per newly addednode is 20 and each node has 9 times links its own has. We assume that each infectednode sends 5 e-mails per a second to own neighbors. Figure 5.1 shows that an effectof when we set the number of immune hub nodes changing. In early time of this, wecan see that this defense method is effective even though the number of immune hubnodes is small.

17

Page 28: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

25 50 75 100 125 150timeHhourL

2000

4000

6000

8000

10000

thenumofinfectednodes

h=0

h=500

h=1000

h=3000

h=5000

5 10 15 20timeHhourL

5

10

15

20

25

thenumofinfectednodes h=0

h=500

h=1000

h=3000

h=5000

Figure 5.1: Effect of changing the number of immune hub nodes with parameter� =0.01. Each h is 0, 500, 1000, 3000 and 5000.

On the other hand, we simulate a case of that we make some nodes selected ran-domly immune. Let � is the number of the immune nodes. Figure 5.2 shows compari-son between a case of that we make some nodes selected randomly immune and somecase of that we make hub nodes immune.

18

Page 29: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

25 50 75 100 125 150timeHhourL

2000

4000

6000

8000

10000

thenumofinfectednodes h=0

h=500

h=1000

h=3000

r=3000

25 50 75 100 125 150timeHhourL

250

500

750

1000

1250

1500

1750

2000

thenumofinfectednodes

h=0

h=500

h=1000

h=3000

r=3000

Figure 5.2: Comparison between a case of that we make some nodes selected randomlyimmune and some case of that we make hub nodes immune. A parameter � is 3000.

From Figure 5.2, we can see that we should protect hub nodes if we protect thesame number of nodes.

19

Page 30: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

Chapter 6

Conclusion

We derive a way to build a model of worm spread dynamics using Scale-free Network.From result of simulation, we found that arrival intervals also decrease slowly if thenumber of infected neighbors of an observer is increase slowly, and the rate of thisdecrease is depend on the link structure of the neighbors.

On the other hand, we confirm an effect of propagation of a worm by the number ofimmune hub nodes. Especially in early time, we can obtain enough advantage eventhough the number of immune hub nodes is small. However unfortunately, we cannotsee a great effect as time goes by. However, the e-mail network in the Internet may bedifferent from this paper. Therefore, we need to research how many links the e-mailaddress in the Internet has or we need to consider a method estimating that.

20

Page 31: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

References

[1] Holger Ebel, Lutz-Ingo Mielsch, and Stefan Bornholdt, “Scale-free topology ofe-mail networks,” Feb 2002.

[2] Symantec, “W32.blaster.worm,http://securityresponse.symantec.com/avcenter/venc/data/w32.blaster.worm.html,”Tech. Rep.

[3] Symantec, “W32.netsky.p@mm,http://securityresponse.symantec.com/avcenter/venc/data/[email protected],”.

[4] Symantec, “W32.swen.a@mm,http://securityresponse.symantec.com/avcenter/venc/data/[email protected],”.

[5] Hiroshi Toyoizumi and Atsushi Kara, “Predators: Good will mobile codes com-bat against computer viruses,” In Proceedings of the New Security ParadigmsWorkshop, pp. 13–21, September 2002.

[6] Hiroshi Toyoizumi, “Performance evaluation of defense strategies against com-puter virus,” Proceedings Of The Queueing Symposium Stochastic Models AndTheir Applications, pp. 267–274, January 2004.

[7] Carey Nachenberg, “Computer virus-antivirus coevolution.,” Commun. ACM,vol. 40, no. 1, pp. 46–51, 1997.

[8] Albert-Lazzio Barabasi and Reka Albert, “Emergence of scaling in random net-works,” Science, vol. 286, no. 5439, pp. 509–512, October 1991.

[9] A.-L. Barabasi, Linked, Plume Books, 2003.

[10] D. J. Watts and S. H. Strogatz, “Collective dynamics of ’small-world’ networks.,”Nature, vol. 393, no. 6684, pp. 440–442, June 1998.

[11] Gueorgi Kossinets and Duncan J. Watts, “Empirical analysis of an evolving socialnetwork,” Science, vol. 311, no. 5757, pp. 88–90, January 2006.

[12] Alberto Medina, Anukool Lakhina, Ibrahim Matta, and John Byers, “BRITE:Universal topology generation from a user’s perspective,” Tech. Rep. 2001-003,January 2001.

21

Page 32: Dynamics of Malicious Software on the Internet · a new virus or worm. The system also needs a large memory for signatures. Further, The system needs the time to detect the viruses

[13] R.Albert and A.-L. Barabasi, “Statistical mechanics of complex networks.,” Rev.Mod. Phys. 74, pp. 47–97, 2002.

[14] Hiroshi Toyoizumi and Tatehiro Kaiwa, “Observation and modeling method ofdynamics of computer virus spread,” QoS Workshop “Reality of QoS in the In-ternet and Latest Measuring or Estimating Technologies.”, November 2005.

[15] Sheldom M. Ross, Applied Probability Model With Optimization Applications,Dover Pubns, 1992.

22