Modeling Malware Spreading Dynamics
description
Transcript of Modeling Malware Spreading Dynamics
Modeling Malware Spreading Dynamics
Michele Garetto (Politecnico di Torino – Italy)
Weibo Gong (University of Massachusetts – Amherst – MA)
Don Towsley (University of Massachusetts – Amherst – MA)
INFOCOM 2003
2
Outline Motivation Modeling framework
Interactive Markov Chains Analysis and simulation of
malware propagation dynamics Percolation problem Transient behavior
Conclusions
3
Motivation The Internet is an easy and powerful
mechanism for propagating malicious software programs : the “malware” (email viruses, worms, …)
It is expected that future malware acitivity will be more prevalent and virulent, resulting in significant greater damage and economic losses
4
Motivation Dynamics of malware propagation are still
not well understood We would like to:
Predict the temporal evolution of an infection process that starts propagating on a network
Design and evaluate effective countermeasures Assess the defensibility and vulnerability of different
network architectures Need to develop mathematical
methodologies that are able capture the spreading characteristics of malware
5
Our contribution A flexible modeling framework based on
Interactive Markov Chains, able to capture the probabilistic nature of malware propagation
Application of such framework to the case of email viruses: Identification of a “percolation problem” Investigation of the impact of the underlying
network topology
Analytical bounds and approximations validated through extensive simulations
6
The “Interactive Markov Chain” (IMC) Modeling Framework
Global network structure ...
alertalert failednormal
Local Structure(can vary from node to node)
secure insecure emergency
Each node is represented by a Markov chain, whose state transitions are influenced by the status of its neighbors
but locally a Markov chain
Global Structure(the network)
7
Computational complexity issue
The whole system evolves according to a global Markov chain G, whose state space dimension ( #G ) is equal to the product of the local chain dimensions ( #L ) G = L N
The solution of the global Markov chain is feasible only for small systems
example: - 20 nodes - binary status (0 = not infected, 1 = infected)
220 states !
8
How can we study very large systems (thousands of nodes) ?
Discrete event simulations of the model
how many runs ? how long ? could be computationally too expensive do not help to understand the system dynamics
Analytical bounds and approximations
quick prediction of the system behavior gross-level approximations can be sufficient provide insights into the inner dynamics
9
The virus propagates as attachment to e-mail messages
Requires human assistance random time elapses before the
recipient reads the message the “click” probability
The virus makes use of the recipient’s address book to send copies of itself
E-mail virus propagation
10
We consider the network graph induced by email address books Each node stands for an email address Edges represent social or business
relationships
The resulting graph is expected to have “small world” properties: small characteristic path length high clustering coefficient
IMC model : global structure
11
IMC model : local structure 3 statuses for each node:
S (Susceptible): the node can be infected by the virus I (Infected): the node has been infected by the virus M (Immune): the node can no more be infected by the
virus
Discrete time model probability that node “j” is susceptible at time
k probability that node “j” is infected at time k
probability that node “j” is immune at time k
12
IMC model
i
j
wij
I M
S
cj1-cj
13
Virus propagation model
The numerical solution of the system requires to know the joint probabilities of neighboring nodes:
?
14
Fundamental questions about malware propagation dynamics:
Virus propagation model
What is the final size of the infection outbreak ? How many nodes (on average) will be
reached by the virus at the end ? How fast is the malware propagation ?
What is the (average) number of infected node as a function of time ?
15
The “small-world” model of Watts and Strogatz
A ring lattice with additional random shortcuts
Parameters:N = number of nodesS = number of shortcuts ( = shortcuts density)
k = lattice connectivity (number of neighbors on each side of a node)
N = 24 S = 4 k = 3
16
What is the final size of the infection outbreak ?
Not all of the susceptible nodes necessarily receive a copy of the virus !
site percolation problem
(node occupation probability = click probability)
17
0.01
0.1
1
10
100
1000
10000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8click probability (c)Av
erag
e nu
mbe
r of
infe
cted
sit
es = 0.1 = 0.01 = 0.001
= 0.0001
Site percolation problem on the small world graph: exact asymptotic result
[Moore, Newman 2000]
18
Transient analysis of an infection process : Bounds
Probability that a node has been reached by the virus
Lower boundproperty of joint probabilities
Upper boundtheory of Associated Random Variables
19
Analytical bounds
0
1000
2000
3000
4000
5000
6000
7000
8000
0 500 1000 1500 2000 2500
Ave
rage
num
ber o
f inf
ecte
d no
des
time
simupper bound
lower bound
Infinite unidimentional lattice
k = 1k = 10
k = 100
(click probability = 1)1
20
Transient analysis of an infection process : approximation
Linear mixing of lower bound and upper bound
k = local connectivity of the nodes = self-influence probability
21
Fitting of mixing coefficient M(k,s)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1 10 100 1000
Mix
ing
coeffi
cien
t (M
)
Connectivity (k)
s = 0s = 1/3s = 2/3s = 0.9
Infinite unidimentional lattice
22
Approximate analysis on the small world graph: the impact of topology
0200400600800
100012001400160018002000
0 200 400 600 800 1000
Ave
rage
num
ber o
f inf
ecte
d no
des
time
model approxsimulation
k = 10 no shortcuts
k = 10 2 shortcuts
k ~ geom(10) no shortcuts
k = 10 20 shortcuts
Fully-connected graph
(2000 nodes - click probability = 1)
23
Combining transient analysis and percolation on general topologies
Probability not to be reached by the virus = initial immunization
overestimate of the spreading rate of the virus
Upper bound of the reaching probability on general topologies:
Global upper bound for the infection process
24
0
1000
2000
3000
4000
5000
6000
0 500 1000 1500 2000 2500 3000 3500 4000
Aver
age
num
ber
of in
fect
ed n
odes
time
simulationmodel approx + bound perc
Transient analysis and percolation on power-law random graphs
10000 nodes
GLP algorithm (Bu 2002)power-law node degree, small-world properties
(click probability = 0.5)
m = initial connectivitym = 1
m = 2
m = 4
One initially infected node with degree 10
25
Conclusions We have proposed an analytical framework
to study the dynamics of malware propagation on a network
We have obtained useful bounds and approximations to study an infection process on a general topology
Approach suitable to analyze a wide range of “dynamic interactions on networks” (routing protocols, p2p,…)
26
The End
Thanks…
27
Site percolation on a given small-world graph
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 100 200 300 400 500 600 700 800 900 999
Rea
chin
g pr
obab
ility
node index
simupper bound - h = 0upper bound - h = 8approximationlower bound - h = 8lower bound - h = 0