d e li v e r tm e- t s d mea s T a k i n gc a r e o ue sts ...
Chapter 1: Foundationmy.fit.edu/~gmarin/CSE5636/CharacterizeActivitySection6.pdf · mp l Ans Namu...
Transcript of Chapter 1: Foundationmy.fit.edu/~gmarin/CSE5636/CharacterizeActivitySection6.pdf · mp l Ans Namu...
Characterizing Activity 6-1
Characterizing Activity
Dr. G. A. Marin
Characterizing Activity 6-2
Activity Profiling
Collecting statistics that summarize the kinds of activities that occur regularly on the network.
Get a description that can be used to identify deviations from normal behavior.Develop profiles of machine behaviorGroup machines into activity clusters
Characterizing Activity 6-3
Characterize Machine ActivityCount number of SYNs to each port (by machine)
Services soughtCount SYN/ACKs from each port
Services provided Count SYNs by port (to destination machines)
Requests for serviceKeep a list of all IP addresses interacting with each machine in our network?
External machines only?Others?
Characterizing Activity 6-4
Activity Profile
An activity profile for a machine is a vector of counts or probabilities. Each count is associated with a specific activity.
E.g. TCP SYN packets sent to a specific port.Note that counts are generally time sensitive so vectors should be collected by hour of day. Thus consider the activity vector to be a vector of counts or probabilities relative to a given time period on a given day of the week.
Characterizing Activity 6-5
Cluster Analysis
Developed largely in biological and physical sciences to classify items or individuals into groups. “Clusters” are those with similar characteristics that seem to belong to a single group. Cluster analysis is the commonly used term for using procedures to identify groups in data.
Characterizing Activity 6-6
Our Goal
Divide machines into clusters based on their activity vectors (either network side or system side). Characterize the activity of machines that are similar (belong to one cluster). Compare new data (or process behavior) with behavior in existing clusters.Alarm if deviation exceeds a threshold.
Characterizing Activity 6-7
What is a Cluster?
Characterizing Activity 6-8
Multivariate Data Matrixn p×
1,1 1,2 1,
2,1 2,2 2,
,1 ,2 ,
... ...
... ...... ... ... ... ...... ... ... ... ...
... ...
p
p
n n n p
x x xx x x
x x xEach row represents the elements of a particular machine such as number of SYNs received by port and number of SYN/ACKs sent by port.
Characterizing Activity 6-9
Create Proximity Matrix
Consider scaling the vector (or row) elements and perhaps weighting them if some deemed more important than others.
Divide by range so that each element is between zero and one. Divide by standard deviation if greater variance implies less significance.
For discrete or continuous numerical values compute distance between each pair of vectors using, for example, Euclidean distance.Results in nxn proximity matrix.
Characterizing Activity 6-10
where each element represents the distance beween the activity vector for process and the activity vector for process . One might use, for example, the usual Euclidean distance:
1,1 1,2 1,
2,1 2,2 2,
,1 ,2 ,
... ...
... ...... ... ... ... ...... ... ... ... ...
... ...
n
n
n n n n
d d dd d d
d d d
Create the proximity matrix:
( ) ( ) ( )2 2 2
, ,1 ,1 ,2 ,2 , ,...i j i j i j i p j pd x x x x x x= − + − + + −
Characterizing Activity 6-11
514 127 934 729 84 0 205 403648 47 864 0 31 92 610 0950 54 988 721 49 52 584 693283 102 0 0 53 34 0 5763 119 764 492 88 35 665 225
Example:SYNs to4 particular ports and SYN/ACKs from Same Ports
M1:M2:M3:M4:M5:
Characterizing Activity 6-12
Proximity MatrixUsing Euclidean Distance
0 948 656 1289 626948 0 1053 1122 398656 1053 0 608 614
1289 1122 608 0 1244626 398 614 1244 0
Characterizing Activity 6-13
Ordered Similarity List1. M2 and M5: 3982. M3 and M4: 6083. M3 and M5: 6144. M1 and M5: 6265. M1 and M3: 6566. M1 and M2: 9487. M2 and M3: 10538. M2 and M4: 11229. M4 and M5: 124410. M1 and M4: 1289
What are the clusters?
We’ll return to this. Next, we look at classifying machines by system-call activity.
Characterizing Activity 6-14
Benign System Call Traces
0
5000
10000
15000
20000
25000
acce
pt
CloseHan
dle
close
sock
et
CloseW
indow
Station
CoCrea
teGuid
CoCrea
teIns
tance
CoGetC
lassO
bject
CoGetT
reatAsC
lass
CoInter
netG
etSec
urityU
rl
CoMarh
alInte
rface
conn
ect
CopyF
ile
CoReg
isterC
lassObje
ct
CoUnm
arsha
lInter
faceCrea
teFile
CreateM
utex
CreateP
roces
s
CreateT
hread
CreateU
RLMon
iker
DeleteF
ile
Duplic
ateTok
en
ExitProc
ess
ExitThre
ad
FindFirs
tFile
FindMim
eFrom
Data
FindNextF
ile
GetCom
mandLin
e
GetDate
Format
getho
stbyn
ame
GetMod
uleFile
Name
GetMod
uleHan
dle
getpe
ernam
e
getso
ckna
me
GetTem
pFile
Name
GetThe
meMarg
ins
GetTick
Count
GetUrlC
acheE
ntryIn
foA
GlobalA
ddAtom
Intern
etGetC
onne
ctedS
tate
Load
Librar
yrec
v
RegClos
eKey
RegCrea
teKey
RegOpe
nKey
RegQue
ryValu
e
RegSetV
alue
RegSetV
alueE
xse
nd
SetCurre
ntDire
ctory
Sleep
Socke
t
WNetClos
eEnu
m
WNetEnu
mResou
rce
WNetOpe
nEnu
mWrite
File
WSACleanup
WSARecv
WSAStartup
Syst
em C
all F
requ
ency
Characterizing Activity 6-15
Viral System Call Traces
0
100
200
300
400
500
600
acce
pt
CloseHan
dle
close
sock
et
CloseW
indowStat
ion
CoCreateGuid
CoCreateInsta
nce
CoGetClas
sObject
CoGetTrea
tAsC
lass
CoInterne
tGetS
ecurity
Url
CoMarhalIn
terfac
eco
nnect
CopyFile
CoRegist
erClass
Object
CoUnmars
halIn
terfac
e
CreateF
ile
CreateM
utex
CreateP
roces
s
CreateT
hread
CreateU
RLMon
iker
DeleteFile
Duplica
teTok
en
ExitProc
ess
ExitThre
ad
FindFirs
tFile
FindMim
eFromData
FindNex
tFile
GetCommand
Line
GetDateFo
rmat
getho
stbyn
ame
GetMod
uleFileNam
e
GetMod
uleHandle
getpe
ername
getso
ckna
me
GetTem
pFile
Name
GetThe
meMargins
GetTick
Count
GetUrlC
ache
EntryIn
foA
GlobalA
ddAtom
Intern
etGetC
onne
ctedS
tate
Load
Librar
yrecv
RegClos
eKey
RegCrea
teKey
RegOpe
nKey
RegQue
ryValu
e
RegSetV
alue
RegSetV
alueE
xse
nd
SetCurr
entD
irecto
rySlee
pSoc
ket
WNetClos
eEnum
WNetEnu
mResou
rce
WNetOpe
nEnu
mWrite
File
WSACleanup
WSARecv
WSAStartup
Syst
em C
all F
requ
ency
Characterizing Activity 6-16
Process Activity Matrix
1,1 1,2 1,
2,1 2,2 2,
,1 ,2 ,
... ...
... ...... ... ... ... ...... ... ... ... ...
... ...
r
r
n n n r
x x xx x x
x x xWhere each element of the matrix is:
, , with representing the th process ( 1) and representing the th system call numbered in any convenient order.
i jx i i i jj
≥
Characterizing Activity 6-17
AGAIN………….Create the proximity matrix:
where each element represents the distance beween the activity vector for process and the activity vector for process . One might use, for example, the usual Euclidean distance:
1,1 1,2 1,
2,1 2,2 2,
,1 ,2 ,
... ...
... ...... ... ... ... ...... ... ... ... ...
... ...
n
n
n n n n
d d dd d d
d d d
( ) ( ) ( )2 2 2
, ,1 ,1 ,2 ,2 , ,...i j i j i j i r j rd x x x x x x= − + − + + −
,i jdi j
Characterizing Activity 6-18
Measures of Distance (numerical data)
( )122
1
1
1
1
1
2 2
1 1
Euclidean distance .
City Block distance .
Minkowski distance , 1.
1Angular distance , with
2
r
ij ik jkk
r
ij ik jkk
r mm
ij ik jkk
r
ik jkij k
ij ijr r
ij jkk k
d x x
d x x
d x x m
x xd
x x
φφ
=
=
=
=
= =
⎛ ⎞= −⎜ ⎟⎝ ⎠
= −
⎛ ⎞= − ≥⎜ ⎟⎝ ⎠
−= =
∑
∑
∑
∑
∑12⎛ ⎞
⎜ ⎟⎝ ⎠
∑
Characterizing Activity 6-19
To illustrate the use of the proximity matrix we take system call data from five of the processes represented in Figures 1 and 2. We select the following 4 (of the 58 total) system call counts for illustrative purposes only:
1. close handle2. create file3. find first file4. register querythe example process activity matrix (greatly abbreviated) and proximity matrix:
Process 1: 179 11 6 226Process 2: 160 163 70 67Process 3: 30 0 1 2Process 4: 70 30 0 101Process 5: 407 0 0 4
0 230 270 167 318230 0 229 178 310270 229 0 111 377167 178 111 0 352318 310 377 352 0
Characterizing Activity 6-20
Importance of Scaling• Proximity matrix leads to this ordering of distances:
3,4 2,4 1,4 2,3 2,1 1,3 2,5 1,5 4,5 3,5, , , , , , , , , .d d d d d d d d d d
• Close distance between process 3 and process 4 may simply be due to small total number of system calls in this two cases.
• Another approach would be to normalize by percent of call of each type:
•Process 3 =
•Process 4 =
[ ]90.9 0 3 6.1
[ ]34.8 14.9 0 50.2
This would result in a different ordering that would not be affected by the total number of system calls only by the percentage of various types. What if we want some of both?
Characterizing Activity 6-21
A New Activity RepresentationThe original activity vectors:
Process 1: 179 11 6 226Process 2: 160 163 70 67Process 3: 30 0 1 2Process 4: 70 30 0 101Process 5: 407 0 0 4
Add a first element which is total activity scaled between 0 and 100 and represent other elements as percent of total for that vector:
M
27.6
30.1
2.2
13.2
26.9
42.4
34.8
90.9
34.8
99.0
2.6
35.4
0
14.9
0
1.4
15.2
3.0
0
0
53.6
14.6
6.1
50.2
1.0
⎛⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎠
:=
Characterizing Activity 6-22
New Proximity Matrix
D
0
53.397
72.546
20.735
77.327
53.397
0
73.484
46.95
76.165
72.546
73.484
0
73.784
26.659
20.735
46.95
73.784
0
83.379
77.327
76.165
26.659
83.379
0
⎛⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎠
=
Now the closest two activity vectors are 1 and 4 instead of 3 and 4. This is simply due to what we have determined is important (and how we measure the distance between two vectors)!
Characterizing Activity 6-23
Measures for Categorical DataSuppose each vector contains fields like the following:- Layer 4 protocol (TCP,UDP,"ICMP")- Layer 3 protocol (IP,IPX,OSI,APPN)- Layer 2 protocol (Ethernet,Token Ring,ATM).
We can compute similarity mea
1
sures (similar to proximity idea), 1 where each 1 if agrees with in the element.
r
ij ijr ijr i jk
s s s x x rthr =
= =∑
Characterizing Activity 6-24
Clustering ExampleHierarchical Agglomerative Method
D
0
53.397
72.546
20.735
77.327
53.397
0
73.484
46.95
76.165
72.546
73.484
0
73.784
26.659
20.735
46.95
73.784
0
83.379
77.327
76.165
26.659
83.379
0
⎛⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎠
=
We return to the vectors that led to matrix D.
1,4
(14)2 12 42 42 (14)3 13 43 13
(14)
The smallest positive value is 20.735; thus, form the two-membercluster (1,4) = (14). Compute the nearest-neighbor distances:
min[ , ] =46.95. min[ , ] 72.546.
d
d d d d d d d d
d
=
= = = = =
5 15 45 15min[ , ] =77.327.d d d= =
Characterizing Activity 6-25
Clustering Step 2
( )
2
We compute the new proximity matrix014
47.0 0272.5 73.5 0377.3 76.2 83.4 05
D
⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎣ ⎦
The smallest value is between (14) and 2 so we add 2 to this clusterand form (142).
Characterizing Activity 6-26
Clustering Step 3.(142)3 (14)3 23
(142)5 (14)5 25
142
3 3
5
min[ , ] 72.5min[ , ] 76.2.
072.5 076.2 83.4 0
d d dd d d
dD d
d
= =
= =
⎡ ⎤⎢ ⎥= ⎢ ⎥⎢ ⎥⎣ ⎦
Form the clusters (1423) and (5).
Characterizing Activity 6-27
Clustering Step 4
(1423)5 (142)5 35min[ , ] 76.2.
Next cluster will be (14235).
d d d= =
Characterizing Activity 6-28
Choosing Clusters12345
1234 5
3 124
2 14
1 4
72.5
20.7
76.2
47.0CLUSTERS:{1,2,4}, {3}, {5}
Characterizing Activity 6-29
Using Principal Components Analysis to Reduce the Number of Variables
(Reduce dimension of activity vectors.)
Characterizing Activity 6-30
Correlation Coefficients
Recall that for two random variables X and Y their covariance is
Cov(X,Y) = E(XY)-E(X)E(Y).
Their correlation coefficient is
Cov( , )( , ) .X Y
X YX Yρσ σ
=
Characterizing Activity 6-31
Estimated Correlation Matrix
1,1 1,2 1,
2,1 2,2 2,
,1 ,2 ,
,
An original matrix of observations... ...... ...
... ... ... ... ...
... ... ... ... ...... ...
represents values from random variables , where implies, for
example, the
r
r
n n n r
i j
x x xx x x
x x xX i
machine, or process, and implies, for example, counts for a particular system call or port number access. Each column contains valuesof a RV giving, say, counts of system call per procej
ith j
S jth
, , , ,1 1 1
,
ss.
We form the estimated covariance matix C
n n n
i k j k i k j kk k k
i j r r
r r
x x x xc
n n n= = =
×
×
⎡ ⎤⎛ ⎞⎛ ⎞⎢ ⎥⎜ ⎟⎜ ⎟⎢ ⎥⎜ ⎟⎜ ⎟⎡ ⎤= = −⎣ ⎦ ⎢ ⎥⎜ ⎟⎜ ⎟
⎜ ⎟⎜ ⎟⎢ ⎥⎝ ⎠⎝ ⎠⎣ ⎦
∑ ∑ ∑
Characterizing Activity 6-32
Example correlation matrixProcess 1: 179 11 6 226Process 2: 160 163 70 67Process 3: 30 0 1 2Process 4: 70 30 0 101Process 5: 407 0 0 4
For the matrix:
C
1.719 104×
873.56−
144.88−
1.55− 103×
873.56−
3.853 103×
1.667 103×
23.4
144.88−
1.667 103×
750.24
22.4−
1.55− 103×
23.4
22.4−
6.757 103×
⎛⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎠
=
Characterizing Activity 6-33
The mathcad computation…
X
179
160
30
70
407
11
163
0
30
0
6
70
1
0
0
226
67
2
101
4
⎛⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎠
:=
i 1 2, 4..:=
MXi mean X i⟨ ⟩( ):= SXi stdev X i⟨ ⟩( ):=
j 1 2, 4..:=
SSi j,X i⟨ ⟩ X j⟨ ⟩⋅
5MXi MXj⋅−:=Ci,j
Characterizing Activity 6-34
Principal Components Analysis
1 11 1 12 2 1
2 21 1 22 2 2
1 1 2 2
It is possible to find new random variables......
...
Such that each accounts for a decreasing amount of the variancein the random
r r
r r
r r r rr r
i
i
Y a S a S a SY a S a S a S
Y a S a S a S
YS
= + + += + + +
= + + +
iii
variables and each pair , are uncorrelated. Once
we account for "most" of the original variance we can drop remaining from consideration (reduce problem dimension).
i j
i
Y Y
Y
Characterizing Activity 6-35
Finding the A MatrixThe coefficients are the "eigenvectors" of the covariance matrix C.
That is, each column of the matrix satisfies:
C[ ] [ ] , where the are the "eigenvalues."
Th
ij
T
ij
i ii i
a
AT a
AT I ATλ λ
⎡ ⎤= ⎣ ⎦
=
[ ]( )
[ ]( )
e eigenvalues are found by solving equations:
det 0.
The eigenvectors are found by solving the equations 0.
However, many tools exist to find these.
i
i i i
I C
V I C V
λ
λ
− =
− =
Characterizing Activity 6-36
“Simple” eigenvalue example[ ] ( )( )
2
1 2 1 2Let . Then det det 1 4 2
1 4 1 4
5 2. If we set this equal to zero
5 17we find 4.562 and 0.438. These are t2
C I Cλ
λ λ λλ
λ λ
λ
− −⎡ ⎤ ⎡ ⎤= − = = − − −⎢ ⎥ ⎢ ⎥− −⎣ ⎦ ⎣ ⎦
= − +
±= ≈
[ ]
1
2
1 1
2 2
he eigenvalues, and we need
the corresponding eigenvectors. If is an eigenvector, then
1 2 0 0. For 4.562 we get
1 4
the simultaneous equatio
vv
v
v vI C
v vλ
λ λλ
⎡ ⎤= ⎢ ⎥⎣ ⎦
− −⎡ ⎤ ⎡ ⎤⎡ ⎤− = ⇒ = =⎢ ⎥ ⎢ ⎥⎢ ⎥− −⎣ ⎦⎣ ⎦ ⎣ ⎦
1 2
1 2
1 2
2 1
3.562 2 0ns . These are homogeneous equations
1 0.562 0which do not have unique solutions. The second equation yields 0.562 .Simply set 1 to get 0.562. Then we normalize by div
v vv v
v vv v
− =⎧⎨− + =⎩
== =
2 21 2 1 2
iding each by
1.147. This gives us 0.49 and 0.872.v v v v+ = = =
Characterizing Activity 6-37
Eigenvalues continued0.490
Thus, corresponding to the eigenvalue 4.562 we have the eigenvector .0.872
0.963Similarly, we find corresponding to 0.438 we have the eigenvector .
0.270If C really had been a covarian
⎡ ⎤⎢ ⎥⎣ ⎦
⎡ ⎤⎢ ⎥−⎣ ⎦
ce matrix, we would write the matrix rows0.490 0.872
using the transpose of these two vectors. Thus, . Notice0.963 0.270
that the first row corresponds to the largest eigenvalue and we contin
A
A ⎡ ⎤= ⎢ ⎥−⎣ ⎦
ue in orderof decreasing eigenvalue size.
Characterizing Activity 6-38
Goodnews – Mathcad demo
C1
1
2
4⎛⎜⎝
⎞⎟⎠
:= e eigenvals C( ):= e0.438
4.562⎛⎜⎝
⎞⎟⎠
=
v eigenvecs C( ):= v0.963−
0.27
0.49−
0.872−⎛⎜⎝
⎞⎟⎠
=
The vectors columns are the negatives of what we found, which does not matter. The first column corresponds to the first eigenvector found by eigenvals, etc.
Characterizing Activity 6-39
Understanding Y Values1 2
1 2
Recall that we're looking for new random variables , ,..., that are orthogonaland may capture most of the variance of the original , ,..., with a much-reducednumber of ' . The defining eq
r
r
i
Y Y YS S S
Y s uations are given on slide number 34. Each row ofthe original X matrix represents one "realization" of the original random variables, ;thus, one row of the X matrix results in one estimate of the
iS' . In practice, to estimate
the ' , we first normalize the x-values by subtracting the column mean from each(as we shall see). Then we transpose each row of this normalized X matrix, NX, sothat
i
i
Y sY s
( )Tr n
the rows become column vectors prior to multiplying by the A matrix.
Utlimately the matrix equation Y NX results in the columns of Y.Each column is an estimate of the random variables
r r r nA nY
× × ×= ×
( )T
arising from the corresponding
normalized columns of NX .i
r n×
Characterizing Activity 6-40
Estimating Y values
,
In our original system call example:Each row of the original X matrix contains a set of estimates of the system call random variables, , 1, 2,..4 for a single process.Each is the th sample of
i
i j
S ix i
=
, j
51
, ,51
,
system call .
From each entry we subtract the column mean (estimated mean for S )
MX to obtain the normalized matrix NX MX .
We estimate the matrix as
i j
j i j i j ji
i j
j
x
x x
Y
Y y
=
⎡ ⎤= = −⎣ ⎦
⎡ ⎤= ⎣
∑
4 5(NX) .T
xA= ×⎦
Characterizing Activity 6-41
Find Eigen-vectors/values
E eigenvecs C( ):= E
0.013
0.401
0.916−
1.553− 10 3−×
0.073−
0.912−
0.401−
0.045−
0.139−
0.05
0.022
0.989−
0.987−
0.065
0.015
0.143
⎛⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎠
=
eval eigenvals C( ):=eval
21.803
4.517 103×
6.538 103×
1.747 104×
⎛⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎠
=
smallest
largest
A
0.987−
0.139−
0.073−
0.013
0.065
0.05
0.912−
0.401
0.015
0.022
0.401−
0.916−
0.143
0.989−
0.045−
1.553− 10 3−⋅
⎛⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎠
:=
Characterizing Activity 6-42
Checking A for Orthogonality
AT A⋅
0.999
6.84 10 4−×
4.98− 10 4−×
4.052− 10 4−×
6.84 10 4−×
0.999
4.71 10 4−×
2.622 10 4−×
4.98− 10 4−×
4.71 10 4−×
1.001
1.455− 10 4−×
4.052− 10 4−×
2.622 10 4−×
1.455− 10 4−×
1.001
⎛⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎠
=
This is approximately as expected for for an orthogonal matrix. (ones on diagonal and zero elsewhere)
Characterizing Activity 6-43
Finding Y Original Observations of:
X
179
160
30
70
407
11
163
0
30
0
6
70
1
0
0
226
67
2
101
4
⎛⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎠
=
S1 S2 S3 S4i 1 2, 5..:=
j 1 2, 4..:=
NXi j, Xi j, MXj−:=
NX
9.8
9.2−
139.2−
99.2−
237.8
29.8−
122.2
40.8−
10.8−
40.8−
9.4−
54.6
14.4−
15.4−
15.4−
146
13−
78−
21
76−
⎛⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎠
=
Y A NXT⋅:=
Y
9.127
147.453−
23.662
3.439−
15.983
21.447
132.084−
1.111−
123.368
94.134
56.656
4.859−
99.98
7.859−
22.322
8.453
248.46−
39.731
29.446
0.955
⎛⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎠
=
MX
169.2
40.8
15.4
80
⎛⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎠
=
Characterizing Activity 6-44
Reducing Variables:
We write MEANS
169.2
169.2
169.2
169.2
169.2
40.8
40.8
40.8
40.8
40.8
15.4
15.4
15.4
15.4
15.4
80
80
80
80
80
⎛⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎠
= And it follows that
YT A⋅ MEANS+
178.915
160.071
30.152
70.092
406.77
11.062
162.927
0.093−
29.938
0.165
5.955
70.095
1.053
0.033
0.135−
226.077
67.02
2.001
101.052
3.85
⎛⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎠
=
Y
9.127
147.453−
23.662
3.439−
15.983
21.447
132.084−
1.111−
123.368
94.134
56.656
4.859−
99.98
7.859−
22.322
8.453
248.46−
39.731
29.446
0.955
⎛
This is the reconstructionof the X matrix.
Recall that ⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎠
=
X
179
160
30
70
407
11
163
0
30
0
6
70
1
0
0
226
67
2
101
4
⎛
5 estimates of Y1 in1st row. ..5 estimatesof Y4 in 4th row.
But Y was created so that Y4 has least influence on original data, X, matrix.
⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎠
:=
Characterizing Activity 6-45
Checking reduction
Y
9.127
147.453−
23.662
3.439−
15.983
21.447
132.084−
1.111−
123.368
94.134
56.656
4.859−
99.98
7.859−
22.322
8.453
248.46−
39.731
29.446
0.955
⎛⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎠
=
Replace
withYRows3
9.127
147.453−
23.662
0
15.983
21.447
132.084−
0
123.368
94.134
56.656
0
99.98
7.859−
22.322
0
248.46−
39.731
29.446
0
⎛⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎠
=
YRows3T A⋅ MEANS+
178.96
160.085
30.215
69.982
406.757
12.441
163.372
1.856
26.548
0.218−
2.805
69.077
3.397−
7.776
0.74
226.071
67.018
1.994
101.065
3.851
⎛⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎠
=
and
1 2 34
ii=1
This is a reconstruction of theX matrix using only 3 variables.
eval eval evalthat account for eval
0.999 of the variance or "energy."
+ +
=
∑
Characterizing Activity 6-46
Estimate of X with 2 variables
X
179
160
30
70
407
11
163
0
30
0
6
70
1
0
0
226
67
2
101
4
⎛⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎠
:=Originally
YRows2T A⋅ MEANS+
180.687
150.443
34.351
71.612
408.907
34.021
42.911
53.526
46.906
26.637
12.293
16.112
19.321
16.727
12.547
227.136
61.075
4.543
102.07
5.176
⎛⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎠
=
Characterizing Activity 6-47
Reduction to 3 Variables
1 2 3Notice that the estimates of the three random variables , , can be used to recoverthe original rows of the X matrix to a considerable degree. The reduction to only tworandom variables leads to m
Y Y Y
uch greater errors.
On the next slide we plot the 5 original points using the new Y-variables and use theseto make a visual clustering decision. This reduced number of variables could also beused for other classifications, such as determining which of the original rows (processes)seem to be malicious.
Characterizing Activity 6-48
P3
P4
P5
P1
P2
Plot Using Three Dimensions:Perhaps Cluster P1, P3, P4?
Characterizing Activity 6-49
Within and Between1 2 1 2Suppose outcomes of random variables and are written , ,..., .
We want to divide the data into two groups representing outcomes from each of the RVs. Using whatever means we select a subset
nX X d d d
( ) ( )
1 211 12 1 21 22 2
1 2
222
1 1 1
of thepoints and relabel them as , ,..., and , ,..., with
.
The total variance (dispersion) estimated from the data is
1 1 1 . The total
n n
n n n
i i ii i i
x x x x x x
n n n
V d d d dn n n= = =
+ =
⎛ ⎞= − = − ⎜ ⎟⎝ ⎠
∑ ∑ ∑
( ) ( )22 2 2
1 1 1
sum-of-squares
is defined as .
It can be shown that ,
where is called the "within" sum of squares and is called the"between" sum of squares.
kn
mk m m mm k m
T nV
T x x n x d W B
W B= = =
=
= − + − = +∑∑ ∑
Characterizing Activity 6-50
Within/Between ExampleData: 1,5,2,4,3,6 and we choose groups 1,2,3 and 4,5,6.
X1
1
2
3
⎛⎜⎜⎝
⎞⎟⎟⎠
:= X2
4
5
6
⎛⎜⎜⎝
⎞⎟⎟⎠
:=MX1 mean X1( ):=
MX2 mean X2( ):= MT
X1∑ X2∑+⎛⎜⎝
⎞⎟⎠
6:=
SST X1 MT−( ) X1 MT−( )⋅ X2 MT−( ) X2 MT−( )⋅+:=
SST 17.5=
W X1 MX1−( ) X1 MX1−( )⋅ X2 MX2−( ) X2 MX2−( )⋅+:=
B 3 MX1 MT−( )2 3 MX2 MT−( )2+:=
W 4= B 13.5=
Characterizing Activity 6-51
Optimization Criterion
A commonly used criterion for determining appropriate groupsis to divide the data in such a way as to minimize the "within-group"sum of squares, .
This is equivalent to maximizing the "between-group"
W
sum of squares, .
This generally requires knowing the correct number of groups.
B
Characterizing Activity 6-52
Data Vectors
1 2
Our problem is more challenging because the data that we collectare vectors, with dimension 1. If we divide these into groups,
then the members of the kth group are vectors: , ,..., ak
i
k k kn
d r g
x x x
>
( )( )T
x1 1
nd
each vector has dimension . (Think of these as our system-call vectors each with variables or counts.) The equation for the total sum
of squares becomes and, similarkng
r r km kmk m
rr
T x x x x= =
= − −∑∑
( )( ) ( )( )T T
1 1 1
ly,
and
Each of these matrices is because original vectors arerepresented as 1 column vectors. Again we have .Groups are formed to minimize Trac
kng g
km m km m m m mk m m
W x x x x B n x x x x
r rr T W B
= = =
= − − = − −
×× = +
∑∑ ∑
e( ).W
Characterizing Activity 6-53
Notation11
1211
1
12
The vector contains the counts of "calls" of type 1 through
made by process 1 of group 1. represents the vector of calls madeby process 2 of group 1. Similarly, represents
m
r
mn
cc
x r
cx
x
=
1
2
the vector of calls
made by the final process, , of group . is now a vector of means:
where each is the mean count for each system call type.
m
r
n m x
c
cx
c
⎡ ⎤⎢ ⎥⎢ ⎥= ⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
Characterizing Activity 6-54
Notation Continued…
[ ]
211 11 12 11 111
2T 12 11 12 12 12 2
11 11 11 12 1
21 11 1 12 1 1
It follows that
...
... ... . The diagonal
...contains the square of the counts for each system call made
r
rr
r r r r
c c c c ccc c c c c cx x c c c
c c c c c c
⎡ ⎤⎢ ⎥⎢ ⎥= =⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
i
( )( )
( ) ( ) ( )
T
x1 1
2 2 2
1 1 2 21 1 1
by process 1.
Thus, , in this case is a matrix with
diag( ) , ,..., . Where
is the total number of processes (total across all grou
kng
r r km kmk m
n n n
i i ir ri i i
T x x x x
T c c c c c c n
= =
= = =
= − −
⎡ ⎤= − − −⎢ ⎥⎣ ⎦
∑∑
∑ ∑ ∑ps).
Characterizing Activity 6-55
Instructive Example:1 2 32 3 13 1 24 5 6
We begin with the data matrix X= .5 6 46 4 57 8 98 9 79 7 8
We divide naturally into 3 groups:1 2 3 4 5 6 7 8 92 3 1 , 5 6 4 , and 8 9 73 1 2 6 4 5 9 7 8
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
⎡ ⎤ ⎡ ⎤ ⎡⎢ ⎥ ⎢ ⎥ ⎢⎢ ⎥ ⎢ ⎥ ⎢⎢ ⎥ ⎢ ⎥ ⎢⎣ ⎦ ⎣ ⎦ ⎣
⎤⎥⎥⎥⎦
Characterizing Activity 6-56
Notation11 12 13
11 12 13
21 22
The vectors , , represent the row-data of the data matrix, but
are written as column vectors:1 2 32 3 1 . Similarly,3 1 2
4 55 6
x x x
x x x
x x
⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥= = =⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦
⎡ ⎤⎢ ⎥= =⎢ ⎥⎢ ⎥⎣ ⎦
T11
T12
33 T21
T33
9
6 ... 7 . Thus, .4 8
5The overall mean is 5 .
5
x
x
x Xx
x
x
⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥= =⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦
⎡ ⎤⎢ ⎥= ⎢ ⎥⎢ ⎥⎣ ⎦
Characterizing Activity 6-57
Total Sum of Squares
( )( ) ( )( )3 3T T
1 1 1 1
Recall
kng
km km km kmk m k m
T x x x x x x x x= = = =
= − − = − −∑∑ ∑∑
XI
1
2
3
4
5
6
7
8
9
2
3
1
5
6
4
8
9
7
3
1
2
6
4
5
9
7
8
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
:=
XIT
1
2
3
2
3
1
3
1
2
4
5
6
5
6
4
6
4
5
7
8
9
8
9
7
9
7
8
⎛⎜⎜⎝
⎞⎟⎟⎠
=xmeani mean XI i⟨ ⟩( ):=
xmean
5
5
5
⎛⎜⎜⎝
⎞⎟⎟⎠
= j 1 2, 3..:=
XV1 j, XIT j⟨ ⟩:=
XV2 j, XIT j 3+⟨ ⟩:=
XV3 j, XIT j 6+⟨ ⟩:=
XV2 1,
4
5
6
⎛⎜⎜⎝
⎞⎟⎟⎠
=
T1
3
k 1
3
m
XVk m, xmean−( ) XVk m, xmean−( )T⋅∑=
∑=
:=
Example:
Characterizing Activity 6-58
Total Matrix Result
T1
3
k 1
3
m
XVk m, xmean−( ) XVk m, xmean−( )T⋅∑=
∑=
:=
T
60
51
51
51
60
51
51
51
60
⎛⎜⎜⎝
⎞⎟⎟⎠
=
Characterizing Activity 6-59
Within Matrix Result
W1
3
k 1
3
m
XVk m, gmeank−( ) XVk m, gmeank−( )T⋅∑=
∑=
:=
XIT
1
2
3
2
3
1
3
1
2
4
5
6
5
6
4
6
4
5
7
8
9
8
9
7
9
7
8
⎛⎜⎜⎝
⎞⎟⎟⎠
=
gmean1
mean XIT 1⟨ ⟩( )mean XIT 2⟨ ⟩( )mean XIT 3⟨ ⟩( )
⎛⎜⎜⎜⎝
⎞⎟⎟⎟⎠
:= gmean2
mean XIT 4⟨ ⟩( )mean XIT 5⟨ ⟩( )mean XIT 6⟨ ⟩( )
⎛⎜⎜⎜⎝
⎞⎟⎟⎟⎠
:= gmean3
mean XIT 7⟨ ⟩( )mean XIT 8⟨ ⟩( )mean XIT 9⟨ ⟩( )
⎛⎜⎜⎜⎝
⎞⎟⎟⎟⎠
:=
XV1 1,
1
2
3
⎛⎜⎜⎝
⎞⎟⎟⎠
=
Example
W
6
3−
3−
3−
6
3−
3−
3−
6
⎛⎜⎜⎝
⎞⎟⎟⎠
= Trace W = 18.
Characterizing Activity 6-60
Between Matrix ResultB
1
3
m
3 gmeanm xmean−( )⋅ gmeanm xmean−( )T⋅∑=
:=
B
54
54
54
54
54
54
54
54
54
⎛⎜⎜⎝
⎞⎟⎟⎠
=
T
60
51
51
51
60
51
51
51
60
⎛⎜⎜⎝
⎞⎟⎟⎠
=
W
6
3−
3−
3−
6
3−
3−
3−
6
⎛⎜⎜⎝
⎞⎟⎟⎠
=
Characterizing Activity 6-61
Optimization AlgorithmsSo...we want to partition machines (or ports or whatever) into groups in a way that minimizes Trace( ). How? * In theory we could compute Trace( ) for each possible partition. * BUT numb
n gW
W
( )1
28
er of partitions of objects into groups is:1 ( , ) 1 !
* (2,5) 15 (10,3) 9330 (50, 4) 5.3x10 ... * Algorithms have been developed to sear
gg m
m
n g
N n gg
N N N
−
=
= −
= = =
∑
ch for optimum value of clustering criteria by picking an initial partition, rearranging in some way and keeping the new arrangement only if the criteria are improved.
Characterizing Activity 6-62
Hill Climbing Algorithms
• Initial partition of n objects into g groups• Move each object into a different group and recompute criterion• Keep the change that most improves the criterion.• Repeat until no improvement from moving a single object.
Characterizing Activity 6-63
K-Means AlgorithmA hill-climbing algorithm in which change is made by relocating objectsinto the group whose mean is closest to the object. Under common conditions results in minimizing Trace W. We return to the process analysis matrix for 5 processes as an example:
X
179
160
30
70
407
11
163
0
30
0
6
70
1
0
0
226
67
2
101
4
⎛⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎠
:=
Suppose that we use the 10 points on the next slide as an example. Note that their coordinates are labeled (Y2,Y4) from real data.
Characterizing Activity 6-64
From Transformed System Call Data
4.5 37.7-147.7 -73.7
2.5 -19.3-14.9 23.143.3 -278.2
-100.2 -98.527.5 -210.3
-150.8 -150.151.2 -225.411.2 1.7
Y2 Y4Process 1Process 2Process 3Process 4Process 5Process 6Process 7Process 8Process 9Process 10
Characterizing Activity 6-65
Kmeans Example with 3 Groups
Y
4.5
147.7−
2.5
14.9−
43.3
100.2−
27.5
150.8−
51.2
11.2
37.7
73.7−
19.3−
23.1
278.2−
98.5−
210.3−
150.1−
225.4−
1.7
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
:=
YT1 2 3 4 5 6 7 8 9 10
12
4.5 -147.7 2.5 -14.9 43.3 -100.2 27.5 -150.8 51.2 11.237.7 -73.7 -19.3 23.1 -278.2 -98.5 -210.3 -150.1 -225.4 1.7
=
YT( ) 1⟨ ⟩ 4.5
37.7⎛⎜⎝
⎞⎟⎠
=
YT( ) 5⟨ ⟩ 43.3
278.2−⎛⎜⎝
⎞⎟⎠
=
YT( ) 8⟨ ⟩ 150.8−
150.1−⎛⎜⎝
⎞⎟⎠
=
Pick 3 points (value-pairs) that are furthest apart.
Characterizing Activity 6-66
Place Process 2 Begin step 1 to place process 2..
YT YT:=
g1mean YT 1⟨ ⟩:= g2mean YT 5⟨ ⟩:= g3mean YT 8⟨ ⟩:=
dist1 YT 2⟨ ⟩ g1mean−( ) YT 2⟨ ⟩ g1mean−( )⋅:=
dist2 YT 2⟨ ⟩ g2mean−( ) YT 2⟨ ⟩ g2mean−( )⋅:=
dist3 YT 2⟨ ⟩ g3mean−( ) YT 2⟨ ⟩ g3mean−( )⋅:=
dist1 188.613=
dist2 279.824=
dist3 76.463= minimum
Process 2 belongs in group 3.
Characterizing Activity 6-67
Define Group Member Vectors
group1
1
0
0
0
0
0
0
0
0
0
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
:= group2
0
0
0
0
1
0
0
0
0
0
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
:= group3
0
1
0
0
0
0
0
1
0
0
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
:=
Process 2
Process 8
g3mean12
⎛⎜⎝
⎞⎟⎠
YT group3⋅( )⋅:= g3mean149.25−
111.9−⎛⎜⎝
⎞⎟⎠
=
Recalculate the group 3 mean…
Characterizing Activity 6-68
Add next process
Begin step 3 to assign process 3 to a group. Result: group 1.
dist1 YT 3⟨ ⟩ g1mean−( ) YT 3⟨ ⟩ g1mean−( )⋅:=
dist2 YT 3⟨ ⟩ g2mean−( ) YT 3⟨ ⟩ g2mean−( )⋅:=
dist3 YT 3⟨ ⟩ g3mean−( ) YT 3⟨ ⟩ g3mean−( )⋅:=
dist1 57.035=dist2 262.095=
dist3 177.772=
group1 group1
0
0
1
0
0
0
0
0
0
0
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
+:=
minimum
g1mean12
⎛⎜⎝
⎞⎟⎠
YT group1⋅( )⋅:=
g1mean3.5
9.2⎛⎜⎝
⎞⎟⎠
=
Characterizing Activity 6-69
Assign process 4dist1 YT 4⟨ ⟩ g1mean−( ) YT 4⟨ ⟩ g1mean−( )⋅:= dist1 23.06=
dist2 YT 4⟨ ⟩ g2mean−( ) YT 4⟨ ⟩ g2mean−( )⋅:= dist2 306.87=
dist3 YT 4⟨ ⟩ g3mean−( ) YT 4⟨ ⟩ g3mean−( )⋅:= dist3 190.46= group1 group1
0
0
0
1
0
0
0
0
0
0
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
+:=
ng group1∑:= ng 3=
g1mean1
ng⎛⎜⎝
⎞⎟⎠
YT group1⋅( )⋅:= g1mean2.633−
13.833⎛⎜⎝
⎞⎟⎠
=
g2mean43.3
278.2−⎛⎜⎝
⎞⎟⎠
=
g3mean149.25−
111.9−⎛⎜⎝
⎞⎟⎠
=
Continue in this manner assigning processes 6,7,9,10.
Characterizing Activity 6-70
After all 10 points assigned
group1
112
3
4
5
67
8
9
10
10
1
1
0
00
0
0
1
= group2
1
12
3
4
5
6
7
8
9
10
00
0
0
1
0
1
0
1
0
= group3
1
123
4
5
67
8
9
10
010
0
0
10
1
0
0
=
g1mean0.825
10.8⎛⎜⎝
⎞⎟⎠
= g2mean40.667
237.967−⎛⎜⎝
⎞⎟⎠
= g3mean132.9−
107.433−⎛⎜⎝
⎞⎟⎠
=
Characterizing Activity 6-71
i 1 2, 10..:=
dist1i YT i⟨ ⟩ g1mean−( ) YT i⟨ ⟩ g1mean−( )⋅:=
dist2i YT i⟨ ⟩ g2mean−( ) YT i⟨ ⟩ g2mean−( )⋅:=
dist3i YT i⟨ ⟩ g3mean−( ) YT i⟨ ⟩ g3mean−( )⋅:=
Now check to see if each process remains closer to its current group mean than to either of the other group means.
Characterizing Activity 6-72
More iterations?
dist1
1
12
3
4
5
6
7
8
9
10
27.15170.88
30.147
19.964
292.105
148.837
222.703
221.086
241.512
13.8
= dist2
1
12
3
45
6
7
8
9
10
278.029249.931
221.973
266.91540.319
198.228
30.64
210.666
16.397
241.471
= dist3
1
12
3
4
5
6
7
8
9
10
199.85636.837
161.557
175.963
245.373
33.898
190.551
46.269
218.653
180.762
=group1
1
12
3
4
5
6
7
8
9
10
10
1
1
0
0
0
0
0
1
= group2
1
12
3
4
5
6
7
8
9
10
00
0
0
1
0
1
0
1
0
= group3
1
12
3
4
5
6
7
8
9
10
01
0
0
0
1
0
1
0
0
=
If any process is closer to a different group mean, it gets moved into that group and group mean is recomputed. Continue until no change needed as in this case.
Characterizing Activity 6-73
-300
-250
-200
-150
-100
-50
0
50
100
-200 -150 -100 -50 0 50 100
Series1
P1P4 P10
P3P6P2
P8P7
P9
P5
Group 1
Graph of Original PVA Values
Group 2Group 3
Characterizing Activity 6-74
Example Use of Process Clusters
First collect data and determine process clustersDetermine the vector of means for each cluster. Compute distance of each process from the mean vector for its cluster. Let Di be the RV for distance of process in
cluster i from its mean. Estimate distribution empirically and choose threshold di such that P[Di> di]=.05 (or your choice).
Or use Chebyshev’s Inequality (next slide)For each new process first determine the closest cluster mean. (Suppose group i.) Compute distance of process to group i mean. Flag as suspicious if distance is greater than di.
Characterizing Activity 6-75
Chebyshev’s Inequality
2
If is a random variable with mean and standard deviation ,1then , for 0.
Note that the distribution of need not be known. The estimate
is conservative; that is, often the probabi
X
P X k kk
X
μ σ
μ σ⎡ − ≥ ⎤ ≤ >⎣ ⎦
2
1lity is much less than .k
Characterizing Activity 6-76
Choosing Number of Groups
References:Cluster Analysis, 4th Ed, Brian S. Everitt, Sabine Landau, Morven Leese, Oxford University Press, NY, 2001.A dendrite method for cluster analysis, Communications in Statistics, 3, 1-27, R. B. Calinski and J. Harabasz, 1974Pattern Classification and Scene Analysis, R. O. Duda and P. E. Hart, Wiley, NY, 1973.
Characterizing Activity 6-77
A Statistical Method for Profiling Network Traffic*•*Paper: David Marchette, Published in Proceedings of the Workshop on Intrusion Detection and Network Monitoring, Santa Clara, Ca, April, 1999. • “Two clustering methods described and applied to NETWORK data. These allow the clustering of machines into ‘activity groups’, which consist of machines which tend to have similar activity profiles. In addition these methods allow the user to determine whether current activity matches these profiles and hence to determine when there is ‘abnormal’ activity on the network. A method for visualizing the clusters is described, and the approaches are applied to a data set consisting of a months worth of data from 993 machines.”
Characterizing Activity 6-78
Example: Counts of Incoming Telnets (1999)
Characterizing Activity 6-79
Possible Approach
Tabulate incoming telnet sessions for current day and compare with activity for previous two months. Counts can be normalized as probabilities.Examine abnormal activity closelyCan only be done for major services and a limited number of machines.Marchette suggests using clustering to group large number of machines in order to find activity abnormal for the cluster.
Characterizing Activity 6-80
ExampleCounts kept for first 1024 ports in both TCP and UDP.Separate counts for ports > 1023.Normalized by total counts to produce probability (activity) vectors of dimension 2050 from data for 993 machines.
1024 + 1 + 1024 + 1Eliminate ports with prob < 0.2 leaves vectors of length 61.
Data plotted with pixel values ~ probabilities.K-means algorithm used for clustering.
Characterizing Activity 6-81
Clusters from port counts of 993 machines created with k-means algorithm.
Characterizing Activity 6-82
Idea for “flagging” inbound packets as abnormal.
Use the destination address to determine the appropriate cluster profile. Look at the activity probability vector (dim 2050) for that cluster.Pick a threshold and if P(dest_port)<=threshold, flag this packet as abnormal. Record the source address as possible attacker.
Characterizing Activity 6-83
Marchette Results
“There were [actually] 27 source IPs that were determined to be attackers against one or more of the 993 machines in the data set.”Total number of records analyzed: 1,757,206.
Characterizing Activity 6-84
Assigment: Read this paper.Intrusion detection and response: An empirical analysis of NATE: Network Analysis of Anomalous Traffic Events
September 2002
Proceedings of the 2002 workshop on New security paradigms
This paper presents results of an empirical analysis of NATE (Network Analysis of Anomalous Traffic Events), a lightweight, anomaly based intrusion detection tool. Previous work was based on the simulated Lincoln Labs data set. Here, we show that NATE can operate under the constraints of real data inconsistencies. In addition, new TCP sampling and distance methods are presented. Differences between real and simulated data are discussed in the course of the analysis.
Carol Taylor and Jim Alves-Foss
Note: Available in ACM Digital Library