Prediction Methods for Mitigating Computer Security Threats

Prediction Methods for Mitigating

Computer Security Threats

Errin W. Fulp

Department of Computer Science

Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats

Outline

Overview of data mining methods

Machine learning tools, techniques, and tasks

Preprocessing, data mining, and interpretation

Prediction or knowledge discovery

When applied to computer security

Large data sets and rare events (at least we hope...)

Methods for addressing each concern

Example application, function discovery in computer networks

Who is doing what in a computer network?

Identify the application based on the pattern of interactions


What is Data Mining

Extracting hidden patterns from data

Can be used to uncover existing hidden patterns

...but it cannot uncover patterns not already in the data

Typically two major objectives

Knowledge discovery - determine facts about the data

Forecasting or predictions - predict future events

Both are relevant to computer security


Steps in the Process

Standard data-oriented view of Knowledge Discovery in Databases

Data

selection

Target Data

preprocessing

Preprocessed Data

transformation

Transformed Data

data mining

Patterns

interpretation

Knowledge

Let’s divide into a process-oriented view

Preprocessing

transformed data

Data Mining

patterns

Interpretation


Preprocessing Data

Once the objective is determined, assemble the data

Again, can only uncover existing patterns

Clean the data, removing noise and account for missing data

Remove unwanted data that hinders data analysis... but what is

noise with regards to security...

Do we really want to remove outliers?

Reduce and transform data into important feature vectors

egasseMemiTgaTleveLytilicaFtsoH

198.129.8.6 local7 notice 189 1171061732 sysstat

198.129.8.6 kern info 6 1171061732 kernel md : using maxim um available idle IO bandwidth

198.129.8.6 cron info 78 1171061733 crond 2500 (root) CM D (/usr/lib/sa/sa1 1 1)

198.129.8.6 auth info 38 1171062445 rsh(pam unix) 2215 session opened for user by (uid=0)

198.129.8.6 auth info 38 1171062445 in.rshd 2216 [email protected] as root: cmd=/root/temps

198.129.8.6 daemon info 30 1171062590 smartd 88 Device: /dev/twe0 SMAR T Prefailure Attribute

198.129.8.18 syslog info 46 1171062590 syslogd restart.

198.129.7.282 daemon info 30 1171062590 ntpd 2555 synchronized to 198.129.149.218, str



198.129.8.6 auth notice 37 1171062590 sshd(pam unix) 12430 auth failure; logname=el-fork-o

198.129.8.6 kern info 6 1171062590 kernel md : using 512k, over a total of 12287936 blocks.

198.129.8.6 cron info 78 1171062601 crond 2500 (root) CM D (/usr/lib/sa/fork-it 1 1)

198.129.8.6 kern alert 1 1171062692 kernel raid5: Disk failure on sde1, disabling device

preprocessing

1.1778 1.1779 1.178 1.1781 1.1782 1.1783 1.1784 1.1785

x 109

0

50

100

150

200

time (seconds)

tag

num

ber

h198.129.146.158

transformation

tag Encoding (e) Sequence f (base 10)

148 2 2148 2 22158 2 22240 1 2221158 2 22212 239188 2 22122 233188 2 21222 21588 1 12221 160158 2 22212 239188 2 22122 215


Types of Data Mining

Preprocessing

transformed data

Data MiningClassificationClusteringRegression

Rule Learning

patterns

Interpretation


Classification

Arrange data into predefined groups, developed from training

Learn a model (classifier) from labeled training data

Examples include k-nearest neighbor and support vector machines

Typically training is slow, but classification is fast

When applied to security (specifically IDS) [CBK]

1 Cluster training data using algorithm

2 For new data, distance to closest cluster is anomaly score

Assumption: Normal data instances belong to specific cluster(s) in the data, while

anomalous does not. Normal data is closest to the centroid.

Can also perform semi-supervised training


Clustering

Arrange data into groups, but the groups are not predefined

No training data required, therefore no training time...

Attack Graph Cluster Representation

1:execCode(commServer,root)

2:RULE 2 (remote exploit of a server program):1

3:netAccess(commServer,iccpProtocol,iccpPort)140:vulExists(commServer,iccpVulnerability,iccpService,remoteExploit,privEscalation)

4:RULE 5 (multi-hop access):0.56:RULE 5 (multi-hop access):0.5

5:hacl(commServer,commServer,iccpProtocol,iccpPort)7:hacl(dataHistorian,commServer,iccpProtocol,iccpPort)8:execCode(dataHistorian,root)


10:netAccess(dataHistorian,sqlProtocol,sqlPort) 137:networkServiceInfo(dataHistorian,oracleSqlServer,sqlProtocol,sqlPort,root)138:vulExists(dataHistorian,oracleSqlVulnerability,oracleSqlServer,remoteExploit,privEscalation)

11:RULE 5 (multi-hop access):0.5 131:RULE 5 (multi-hop access):0.5 133:RULE 5 (multi-hop access):0.5135:RULE 5 (multi-hop access):0.5

132:hacl(citrixServer,dataHistorian,sqlProtocol,sqlPort)13:execCode(citrixServer,normalAccount)

14:RULE 0 (When a principal is compromised any machine he has an account on will also be compromised):0.5

15:canAccessHost(citrixServer)

79:principalCompromised(ordinaryEmployee)

130:hasAccount(ordinaryEmployee,citrixServer,normalAccount)

16:RULE 7 (Access a host through executing code on the machine):1


113:RULE 8 (Access a host through a log-in service):1

18:execCode(citrixServer,root)

19:RULE 4 (Trojan horse installation):0.2

20:accessFile(citrixServer,write,�/usr/local/share�)

21:RULE 15 (NFS semantics):1

22:accessFile(fileServer,write,�/export�) 112:nfsMounted(citrixServer,�/usr/local/share�,fileServer,�/export�,read)

23:RULE 16 (NFS shell):0.626:RULE 16 (NFS shell):0.629:RULE 16 (NFS shell):0.6 106:RULE 16 (NFS shell):0.6 109:RULE 16 (NFS shell):0.6

27:hacl(citrixServer,fileServer,nfsProtocol,nfsPort)28:nfsExportInfo(fileServer,�/export�,write,citrixServer)30:hacl(webServer,fileServer,nfsProtocol,nfsPort) 31:nfsExportInfo(fileServer,�/export�,write,webServer)32:execCode(webServer,apache)


34:netAccess(webServer,httpProtocol,httpPort) 104:networkServiceInfo(webServer,httpd,httpProtocol,httpPort,apache) 105:vulExists(webServer,�CAN-2002-0392�,httpd,remoteExploit,privEscalation)

35:RULE 5 (multi-hop access):0.595:RULE 5 (multi-hop access):0.5 97:RULE 5 (multi-hop access):0.599:RULE 5 (multi-hop access):0.5101:RULE 6 (direct network access):1

36:hacl(vpnServer,webServer,httpProtocol,httpPort)37:execCode(vpnServer,normalAccount)


39:canAccessHost(vpnServer) 94:hasAccount(ordinaryEmployee,vpnServer,normalAccount)

40:RULE 7 (Access a host through executing code on the machine):1 41:RULE 8 (Access a host through a log-in service):1

42:netAccess(vpnServer,vpnProtocol,vpnPort)91:logInService(vpnServer,vpnProtocol,vpnPort)

43:RULE 5 (multi-hop access):0.5 45:RULE 5 (multi-hop access):0.547:RULE 5 (multi-hop access):0.586:RULE 5 (multi-hop access):0.5 88:RULE 6 (direct network access):1

44:hacl(vpnServer,vpnServer,vpnProtocol,vpnPort) 46:hacl(webServer,vpnServer,vpnProtocol,vpnPort)87:hacl(workStation,vpnServer,vpnProtocol,vpnPort) 49:execCode(workStation,normalAccount)


51:canAccessHost(workStation)

83:hasAccount(ordinaryEmployee,workStation,normalAccount)



59:RULE 8 (Access a host through a log-in service):1

54:execCode(workStation,root)


56:accessFile(workStation,write,�/usr/local/share�)

57:RULE 15 (NFS semantics):1

58:nfsMounted(workStation,�/usr/local/share�,fileServer,�/export�,read)

60:netAccess(workStation,tcp,sshProtocol) 75:logInService(workStation,tcp,sshProtocol)

61:RULE 5 (multi-hop access):0.5 63:RULE 5 (multi-hop access):0.5 65:RULE 5 (multi-hop access):0.569:RULE 5 (multi-hop access):0.5 71:RULE 5 (multi-hop access):0.573:RULE 5 (multi-hop access):0.5

64:hacl(citrixServer,workStation,tcp,sshProtocol) 66:hacl(fileServer,workStation,tcp,sshProtocol)67:execCode(fileServer,root)


70:hacl(vpnServer,workStation,tcp,sshProtocol) 74:hacl(workStation,workStation,tcp,sshProtocol)

76:RULE 12 ():1

77:networkServiceInfo(workStation,sshd,tcp,sshProtocol,sshPort)

80:RULE 10 (password sniffing):0.8 82:RULE 10 (password sniffing):0.8 84:RULE 11 (incompetent user):0.2

85:inCompetent(ordinaryEmployee)

89:hacl(attacker,vpnServer,vpnProtocol,vpnPort) 103:attackerLocated(attacker)

92:RULE 13 ():1

93:networkServiceInfo(vpnServer,vpnService,vpnProtocol,vpnPort,root)

96:hacl(webServer,webServer,httpProtocol,httpPort) 100:hacl(workStation,webServer,httpProtocol,httpPort)102:hacl(attacker,webServer,httpProtocol,httpPort)

110:hacl(workStation,fileServer,nfsProtocol,nfsPort)111:nfsExportInfo(fileServer,�/export�,write,workStation)

114:netAccess(citrixServer,sshProtocol,sshPort) 127:logInService(citrixServer,sshProtocol,sshPort)

115:RULE 5 (multi-hop access):0.5117:RULE 5 (multi-hop access):0.5119:RULE 5 (multi-hop access):0.5121:RULE 5 (multi-hop access):0.5 123:RULE 5 (multi-hop access):0.5125:RULE 5 (multi-hop access):0.5

118:hacl(citrixServer,citrixServer,sshProtocol,sshPort)120:hacl(fileServer,citrixServer,sshProtocol,sshPort)122:hacl(vpnServer,citrixServer,sshProtocol,sshPort) 126:hacl(workStation,citrixServer,sshProtocol,sshPort)

128:RULE 12 ():1

129:networkServiceInfo(citrixServer,sshd,sshProtocol,sshPort,root)

134:hacl(commServer,dataHistorian,sqlProtocol,sqlPort)136:hacl(dataHistorian,dataHistorian,sqlProtocol,sqlPort)

5 10 15 20 25 30 35 40

5

10

15

20

25

30

35

40

5 10 15 20 25 30 35 40

5

10

15

20

25

30

35

40

Examples of statistical classification include

k-means clustering and fuzzy clustering

Have difficulty with higher dimensional data [CBK]


Regression

Model the data with the least error

Useful for forecasting and prediction

As applied to security, regression typically has two steps

1 Fit regression model to the data

2 For each test instance, residual determines anomaly score

Presence of anomalies can influence the robustness of the model


Association Rule Learning

Searches for relationships between variables

Learn rules that capture normal behavior, any test that is not

covered is an anomaly (one-class) [EEGPP06, LSM98]

For multi-class

Learn rules from training data

using algorithm, each rule has a

confidence values

For each test instance find the

best rule, the inverse of the

confidence is the anomaly score

if UDP is AVERAGE ∧ TCP is AVERAGE then ICMP is AVERAGE

if SYN is AVERAGE ∧ FIN is AVERAGE then ICMP is AVERAGE

if ICMP is AVERAGE ∧ UDP is AVERAGE ∧ TCP is AVERAGE ∧SYN is AVERAGE then FIN is AVERAGE

if UDP is AVERAGE ∧ FIN is AVERAGE then SYN is AVERAGE

if UDP is AVERAGE ∧ SYN is AVERAGE then ICMP is AVERAGE

if SYN is AVERAGE then ICMP is AVERAGE

if ICMP is AVERAGE ∧ FIN is AVERAGE then SYN is AVERAGE

if UDP is AVERAGE ∧ TCP is AVERAGE ∧ SYN is AVERAGE ∧FIN is AVERAGE then ICMP is AVERAGE

if UDP is AVERAGE ∧ SYN is AVERAGE then FIN is AVERAGE

if ICMP is AVERAGE ∧ TCP is AVERAGE ∧ SYN is AVERAGE

then FIN is AVERAGE

if ICMP is AVERAGE ∧ SYN is AVERAGE then FIN is AVERAGE

.

.

.


Interpreting the Results

Final step of the process, evaluate the patterns discovered

Not all are valid or may have a validity time period

Standard measures: accuracy, precision, recall, and F-score

Unbalanced test sets are a concern

Overfitting – excellent job of fitting the data, but not predicting

Find patterns in training-set not present in test set

0 0.2 0.4 0.6 0.8 1-3

-2

-1

0

1

2

3

dataoverfit modelcorrect model


When Applied to Computer Security

Two major issues...

Large data sets

Rare events


Security and Large Data Sets

Security typically involves large data sets

Sendmail “11,500 system calls per message” [WGZ08]

1998 MIT network data, 7 weeks is about 5 million connections

Must be processed quickly and accurately

Data oriented solutions

Discretization, feature selection [FFH08], feature construction

(principal component analysis) [WGZ04], and sampling [PP07]

Method oriented solutions

Parallel data mining (high-performance data mining)


Security and Rare Events

Rare event processing is often required

We hope security events are infrequent...

Are there enough examples for supervised learning?

Black swan theory (hard to predict, high consequence, and easy to

see afterwards)

Bulk anomalies (worms) are the opposite... [CBK]

Standard approaches do not work well with rare events [JAK01]

Normal events maybe similar, but rare events often different

Many techniques attempt to model normal, look for variations

Over-sample rare class, down-size large class, artificial cases


Rare Events in Other Areas

Insurance risk modeling [PRA00]

E-commerce and web mining, “Online merchants convert an

average of 2%-3% of their site visitors into buyers”

Churn analysis, “number of customers that end relationship with a

company in a given period” [NGK+06]

Hardware faults, for example new disk failures [AWG+93]

Airline No-Show predictions [LHC03]


Example Security Application: Who is Doing What?

Given a computer network, discover what computers are doing

Specifically what applications or types of applications

Identifying an application is important for two reasons

Management of network resources

Compliance with security policies

However current methods do not always work

Port numbers are unreliable

Payloads can be encrypted

Current in-the-dark methods can defeated


A New Approach

Given a set of computer network trace data, is it possible to

identify the application protocols (e.g. HTTP, AIM, DNS) that

hosts are using, based on interactions patterns?

Three different views of the same network

Physical Logical Application


Motifs

A motif is a pattern of interconnections occurring in complex

networks at numbers that are significantly higher than those in

randomized networks

Motifs have been applied to several complex networks

Gene regulation, neural networks, ecosystem food webs, electronic

circuits (forward logic chips, digital fractional multipliers), and

World Wide Web

Certain motifs can be linked to specific functions


Applying this Idea to Application Identification

ugggh... time consuming easy time consumingtalk to

grad student...

Parsedata

Constructapplication graphs

Create motifprofiles

Nearest neighborclassification

Interpretresults

Evolutionaryattributeweighting

Preprocessing

Collect data, parse into connection information

Find all order 3 and 4 motifs and build motif profiles

k-nearest-neighbor classification (for training and testing)

Interpret results, possibly weight features to improve performance


Initial Experiments

Sources of data

Dartmouth University campus wireless network, Fall 2003

OSDI Conference 2006

Lawrence Berkeley National Lab 2004/2005

Create a profile per application

Application x profile =

1.000 0.662 0.650 0.632 0.585

Application y profile =

0.900 0.672 0.50 0.772 0.85

Given new application, find best matching profile


Motif Profile Results

AIM DNS HTTP Kazaa

AIMDNSHTTPKazaaMSDSNetbiosSSH

MSDS Netbios SSH

Results very good compared to traditional graph statistics

Although there is a problem with AIM and SSH...

So what is the problem...?


So What is the Problem?


For Further Reading I

[AWG+93] C. Apte, S. M. Weiss, G. Grout, Chidanand Apte, Sholom Weiss, and Gordon Grout.

Predicting defects in disk drive manufacturing: A case study.

In Proceedings of the IEEE CAIA93, pages 212–218, 1993.

[CBK] Varun Chandola, Arindam Banerjee, and Vipin Kumar.

Anomaly detection: A survey.

To appear in ACM Computing Surveys, September 2009.

[EEGPP06] Aly ElSemary, Janica Edmonds, Jesus Gonzalez-Pino, and Mauricio Papa.

Applying data mining of fuzzy association rules to network intrusion detection.

In Proceedings of the IEEE Workshop on Information Assurance , 2006.

[FFH08] Errin W. Fulp, Glenn. A. Fink, and Jereme N. Haack.

Predicting computer system failures using support vector machines.

In Proceedings of the Workshop on Analysis of Sytem Logfiles , 2008.

[JAK01] Mahesh V. Joshi, Ramesh C. Agarwal, and Vipin Kumar.

Mining needle in a haystack: classifying rare classes via two-phase rule induction.

In SIGMOD ’01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data ,

pages 91–102, 2001.

[LHC03] Richard D. Lawrence, Se June Hong, and Jacques Cherrier.

Passenger-based predictive modeling of airline no-show rates.

In Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data

Mining, pages 397–406, 2003.


For Further Reading II

[LSM98] Wenke Lee, Salvatore J. Stolfo, and Kui W. Mok.

Mining audit data to build intrusion detection models.

In Proceedings of the International Conference on Knowledge Discovery and Data Mining , 1998.

[NGK+06] Scott A. Neslin, Sunil Gupta, Wagner Kamakura, Junxiang Lu, and Charlotte H. Mason.

Defection detection: Measuring and understanding the predictive accuracy of customer churn models.

Journal of Marketing Research, 43:204–211, 2006.

[PP07] Animesh Patcha and Jung-Min Park.

An adaptive sampling algorithm with applications to denial-of-service attack detection.

In Proceedings of the IEEE International Conference on Computer Communications and Networks, pages

11–16, 2007.

[PRA00] Edwin P. D. Pednault, Barry K. Rosen, and Chidanand Apte.

Handling imbalanced data sets in insurance risk modeling.

Technical Report RC-21731, IBM, 2000.

[WGZ04] Wei Wang, Xiaohong Guan, and Xiangliang Zhang.

A novel intrusion detection method based on principle component analysis in computer security.

In Proceedings of the International Symposium on Neural Networks, pages 657–662, 2004.

[WGZ08] Wei Wang, Xiaohong Guan, and Xiangliang Zhang.

Processing of massive audit data streams for real-time anomaly intrusion detection.

Computer Communications, 31(1):58 – 72, 2008.


Title

Item

Sub-item


Prediction Methods for Mitigating Computer Security Threats

Documents

Transcript of Prediction Methods for Mitigating Computer Security Threats