BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

42
BUILDING A USAGE PROFILE OF A SYSTEM FOR DETECTING THREATS ON THE SYSTEM October, 2016 Nathanael Asaam [email protected] Abstract This paper is an investigation into building usage profiles of a system using behavior models. Such behavior models are the heart of machine learning, and evolutionary computing. Some other methods of building such usage profiles include the use of statistical models such as time series models, univariate models and mean and standard deviation models. The aim of building these usage profiles is to be able to detect unusual behavior on the system. This paper uses regression to determine the usage profiles of a system by studying the relationship between relevant system variables that will be used to formulate the usage profile. The dependent and independent variables for the usage profile can be determined from an audit trail. Additionally, the paper applies hidden markov models to study the various states a computer system can fall into and the various stage transition in order to be able to predict unusual behavior in the system. Unusual behavior in this case may be a particular state or a transition from one state to another or the manner in which a particular state transition occurred. With this usage profile which is composed of the usage profile equation and a mean and standard deviation model that capture average usage and its standard deviation and the markov chain model that captures the various states of the system and the various state transition it becomes possible to detect anomaly on the system. Using linear and nonlinear programming, the usage profile equation can be maximized to determine states of the system and points at which the system is optimal. This can help improve the system’s usage. Also using differential coefficient of the usage profile equation and other statistical models such as the mean and standard deviation model, a threat profile of the system can be developed. When the threat profile equation is minimized using linear and nonlinear programming, it will help prevent threats on the system. The benefit of this research is its application to the development of anomaly threat detection systems and risk analysis systems that can be used for performing computer security risk assessments and analysis

Transcript of BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

Page 1: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

BUILDING A USAGE PROFILE OF A SYSTEM FOR

DETECTING THREATS ON THE SYSTEM October, 2016

Nathanael Asaam [email protected]

Abstract This paper is an investigation into building usage profiles of a system using behavior models. Such behavior models are the heart of machine learning, and evolutionary computing. Some other methods of building such usage profiles include the use of statistical models such as time series models, univariate models and mean and standard deviation models. The aim of building these usage profiles is to be able to detect unusual behavior on the system. This paper uses regression to determine the usage profiles of a system by studying the relationship between relevant system variables that will be used to formulate the usage profile. The dependent and independent variables for the usage profile can be determined from an audit trail. Additionally, the paper applies hidden markov models to study the various states a computer system can fall into and the various stage transition in order to be able to predict unusual behavior in the system. Unusual behavior in this case may be a particular state or a transition from one state to another or the manner in which a particular state transition occurred. With this usage profile which is composed of the usage profile equation and a mean and standard deviation model that capture average usage and its standard deviation and the markov chain model that captures the various states of the system and the various state transition it becomes possible to detect anomaly on the system. Using linear and nonlinear programming, the usage profile equation can be maximized to determine states of the system and points at which the system is optimal. This can help improve the system’s usage. Also using differential coefficient of the usage profile equation and other statistical models such as the mean and standard deviation model, a threat profile of the system can be developed. When the threat profile equation is minimized using linear and nonlinear programming, it will help prevent threats on the system. The benefit of this research is its application to the development of anomaly threat detection systems and risk analysis systems that can be used for performing computer security risk assessments and analysis

Page 2: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

INTRODUCTION

If a usage profile of a system can be built, it will become possible to detect unusual behavior on

the system. The method for building such usage profiles involved determining factors of the

system that are critical to the system. These factors can be seen as critical system variables that

affect the system’s usage. The other thing to consider is determining the way in which you can

obtain an abstract representation of the usage profile. The abstract representation of the usage

profile can be achieved by the application of behavior models such as statistical models, machine

learning models and cognitive based models.

The first goal of this research paper is to investigate techniques for building a usage profile of a

computer system. The aim of building the usage profile is to be able to have a working model

that describes the systems behavior. The second goal of the research is to be able to detect

unusual behavior on the system. Unusual behavior will be detected as deviation from the usage

profile model built. The last goal is to be able to build an anomaly threat detection system and a

risk analysis system that can be used for detecting threats and performing risk analysis on a

system.

RESEARCH MODEL AND METHODOLOGY

This paper investigates threat detection using application of machine learning, regression, linear

and nonlinear programming and calculus. The research model of this paper is inspired by

application of regression, machine learning and statistical models and the methodology for

threat detection is inspired by linear and nonlinear programming, calculus and multithreading.

These fields of study are mainly related to computer science, discrete mathematics and

operations research.

BEHAVIOR MODELS

The behavior models that can be used for building a usage profile include, machine learning

models, statistical models, cognitive based models, user intention based models and computer

immunology models. Examples of statistical models that can be used are time series models,

univariate models and moment or mean and standard deviation model. Machine learning

Page 3: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

models that can be used include neural networks, Bayesian networks, hidden markov models

and genetic algorithms.

SYSTEM THREATS

There are three types of system logs that our intended threat analysis and detection hope to

arrive at. These are system errors, system threats, and usage rates all categorized based on the

magnitude and characteristics of an instance of the threat model. These logs must as such be

audited by a security expert to analyze changes in our computer system that fits or deviates from

the current usage profile, in order to project a more appropriate instance of the usage profile

that will be perfectly functional and suiting in the future.

PROBLEM DEFINITION

If the normal usage or behavior of a computer system can be represented by an abstract model,

then this abstract model can be used to detect threats on the system. The threats on the system

can be detected as deviations from the abstract model which is the behavior of the system. The

main problems this paper seek to investigate are listed below.

Representing the normal usage or behavior of a system with an abstract model.

Determining activities and occurrences on the system that are deviations from the

system’s normal behavior or usage.

Representing these activities or deviations with an abstract model.

Preventing such activities or occurrences from occurring on the system.

In this paper the system’s normal behavior is known as the usage profile and the deviations from

the systems normal behavior is known as the threat profile of the system.

RESEARCH QUESTIONS

The main questions to be investigated are listed below.

What are the best and most efficient techniques for modelling a system’s normal behavior

or usage?

What are the best and most efficient techniques for design and implementation of a

threat detection system?

Page 4: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

How can a we build a risk analysis system for performing risk assessment of a computer

system?

OBJECTIVES

The main objectives of this research are as follows.

Representing a system’s normal functioning with an abstract model.

Design and implementation of a threat detection system.

Design and implementation of a mobile security audit framework.

Page 5: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

LITERATURE REVIEW

This section review major topics that constitute this research and work done in this areas. The

topics that will be reviewed are intrusion detection systems, behavior encryption, risk analysis,

information security awareness and practices, ways of mitigating risks on social networking sites

and application of markov chain models for anomaly detection.

INTRUSION DETCETION SYSTEMS

There are two types of intrusion detection systems. These are signature based or knowledge

based threat detection systems and anomaly based threat detection systems. Signature based

threat detection systems detect threat using direct mappings of incidents with a database of

known threats. The database of know threats is called the threat signatures. Anomaly threat

detection system detect threats based on deviation from an observed behavior pattern of the

system for which the threat detection system has been built. They are also known as behavior

threat detection systems. The database of threats of knowledge based threat detection system

which is the threat signatures must be constantly update for new identified threats. Since new

threats may not be in the threat signatures, the correctness of detecting threats using such threat

detection systems is sometimes compromised.

However, anomaly threat detection systems have higher correctness of detecting threats

but they sometimes give false alarms since there are no mapping of system incidents with a

database of known threats. Anomaly based threat detection system are built using artificial

intelligence technologies. Besides this, intrusion detection systems are classified based on

purpose for which they are built and activeness and passiveness with which they deal with

threats. There are host based and network based threat detection systems made for such

purposes. Also, active threat detection systems are configured to block or prevent attacks while

passive intrusion systems are configured to monitor, detect and alert threats.

BEHAVIOR ENCRYPTION

Behavior algorithms are applied to safeguard information on computing devices such as mobile

phone and laptops. These algorithms are the basis for building systems that study and encrypt

Page 6: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

user behavior on computing devices in order to ensure the security of the computing device. A

study into mobile platform security reports that behavior encryption system have been designed

and built focusing on mobile platforms. Results from this study indicates that behavior

encryption application systems are effective at ensuring mobile platform security. It must be

stated that cryptographic study into how to encrypt the usage profile of a system can fall under

behavior encryption. This will help in securing the information that embodies the usage profile.

It is necessary because if the usage profile can be known, then it is possible to launch an attack

on the system. It must be stated that the usage profile may consist of user behavior and network

behavior. As such, if such information is compromised, then a user can be impersonated or the

network can be compromised.

RISK ANALYSIS

Computer risk analysis is also called risk assessment. It involves the process of analyzing and

interpreting risk. To analyze risk, the scope and methodology has to be initially determined.

Later, information is collected and analyzed before interpreting the risk analysis results.

Determining the scope can be described as identifying the system to be analyzed for risk and

parts of the system that will be considered. Also, the analytical method that will be used with its

detail and formality must be planned. The boundary, scope and methodology used during risk

assessment determine the total amount of work efforts that is needed in the risk management,

and the type and usefulness of the assessments result.

Risk has many components including assets, threats, likelihood of threat occurrence,

vulnerability, safeguard and consequence. Risk management include risk acceptance which takes

place after several risk analysis. Normally, after risk has been analyzed and safeguards

implemented, the remaining or residual risk in the system that makes the system functional must

be accepted by management. This may be due to constraints on the system such as ease of use,

or features of the systems for which strict safeguard will cost the organization operational

problems. As such, risk acceptance, like the selection of safeguards, should take into account

various factors besides those addressed in the risk assessment. In addition, risk acceptance

should take into account the limitations of the risk assessment.

Page 7: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

INFORMATION SECURITY AWARENESS AND PRACTICES

A paper on information security awareness in Saudi Arabia discusses information security

awareness and practices. The paper is entitled “A study of information security awareness and

practices in Saudi Arabia.” This paper emphasizes the fact that information is under constant

threat from cyber vandals. However, Saudi Arabia is rated poor in terms of information security

due to the fact that the country is a highly suppressed, patriarchical and tribal culture country.

The paper examined the level of information security awareness among the general

public in the country using an anonymous online survey based on instruments the Malaysian

Security Organization produced. In all, 633 persons responded to the survey and analysis

confirmed that indeed, information security awareness is low in the country and this is mostly

related to the fact that, the country is highly suppressed, patriarchical and tribal in nature.

PROTOCOL FOR MITIGATING RISKS ON SOCIAL NETWORKING SITES

According to an academic paper entitled, “Protocol for mitigating the risk of hijacking social

networking sites”, hackers can hijack a user’s session on social networking sites, impersonate the

victim and take over his session. The paper deals with this risk by presenting a security

authentication protocol for mitigating the risk.

The protocol takes into account that users of social networking sites connect to the sites

using several platforms and connection speeds. To cater for mobile devices and tablets using Wifi

connection, a novel Self-Configuring Repeatable Hash Chains (SCRHC) protocol was developed to

prevent the hijacking of session cookies. This protocol supports three levels of caching making it

possible to forfeit storage space for enhanced performance and reduced workload.

APPLICATION OF MARKOV CHAIN MODELS FOR ANOMALY DETECTION

According to a research paper on application of markov chain models for anomaly detection, a

temporal behavior of a system can be built using markov chain models. The technique was used

to represent the temporal profile of the normal behavior of a computer and a network system.

The markov chain model of the normal profile was learnt from historic data. The observed

behavior was analyzed to infer into the probability that the markov chain model of the norm

Page 8: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

profile supports the observed behavior. A low probability of support indicates anomalous

behavior that may result from intrusive activities. The technique was implemented and tested

on audit data of a Sun Solaris system. The testing results show that the technique clearly

distinguished intrusive activities from normal activities. According to the paper, two primary

sources of data has been used to capture activities in a computer or network system for intrusion

detection, network traffic and audit data.

Page 9: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

RESEARCH MODEL AND METHODOLOGY RESEARCH MODEL

Assume that the normal usage (Y) of a system such as smart phone, laptop or a wireless network

can be represented by a mathematical function;

Y=f (Xi, Ci) such that Xi represents system variables like number of functions or number of

authentications. Ci represents system constants like maximum or minimum number of

authentications. When a change in Y is beyond the standard deviation determined from the data

set of our usage, then that change indicates a threat. To investigate this threat, machine learning

algorithms, mathematical functions and behavior based intrusion detection systems will be

studied to determine Y in terms of a number of variables that represent Y appropriately.

The expected usage model of the network to be investigated includes the following

components. Host Usage Model, Server Usage Model, Device Usage Model, Port Usage Model,

Network Usage Model, Session Usage Model, Authentication Usage Model, Memory Usage

Model, CPU Usage Model, Battery Usage Model and Program Usage Model. These components

are expected to be derived from the variables listed below.

• Average number of application software that run on the mobile system while using the system

• Average number of system processes that run on the mobile system while using the system

• Average number of authentications in the mobile system.

• Average number of user actions that happens on the mobile system

• Average time a user spends before his session expires.

• Average time the mobile facility or resource functions each day.

• Number of paired ports communicating on the network

• Average amount of memory space used on devices while the network is being operated.

• Average CPU time spent on a single device on the network

• Average life span of a single device battery on the network.

Page 10: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

METHODOLOGY

The list below details activities or processes that will be followed to represent a computer system

with an abstract mathematical model and analyze changes in the system. It is hoped that

following these processes will arrive at design and implementation of a normal usage model, a

threat detection system and a mobile security audit framework.

• Machine Learning Algorithms & Behavior Based Intrusion Systems: Investigate machine learning

algorithms, mathematical functions, and behavior based intrusion detection systems in order to

determine the extent to which the normal usage of a mobile system can be represented by the

research model.

• Audit trails: Analyze audit trails in order to formulate a set of independent and dependent

variables and their associated data set that will help in modelling the usage model of a mobile

system.

• Normal Usage Model: Apply the knowledge gained from the machine learning algorithms and

behavior based intrusion detection systems study and the audit trails analysis to model and

represent the normal usage of a mobile system such as a smart phone, laptop or wireless network.

• Threat Modelling: Study differential equations of the normal usage model and its applications in

order to model, detect and prevent threats.

• Boolean Calculus: Apply Boolean algebra and calculus of Boolean functions to design and

implement a hardware and software that make up the Normal Usage and Threat Detection

Systems.

• Use programming as a tool to experiment representations of the normal usage and threat models

to aid design and implementation of a mobile security audit framework.

• Employ questionnaire to collect information about the usage of computers and mobile phones.

• Threat Detection Systems: Develop an anomaly based threat detection system to demonstrate

the effectiveness of the research model. The goal is to measure the effectiveness of the threat

detection system developed, at preventing threats on a computer system.

Machine Learning Algorithms & Behavior Based Intrusion Systems

Machine learning techniques and algorithms will be investigated to know the extent to which an

expert system that learns a computer system’s usage can be built. Since the expected usage

model is a mathematical model, various mathematical modelling techniques will be applied to

determining the normal usage model.

Page 11: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

When deviations from these mathematical models are analyzed it can lead to design and

implementation of behavior based intrusion detection systems. As such, a thorough study into

design and implementation of behavior based intrusion detection systems will be done.

Audit Trail Analysis

It is expected that computer security audit reports will be sampled and analyzed to arrive at a

set of dependent and independent variables and their data set. These variables and their

associated data set can be used to formulate the normal usage model.

Normal Usage Model

An investigation into applying the knowledge gained from the machine learning study, the

mathematical modelling study, the behavior based intrusion detection system study and the

audit trail analysis will the done. It is hoped that this will answer the question how do you

represent the normal functioning of a computer system with a mathematical abstract model.

Threat Modelling

Differential equations of the normal usage model will be investigated to know the extent to

which deviations from the normal usage models can be analyzed. An abstract mathematical

model of these deviations will be formulated. These abstract models are derivatives of the

normal usage model.

Boolean Calculus

A study into representing the normal usage model with a boolean function will be done. It is

hoped that analyzing these boolean functions will aid in building a hardware that is the expected

usage system. Differential equations of these boolean functions will be studied to analyze

changes in the system that indicate deviation from the normal usage model.

Experimenting Usage and Threat Models

Programming will be used as a tool to experiment various usage and threat models. These usage

and threat models are expected to be derived from a computer system. This experiment will lead

to design and implementation of a normal usage system, a threat detection system and a risk

Page 12: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

analysis system. These systems are expected to be components of a mobile security audit

framework.

Computer Usage Survey

A questionnaire for obtaining information about computer and smart phone usage will be

employed. It is expected that this will give an idea about various statistics that make up a

computer or smart phone’s usage. These statistics will be a guideline for sampling experimental

data of a computer system’s usage during experimenting the usage and threat models.

Threat Detection Systems

It is hoped that an anomaly based threat detection system will be developed to demonstrate the

effectiveness of the research model at being used to model systems usage and threats. The

effectiveness of the threat detection system developed at preventing threats on a computer

system will also be measured. In this project, the threat detection system that will be developed

is for ecommerce sites.

Page 13: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

USAGE PROFILE OF A SYSTEM AND THREATS ASSOCIATED WITH THE SYSTEM

Building a usage profile of a system requires determining an abstract model that represents the

system’s usage appropriately. In this paper, we will determine the usage profile by using

statistical models and machine learning models. The statistical model that will be used is the

moments or mean and standard deviation model. The machine learning model that will be used

is the hidden markov model. Additionally, we will be determining the usage profile by trying to

find out the relationship between dependent and independent variables that make up the

system. As such, we will be using regression to determine the relationship between the

dependent and independent variables. A study into the application of differentiation also gives

insight into the kind of threats associated with the system.

MATHEMATICAL MODELLING TECHNIQUES

The mathematical relation that represents the usage of a system can be determined using

regression analysis. Regression analysis is a field of statistics that employs the least squares

method to determine the relationship between a dependent and one or more independent

variables given the data set for these variables. The least squares method tries to determine the

relationship by minimizing the error margin of the relationship determined. Additionally,

differential equations will be used to model threats in the system given the usage equation.

SIMPLE LINEAR REGRESSION

Simple linear regression involves a dependent variable and a single independent variable. The

goal is to find a linear relationship between the two variables. The linear relationship found are

typically of the form y=b0+b1x where y is the dependent variables. The slope of the line is b1 and

the y-intercept is b0. The relationship between the dependent and independent variables can be

determined using the least squares method. First of all, the sum of the dependent and the

independent variables (∑y, ∑x) and the sum product of the dependent and the independent

variables (∑xy) are determined. Secondly, the sum of the squares of the dependent and the

independent variables (∑y2, ∑x2) must be determined.

Page 14: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

The constant that represents the slope of the line that best fits the relationship determined is

calculated as the product of the sum product of the dependent and the independent variables

and the sample size, minus the product of the sums of the dependent and the independent

variables divided by the product of the sample size and the sum of square of the independent

variable minus the square of the sum of the independent variable. This is given a s: (n∑xy- (∑x

∑y) ∕(n∑x2-(∑x)2). Where n is the sample size.

The constant that represent the y-intercept of the line determined to be the relationship

between the dependent and the independent variables can be calculated as the product of the

sum of the dependent variable and the sum of squares of the independent variables minus the

product of the sum of the independent variable and the sum product of the dependent and

independent variables divided by the product of the sum of squares of the independent variable

and the sample size minus the square of the sum of the independent variable. This is given as

(∑y∑x2-∑x∑xy) /((n∑x)2-(∑x)2). Where n is the sample size.

Finally, the correlation coefficient of the predictive relationship determined is calculated as the

product of the sample size and the sum product of the dependent and independent variable

minus the product of the sums of the dependent and independent variables divided by the

square root of the product of the sample size and the sum of the squares of the independent

variable minus the product of the squares of the sum of the independent variables multiplied by

the product of the sample size and the sum of the squares of the dependent variable minus the

square of the sum of the dependent variable. This is given as (n∑xy-∑x∑y)/√(n∑x2-(∑x)2(n∑y2-

(∑y)2) where n is the sample size.

MULTIPLE LINEAR REGRESSION

Multiple linear regression problems involve a dependent variable and two or more independent

variables. Using the least squares method, the goal is to find the linear relationship between the

variables involved. The relationships are of the form y=b0 + b1x1+b2x2+…+bnxn, where n is the

number of independent variables and x1, x2,… ,xn are the various independent variables and y is

the dependent variable.

Page 15: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

To solve multiple linear problems, we first need to reduce the expected function or multiple

linear models to their simple linear forms. In this form, it is easier to determine the regression

equation. To do this we need to determine the y=b0+b1x for every independent variable. That

way, the regression coefficient set denoted b associated with the independent variables can be

determined using the least squares method. As such the set b made up of b1, b2,…bn is a set

containing the entire regression coefficient associated with the predicted regression function.

NON LINEAR REGRESSION

Nonlinear regression problems involve finding a nonlinear relationship between a dependent

variable and one or more independent variables. Because nonlinear graphs are difficult to

analyze, they can be represented mathematically as linear models before they are analyzed. This

makes it possible to use linear regression techniques to analyze such relationships.

One of the ways used to represent nonlinear relationships with linear models is taking

logs on both sides of the relationship equation. That reduces the nonlinear relationship to a

linear relationship. An example is the equation y2=x2/xy. To reduce this relationship to a linear

relation we take logs on both sides of the relation.

The resulting relationship is 2logy=2logx-logx-logy. When this relationship is simplified

the resulting relationship is logy=(logx)/3. In this form, the logy term represents the dependent

variable and the logx term represents the independent variable. Let K=logy and let P = logx. It

implies that K=P/3. This becomes the linear form of our nonlinear relation.

SINGLE VARIABLE CALCULUS REVIEW (DIFFERENTIATION)

Assume a system with exactly three major system variables. If sampling each of these variables

helps us to arrive at exactly one micro usage equation of our system that best represents the

behavior or functioning of that feature of our system, then we can use differential equations of

the three micro models to analyze and detect threats. Below are some examples of calculus

basics for our usage profile and threat modelling.

Y=2X+3 is a linear function that represents our first micro usage model. X is number of

authentications. Y=3X2+2X+6 is a quadratic function that represents our second micro usage

Page 16: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

model and X is the number of host on the system’s wireless network. Y=40/ X+ 5 is an exponential

function that represents our third micro usage model and X is the number of applications on a

host on the system’s wireless network. For each micro usage model, the differential coefficient

can be computed using the law for differentiation given below.

THEOREM 1: dy/dx(C) =0, where C is a constant. THEOREM 2: dy/dx (f[Xi, Ci]) is computed

as the product of the exponent of the first term that results from simplifying f (Xi, Ci) and the

constant besides it multiplied by the system variable Xi raise to the power the original exponent

of the first term minus one plus the result for iterating the first step till every term of f (Xi, Ci) has

been evaluated based on the first step. The final result looks like the sum of a series of rational

numbers computed from the law after going through all the terms.

From the calculus basics review above, the corresponding differential coefficients of the

three micro models are determined as follows; 2, 6X+2, and -40/ X2. If the average usage and

standard deviations of our micro models are computed, then we can analyze changes in our

system by looking at values of our usage equations and their derivatives and how they relate to

the average usage, its corresponding standard deviation, and how this helps us determine

threats.

INTEGRATION REVIEW

Assume the following functions y=3, y=4X+2 and y=9X2+3 are threat model equations. Based on our

three functions we will do an introductory review of integration which is a branch of calculus

that is a reverse operation for differentiation. The integrals for the functions are computed

respectively as 3X +C, 2X2+2X+C and 3X3+3X+C where C represents system constants in the

system. Computing the integral can be tricky so two laws are defined below to aid quick

computation of the integrals of a normal mathematical function.

THEOREM 3:

If a function is represented by a constant such as a rational number, the integral is the product

of the variable x and the rational number which is the constant plus a system constant c, to be

determined by about a pair of x and y values.

Page 17: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

THEOREM 4:

If a function is not represented by a constant, the integral is given as the constant of the first x

occurring term divided by the sum of the exponent of the first x occurring term and 1 multiplied

by the variable x raised to the power the sum of the exponent of the first x occurring term and 1

plus repeating the same for every x occurring term plus the corresponding system constant c.

BUIDING THE USAGE PROFILE

To build a usage profile, we use a mathematical model that captures the behavior of the system

and a markov chain model that captures various states and transitions in the system. The

mathematical model is made up of a usage equation composed of a dependent and independent

variables and a statistical model that captures average usage and its standard deviation. The

usage equation of the system can be summarized as Y=f (Xi, Ci), where Y is our systems’ usage

and Xi are the various independent variables of our system that constitutes the normal usage or

behavior of the system.

In order to determine the usage equation, it is essential to keep the method simple and

the variables simple in abstraction and minimal in quantity. This makes it easy to appropriately

represent the system’s usage with a usage equation. Then we use regression to determine the

mathematical equation that describe the usage of the system. In this paper, we break down the

usage of a system into various micro usage models that describe smaller parts of the system.

When we are able to determine the usage equation of these micro usage models and their

associates mean and standard deviation model, it means that we have finally built a usage profile

of the system. The other thing left is to be able to study the various states and state transitions

in the systems. This is also part of the usage profile.

AUTHENTICATION USAGE MODEL

The authentication usage model represents the usage of an authentication system. The

independent variables that must be sampled to determine the usage of an authentication system

are the average data transmitted during an authentication (x1) and the average network speed

for a single authentication (x2). The average data transmitted is the average of request and

response data for a single authentication and the average network speed is the average upload

Page 18: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

and download speed for a single authentication. The dependent variable that must be sampled

is the time taken for an authentication (y).

The goal of modelling the dependent and independent variables is to arrive at a

mathematical relationship between y and the two independent variables x1 and x2. It is

expected that the relationship will be Y=c1(x2/x1) +c2, where c1 and c2 are system constants. In

addition to that, some system constants that will aid threat analysis must be determined. These

are the total number of valid authentications, the expected authentications within a time frame,

the minimum authentications within a time frame and the maximum authentications within a

time frame.

The mathematical relationship between y, x1 and x2 is the normal usage model of the

authentication system. After this relationship has been determined, various occurrences that

deviate from this relationship can be used to analyze threats. For instance, any occurrence that

is not equal to the average usage is a threat. Additionally, any occurrence that indicates a change

outside an acceptable threshold is a threat. The acceptable threshold is a range within which

changes in the systems are deemed normal. Such a range is composed of the average usage and

standard deviation.

SESSION USAGE MODEL

A session usage model represents a single user’s behavior before his session expires. To

determine the mathematical model for a user’s session, two main independent variables must

be sampled. These are size of session data accumulated (x1), and number of user actions (x2).

The dependent variable that must be sampled is time spent before session expires (y). The

session usage model is expected to be made up of two micro usage models. The mathematical

representation of the micro usage models are expected to be Y=c1x1+c2 where c1 and c2 are

systems constants and Y=c1x2+c2 where c1 and c2 are system constants.

In addition to the two mathematical functions, some system constants that will aid threat

analysis must be determined. These include average user actions, average size of data

Page 19: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

accumulated, average time spent. These constants can be determined from the data set used to

determine the usage model.

The two mathematical relationships represent the session usage model. Both are linear

functions. It is expected that as user actions increase the time spent also increases. It is also

expected that as data accumulated increase times spent also increases.

MEMORY USAGE MODEL

The memory usage model represents the usage of memory space in a system. The independent

variables that must be sampled are number of application programs running (x1), and the

number of system processes running (x2). The dependent variable that must be sample is

amount of memory space being used(y). The mathematical relationship between x1, x2, and y is

expected to be y=c1x1+c2x2+c3 where c1 is the average memory space for programs, c2 is the

average memory space for processes and c3 is the average memory being used when no process

or program is running.

In addition to these, some system constants that aid threat analysis must be determined.

These include the minimum and maximum memory space for programs and the minimum and

maximum memory space for processes. The mathematical relationship between x1, x2, and y is

the memory usage model. When determined, the memory usage model can be used to analyze

changes in the memory usage that indicate threats in the system.

CPU USAGE MODEL

The CPU usage model represents CPU usage in a system. The independent variables that must

be sampled are the number of application programs running (x1), and number of system

processes running (x2). The dependent variable that must be sampled is amount of CPU power

being used (y). The mathematical relationship between x1, x2, and y is expected to be

y=c1x1+c2x2+c3 where c1 is the average CPU power being used for programs, c2 is the average

CPU power being used for processes and c3 is average CPU power being used when no process

or program is running. In addition to these, some system constants that aid threat analysis must

be determined. These include the minimum and maximum CPU power for programs and the

Page 20: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

minimum and maximum CPU power for processes. The mathematical relationship between x1,

x2 and y is the CPU usage model. When determined, the CPU usage model can be used to analyze

changes in the CPU usage that indicate threats in the system.

PROGRAM USAGE MODEL

To determine the program usage model, the dependent and independent variables that must be

sampled are time spent using program (y), and number of functions used (x). In addition to that,

the following constants must also be determined. Minimum functions used and maximum

functions used. The relationship between y and x determined after sampling various x and y

values is the program usage model denoted by y=f(x).

HOST USAGE MODEL

The host usage model is composed of four independent variables. Memory usage (x1), session

usage (x2), CPU usage (x3), and program usage (x4), derived from their respective usage models.

The dependent variable that must be sampled in the time spent on host (y). Any relationship

determined between the dependent and the independent variables is the host usage model. The

resulting host usage model is denoted y=f (x1,x2, x3, x4).

BATTERY USAGE MODEL

The battery usage model is made up of the average usage of CPU, average memory usage and

the average usage of how a session behaves in the system. These are the independent variables.

The dependent variable is the battery lifespan. The independent variables are derived from their

respective micro usage models.

DEVICE USAGE MODEL

The device usage model is made up of a battery usage model, a host usage model, and the time

spent on the device. The usage models that make up the device usage model compute the

average micro usage and try to relate that with the time spent on the device. The time spent on

the device is the dependent variable.

Page 21: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

SERVER USAGE MODEL

The server usage model is made up of the CPU time being used, the memory space being used

and the number of processes running. These variables are used to form two different micro usage

models. As such, there are two dependent variables, CPU time and memory space. The

independent variable for both micro usage models is the number of processes running.

PORT USAGE MODEL

The port usage model is made up of the time elapsed during communication, number of

programs that use the port and the number of paired ports. The number of paired ports is the

dependent variable and the remaining variables are the independent variables.

NETWORK USAGE MODEL

The network usage model is made up of average port usage, average server usage, average host

usage, the average size of data transmitted on the network, and time spent on the network. The

first three variables are the independent variables. The remaining two are the dependent

variables. As such, two micro usage models make up the network usage model.

AGGRESSIVE USAGE DETECTOR

This model is a utility that detects aggressive behavior on a system. It is modelled just like the

various micro usage models. Various factors that determine aggressive behavior during system

usage are used to determine the mathematical representation of this utility. Aggressive behavior

includes aggressive use of major system resources, and aggressive use of system components

with limited resources.

The average aggressive behavior and its standard deviation are determined. Any system

occurrence that indicates the average aggressive behavior, or the average aggressive behavior

plus its standard deviation or the average aggressive behavior minus its standard deviation is

considered a threat and must be halted, alerted or stored for audit purposes.

FALSE ALARM DETECTOR

The false alarm detector is a utility that detects normal system usage that otherwise may be seen

as threat. Occurrences that meet the criteria for false alarms are normal usage that seems to put

Page 22: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

the entire usage of the system into a false state of vibration or anarchy. Such usage occurrences

are as such prioritized as normal optimal usage. The remedy for the vibrations such usage

occurrences cause is delay in other normal usage occurrences in the system.

The state and magnitude of other system occurrences plus the state and magnitude of

the normal optimal usage determine the impact of the perceived anarchy. To increase

convenience with which the system for which this utility is developed, the average delay time

and its standard deviation must be detected. This utility is part of the normal usage. The utility is

modelled just like the aggressive usage detector.

SPECIAL PARAMETERS OF THE USAGE PROFILE

This section discusses special parameters of our usage profile. These parameters include the

average usage, the usage standard deviation, the minimum usage, the maximum usage and the

most frequent usage value recorded.

The average usage is the predicted average usage after the normal usage model function

has been determined. The usage standard deviation is the standard deviation of the predicted

normal usage function. The minimum and maximum usage values are the minimum and

maximum usage predicted using the usage equation. These parameters together with usage

rates, threat model constants and other usage constants are used in analyzing and detecting

threats.

BUILDING THE THREAT PROFILE OF THE SYSTEM

To build a threat profile of a system we use differential equations of the usage profile. When we

study differential equations of the usage profile, we will arrive at occurrences in the system that

deviate from our usage profile. The differential coefficient of the usage profile is known as a

threat model. Threats in the system occur as a result of changes in the usage profile that are

beyond a certain acceptable threshold called the standard deviation of the usage profile. A threat

model on the other hand is an abstract representation of this change in our system that is beyond

the acceptable threshold. In addition to that, any state of the system that is unusual or any state

Page 23: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

transition in the system that is unusual is also a threat. We will look at how to detect and prevent

such unusual states and transitions in the system using hidden markov models.

Integration can be performed on a threat model to determine the source of the threat.

Integration is a reverse operation for differentiation in calculus. A threat model that can perform

integration operations can be called a novel self integrating data structure. The sections that

follow will look at how to analyze and prevent threats using the usage profile. Also, how to

determine the sources of these threats using a novel self integrating threat model will be

discussed.

THREAT ANALYSIS AND DETECTION

To do threat analysis in a system and abort processes that initiated those threats, linear and

nonlinear programming techniques can be used. The goal here is to minimize the threat

occurrence frequency and the overall impacts associated with the threat and optimize the usage

equation. In addition to these two goals, there are some constants that aid threat analysis. These

constants are associated with the usage profile and the threats in the system.

Examples of these constants may be the rate at which usage is increasing with respect to

a particular usage variable or the rate at which the threat impact and frequency increases with

respect to a particular variable in the usage profile and other special parameters associated with

the usage profile equation.

The average usage, its standard deviation and the threat model equation make up the

threat model. The average usage and standard deviation are constants in the threat model. Using

the threat model equation, the average usage and standard deviation, threats analysis can be

done using linear and nonlinear programming. The goal is to minimize threats using the threat

model equation as the objective function and the average usage and standard deviation as

constraints. Other parameters that may be used as constraints include the rate at which usage

is increasing with respect to a particular usage variable or the rate at which the threat impact

and frequency is increasing with respect to a particular usage variable.

Page 24: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

THREAT PREDICTION

This section discusses how to predict threats in a system. The network usage model discussed in

this chapter and its associated threat model will be used to demonstrate how to predict or detect

a threat in a system. As discussed in the previous section, threat can be detected using linear and

nonlinear programming. The network usage model equation and its associated threat model

equation are the objective functions.

The constraints that will be used are the average network usage and its standard

deviation, and other parameters such as the rate at which the network threat increases with

respect to other network usage profile components such as average host usage, average server

usage, average port usage, average time the network operates, average data transmitted on the

network. The goal of the linear or nonlinear programming is to optimize the usage such that

usage is within the range of the average usage minus its standard deviation and the average

usage plus its standard deviation. These are the lower and upper bounds of our objective

function. Every combination of system variables whose usage is within this usage range

minimizes threat in the system.

Since the average port, host and server usage are derived from their corresponding usage

profile models, the linear and nonlinear programming analysis will be done independently for

these ones. When a threat is predicted in a system, the chance of it being accurate is dependent

on the usage value at that instance and whether it is within the range of the acceptable usage.

This is constructed using the average usage and its standard deviation. Any usage value that is

less than the average usage minus its standard deviation is a threat. Also, a usage value that is

greater than the average usage plus its standard deviation is a threat. That means that any

predicted threat at a point where the predicted usage is within the usage range has a high chance

of being false.

In addition to that, the actual and predicted usage values can be used to determine the

chance that the predicted threat is accurate. If the difference between them is high, there is a

chance that the predicted usage may be wrong. Since the predicted usage and the threat models

are derived from the usage profile equation, there is a chance the predicted threat is also false.

Page 25: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

Finally, the closer the correlation coefficient of the usage profile equation is to zero, the higher

the chance the predicted usage and its associated threats values are wrong. Usage model

functions with correlation coefficient of 0.6 and above indicate that the predicted usage values

and predicted threats values are accurate. These values are obtained from the usage model

function and the threat model function respectively which are modeled using relevant systems

variables that make it possible to model system usage and system threats.

RISK ANALYSIS IN A SYSTEM

To do risk analysis in a system, the frequency at which threats in the system occur and the impact

they have on the system must be known. When a frequency table is constructed for all threats

and their associated impacts stored, it becomes easy to analyze risks associated with a system.

When a threat is predicted, the likelihood of the threat occurring in the system can be computed

using the threat frequencies.

The impacts various threats have can also be determined based on the types of threats

and other parameters such as the number of such threats, the speed at which they occurred and

the resources they affected or damaged. Risk in a system is computed as the product of the

likelihood of threat occurrence and the impact that threat occurrence has on the system. These

concepts are the basics for developing a risk analysis system using the techniques we have

discussed so far.

PROPERTIES AND METHODS OF THE NOVEL SELF INTEGRATING STRUCTURE

The best properties or characteristics of the data structure that represents our threat model

include just to mention a few, names of network software or host application software, version

number of network and host software, license information that include date software was

purchased or released and number of years needed for renewal, IP address and Mac address of

a host on a network.

The methods of such a gigantic or simulative object may include methods for computing

the integral of a threat model, another for computing the differential coefficient of the predictive

usage profile equation, a method for computing the differential equation of a network or host

Page 26: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

threat model. These methods included are mostly methods needed for performing the major

calculus operations that will help in the novel calculus simulation on a network to detect threat

and their sources on a wireless network. Besides these, it may be necessary to implements

methods that retrieve hidden network identity like IP and Mac addresses on a local area network.

INTERPRETATION OF THREAT MODEL INTEGRALS

Since the novel self integrating data structure is a programmed threat model, it is important to

discuss the meaning of its integrals. The integrals represent the source of the original threat.

Examples of the integrals of the threat model may result in detecting the function, software, host

or network from which the threat was detected. With properties like software name, version

number, IP and Mac addresses it becomes easy to pin point the source of the threat.

If the integral of a threat model looks like the usage profile equation of a function of the

system under examination, then that function from the system under examination can be

predicted as the source of the threat. Similarly, if the integral is similar to the usage profile

equation of a software, host, or network that forms part of the system which is being

investigated, then that threat can be predicted to be from that software, host or network.

APPLICATION OF HIDDEN MARKOV MODELS FOR STUDYING VARIOUS STATES IN THE SYSTEM

AND THREATS THAT CAN BE DETECTED

Hidden markov models are machine learning models that are used to model states in a system,

the sequence in which they occur and the associated probabilities for each state transition.

When a system has a set of states in which it usually falls and it can be predicted or established

that each new state is dependent on the previous states, then hidden markov models can be

used to learn the state transitions that usually happens in the system. It must be stated that the

sequence in which states occur in a system can be characterized by a parametric random process.

Also, the probability associated with each state transition is irrespective of the time in which the

transition occurred in the system.

For computer systems which have occurrences that happen based on a parametric

random process, these occurrences can be seen as the set of states in the system. Some of these

Page 27: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

occurrences may be the point at which the system is at its optimal, maximum or minimum usage,

or the point at which the system attains average usage. It must be stated that, if the various

states of the system’s usage are determined from the usage profile which is made up of the usage

equation and the mean and standard deviation model, then any state that is unusual is seen as

a threat. The usage states can be the minimum, average and maximum usage. Also the various

state transitions from one of these states to another can be determined. As such, any state which

is less than the minimum usage or greater than the maximum usage can be seen as an unusual

behavior. Also any transition from one of these states to another which is not captured as a

normal state transition can be seen as a threat.

Additionally, when a set of threat types that happens in the system is determined, it

becomes possible to study the sequence in which these threats occur in the system and the

various transitions between the threats using hidden markov models. Also, the various usage

points including the optimal, the minimum and the average usage and how they are transited in

the system can be studied using hidden markov models. Because various occurrences and

threats can be studied using hidden markov models, it becomes possible to predict the next

occurrence or threat that will happen on a host or a computer network.

Threat sources can also be predicted using threat models. When threat models are

integrated, they give a general idea about the source of the threat. With such knowledge and

ability, the next threat or occurrence that has a higher likelihood of happening on a host or

network can be predicted using application of hidden markov models. As such, occurrences can

be prevented if they are estimated to be disastrous. Also, if for instance, for some reason the

optimal or minimal usage must be reached, it becomes possible to study ways of optimizing the

transition from the current state or predicted next state to the required state. This makes it

possible to move from a particular usage point to the desired usage point.

This approach to threat detection and usage optimization, make it possible to build

anomaly based intrusion detection systems that are correct, prompt and increase optimal use of

the system. The anomaly based intrusion detection systems built using these techniques are

correct because the threat models come from usage models that are built using similar

Page 28: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

approaches and the threat prediction and prevention mechanisms are designed using robust

techniques developed using these approaches. Also, there are likely going to be lower false

alarms since the threats predicted on host or networks come from threat models designed from

such robust methods.

An example of a kind of cyber security threat that this approach can be used to model is

a network problem where a student is determined or predicted to be sending threatening or

socially unacceptable emails to colleagues. Typically, his identity is hidden on the network on

which he sends the emails. As such, it is difficult to determine the likelihood that he will send

such threatening emails on a particular day or hour so that his identity could be determined and

brought to book. Using hidden markov models, a usage profile of the email system could be

developed. This will make it possible to determine the day or hour in which he is likely going to

send such threatening email so that his identity can be found and the problem solved.

Page 29: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

EXPERIMENTING THE USAGE PROFILE AND THREATS ASSOCIATED WITH IT

In this chapter, we discuss the experiment that was conducted to determine the usage of a

computer system. We also discuss how to simulate the threat and usage models with the hope

of developing a threat detection system. The experiment was conducted by implementing the

usage models in java. The implementation uses an interface that captures what makes up a usage

model. Each micro usage model implements the interface. The interface is given in Appendix B.

Additionally, a multithreading object was also implemented for learning the systems usage,

monitoring the system, and for determining the relationship that describes the usage model. This

object is also given in Appendix A.

The interface is made up of eight functions these are computeval() for computing the usage value

based on system variables, findchange() for finding any change in the usage model, learnsys (int

t) for learning the system within a time frame, findrelationship() for determining the relationship

that represents the usage model, monitor(int t) for monitoring the system, showalarm(String

info) for displaying alarms, haltprocess() for halting systems processes that are threats, and

predictvals() for predicting system usage values.

The multithreading object implements the run method of the thread class. The run method

implemented performs three main functions. It calls the learnsys method of the usage model

when the systems usage needs to be learnt. It also calls the findrelationship method of the usage

model when the system usage profile equation needs to be determined. It also calls the monitor

method of the usage model when the system has to be monitored for various activities in the

system.

During the experiment the various independent and dependent variables of the usage models

where sampled. Each sample was captured for the particular usage model using random

numbers. The samples were captured within a time frame. The time was in seconds and refers

to the t variable that the learnsys method of the usage model takes. Each second a sample is

captured for each usage model. Then after the samples have been captured, the relationship

that describes the usage model for the system was determined using the mathematical

modelling techniques reviewed in the previous chapter.

Page 30: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

Also, after the usage profile has been built by learning the system and capturing the samples that

helps determine the various usage models, the system was monitored within a time frame. The

time was set in second. This time refers to the t variable that the monitor method of the usage

model takes. This was used to analyze the system in order to have an idea about the activities

going on in the system. Based on the usage profile built, activities that are deviations from the

usage profile are flag as incidents that indicate threat in the system.

Additionally, during the experiment processes that are independent are grouped first and

executed together. Processes that depended on other processes had to wait for those processes

to finish first before. This was important because it makes it possible to determine the usage

model of micro usage models that form part of other usage models first. Later these micro usage

models can be used to determine the usage model of the parts of the system that depend on the

micro usage models. This can be seen in the main method in Appendix C.

It must be stated that because the usage model for authentications was determined to be a

rational function, logs must be taken on both sides of the relation as part of the experiment in

order to reduce the relation to their linear form. The original function is Y=c1(x2/x1) +c2. When

reduced to its linear form we have log Y= log c1+ log x2 – log x1 + log c2. Since log c2 and log c2

results in constants let denote them with k1 and k2 respectively. Additionally, let B= log Y, let j1=

log x1 and let j2= log x2. Therefore, the linear form of the usage for authentication is B= j2- j1 +

k1 + k2. Since k1 + k2 is a constant let it be represented by k. As such B= j2- j1 + k where B is the

dependent variable and j2 and j1 are the independent variables. When B, j2, and j1 are sampled,

Y=c1(x2/x1) +c2 can be determined.

The cpu and the memory usage models on the other hand are multiple linear. The original

relations are of the form y=c1x1+c2x2+c3 where x1 and x2 are the independent variables. The

original relations must be reduced to their simple linear form. To do this, determine y=b0+bx for

each independent variable. The sum of the various b0 equals c3. The various b correspond to the

constant associated with the independent variable for which y=b0+bx was determined. For

example, the b for any y=b0+bx determined for x1 equals to c1 and that for x2 equals to c2. When

Page 31: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

x1, x2, and y are sampled and the various y=b0+bx determined, y=c1x1+c2x2+c3 can be

determined completely.

CHALLENGES WITH PROJECT

This section of the report mentions the challenges encountered during the research. First of all,

it is important to state that it was not easy to get audit data so the system variables that were

enumerated in this research were based on careful selection of critical aspects of a computer

system that were seen as good to be used.

Secondly, it is important to note that during the simulation of the usage and threat

models, the Multithreading object discussed earlier was not working as expected. As such, the

capturing of system incidents while learning the system’s usage was done by using a method that

captures the information each second. This method uses the sleep method of the thread class.

The sleep method is a static method. As such, a while loop that runs t second was implemented

using the sleep method that makes the system to wait for a second before capturing the next set

of sample data.

Page 32: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

TOOLS AND COMPUTER PACKAGES

This chapter discusses the tools and computer packages that were used throughout this research

project. We will also look at the programming languages, database platforms and development

frameworks that can be used to develop an anomaly based intrusion system for ecommerce sites

using the concepts were have discussed in this paper. The simulation was implemented using

java. It was a console based simulation. Java was chosen for its object oriented concepts such as

encapsulation, inheritance, interfaces, objects, and polymorphism.

To implement an intrusion detection system using results of this research, the following

tools will be essentials. These tools are best suited for intrusion detection systems developed for

ecommerce sites. Bootstrap, Codeignitor, MySQL Database Management System, SQLite,

SQLyog, and Eclipse. The programming languages that will be used are PHP and Android. PHP is

for the desktops and laptops that connect to the ecommerce sites and Android is for mobile

phones that use the ecommerce sites.

Bootstrap and Codeignitor are web development frameworks. Bootstrap is for frontend

developments and Codeignitor is a backend framework for PHP developers. For Android Eclipse

can be used as the best IDE for Android developments. MySQL and SQLyog are for the database

servers that will run on the ecommerce site as part of the intrusion detection system

implementation. SQLite is for the databases that run on the Android implementations that form

part of the intrusion detection system developed for the ecommerce website.

With are these tools, frameworks and packages, developers are ready to develop

intrusion detection systems for ecommerce sites using the concepts in this research paper. It is

expected that the micro usage models discussed will be integral libraries that will be

implemented in PHP and Android as part of an implementation for ecommerce sites or any group

of web or mobile application system.

Page 33: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

CONCLUSION AND DISCUSSION

To end this discussion, it is worth mentioning that the normal usage models and threat models

experimented in this paper represents a computer system and it associated threats. These

threats can be analyzed periodically and audited as part of a computer security audit. This will

fuel development of a risk analysis system. A risk analysis system, threat detection system and

normal usage system developed from experimenting the usage and threat models will make up

a mobile security audit framework that can be used for maintaining cyber security on computer

systems. When practices and processes for maintaining this framework are drafted and adhered

to, it will make it easy to maintain cyber security on various computer systems.

Additionally, it can be established that using the differential equations technique, the

novel self integrating data structure, and the linear and non-programming techniques, threats

on a system can be analyzed and detected. To halt such threats, the intrusion detection system

developed using the techniques stated above must possess certain qualities. These qualities

include correctness, promptness, and ease of use. Correctness means how good the intrusion

detection system can detect threats. This is important because correctness affects the rate at

which a predicted threat is false or true. Promptness is related to the time it takes to detect or

halt a threat and ease of use is related to the property of the intrusion detection system aiding

convenient use of the computer system for which it is developed.

The techniques we have discussed make it possible to achieve correctness, promptness

and ease of use. The usage model function with its associated average usage and standard

deviation make it possible to ensure correctness of the intrusion detection system. This is

because the statistical data sampled for developing the intrusion detection system is within the

range of the acceptable usage. The average usage and standard deviation are computed using

statistical models. One of such models used in this research is the moments or mean and

standard deviation model. With this statistical model and the usage model equation, it becomes

possible to ensure correctness of the intrusion detection system.

To achieve promptness, multithreading is applied to analyze, predict, detect and halt

threats. All threats alarms and detection must use multithreading. Multithreading is a

Page 34: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

programming concept that ensure that several processes run on the computer at the same time.

This concept makes it possible to predict multiple threats, do multiple threat analysis and halt or

alarm occurrences of multiple threats in a computer system. This makes the threat detection

system prompt. Multithreading may be optimized to halt, prevent or alarm threats with high

magnitude and impact. Prioritizing the detection of such threats also use multithreading.

Ease of use is achieved using the mean and standard deviation model. Without that

model, there is no acceptable range of our usage. That means that the average usage and its

standard deviation prevents a rigid usage model and as such makes usage convenient. Periodic

audits also ensure that the normal usage function and its mean and standard deviation model

are up to date. Application of machine learning techniques such as data mining also ensure that

the usage model and its associated mean and standard deviation model are up to date with

actual usage of the system.

With monitoring mechanisms, intrusion detection systems developed using the normal

usage model and its associated mean and standard deviation model help ensure ease of use. This

is because an administrator monitoring a network or computer system using an intrusion

detection system can prevent relevant threats using a click. He can also use several

configurations of the intrusion detection system to halt or prevent normal threats. All these

mechanism, together with the mean and standard deviation model of the normal usage make

the computer or network system being monitored easy to use.

It is expected that using Boolean algebra and calculus of Boolean functions, the normal

usage model can have a hardware representation. Researching how to implement this hardware

representation can be done using Boolean algebra and calculus of Boolean functions. These

concepts are related with concepts from computer organization and architecture such as logic

gates, multipliers, design of arithmetic and logic units, and concepts from embedded systems

like architecture of various embedded system implementation. These architectures include

hardware only implementation and hardware/software implementation.

Page 35: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

The two utilities that compose the usage model are essential for improving convenient

usage and preventing false alarms. This makes the intrusion detection system correct and prompt

at preventing threats. These utilities can be modelled mathematically as linear and quadratic

functions. For instance, the process that causes aggressive usage can be modelled as quadratic

or linear. If the process involves a download or data transfer on a network, then the size of the

data being transferred or downloaded determines whether the mathematical model is linear or

quadratic.

If the size of the data is huge, the function is quadratic. If it is small, the function is linear.

The aggressive behavior and false alarm is analogical to public road transportation. When a car

wants to overtake another on a highway, the car or truck ahead of it must wait for the car making

the overtake. This prevents anarchy on the highway. Similarly, overtaking may seem like over

speeding and may create a false state of anarchy on the road.

If a process has a high chance of completing quickly and has indicated that it wants to

commence, then it is expected that the state of the system usage does not change. Any other

process that changes the state of the system usage such that the commencement of the new

process leads to vibrations or anarchy is aggressive and must be stopped.

The implication of this research is its application to critical systems such as medical

systems, banking systems and systems for monitoring incidents on a country’s road network.

Usage of banking and medical systems may pose social threats such as theft and increase

mortality rates. To prevent such threat, frameworks for auditing these systems must be used

periodically to secure human life, savings and investments of citizens and organizations.

Incidents on a country’s road network can lead to prolonged court cases and health and

work hazards. Building incidents monitoring systems for such road networks can reduce these

health and work hazards and the prolonged court cases in a country. The concepts discussed in

this paper may be beneficial to the design and implementation of such systems.

Page 36: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

APPENDIX A

import java.util.*; class

LearnSysProcess extends Thread{ String process_name; model usage;

int time; int time2;

int N;

Thread process;

LearnSysProcess(String name,model x, int t,int n,int t2){

process_name=name;

usage=x;

time=t;

N=n; time2=t2;

}

public void run(){ try{

switch(N){

case 1://learn system

usage.learnsys(time);

break;

case 2:// find relationship

usage.findrelationship();

break; case 3:// monitor system

usage.monitor(time2); break;

}

}

catch(Exception e){

}

}

public void start(){

if(process==null){

process=new Thread(this,process_name);

process.start();

Page 37: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

}

}

}

APPENDIX B

public interface model{

public double computeval();

public double findchange();

public void learnsys(int t);

public Object findrelationship();

public void monitor(int t);

public void showalarm(String info);

public void haltprocess();

public void predictvals();

}

APPENDIX C

class normal_usage implements model{

double usageVal;// a system's usage value at a point in time

network_usage net_usag;// network behavior;

user_behavior u_bev;//user behavior;

device_usage device_usag;// avearge behavior of a mobile device on the networrk;

memory_usage memory_usag;

cpu_usage cpu_usag;

program_usage program_usag;

host_usage host_usag;

double net_usag_time; // how long the system functions

usage_modeller math_modeller;

normal_usage(){

net_usag=new network_usage();

u_bev=new user_behavior();

device_usag=new device_usage();

memory_usag=new memory_usage();

cpu_usag=new cpu_usage();

program_usag=new program_usage();

host_usag=new host_usage();

math_modeller=new usage_modeller(4);

}

Page 38: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

public static void main(String args[]) throws InterruptedException{

normal_usage usage=new normal_usage();

int min=1;

int time=min*60*1000;

LearnSysProcess p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13;

p1=new LearnSysProcess("learn session_usage",usage.u_bev.x1,time,1,0); p1.start();

p2=new LearnSysProcess("learn auth_usage",usage.u_bev.x2,time,1,0);

p2.start();

p5=new LearnSysProcess("learn memory_usage",usage.memory_usag,time,1,0);

p5.start();

p6=new LearnSysProcess("learn cpu_usage",usage.cpu_usag,time,1,0); p6.start();

p10=new LearnSysProcess("learn server",usage.net_usag.server_usag,time,1,0);

p10.start();

p11=new LearnSysProcess("learn port_usage",usage.net_usag.port_usag,time,1,0);

p11.start();

p8=new LearnSysProcess("learn program_usage",usage.program_usag,time,1,0);

p8.start();

p1.join();

p2.join();

p5.join();

p6.join();

p10.join();

p11.join();

p8.join();

p3=null;

// check if processes p1 and p2 are complete then

if(!p1.isAlive()&&!p2.isAlive()){

p3=new LearnSysProcess("learn user_behavior",usage.u_bev,time,1,0);

p3.start();

} p4=null;

p7=null; // check if processes p1,p5, and p6 are complete then

if(!p1.isAlive()&&!p5.isAlive()&&!p6.isAlive()){

usage.device_usag.battery_usag.set_memory_usage(usage.memory_usag);

usage.device_usag.battery_usag.set_session_usage(usage.u_bev.x1);

usage.device_usag.battery_usag.set_cpu_usage(usage.cpu_usag);

p4=new LearnSysProcess("learn battery_usag",usage.device_usag.battery_usag,time,1,0);

p4.start();

p4.join();

Page 39: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

p7=new LearnSysProcess("learn device_usag",usage.device_usag,time,1,0);

p7.start();

} p3.join();

p7.join(); // check if process p5 is complete then

if(!p5.isAlive()){

usage.host_usag.set_memory_usage(usage.memory_usag); }

// check if process p6 is complete then

if(!p6.isAlive()){

usage.host_usag.set_cpu_usage(usage.cpu_usag);

}

// check if process p8 is complete then

if(!p8.isAlive()){

usage.host_usag.set_program_usage(usage.program_usag);

}

// check if process p1 is complete then

if(!p1.isAlive()){

usage.host_usag.set_session_usage(usage.u_bev.x1);

}

p9=null;

// check if processes p1,p5,p6, and p8 are complete then

if(!p1.isAlive()&&!p5.isAlive()&&!p6.isAlive()&&!p8.isAlive()){

usage.net_usag.set_host_usage(usage.host_usag);

p9= LearnSysProcess("learnhost_usage",usage.net_usag.host_usag,time,1,0);

p9.start();

p9.join();

}

p12=null; // check if processes p9,p10 and p11 are complete then

if(!p9.isAlive()&&!p10.isAlive()&&!p11.isAlive()){

p12=new LearnSysProcess ("learn network_usage",usage.net_usag,time,1,0);

p12.start();

p12.join();

}

p13=null; // check if processes p9,p10,p11,p1,p2 and p3 are complete then

if(!p9.isAlive()&&!p10.isAlive()&&!p11.isAlive()&&!p1.isAlive()&&!p2.isAlive()&&!p3.isAlive()){

p13=new LearnSysProcess("learn system usage",usage,time,1,0);

p13.start();

}

Page 40: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

double x_vals[]=new double[4];

x_vals[0]=9;

x_vals[1]=7; x_vals[2]=54; x_vals[3]=43;

usage.math_modeller.sample_x_set(x_vals) ;

usage.math_modeller.sample_y(87);

usage.math_modeller.queue_sample();

x_vals[0]=9; x_vals[1]=7; x_vals[2]=54; x_vals[3]=43;

usage.math_modeller.sample_x_set(x_vals);

usage.math_modeller.sample_y(87);

usage.math_modeller.queue_sample();

usage.math_modeller.end_modeller=true;

usage_stats ustats=usage.math_modeller.get_usage_stats();

x_set avg_xset=usage.math_modeller.average_x_set;

x_vals=avg_xset.x_set;

System.out.println("mean 1: "+x_vals[0]);

System.out.println("mean 2: "+x_vals[1]);

System.out.println("mean 3: "+x_vals[2]);

System.out.println("mean 4: "+x_vals[3]);

}

public double computeval(){

return 0;

}

public double findchange(){

return 0;

}

public void learnsys(int t){ int timer=1;

while(t>=timer){

net_usag.learnsys(t);

try {

Thread.sleep(1000);

}catch (InterruptedException e){

e.printStackTrace();

}

timer++;

if(timer==t){ try{

Thread.yield();

Page 41: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

}

catch(Exception e){

}

} }

}

public Object findrelationship(){

return null;

}

public void monitor(int t){

int timer=0;

while(timer<t){

try{

net_usag.monitor(t);

}

catch(Exception e){

}

}

timer++;

}

public void showalarm(String info){

System.out.println(info);

}

public void haltprocess(){

}

public void predictvals(){

} }

Page 42: BUILDING A USAGE PROFILE FOR ANOMALY DETECTION

REFERENCES A, A study of information security awareness and practices in Saudi Arabia [26-28 June 2012]

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6285845&contentType=Confere

nce+Publications&queryText%3DCyber+Security+Papers+.PLS.+mobile+phones [15th November, 2012]

Cashion J, Protocol for mitigating the risk of hijacking social networking sites [15-18 Oct. 2011]

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6144818&contentType=Confere

nce+Publications&queryText%3DCyber+Security+Papers+.PLS.+mobile+phones [15th November, 2012]

Chidambaram Mahadevan (CISA, FCA), Intrusion, Attack, Penetration – Some Issues

Cybersecurity http://whatis.techtarget.com/definition/cybersecurity [14th November 2012]

Ethem Alpadin, Introduction to Machine Learning 2nd edition 2010

Jian g Chunfeng, Research and application of behavior encryption [27-31 May 2012]

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6320096&contentType=Confere

nce+Publications&queryText%3DCyber+Security+Papers+.PLS.+mobile+phones [15th November, 2012]

Larson, Hostetler, Edwards, Multivariable Calculus 8th edition 2006

Matthew E. Whiteman, Herbert J. Mattoro, Principles of Information Security 2nd edition 2005

Nong Ye, A Markov Chain model of Temporal Behaviour for Anomaly Detection [6-7 June 2000]

Security hole allows anyone to hijack your Skype account using only your email address

http://thenextweb.com/microsoft/2012/11/14/security-hole-allows-anyone-

tohijackyourskypeaccountusing-only-your-email-address/ [12th November, 2012]