Computer System Intrusion Detection: A Survey Anita K. Jones & Robert S. Sielken Presented by...

32
Computer System Intrusion Detection: A Survey Anita K. Jones & Robert S. Sielken Presented by Peixian Li (Rick) For CS551/651 Computer Security

Transcript of Computer System Intrusion Detection: A Survey Anita K. Jones & Robert S. Sielken Presented by...

Computer System Intrusion Detection: A Survey

Anita K. Jones & Robert S. SielkenPresented by Peixian Li (Rick)

For CS551/651 Computer Security

Overview

• Why IDS

• IDS Overview

• Anomaly Detection

• UNM Pattern Matching

• Misuse Detection

• Extended to Networked Systems

• Conclusion

Why IDS ?

• In defending network resources, we have– Firewalls– Encryption technology– Authentication devices– Vulnerability checking tools– Others …

Why IDS ? -2-

• But computer system is still susceptible– Due to unknown system flaws

– Due to known system flaws better stay than gone because of functionality or cost.

– Due to social engineering tricks

• A recent news– An 18 year old boy broke into a eCom web site

– Thousands customer's credit info was stolen

– Including Bill Gates’

Why IDS ? -3-

• Based on the fact that– Penetrations always exist

• We need– A second line of defense

– A mechanism to detect the penetrations and the attempting intrusions

– Which is in the form of an Intrusion Detection System

• Even attempts are guaranteed to fail– IDS can still help us to find out potential vulnerabilities

Approaches

• Anomaly Detection– Defines and

Characterizes correct static form and acceptable dynamic behavior

– Detects anomalous changes or behaviors which may not be intrusions

• Misuse Detection– Characterizes known

ways to penetrate a system as patterns

– Monitors for explicit patterns which are known to be intrusions

Anomaly vs. Misuse

• Anomaly Detection– May have high rate of

false alarms

– Can detect novel attacks

– Normal databases are relatively more stable

• Misuse Detection– May miss novel attacks

– Complexity grows as the number of well-known attacks grows

– Difficult to keep them updated as the catalog of attacks grows

Three Generations

• First Generation– The emphasis was on single computer systems– O/S audit records were post-processed

• Second Generation– Extended and scaled to address distributed

system.– More sophisticated– Primitive real-time alerts became possible

Three Generations -2-

• The Third Generation– Further extended to address loosely coupled

networks, such as LAN

• Two Primary Challenges– Tracking users as they move through nodes– Managing the data as the size of the network

scales up

What Makes A Good IDS ?

• Manage the volume of data, communications, and processing in large scale networks

• Increase coverage, i.e. miss ALAP

• Decrease false alarms

• Detect intrusion in progress

• React in real-time

Basic Components

• Focus– Which entity’s self or which elements of the

entity do we try to focus on– Definitions of events or behavior of interest

• Representation– How to represent signatures effective and

efficient

Basic Components -2-

• Initial Database– Initial behavior profile or normal database– Which can characterize behavior of interest– Which can represent entity of interest

• Detection Algorithm– Statistical processing techniques for divining

the difference between normal and anomalous behavior (effective and efficient)

Anomaly Detection

• Static– Assume that a portion

of the system remain constant

– System code and portion of system data

– Represented as a binary bit string or a set of such string

– A single bit change

• Dynamic– Assume that system’s

behavior is stable

– Include a definition behavior

– Represented as a sequence of distinct events

– Empirical threshold

Static Anomaly Detection

• How does it work?– Defines the desired state of the system using

static bit strings– Archives a representation of the state– Periodically compares the current state and the

archived state– Any difference signals an error

Signature

• Storing and comparing actual bit strings representation is quite costly

• Compressed representation is called signature

• Signatures include checksums, message-digest algorithms and hash functions

• Meta-data: knowledge about the structure

Some Actual Systems

• Tripwire– A file integrity checker– Uses signatures as well as Unix file meta-data

• Virus Checkers– Uses actual bit string inserted by the virus– Strings are short, thus uncompressed

• Self-Noself– Unlike Tripwire, the Self-Nonself signatures are for

unwanted string values

Dynamic Anomaly Detection

• Before Running

• For each individual entity, IDS creates a base profile to characterize normal, acceptable behavior– Entities can be: users, workstations, remote

hosts, or applications– Behaviors can be: preferred choices, resources

consumed, representative sequences of actions

Dynamic Anomaly Detection -2-

• Two ways to build up base profiles– By synthetically running the system

• Can it represents the real system?

– By observing normal user behavior over a sufficiently long time

• Can we be sure that no intrusion undertaking during the period of time?

Dynamic Anomaly Detection -3-

• When Running

• Observes events related or attributed to the entity

• Incrementally builds a current profile

• Some operate in real-time, or near real-time, or directly observe the events during occurrence

Dynamic Anomaly Detection -4-

• Static detections do not care the degree of the difference

• Dynamic detections do care

• Comparison is based on empirically determined thresholds

• Only those mismatch over the thresholds will result in alert

UNM Pattern Matching

• Focus– Individual application and its behavior

– E.g. Sendmail

• Representation– Uses privileged system call sequences to represent an

application’s behavior

– E.g. (open, read, mmap), (read, mmap, mmap)…

– Sequence length usually between 3 to 6

UNM Pattern Matching -2-

• Initial Database– Built either by synthetically running the application or

by observing its real running

– Normal sequences are stored as forest in normal database to save space

• Detection Algorithm– Largest Minimum Hamming Distance

– Normalized LMHD

– Local Frame Count

UNM Pattern Matching -3-

Total seqs in DB

0

100

200

300

400

500

600

700

800

900

To

tal se

qs s

ca

nn

ed

0

46

82

53

95

68

47

82

70

94

81

10

43

8

11

66

5

13

19

7

16

10

0

18

32

4

19

56

4

20

71

2

21

85

7

23

86

2

24

91

5

Total seqs in DB

UNM Pattern Matching -4-

1.sendmai

2.ftpd

3.lpr

4.ps

UNM Pattern Matching -5-

0

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8

Anomalous

% Anomalous

Largest Min HD

Maximum LFC

Misuse Detection

• Remember known technique

• Monitor the system if any of those known technique presents

• Intrusion scenario – A description of a fairly precisely know kind of intrusion which usually a sequence of actions

Rule-Based vs. State-Based

• Rule-Based– Encode scenarios as a set of

rules, where rules reflect the sequence of actions

– Fact base is a collection of assertions based on accumulated data

– Rule base contains the rules that describe known intrusion scenarios

– Rule-face binding

– Rule firs

• State-Based– Attribute-value pairs

characterize systems states of interest

– Actions are defined as transitions between states

– Monitor the actions and then change the state

– If compromised state reached, the intrusion happens

Extended to Networked Systems

• New situations– Cooperative intrusions are more frequent– Intruder(s) use multiple nodes in an attempt to

• Parallel actions to make intrusion faster

• Distribute actions to disguise their activities

• New elements in Network IDS– Include network traffic as part of behavior– Data sharing and communication

Centralized vs. Decentralized

• Centralized Analysis– Audit data is collected on

individual systems

– Reported to some centralized location

– Intrusion detection analysis is performed there

– Don’t work well for large network due to sheer volume of data

– Need data translation in heterogeneous systems

• Decentralized Analysis– Distributed audit data

collection

– Distribute intrusion detection analysis

– Works well for large networks because less data shared between different components

– Can eliminate translation problem by grouping homogeneous systems

Partition

• In decentralized system, entire system is divided into smaller domains for the purpose of communication

• Partition can base on– Geography– Administrative control– Collections of similar software platforms– Anticipated types of intrusions

• Still centralized within a domain

Vulnerabilities

• Intrusion detection software themselves are not inherently survivable and need protection also

• Initialization will be flawed if the intrusions are present

• Audit data must be timely available• IDS should not compete resource with the

rest of the system

Conclusion

• Why IDS• The generations of IDS• What makes a good IDS• Basic components of an IDS• Different approaches used in IDS• Exam how the UNM pattern matching works with • How IDS extended for networked systems• What is the vulnerabilities of IDS