Detecting Unknown Network Attacks using Language Models · Detecting Unknown Network Attacks using...

Detecting Unknown Network Attacks using Language Models

Konrad Rieck and Pavel LaskovDIMVA 2006, July 13/14

Berlin, Germany

The zero-day problem

‣ How to distinguish normal from unknown?

GET /scripts/..%%35c../..%%35c../..%%35c../..%%35c %%35c../winnt/system32/cmd.exe?/c+dir+c:\ HTTP/1.0Host: wwwConnection: close

‣ Cast intrusion detection into linguistic problem

‣ Utilization of machine learning instruments

GET /dimva06/john/martin.htmlAccept: */*Accept-Language: enHost: wwwConnection: keep-alive

N-gram models

ge ett▯ ▯/

in nd⋯

/i

2-grams

get▯/index.html

Connection payload

g et ▯

i n⋯

/

Bytes

get et▯

t▯/ ▯/iind nde

⋯

/in

3-grams

⋯

n-grams

N-grams in attacks

!0.02

!0.01

0

0.01

0.02

0.03

0.04

0.05

4!grams

frequ

ency

diff

eren

ce

Nimda IIS attack and HTTP traffic comparison

GET /scripts/..%%35c../..%%35c../..%%35c../..%%35c %%35c../winnt/system32/cmd.exe?/c+dir+c:\ HTTP/1.0

Frequency differences to 4-grams in normal HTTP

%%35 35c. 5c.. c../

Acce cept

Geometric representation

‣ A simple example

00.0050.01

0

0.05

0

0.005

0.01

0.015

Acce

GET▯

GET▯

Acce

%%35

00.0050.01

0

0.05

0

0.005

0.01

0.015

%%35

HTTP pipelining

Similarity of connections

‣ Huge feature space

‣ 256n dimensions

‣ Geometric representation of connections

Similarity measures

‣ Distances, kernel functions, ... e.g.

k!"

w!L |!w(x) " !w(y)|k‣ Minkowski

!w!L |!w(x) " !w(y)|‣ Manhattan

‣ Efficient computation not trivial

‣ Sparse representation of n-gram frequencies

‣ Linear-time algorithms (cf. DIMVA 2006 paper)

x, y ! {0, . . . , 255}", L = {0, . . . , 255}n

!w(x) = frequency of w in sequence x

Anomaly detection

‣ Detection of outliers in feature space

‣ Exploration of geometry between connections

‣ No training phase - no labels required

‣ Anomaly detection (AD) methods

‣ e.g. Spherical AD, Cluster AD, Neighborhood AD

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1Toy data

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1Toy data

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1Toy data

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1Spherical anomaly detection

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1Cluster anomaly detection

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1Neighborhood anomaly detection

Experiments

‣ Open questions

‣ Do n-gram models capture semantics sufficient for detection of unknown attacks?

‣ Can anomaly detection reliably operate at low false-positive rates?

‣ How does this approach compare to classical signature-based intrusion detection?

Evaluation data

‣ PESIM 2005 data set

‣ Real network traffic to servers at our laboratory

‣ HTTP Reverse proxies of web sites‣ FTP Local file sharing, e.g. photos, media‣ SMTP Retransmission flavored with spam

‣ Attacks injected by pentest expert (e.g. metasploit)

‣ DARPA 1999 data set as reference

‣ Statistical preprocessing

‣ Extraction of 30 independent samples comprising 1000 incoming connection payloads per protocol

Method comparison

‣ Comparison of anomaly detection methods

‣ Criteria: AUC0.01 - Area under ROC within [0, 0.01]

‣ Results averaged over n-gram lengths [1,7]

Protocol Best method AUC0.01

HTTP Spherical (qsSVM) 0.781

FTP Neighborhood (Zeta) 0.746

SMTP Cluster (Single-linkage) 0.756

Bottom line: Different protocols require different anomaly detection methods

N-gram lengths

‣ How does one choose the optimal n-gram length?

0%

10%

20%

30%

40%

1 2 3 4 5 6 7

HTTP FTP SMTP

Optimal n-gram length per attack

‣ No single n fits all: variable-length models required

Variable-length models

indexget

html

Words

get▯/index.html

Connection payload

CR LF TAB ▯ , . : / &

Combined n-grams

g et ▯

i n⋯

/ge et

t▯ ▯/in nd

⋯

/i

get et▯

t▯/ ▯/iind nde

⋯

/in

n = {1,2,3,...}

Comparison with Snort

‣ Language models vs. Snort

‣ Combined n-gram (1-7) and word models

‣ Snort: Version 2.4.2 with default rules

0 0.002 0.004 0.006 0.008 0.010

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

false positive rate

true

posit

ive ra

te

HTTP traffic

Best combinedWordsSnort

0 0.002 0.004 0.006 0.008 0.010

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

false positive rate

true

posit

ive ra

te

FTP traffic


0 0.002 0.004 0.006 0.008 0.010

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

false positive rate

true

posit

ive ra

te

SMTP traffic


Conclusions and outlook

‣ Language models for intrusion detection

‣ Characteristic patterns in normal traffic and attacks

‣ Unsupervised nomaly detection with high accuracy

‣ Detection of ~80% unknown network attacks

‣ Future perspective

‣ From in vitro to in vivo: real-time application

‣ Language models as prototypes for signatures?

Outwit language models

‣ Approaches

‣ Red herringDenial-of-service with random traffic patterns

‣ Creeping poisoningCareful subversion of normal traffic model

‣ Mimicry attacksAdaption of attacks to mimicry normal traffic

‣ Conclusions

‣ (1) Worse for signature-based intrusion detection

‣ (2,3) Requires profound insider knowlegde

Questions?

Detecting Unknown Network Attacks using Language Models · Detecting Unknown Network Attacks using...

Documents

Transcript of Detecting Unknown Network Attacks using Language Models · Detecting Unknown Network Attacks using...