Machine Learning for Network Anomaly Detection

Machine Learning for Network Anomaly Detection

Matt Mahoney

Network Anomaly Detection

• Network – Monitors traffic to protect connected hosts

• Anomaly – Models normal behavior to detect novel attacks (some false alarms)

• Detection – Was there an attack?

Host Based Methods

• Virus Scanners

• File System Integrity Checkers (Tripwire, DERBI)

• Audit Logs

• System Call Monitoring – Self/Nonself (Forrest)

Network Based Methods

• Firewalls

• Signature Detection (SNORT, Bro)

• Anomaly Detection (eBayes, NIDES, ADAM, SPADE)

User Modeling

• Source address – unauthorized users of authenticated services (telnet, ssh, pop3, imap)

• Destination address – IP scans

• Destination port – port scans

Frequency Based Models

• Used by SPADE, ADAM, NIDES, eBayes, etc.

• Anomaly score = 1/P(event)

• Event probabilities estimated by counting

Attacks on Public Services

PHF – exploits a CGI script bug on older Apache web servers

GET /cgi-bin/phf?Qalias=x%0a/usr

/bin/ypcat%20passwd

Buffer Overflows

• 1988 Morris Worm – fingerd

• 2003 SQL Sapphire Wormchar buf[100];

gets(buf);

buf stackExploit code

Return Address0 100

TCP/IP Denial of Service Attacks

• Teardrop – overlapping IP fragments

• Ping of Death – IP fragments reassemble to > 64K

• Dosnuke – urgent data in NetBIOS packet

• Land – identical source and destination addresses

Protocol Modeling

• Attacks exploit bugs

• Bugs are most common in the least tested code

• Most testing occurs after delivery

• Therefore unusual data is more likely to be hostile

Protocol Models

• PHAD, NETAD – Packet Headers (Ethernet, IP, TCP, UDP, ICMP)

• ALAD, LERAD – Client TCP application payloads (HTTP, SMTP, FTP, …)

Time Based Models

• Training and test phases

• Values never seen in training are suspicious

• Score = t/p = tn/r where– t = time since last anomaly– n = number of training examples– r = number of allowed values– p = r/n = fraction of values that are novel

Example tn/r

• Training: 0000111000 n/r = 10/2

• Testing: 01223– 0: no score– 1: no score– 2: tn/r = 6 x 10/2 = 30– 2: tn/r = 1 x 10/2 = 5– 3: tn/r = 1 x 10/2 = 5

PHAD – Fixed Rules

• 34 packet header fields– Ethernet (address, protocol)– IP (TOS, TTL, fragmentation, addresses)– TCP (options, flags, port numbers)– UDP (port numbers, checksum)– ICMP (type, code, checksum)

• Global model

LERAD – Learns conditional Rules

• Models inbound client TCP (addresses, ports, flags, 8 words in payload)

• Learns conditional rules

If port = 80 then word1 = GET, POST (n/r = 10000/2)

LERAD Rule Learning

• If word1 = GET then port = 80 (n/r = 2/1)• word1 = GET, HELO (n/r = 3/2)• If address = Marx then port = 80, 25 (n/r =

2/2)

Address Port Word1 Word2

Hume 80 GET /

Marx 80 GET /index.html

Marx 25 HELO Pascal

LERAD Rule Learning

• Randomly pick rules based on matching attributes

• Select nonoverlapping rules with high n/r on a sample

• Train on full training set (new n/r)

• Discard rules that discover novel values in last 10% of training (known false alarms)

DARPA/Lincoln Labs Evaluation

• 1 week of attack-free training data

• 2 weeks with 201 attacks

SunOS Solaris Linux NT

RouterInternet

SnifferAttacks

Attacks out of 201 Detected at 10 False Alarms per Day

0

20

40

60

80

100

120

140

PHAD ALAD LERAD NETAD

Problems with Synthetic Traffic

• Attributes are too predictable: TTL, TOS, TCP options, TCP window size, HTTP, SMTP command formatting

• Too few sources: Client addresses, HTTP user agents, ssh versions

• Too “clean”: no checksum errors, fragmentation, garbage data in reserved fields, malformed commands

Real Traffic is Less Predictable

r (Number ofvalues)

Time

Synthetic

Real

Mixed Traffic: Fewer Detections, but More are Legitimate

0

20

40

60

80

100

120

140

PHAD ALAD LERAD NETAD

Total

Legitimate

Project Status

• Philip K. Chan – Project Leader

• Gaurav Tandon – Applying LERAD to system call arguments

• Rachna Vargiya – Application payload tokenization

• Mohammad Arshad – Network traffic outlier analysis by clustering

Further Reading

• Learning Nonstationary Models of Normal Network Traffic for Detecting Novel Attacks by Matthew V. Mahoney and Philip K. Chan, Proc. KDD.

• Network Traffic Anomaly Detection Based on Packet Bytes by Matthew V. Mahoney, Proc. ACM-SAC.

• http://cs.fit.edu/~mmahoney/dist/

http://cs.fit.edu/~mmahoney/paper4.pdf





Machine Learning for Network Anomaly Detection

Documents

Transcript of Machine Learning for Network Anomaly Detection