Dynamic Application-Layer Protocol Analysis For Network Intrusion Detection Holger Dreger, TU...

Dynamic Application-Layer Protocol Analysis

For Network Intrusion Detection

Holger Dreger, TU Munchen

Anja Feldmann, T-Labs / TU Berlin

Michael Mai, TU Munchen

Vern Paxson, ICSI / LBNL

Robin Sommer, ICSI

Presented by: Jim Spadaro

NIDS: State-of-the-Art

• Protocol-specific traffic analysis Semantic context for (much) better detection quality

• How to decide which protocol to analyze? Relies on well-known port numbers

• (As in, HTTP if-and-only-if TCP port 80)• (or um maybe 8080 and 8000 and ….)

• And if it’s not on a well-known port? Perhaps use byte-level signatures to flag what

protocol it appears to be

Problem

• Applications use arbitrary ports! Benign reasons

• Lack of user privileges, obfuscation, multiple versions• Adversarial applications (maybe not so benign)

e.g. Skype bypassing firewalls

Malicious intent• Evasion of security monitoring

IRC-botnets on ports other than 666x/tcp Pirate FTP-servers on ports other than 21/tcp

• How to distinguish these?

Structure

• Prevalence of the problem

• Approach for dynamic analysis in NIDS

• Applications of new capabilities

• Performance evaluation

Prevalence of the Problem

• Data 24 hour full packet trace from MWN 3.2 TB of data in 6.3 billion pkts,

137M TCP connections

Successful TCP connections: ~78% Successful TCP connections on unpriv. Port: ~4%

• UCB: University of California, Berkeley, 45,000• MWN: Munich Scientific Network, 50,000• LBNL: Lawrence Berkeley National Laboratory, 13,000

Existing NIDS Solutions

• None known to fully address the problem• Bro, Snort, Dragon, and Intrushield all rely on

port-based protocol analysis Some can use signatures to detect inappropriate

protocol use

• Such detection is helpful, but has drawbacks Does not distinguish benign off-port traffic from

malicious:• Can only stop BitTorrent completely, not detect for illegal file

sharing• Can only turn off off-port IRC completely, not detect botnets

Protocol Detection - Alternatives

• Statistical approach E.g., packet size distribution

• Suitable for separating interactive/bulk traffic e.g., distinguish chat from file transfers

Detect protocol patterns• Signatures (already implemented)

Relatively easy to implement: most NIDS have signature-matching infrastructure

e.g., Linux netfilter l7-filter

• Very general signatures, not completely accurate

• Maybe: Protocol detection by plausibility heuristics

Protocol Detection: Signatures

• Most (but not all) successful connections trigger expected signature

• FTP: high percentage of false negatives ~ 21.7%

• “Other port” matches: needs further investigation

Protocol HTTP IRC FTP SMTP

Port (Succ.) 93,429K 75.9K 151.7K 1,447K

Signature 94,326K 74.0K 125.3K 1,416K

expected port 92,228K 71.5K 98.0K 1,415K

other port 2,126K 2.5K 27.3K 0.3K

Protocol Signatures:Well-known Ports

• Some connections trigger more than one signature Signature too general

• Some misappropriate use of well-known ports

Port HTTP IRC SMTP Other No match

80 92,228,291 59 0 41,086 1,158,977

666x 1,217 71,650 0 4,238 524

25 459 2 1,415,428 195 31,889

Observations

• Imprecision of signatures: False negatives highlight need for refined signatures

and/or more context False positives (e.g., multiple matches for single

connection) highlight limits in discriminating power Certain protocols are difficult to make signatures for

• Telnet: many legitimate initial byte patterns

• Problem is real: If we just believe port numbers, numerous

misidentifications

Structure





Goals

• Detection Scheme Independent Currently predominantly use signatures

• However, flexibility is maintained to allow other approaches, like heuristics

• Dynamic Analysis Some protocol detection schemes need more data than others Analyzers should be disabled upon detecting a false positive

• Modularity Eases dealing with multiple network substacks

• IP-within-IP tunnels

• Efficiency Improvements must retain performance

• Customizability Result must easily adapt to specific needs

Approach for Dynamic Analysis

• Dynamic data path enhances flexibility and accuracy Example: A packet is received on port 80/tcp, but

really carries data for an IRC session• A traditional NIDS will still examine the packet as HTTP• Dynamic analysis can change the analysis to IRC even

though the analysis was initialized for HTTP

• Approach uses a PIA Protocol Identification Analyzer

Dynamic Data Path

• How can this be done? Associate each connection with a tree structure

• Each node represents an analyzer• Links represent data channels, with parent node’s output channels

connecting to childrens’ input channels The PIA instantiates the initial analyzers

• Each analyzer can insert or remove other analyzers on its input and output channels

Thus, each analyzer can add additional analyzers if it needs the support of additional functionality

• If the analyzer cannot determine which analyzer is needed, another PIA can be instantiated

• An analyzer that cannot analyze the data it is being given can remove its subtree from the tree

Allows siblings on the tree to be run in parallel

Analyzer Tree Example

• Example for an analyzer tree for an email connection: The IP Analyzer determines the connection is TCP The TCP Analyzer determines the connection looks like email Analyzers for SMTP, POP, and IMAP are instantiated to analyze

the data Any analyzers that determine that they cannot analyze the data

can remove themselves

Technical Issues

• Byte Streams vs Packet Streams Protocols over TCP vs Other Resolved by having both input channels for

each analyzer

• Starting an analyzer mid-connection Resolved by buffering the start of each stream

(Default 4KB)

Implementation

• Implemented in Bro NIDS New “Protocol Identification Analyzer” (PIA)

implements protocol-detection and buffering Stock Bro has modular design suited to implementing

the PIA Required changing Bro’s notion of one-to-one static

binding from transport analyzer to application analyzer(s)

• Running in three large environments: MWN, UCB, and LBNL

Implementation

• PIA examines the first few KB of each connection for efficiency Shown to be sufficient for protocol detection

• Can activate analyzers in four ways: Signatures Connection port Each analyzer can register a detection function

• Allows arbitrary heuristics

Using a prediction table

Deployment Trade-Offs

• Protocol detection signatures Loose signatures affordable

• false positives fixed later But too lose means slower

• Analyzer is more expensive than pattern-matching• Improve accuracy with bidirectional signatures

Server must respond with the same protocol Prevents attacker from intentionally triggering slow

analyzers

Deployment Trade-Offs

• At what point should an analyzer remove itself? Real-world traffic is not perfect

• Implementations can stretch protocol bounds

Should not parse the whole stream• Defeats the purpose of protocol analysis

Resolution: Analyzer should never disable itself• Generate Bro events on protocol violations• Allow user-level policy script to disable analyzer if necessary

E.g., after a certain number of violations

Structure





New Capabilities

• In summary, can now: Detect connections on non-standard ports

reliably• Includes protocols that use others as transport

IE, distinguish Kazaa, BitTorrent, SOAP, etc over HTTP

Inspect payload of FTP transfers Detect IRC-based bots

• This has successfully worked in the field

Reliable Real-Time Protocol Detection on non-Standard Ports

• 1 day at UC Berkeley (MWN similar)• Connections on non-standard ports mainly HTTP

UCB: Split between real HTTP (e.g., Apache) and Gnutella MWN: Similar, but more P2P (BitTorrent), also some FTP Open HTTP proxies detected and closed Open SMTP relay detected and closed

Internal Remote

FTP servers 6 17

HTTP servers 568 54,830

IRC servers 2 33

SMTP servers 8 8

Payload Inspection of FTP Data Transfers

• FTP data transfers use arbitrary ports Identify based on prior PORT, PASV

• Dynamically added to prediction table

• Check connection payload use libmagic Actual file type == expected file type?

• E.g, could find rootkit tarball sent in .jpg Determined using file analyzer

• Extension: Use same mechanism for SMTP(mail attachments)

Detecting IRC Based Botnets

• Idea Botnet communication often uses IRC Botnet detector on top of IRC analyzer

• Check nicknames• Check channel names• Check contact to identified bot-servers

• Key consideration: must analyze IRC dialog seen off-port Because lots of benign IRC runs off-port too…

• > 100 bots found at MWN+UCB MWN employs auto-blocking based on detector

• Not as adept at detecting custom protocols

Performance Evaluation

Stock-Bro

PIA-Bro

PIA-Bro-M4K

Config-A Standard

Standard + sigs

3335s

3843s

3254s

3778s

Config-B All TCP pkts 3584s 3496s

Config-C All TCP pkts + sigs

All TCP pkts + sigs + reass.

4446s 4436s

4488s

3716s

4795s

Performance

• New framework does not add significant additional overhead Performance cost is about 13.8% between

PIA-Bro-M4K and Stock-Bro

• Protocol detection (signature matching on all packets) expensive but doable) Solutions:

• Specialized hardware• Load balancing possible

Summary

• Network traffic resists classification by port

• General framework for dynamic protocol analysis Use signatures to pre-filter for efficiency Use application parsing to make high-quality decisions

• Accurate enough for auto-blocking of bots at large-scale network Plus detection of illicit relays and servers Integrated into Development Release 1.2 of Bro

Questions?

Dynamic Application-Layer Protocol Analysis For Network Intrusion Detection Holger Dreger, TU...

Documents

Transcript of Dynamic Application-Layer Protocol Analysis For Network Intrusion Detection Holger Dreger, TU...