Efficient Packet Pattern Matching for Gigabit Network Intrusion Detection using GPUs
Intrusion Detection Processor with Packet Content Matching
description
Transcript of Intrusion Detection Processor with Packet Content Matching
04/19/231
Intrusion Detection Processor with Packet Content Matching
JC Ho
ECE 594
04/19/232
Topics
Background Algorithm and Data Structure Memory Architecture Processor Design
04/19/233
Background
04/19/234
String Matching Algorithms
Boyer-Moore– Good for single-pattern
Wu-Manber– Best average-case performance
Aho-Corasick– O(n) worst-case performance
04/19/235
Data Structure for Aho-Corasick
Unoptimized – 1028 bytes per node, 53MB
Bitmap Compression– 41 Bytes per node, 2.8MB
Path Compression– 20 Bytes per node (average), 1.1MB
Data structure size is reduced w/out rules
04/19/236
Aho-Corasick with Bitmap Compression– Separation of signature and rules database in
different storage units– Smaller next node, failure, and rules pointers
24 bits each
Result– 41 bytes per node– Same performance
adaptation
32 byte bitmap next node pointer failure pointer rules pointer
04/19/237
Considerations
Complete or partial match
Complete signature
Partial signature
Partial signature
No match
04/19/238
Considerations—Cont.
Case 1:– Failure pointers eventually go to the root– Tag as safe
No match
04/19/239
Case 2:– Easy to handle– Start from the beginning of packet– Failure pointers goes back to the root– Mark root node visited– Beginning of signature eventually goes to the right
path– Traverse entire path and tag as full match
Considerations—Cont.
Complete signature
04/19/2310
Considerations—Cont.
Case 3:– Similar to Case 2– Beginning of the signature eventually goes down
the right path– Mark root node visited– When end of packet reached, tag as partial match
Partial signature
04/19/2311
Considerations—Cont.
Case 2:– Very different from cases 2 and 3– Needs to start from the middle of the data
structure– Needs to find the first instance of the first byte in
the data structure– Traverse the path of the signature to reach the
leaf, mark as partial match since root is not visited
Partial signature
04/19/2312
Considerations—Cont.
Result – Case 4 can be the general case– Cases 1, 2, and 3 are special situations of case 4– Start from the middle of the data structure every
time for each packet– Cases 1, 2, and 3 will eventually be redirected
back to the root and will operate as if they started from the root
04/19/2313
Memory Architecture
Guarantee worst-case performance On-chip storage for data structure Similar to cache design Wide word reference For ASIC design, memory reference can use
node addressable scheme to reduce pointer size further
04/19/2314
Memory Architecture—Cont.
Node size = Line width– 64 bytes in theory– 41 bytes in reality– Remaining bytes
are not constructed
0 40 63
Address 23:6
04/19/2315
Processor Design
Preprocessing
LoadData
Effective MemoryAddress Resolution
AddressCheck
Signature StorageUnit Access
BitmapProcessing
Next NodeAddress Calculation
DataCheck
MatchCheck
Next RoundPreparation
Post-processing
04/19/2316
Processor Design—Preprocessing
Multiple packets are buffered Contents are loaded to queues on-chip Each byte of the content is accessed
sequentially Head and tail pointers required for enqueue
and dequeue Start and end pointers required to indicate
start and end of packet
04/19/2317
Processor Design—Preprocessing Cont.
Packets are assumed to be independent Data from the same packet always occupies
the same queue Number of queues are proportional to
number of stages in data path Size of queues can be inversely proportional
to number of queues
04/19/2318
Processor Design—Core
Load data– A counter determines from which queue data is
loaded– 1 byte is loaded from a different queue each cycle – No data dependency in the data path– Counter value is passed to the pipeline register
along with data byte to keep track of queue
04/19/2319
Processor Design—Core Cont.
Effective memory address resolution– Check start pointer to determine whether this is
the starting byte of a packet– Starting byte of a packet
Use byte to index into a table to find the address of the first instance of this byte in data structure
Reset all flags associated with this queue
– Not the starting byte Use the next node address computed from previous
byte
04/19/2320
Processor Design—Core Cont.
Address check– Determine if effective address is root node
Set root flag (RF)
04/19/2321
Processor Design—Core Cont.
Signature storage unit access– Bitmap loaded into 8 bitmap registers (BMR0-7),
each 32 bits– Next node pointer loaded to next node register
(NNR), 24 bits– Failure pointer loaded to failure register (FR)– Rules pointer loaded to rules register (RR)
04/19/2322
Processor Design—Core Cont.
Bitmap processing– 8 independent popcount units to count the 1’s in
BMR0-7– Bits 0-4 of current data byte is used to load a bit
from each BMR– Bits 5-7 of current data byte is used to select the
proper bit and load value of this BMR to PCR– Check if bit is 1 and set BMF (flag) value
04/19/2323
Processor Design—Core Cont.
Next node address calculation– If (BMF = 0) next node address = FR– If (BMF = 1)
Perform popcount on PCR to the proper bit (based on bits 0-5 of current byte)
Sum all popcount values up to proper bit Next node address = (this sum * node size ) + NNR
– Use saturated add– Value is stored back to NNR
04/19/2324
Processor Design—Core Cont.
Data check– Check end pointer to determine if current byte is
end of packet Set end flag (EF)
– Check NNR value to determine if leaf node is reached
Set match flag (MF)
04/19/2325
Processor Design—Core Cont.
Match check– Case 2: if (RF = 1) and (MF = 1)
Set complete match flag (CMF)
– Case 4: If (RF = 0) and (MF = 1) Set partial match flag (PMF)
– Case 3: If (EF = 1) and (current node != root node) and (NNR != FR)
Set PMF
04/19/2326
Processor Design—Core Cont.
Next round preparation– Route NNR value back to load data stage– If (CMF = 1)
Set flush flag (FF) to signal to preprocessing unit to load new packet to this queue
– If ignore flag (IF) is set Ignore processing result
– Reset CMF, PMF, EF
04/19/2327
Processor Design—Post-processing
If (CMF = 1) or (PMF = 1) – Use RR value to access rules database– Perform actions according to rule
If (EF = 1) and (CMF = 0) and (PMF = 0)– Release packet to router
If (FF = 1)– Set IF to invalidate subsequent data from this queue– Reset FF
04/19/2328
Preliminary Results
2MB signature storage unit– 3.6 ns access time using CACTI– Assume storage unit access is critical path– Translate to 250 MHz conservatively
Support up to 2Gbps
04/19/2329
Conclusion
Algorithm is optimized for hardware implementation
Memory requirements can be met by current technology
Implementation is feasible