Gigabit Rate Packet Pattern-Matching Using TCAM
-
Upload
cairo-newton -
Category
Documents
-
view
29 -
download
0
description
Transcript of Gigabit Rate Packet Pattern-Matching Using TCAM
Gigabit Rate Packet Pattern-Matching Using TCAM
Fang Yu and Randy H. KatzUC Berkeley
T. V. LakshmanBell Laboratories, Lucent Technologies
Motivation
Numerous malicious probes and worms End-host based solution is not sufficient
It is hard for all end users to apply patches quickly Worms can contaminate millions of hosts within hours
Network based solution – network intrusion detection systems (NIDS) Perform packet scanning for complicated worm
patterns in the network Stop worms from reaching end hosts Easy to manage for network administrators
Pattern Matching for NIDS
Thousands of complicated patterns Patterns have variable lengths Patterns with correlation
“abc” followed by “cde” within 3 bytes
Patterns with negation “user” not followed by “|0a|” within 50 bytes
Require packet payload scanning Not supported by most current network devices,
which support packet header processing only
Current Pattern Matching Schemes
Software based solutions Speed is slow
FPGA solutions Build large DFA or NFA for all patterns Build a KMP based search engine for each pattern
Bloom Filters One bloom filter for each pattern length Not scalable when pattern lengths vary dramatically
Ternary-CAM (TCAM)
Fully associative memory compare input string with all the entries in parallel If multiple matches, report the index
of the first match Each cell takes one of three logic
states ‘0’, ‘1’, and ‘?’(don’t care)
Current TCAM technology Fast Match Time: 4 ns Size: 1-2MB Width configurable
1024 entries *1024 bytes width 2048 entries *512 bytes width
192.128.101.100
168.100.???.???
192.128.???.???
Match192.128.101.???
Input
TCAM
entry
cell
width
Pattern Matching with TCAM
Put all the patterns into the TCAM Assume patterns are less or equal to
the TCAM width If less than the TCAM width, pad with
‘?’ Order the patterns according to lengths
in reverse order When matching entry ABC, report
matching of both pattern ABC and AB
Shift one byte each time
A B C D E F
C D E F
A B ? ?
MatchA B C ?
Input
TCAM
A B C D E F
C D E F
A B ? ?
A B C ?
Input
TCAM
Analysis
Scan speed: 4 ns per TCAM lookup, shift one byte at a time 8bits/4ns =2 Gbps worst case scan rate
Limitation: require all the patterns to be shorter or equal to the TCAM width Set the TCAM width >= longest pattern’s length
Pad all short patterns to TCAM width Waste TCAM resources
Can we set TCAM width smaller and cut long patterns into smaller patterns?
Long Patterns
Cut long patterns into smaller patterns TCAM width w=4 bytes DEFGABCDL is split into DEFG, ABCD, and L
Pad the last partial pattern with the tail of the second last partial pattern DEFGABCDL is split into DEFG, ABCD, and BCDL
DEFGABCDL
DEFGABCD
L
DEFGABCDL
DEFGABCD
BCDLShort partial patterns, many TCAM hits
Concatenate Partial Patterns into Long Patterns Patterns:
ABCDABCD
DEFGABCDL
DEFGDEF
DEF
,
D E F G A B C D LInput
TCAM
Matching Table
Partial Hit List (PHL)
Position Matched entryD E F G
A B C D
B C D L
G D E F
D E F ?
D E F G
Prefix Index
Suffix Index
DistanceMatched Long Pattern Index
1(ABCD) 1(ABCD) 4 ABCDABCD
2(DEFG) 1(ABCD) 4 3(DEGFABCD)
2(DEFG) 3(GDEF) 3 (DEGFDEF
3(DEGFABCD) 1(ABCD) 4 ABCDABCD
3(DEGFABCD) 2(BCDL) 1 DEFDABCDL
Position Matched entry
1 2(DEFG)D E F G
A B C D
B C D L
G D E F
D E F ?
D E F G
A B C D
B C D L
G D E F
D E F ?
D E F G
A B C D
B C D L
G D E F
D E F ?
D E F G
A B C D
B C D L
G D E F
D E F ?
A B C DPosition Matched entry
5 3(DEFGABCD)
D E F G
A B C D
B C D L
G D E F
D E F ?
B C D L
Correlated Patterns
One pattern after another E.g. “ABCD” followed by
“DEF” within 10 bytes The matching result of
“ABCD” has to be in PHL for 10 positions
A B C D
A A B ?
D E F G
Input
TCAM
A B C D
D E F ?
A ? ? ?
A B C D A D E F G
Pattern D E F
D E F G
A B C D
A ? ? ?
D E F ?
A A B ?
Matching Process
TCAM reports a miss No extra memory lookup
TCAM reports a hit If it is a partial pattern
For every item in PHL One memory lookup into matching table to see whether it
generates a valid pattern
Examples based on statistical analysis n = 2000, mi = 200 bytes, w =4 bytes. Associate hit
rate is 2.2e-5, PHL size is 8.8e-5 w = 8 bytes, associate hit rate is 2.6e-15, PHL size is
2.08e-14
Associate hit rate
PHL size
Malicious Attack?
When j = 1, probability is:
1-
E.g., n = 1000 and m=4, it is 0.029
When j increases, the probability
increases. If j=m, then probability =1
Window: distance between two correlated patterns After matching a pattern, what is possibility to match
another at window size j positions later?
A B C D
Input A B C D A A G G
Pattern
B C D A
)))2)!*(()2/((()!)2(( 181818 nmmm n .
A B C D
Input A B C D A A G G
Pattern
A A G G Worst case PHL size is at least: window size / m
Simulation Results on ClamAV
ClamAv virus signature database Version 0.15, which contains simple patterns only 1768 patterns, varying from 6 bytes to 2189 bytes
0
50
100
150
200
250
300
350
400
1 10 100 1000 10000Length (bytes)
Nu
mb
er
of
Pa
tte
rns
Effect of TCAM Width
Total TCAM space: Increase when w increases,
because of padding Mapping Table Size
Decreases as w increases
because of fewer partial patterns
1
10
100
1000
10000
4 8 16 32 64 128
256
512
1024
TCAM width(in bytes)
TC
AM
Sp
ace
(KB
)
0
0
1
10
100
1000
10000
Map
pin
g T
able
Siz
e (M
B)
TCAM Spaces ConsumedMemory Space for Mapping Table
wmw i /*
2))1/((* i
i wmw
PHL Size on Real Data
For each packet, record average and maximum PHL size Avg: mean of the average PHL size over all packets AvgMax: mean of the maximum PHL sizes Max: maximum PHL size over all packets
TCAMWidth
MIT Dump Berkeley Dump
Avg AvgMax
Max Avg AvgMax
Max
4 0.042 0.27 4 0.03 0.48 4
8 4.8e-6 5.6e-4 8 1.e-6 1.9e-5 7
16 0 0 0 4.3e-7 5.8e-6 3
Simulation Results on Snort
SNORT system (v2.1.2) has 1991 rules 1039 simple patterns 527 correlated patterns
Up to 7 sub-patterns
Set TCAM width as 128 bytes Patterns fit into a TCAM
size of 295KB
Win-dowSize
MIT Dump Berkeley Dump
Avg AvgMax
Max Avg AvgMax
Max
20 0.5523 2.7683 8 0.4702 1.5765 12
40 0.9881 3.5376 14 0.6500 1.8661 18
60 1.3151 3.9960 14 0.7313 1.9652 23
80 1.5491 4.2158 16 0.7587 2.0373 24
100 1.6867 4.3485 18 0.7661 2.0740 25
120 1.7725 4.4475 18 0.7669 2.0768 25
140 1.8308 4.5722 19 0.7669 2.0768 25
160 1.8800 4.6643 19 0.7669 2.0768 25
180 1.9244 4.7386 19 0.7669 2.0768 25
200 1.9662 4.8079 20 0.7669 2.0768 25
Conclusions
Fast speed pattern matching is essential for building effective defenses against virus
Multiple pattern matching with TCAM Achieve multi-gigabit rate Search for thousands, or tens of thousands patterns
in parallel Support long patterns, correlated patterns, and also
patterns with negation, wildcards Can be extended to support higher rates with larger
TCAMs
Backup Slides
Long Patterns
What if pattern is longer than the width of TCAM? Split it into multiple partial patterns For example, TCAM width k=4
Patternindex
Pattern content
1 ABCD ABCD
2 DEFG ABCD L
3 DEFG DEF
4 DEF
4 bytes
D E F G
TCAM
A B C D
D E F ?
L ? ? ?
Short partial patterns, many TCAM hits
L ? ? ?
Statistical Analysis
Example n = 2000, mi = 200 bytes, w =4 bytes. Associate hit rate is 2.2e-5, PHL
size is 8.8e-5 w = 8 bytes, associate hit rate is 2.6e-15, PHL size is 2.08e-14
Assume random input string, independent patterns Number of patterns: n Pattern size: mi bytes for pattern i TCAM width: w Total entries for partial items in TCAM: Associate hit rate is
Ignoring the dependency between neighboring positions, PHL size is
)1/( i
i wm
wi
i wm
)2(
)1/(
8
w
ii wm
w)2(
)1/(*
8
Synthesized “Worst-case” Packets
Four sets of synthesized data 1, 10, and 100 randomly
inserted virus patterns per packet
0
5
10
15
20
16 32 64 128 256 512 1024
TCAM width
Max
Par
tial
Hit
Lis
t S
ize 1 Pattern/packet
10 Patterns/packet
100 Patterns/packet
0
0.05
0.1
0.15
0.2
0.25
0.3
16 32 64 128 256 512 1024
TCAM width(in bytes)
Ave
rag
e P
HL
Siz
e
1 Pattern/packet
10 Patterns/packet
100 Patterns/packet
0
1
2
3
4
5
16 32 64 128 256 512 1024
TCAM Width
AV
gM
ax P
HL
Siz
e
1 Pattern/packet
10 Patterns/packet
100 Patterns/packet
Memory Lookup Process
TCAM reports a miss No extra memory lookup Memory lookup process is idle
TCAM reports a hit One memory lookup in the combined pattern table Lookups in matching table if PHL is not empty
a aTCAM
Lookuptime
Position
a a a a a a a a
1 2 3 4 5 6 7 8 9 10
MemoryLookup
time
Performing Memory Lookups Idle
hit hit hit miss miss miss miss miss hit
n'
hit
Effects of Memory Ratio on Scan Rate Scan ratio
Total scanning time (including memory lookups) vs. the time spent on TCAM lookups only.
E.g., scan ratio=2 total scanning rate = TCAM access rate /2 Memory ratio
SRAM to TCAM access times
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
0.6 0.7 0.8 0.9 1% of Packets
Sc
an
Ra
tio
0.20.40.60.81
Memory Ratio