Post on 14-Jan-2016
A Scalable and ReconfigurableSearch Memory Substrate for
High Throughput Packet Processing
Sangyeun Cho and Rami Melhem
Dept. of Computer ScienceUniversity of Pittsburgh
Feb. 6 ’07 – CCW-21
Lookup ops in packet processing Packet forwarding
• Given an IP address• Look up in a table (IP table) a matching prefix• Make sure the chosen prefix is longest LPM (Longest Prefix
Matching)
Rule-based packet filtering• Given a set of packet fields• Look up in a rule database matching entries
Deep packet inspection• Given a string in packet payload• Look up in a signature database matching entries
Feb. 6 ’07 – CCW-21
Lookup performance scalability Lookup performance must match increasing line speeds
• For OC-768, up to 104M packets must be processed per second• Network traffic has doubled every year [McKeown03]
• Router capacity doubles every 18 months
Capacity pressure• Routing tables (~200K prefixes in a core router) are growing [RIS]• # of firewall rules increases; 100K rules are practical [Baboescu04]
• IPv6
Power and thermal issue already a critical limiting factor in network processing device design [McKeown03]
Two conventional lookup solutions• Software methods (tries, hash table, …)• Hardware methods (TCAM, Bloom filter, …)
Feb. 6 ’07 – CCW-21
IP lookup using a trie
Consider an IP address: 0 1 0 0 0 1 1 0
“flexibility”
high memory capacity requirement
high memory bandwidth requirement
not SCALABLE
Feb. 6 ’07 – CCW-21
IP lookup using TCAM
Consider an IP address: 0 1 0 0 0 1 1 0
110100*110101*110111*01000*01100*01101*11011*0100*0110*1101*10*0*
sort beforestoring
choose the firstamong the matched high bandwidth, constant time
lookup
TCAMs are relatively small, expensive
power consumption very high
not SCALABLE
Feb. 6 ’07 – CCW-21
CA-RAM – a hybrid approach
Can we do better than the existing conventional schemes?• Flexibility and search performance• Exploit optimized RAM designs
CA-RAM combines hashing w/ hardware parallel matching
CA-RAM design goals• High lookup performance• Low power consumption• Smaller chip area per stored datum• Straightforward system-level integration
Feb. 6 ’07 – CCW-21
CA-RAM – Content Addressable RAM
Separate match logic and memory Match logic for a single row, not every row Allows the use of dense RAM technology Enables highly reconfigurable match logic Keep keys sorted in each row, not in entire array
Match logic
Memory cells
Conventional CAM/TCAM CA-RAM
Feb. 6 ’07 – CCW-21
Very simple, yet efficient
Use hashing to store keys in a particular row To look up, hash the key and retrieve one row Perform matching on entire row in parallel Achieve full content addressability w/o paying overhead!
Index generato
r
Keyi1
Match processor1
…
…
Keyi2
Keyj2Keyj1
Match processor2…
key
Feb. 6 ’07 – CCW-21
Pipelined CA-RAM operation
Index generator
Search keyKeyi1
Match processor1
Keyi2
Keyj2Keyj1
Match processor2
Result
Match processor3
Keyi3
Keyj3
Step 1 Step 2 Step 3 Step 4
Index
Keyj2Keyj1 Keyj3
Search key Match processor2
Index generationMemory accessKey matchingResult forwarding
Feb. 6 ’07 – CCW-21
Dealing w/ bucket overflows
Careful design of hash function Increase bucket size
• Reduce load factor (); = # of occupied entries / # of total entries
Use “chaining”; store overflows in subsequent rows• Multiple accesses per lookup
Use a small overflow CAM, accessed in parallel• Similar to popular “victim caching” in computer architecture
Use two-level hashing and employ multiple CA-RAM banks
……
Feb. 6 ’07 – CCW-21
CA-RAM reconfig. opportunities
Reconfigurable match logic allows:
Adapting key size to apps• Same hardware to support multiple apps or standards
……
Feb. 6 ’07 – CCW-21
Adapting key size
Keyi1
Reconfigurable match logic
Keyi2
Keyj2Keyj1
Keyi3
Keyj3
Match information
Keyi1 Keyi2
Keyj2Keyj1 Adapting key size is straightforward
Will benefit supporting multiple apps/ standards
Select key bitsfor matching
Feb. 6 ’07 – CCW-21
CA-RAM reconfig. opportunities
Reconfigurable match logic allows:
Adapting key size to apps• Same hardware to support multiple apps or standards
Binary and ternary matching• Some apps require ternary matching, some don’t
……
Feb. 6 ’07 – CCW-21
Supporting binary/ternary matching
Reconfigurable match logic
Match information
Keyi1 Keyi2
Keyj2Keyj1
Search key
Maskj1
Maski1
Developed configurable comparator
T-matching requires 2 bits / 1 symbol
Supporting different types of matching in different bit positions feasible
Consider maskbits or not
Feb. 6 ’07 – CCW-21
CA-RAM reconfig. opportunities
Reconfigurable match logic allows:
Adapting key size to apps• Same hardware to support multiple apps or standards
Binary and ternary matching• Some apps require ternary matching, some don’t
Storing data and keys in a CA-RAM module• Cuts # of memory accesses for IP lookup by half
……
Feb. 6 ’07 – CCW-21
Simult. key matching & data access
Reconfigurable match logic
Match information
Keyi1 Keyi2
Keyj2Keyj1
Search key
Dataj1
Datai1
Data access follows TCAM lookup
CA-RAM supports data embedding
Cuts memory traffic & latency by half
Match information & Data
Match key &bypass data
Feb. 6 ’07 – CCW-21
CA-RAM reconfig. opportunities
Reconfigurable match logic allows:
Adapting key size to apps• Same hardware to support multiple apps or standards
Binary and ternary matching• Some apps require ternary matching, some don’t
Storing data and keys in a CA-RAM module• Cuts # of memory accesses for IP lookup by half
Providing range checking capabilities• Beneficial for rule-based packet filtering
……
Feb. 6 ’07 – CCW-21
Supporting range checking
Reconfigurable match logic
Match information
Keyi1 Rangei1
Rangej1Keyj1
Search key
(Range checking causes troubles)
(Entries must be expanded)
CA-RAM can upport range checking efficiently Match key &
check range
Feb. 6 ’07 – CCW-21
Evaluation
We implemented a CA-RAM design (w/ reconfigurability) and evaluated its power and area advantages over state-of-the-art TCAMs
We experimented with real routing tables to estimate the load factor and the average memory accesses per lookup
Feb. 6 ’07 – CCW-21
Comparing CA-RAM and TCAM
0123456789
10
16T SRAM-basedTCAM
8T DRAM-basedTCAM
6T DRAM-basedTCAM
DRAM-based ternaryCA-RAM
Per
Cel
l Are
a (u
m2)
@13
0nm
4.5x
11x
0
1
2
3
4
5
6
7
8
16T SRAM-basedTCAM
8T DRAM-basedTCAM
6T DRAM-basedTCAM
DRAM-based ternaryCA-RAM
4.5M
b P
ower
(W
) @
143M
Hz
14x
4x
Cell area (m2)@130nm CMOS
Power (W)4.5Mb @143MHz
CA-RAM area advantage 4.5x~11x
CA-RAM power advantage 4x~14x
Feb. 6 ’07 – CCW-21
Mapping a large IP routing table
Consider multiple design points:
Design B
Design A
Design D
Design C
Design EDesign F
2,048 rows (32 entries)
4,096 rows (64 entries)
( = 0.47)
( = 0.40)
( = 0.36)
( = 0.36)
( = 0.24)
( = 0.36)
Feb. 6 ’07 – CCW-21
0
0.5
1
1.5
2
2.5
Design A Design B Design C Design D Design E Design F
Mapping a large IP routing table
0%
10%
20%
30%
40%
Design A Design B Design C Design D Design E Design F
Spilled entries
0
0.5
1
1.5
2
2.5
Design A Design B Design C Design D Design E Design F
Average memoryaccess latency
( = 0.47) ( = 0.40) ( = 0.36) ( = 0.36) ( = 0.24) ( = 0.36)
“Uniform” traffic
“Skewed” traffic
With a properly chosen ,
CA-RAM achieves near-constant AMAL
Feb. 6 ’07 – CCW-21
Mapping a large IP routing table
0
0.2
0.4
0.6
0.8
1
1.2
TCAM TCAM
CA-RAM
CA-RAM
Area Power
CA-RAM advantageous over TCAM
Design B
Feb. 6 ’07 – CCW-21
Conclusions
Compared w/ software methods• Less # of memory accesses; higher lookup performance
Compared w/ TCAM• Higher density matching that of DRAM large lookup table• Exceeds the speed of TCAM• Low power – a critical advantage for cost-effective system
design
Reconfigurability• Can accommodate apps having different key/record sizes,
binary vs. ternary searching requirements, range checking, …• Can adopt new standards much more easily, e.g., IPv6
Feb. 6 ’07 – CCW-21
CA-RAM components
Index generator
Result Bus
Keyi1
Match processor1 …
…
…
Keyi2
Keyj2Keyj1
Match processorsMatch processor2
C bits
2R rows
N bits