A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun...
-
Upload
kennedi-shirah -
Category
Documents
-
view
214 -
download
0
Transcript of A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun...
![Page 1: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/1.jpg)
A Scalable and ReconfigurableSearch Memory Substrate for
High Throughput Packet Processing
Sangyeun Cho and Rami Melhem
Dept. of Computer ScienceUniversity of Pittsburgh
![Page 2: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/2.jpg)
Feb. 6 ’07 – CCW-21
Lookup ops in packet processing Packet forwarding
• Given an IP address• Look up in a table (IP table) a matching prefix• Make sure the chosen prefix is longest LPM (Longest Prefix
Matching)
Rule-based packet filtering• Given a set of packet fields• Look up in a rule database matching entries
Deep packet inspection• Given a string in packet payload• Look up in a signature database matching entries
![Page 3: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/3.jpg)
Feb. 6 ’07 – CCW-21
Lookup performance scalability Lookup performance must match increasing line speeds
• For OC-768, up to 104M packets must be processed per second• Network traffic has doubled every year [McKeown03]
• Router capacity doubles every 18 months
Capacity pressure• Routing tables (~200K prefixes in a core router) are growing [RIS]• # of firewall rules increases; 100K rules are practical [Baboescu04]
• IPv6
Power and thermal issue already a critical limiting factor in network processing device design [McKeown03]
Two conventional lookup solutions• Software methods (tries, hash table, …)• Hardware methods (TCAM, Bloom filter, …)
![Page 4: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/4.jpg)
Feb. 6 ’07 – CCW-21
IP lookup using a trie
Consider an IP address: 0 1 0 0 0 1 1 0
“flexibility”
high memory capacity requirement
high memory bandwidth requirement
not SCALABLE
![Page 5: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/5.jpg)
Feb. 6 ’07 – CCW-21
IP lookup using TCAM
Consider an IP address: 0 1 0 0 0 1 1 0
110100*110101*110111*01000*01100*01101*11011*0100*0110*1101*10*0*
sort beforestoring
choose the firstamong the matched high bandwidth, constant time
lookup
TCAMs are relatively small, expensive
power consumption very high
not SCALABLE
![Page 6: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/6.jpg)
Feb. 6 ’07 – CCW-21
CA-RAM – a hybrid approach
Can we do better than the existing conventional schemes?• Flexibility and search performance• Exploit optimized RAM designs
CA-RAM combines hashing w/ hardware parallel matching
CA-RAM design goals• High lookup performance• Low power consumption• Smaller chip area per stored datum• Straightforward system-level integration
![Page 7: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/7.jpg)
Feb. 6 ’07 – CCW-21
CA-RAM – Content Addressable RAM
Separate match logic and memory Match logic for a single row, not every row Allows the use of dense RAM technology Enables highly reconfigurable match logic Keep keys sorted in each row, not in entire array
Match logic
Memory cells
Conventional CAM/TCAM CA-RAM
![Page 8: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/8.jpg)
Feb. 6 ’07 – CCW-21
Very simple, yet efficient
Use hashing to store keys in a particular row To look up, hash the key and retrieve one row Perform matching on entire row in parallel Achieve full content addressability w/o paying overhead!
Index generato
r
Keyi1
Match processor1
…
…
Keyi2
Keyj2Keyj1
Match processor2…
key
![Page 9: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/9.jpg)
Feb. 6 ’07 – CCW-21
Pipelined CA-RAM operation
Index generator
Search keyKeyi1
Match processor1
Keyi2
Keyj2Keyj1
Match processor2
Result
Match processor3
Keyi3
Keyj3
Step 1 Step 2 Step 3 Step 4
Index
Keyj2Keyj1 Keyj3
Search key Match processor2
Index generationMemory accessKey matchingResult forwarding
![Page 10: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/10.jpg)
Feb. 6 ’07 – CCW-21
Dealing w/ bucket overflows
Careful design of hash function Increase bucket size
• Reduce load factor (); = # of occupied entries / # of total entries
Use “chaining”; store overflows in subsequent rows• Multiple accesses per lookup
Use a small overflow CAM, accessed in parallel• Similar to popular “victim caching” in computer architecture
Use two-level hashing and employ multiple CA-RAM banks
……
![Page 11: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/11.jpg)
Feb. 6 ’07 – CCW-21
CA-RAM reconfig. opportunities
Reconfigurable match logic allows:
Adapting key size to apps• Same hardware to support multiple apps or standards
……
![Page 12: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/12.jpg)
Feb. 6 ’07 – CCW-21
Adapting key size
Keyi1
Reconfigurable match logic
Keyi2
Keyj2Keyj1
Keyi3
Keyj3
Match information
Keyi1 Keyi2
Keyj2Keyj1 Adapting key size is straightforward
Will benefit supporting multiple apps/ standards
Select key bitsfor matching
![Page 13: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/13.jpg)
Feb. 6 ’07 – CCW-21
CA-RAM reconfig. opportunities
Reconfigurable match logic allows:
Adapting key size to apps• Same hardware to support multiple apps or standards
Binary and ternary matching• Some apps require ternary matching, some don’t
……
![Page 14: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/14.jpg)
Feb. 6 ’07 – CCW-21
Supporting binary/ternary matching
Reconfigurable match logic
Match information
Keyi1 Keyi2
Keyj2Keyj1
Search key
Maskj1
Maski1
Developed configurable comparator
T-matching requires 2 bits / 1 symbol
Supporting different types of matching in different bit positions feasible
Consider maskbits or not
![Page 15: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/15.jpg)
Feb. 6 ’07 – CCW-21
CA-RAM reconfig. opportunities
Reconfigurable match logic allows:
Adapting key size to apps• Same hardware to support multiple apps or standards
Binary and ternary matching• Some apps require ternary matching, some don’t
Storing data and keys in a CA-RAM module• Cuts # of memory accesses for IP lookup by half
……
![Page 16: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/16.jpg)
Feb. 6 ’07 – CCW-21
Simult. key matching & data access
Reconfigurable match logic
Match information
Keyi1 Keyi2
Keyj2Keyj1
Search key
Dataj1
Datai1
Data access follows TCAM lookup
CA-RAM supports data embedding
Cuts memory traffic & latency by half
Match information & Data
Match key &bypass data
![Page 17: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/17.jpg)
Feb. 6 ’07 – CCW-21
CA-RAM reconfig. opportunities
Reconfigurable match logic allows:
Adapting key size to apps• Same hardware to support multiple apps or standards
Binary and ternary matching• Some apps require ternary matching, some don’t
Storing data and keys in a CA-RAM module• Cuts # of memory accesses for IP lookup by half
Providing range checking capabilities• Beneficial for rule-based packet filtering
……
![Page 18: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/18.jpg)
Feb. 6 ’07 – CCW-21
Supporting range checking
Reconfigurable match logic
Match information
Keyi1 Rangei1
Rangej1Keyj1
Search key
(Range checking causes troubles)
(Entries must be expanded)
CA-RAM can upport range checking efficiently Match key &
check range
![Page 19: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/19.jpg)
Feb. 6 ’07 – CCW-21
Evaluation
We implemented a CA-RAM design (w/ reconfigurability) and evaluated its power and area advantages over state-of-the-art TCAMs
We experimented with real routing tables to estimate the load factor and the average memory accesses per lookup
![Page 20: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/20.jpg)
Feb. 6 ’07 – CCW-21
Comparing CA-RAM and TCAM
0123456789
10
16T SRAM-basedTCAM
8T DRAM-basedTCAM
6T DRAM-basedTCAM
DRAM-based ternaryCA-RAM
Per
Cel
l Are
a (u
m2)
@13
0nm
4.5x
11x
0
1
2
3
4
5
6
7
8
16T SRAM-basedTCAM
8T DRAM-basedTCAM
6T DRAM-basedTCAM
DRAM-based ternaryCA-RAM
4.5M
b P
ower
(W
) @
143M
Hz
14x
4x
Cell area (m2)@130nm CMOS
Power (W)4.5Mb @143MHz
CA-RAM area advantage 4.5x~11x
CA-RAM power advantage 4x~14x
![Page 21: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/21.jpg)
Feb. 6 ’07 – CCW-21
Mapping a large IP routing table
Consider multiple design points:
Design B
Design A
Design D
Design C
Design EDesign F
2,048 rows (32 entries)
4,096 rows (64 entries)
( = 0.47)
( = 0.40)
( = 0.36)
( = 0.36)
( = 0.24)
( = 0.36)
![Page 22: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/22.jpg)
Feb. 6 ’07 – CCW-21
0
0.5
1
1.5
2
2.5
Design A Design B Design C Design D Design E Design F
Mapping a large IP routing table
0%
10%
20%
30%
40%
Design A Design B Design C Design D Design E Design F
Spilled entries
0
0.5
1
1.5
2
2.5
Design A Design B Design C Design D Design E Design F
Average memoryaccess latency
( = 0.47) ( = 0.40) ( = 0.36) ( = 0.36) ( = 0.24) ( = 0.36)
“Uniform” traffic
“Skewed” traffic
With a properly chosen ,
CA-RAM achieves near-constant AMAL
![Page 23: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/23.jpg)
Feb. 6 ’07 – CCW-21
Mapping a large IP routing table
0
0.2
0.4
0.6
0.8
1
1.2
TCAM TCAM
CA-RAM
CA-RAM
Area Power
CA-RAM advantageous over TCAM
Design B
![Page 24: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/24.jpg)
Feb. 6 ’07 – CCW-21
Conclusions
Compared w/ software methods• Less # of memory accesses; higher lookup performance
Compared w/ TCAM• Higher density matching that of DRAM large lookup table• Exceeds the speed of TCAM• Low power – a critical advantage for cost-effective system
design
Reconfigurability• Can accommodate apps having different key/record sizes,
binary vs. ternary searching requirements, range checking, …• Can adopt new standards much more easily, e.g., IPv6
![Page 25: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.](https://reader036.fdocuments.us/reader036/viewer/2022070411/56649c7b5503460f9492f272/html5/thumbnails/25.jpg)
Feb. 6 ’07 – CCW-21
CA-RAM components
Index generator
Result Bus
Keyi1
Match processor1 …
…
…
Keyi2
Keyj2Keyj1
Match processorsMatch processor2
C bits
2R rows
N bits