ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter...
-
Upload
norman-burke -
Category
Documents
-
view
221 -
download
1
Transcript of ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter...
![Page 1: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/1.jpg)
ECE 526 – Network ECE 526 – Network Processing Systems Processing Systems
DesignDesignNetwork Processor Architecture and
ScalabilityChapter 13,14: D. E. Comer
![Page 2: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/2.jpg)
Ning Weng ECE 526 2
NP ArchitecturesNP Architectures• Last class:
─ Key requirement of network processor: flexibility and scalability
─ Optimized instruction set and parallel processing using multiprocessors
• This class:─ Internal organization of NP:
• Computation, storage and communication• Operating support• Content addressable memory (CAM)
─ NP scaling issues
![Page 3: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/3.jpg)
Ning Weng ECE 526 3
NP ArchitecturesNP Architectures• NP architecture characteristics
─ Computation• Processor hierarchy• Special-purpose functional units
─ Storage• Memory hierarchy• Content addressable memory (CAM)
─ Communication• Internal buses• External interfaces
─ Operation support• Concurrent/parallel execution support• Programming models• Dispatch mechanisms
![Page 4: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/4.jpg)
Ning Weng ECE 526 4
Processor FunctionalityProcessor Functionality
![Page 5: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/5.jpg)
Ning Weng ECE 526 5
Processor PyramidProcessor Pyramid
![Page 6: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/6.jpg)
Ning Weng ECE 526 6
Packet Flow through Packet Flow through HierarchyHierarchy
• Accommodating tasks of different complexity and frequency─ Low level: simple and
frequent processing─ High level: occasional
and complex processing
• Computation scaling─ Faster processor─ More concurrent threads─ More processors─ More processor types
![Page 7: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/7.jpg)
Ning Weng ECE 526 7
Memory HierarchyMemory Hierarchy• Different memory technologies used for performance, cost
and area• Conventional Approach:
─ Register + cache + off-chip DRAM • Exploiting locality: temporal and spatial
─ Optimized for average case─ Transparent to programmer
• Network Processors:─ Register, scratch pad, control store, onboard RAM, CAM/TCAM,
SRAM and SDRAM ─ Specialized for network processing application
• Little temporal locality
─ Explicit to application developer• Different to programming• More control
─ Memory hierarchy is not “cached” but used explicitly
![Page 8: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/8.jpg)
Ning Weng ECE 526 8
Memory TechnologyMemory Technology• Characterized by access latency, area
─ SRAM: 2-10 ns, 4-6 transistors─ DRAM: 50-70 ns, 1 or 3 transistors
• What data should be store where?─ Instruction data─ Packets data: header, payload and meta-data─ Temporal data: data structure allocated on the stack─ Application data: persistent data, e.g., routing table, rule file
![Page 9: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/9.jpg)
Memory Size ExampleMemory Size ExampleConsider a network system that processes IP
datagram. Assume the system executes 5,000 instructions per packet, each instruction occupies 4 bytes, 10% of instructions need to access 4-byte value memory, each datagram consists of 1500 bytes, a lookup examines 10 4-byte values on average in an IP routing table, and a datagram arrives and leaves in an Ethernet frame. Compute the total number of memory locations accessed to process on datagram. Assume no memory caching.
─ Instruction Memory:─ Packet Memory: ─ Application Memory: ─ Temporary Memory: Total:
Ning Weng ECE 526 9
![Page 10: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/10.jpg)
Ning Weng ECE 526 10
Memory ScalingMemory Scaling• Memory access time: raw access speed
─ Technology dependent─ Important for random access
• Memory bandwidth─ Important for overall system performance─ Scale with
• Multiple ports• Multiple banks• Wider bus
─ Limits by• Pins and package cost
![Page 11: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/11.jpg)
Ning Weng ECE 526 11
Content Addressable Content Addressable MemoryMemory
• Not using address to locate content
• CAM using content as input in a query-style format
• Organized as array of slots• Combination of
mechanisms─ Random access storage─ Exact-match pattern search
• Rapid search enabled with parallel hardware
![Page 12: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/12.jpg)
Ning Weng ECE 526 12
Lookup using Conventional Lookup using Conventional CAMCAM
• Given ─ Pattern for which to search─ Known as key
• CAM returns─ First slot that match key or─ All slots that match key
• Algorithmfor each slot do { if (key == slot) { declare key matches slot; } else { declare key does not
match slot; }}
•
![Page 13: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/13.jpg)
Ning Weng ECE 526 13
Ternary CAM (TCAM)Ternary CAM (TCAM)• Regular CAM
─ Binary value: 0 and 1─ Requiring key to match all the
content in one slot─ Not flexible
• TCAM─ Ternary value: 0, 1 and don’t
care─ Implemented using masking of
entries• Good for network processor
flow classification
![Page 14: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/14.jpg)
Ning Weng ECE 526 14
TCAM LookupTCAM Lookup• Each slot has bit mask• Hardware uses mask to decide which bits to test• Algorithm
for each slot do { if (key & mask ) == (slot & mask)) { declare key matches slot; } else { declare key does not match slot; }}
![Page 15: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/15.jpg)
Ning Weng ECE 526 15
Partial Matching using Partial Matching using TCAMTCAM
• Key matched slot 1• Packet belonging to flow ID: 00.02• Here “additional information” stored in each slot
![Page 16: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/16.jpg)
Ning Weng ECE 526 16
Classification using TCAMClassification using TCAM• Flexibility: “additional information” stored in separate memory• Extracting values from fields in headers• Forming values in contiguous string• Using a key for TCAM lookup• Storing classification in slot
![Page 17: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/17.jpg)
Ning Weng ECE 526 17
CommunicationCommunication• Internal interfaces: channels between processing
elements, memories─ Internal bus─ Hardware FIFO: sequential access─ Transfer register: random access─ Onboard shared memory: shared random access
• External interfaces─ Memory interfaces: accesses to larger off-chip memory─ Direct I/O interfaces: e.g., access to link interfaces─ Bus interfaces: accesses to other devices, e.g., control
CPU─ Switching fabric interface
• Access to switching fabric• Several standards (e.g., CSIX by NP Forum)
![Page 18: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/18.jpg)
Communication Cost Communication Cost ExampleExample
• Consider a second generation network system that forwards IP datagram. If the system has 16 interfaces that each connect to an OC-192 line (data rate is 10 Gbps). These 16 interfaces are interconnected with a shared communication channel. The packet size is in the range of 40 bytes to 1500 bytes. What aggregate bandwidth is needed on the communication channel for the two design scenarios:─ Every bit of a packet transfers through the shared
communication channels.
─ Only a 4-byte packet memory address transfers through the shared communication channels.
Ning Weng ECE 526 18
![Page 19: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/19.jpg)
Ning Weng ECE 526 19
NP Operating SupportNP Operating Support• Programming model: interrupt, event vs. thread
based• Parallel and concurrent execution support• Dispatch mechanism: how threads are initiated
![Page 20: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/20.jpg)
Ning Weng ECE 526 20
SummarySummary• NP scaling by
─ Heterogeneous multiprocessors structured hierarchically─ Mixed memory technologies explicitly available to
programmer─ Different communication mechanisms─ Operating support important to achieve high system
performance
• NP scaling limited by─ Physical space: chip area (less than 400 mm2)─ Pin limits and packaging technology─ Power consumption and heat dissipation
![Page 21: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649e365503460f94b25615/html5/thumbnails/21.jpg)
Ning Weng ECE 526 21
For Next Class and For Next Class and ReminderReminder
• Read Comer: chapter 15 and 16• Homework solution on-line by Friday• Midterm: 10/6• Project
─ topic finalized 10/5 (group leader email me)─ proposal presentation 10/22