ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter...

21
ECE 526 – Network ECE 526 – Network Processing Systems Processing Systems Design Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer

Transcript of ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter...

Page 1: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

ECE 526 – Network ECE 526 – Network Processing Systems Processing Systems

DesignDesignNetwork Processor Architecture and

ScalabilityChapter 13,14: D. E. Comer

Page 2: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 2

NP ArchitecturesNP Architectures• Last class:

─ Key requirement of network processor: flexibility and scalability

─ Optimized instruction set and parallel processing using multiprocessors

• This class:─ Internal organization of NP:

• Computation, storage and communication• Operating support• Content addressable memory (CAM)

─ NP scaling issues

Page 3: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 3

NP ArchitecturesNP Architectures• NP architecture characteristics

─ Computation• Processor hierarchy• Special-purpose functional units

─ Storage• Memory hierarchy• Content addressable memory (CAM)

─ Communication• Internal buses• External interfaces

─ Operation support• Concurrent/parallel execution support• Programming models• Dispatch mechanisms

Page 4: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 4

Processor FunctionalityProcessor Functionality

Page 5: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 5

Processor PyramidProcessor Pyramid

Page 6: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 6

Packet Flow through Packet Flow through HierarchyHierarchy

• Accommodating tasks of different complexity and frequency─ Low level: simple and

frequent processing─ High level: occasional

and complex processing

• Computation scaling─ Faster processor─ More concurrent threads─ More processors─ More processor types

Page 7: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 7

Memory HierarchyMemory Hierarchy• Different memory technologies used for performance, cost

and area• Conventional Approach:

─ Register + cache + off-chip DRAM • Exploiting locality: temporal and spatial

─ Optimized for average case─ Transparent to programmer

• Network Processors:─ Register, scratch pad, control store, onboard RAM, CAM/TCAM,

SRAM and SDRAM ─ Specialized for network processing application

• Little temporal locality

─ Explicit to application developer• Different to programming• More control

─ Memory hierarchy is not “cached” but used explicitly

Page 8: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 8

Memory TechnologyMemory Technology• Characterized by access latency, area

─ SRAM: 2-10 ns, 4-6 transistors─ DRAM: 50-70 ns, 1 or 3 transistors

• What data should be store where?─ Instruction data─ Packets data: header, payload and meta-data─ Temporal data: data structure allocated on the stack─ Application data: persistent data, e.g., routing table, rule file

Page 9: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Memory Size ExampleMemory Size ExampleConsider a network system that processes IP

datagram. Assume the system executes 5,000 instructions per packet, each instruction occupies 4 bytes, 10% of instructions need to access 4-byte value memory, each datagram consists of 1500 bytes, a lookup examines 10 4-byte values on average in an IP routing table, and a datagram arrives and leaves in an Ethernet frame. Compute the total number of memory locations accessed to process on datagram. Assume no memory caching.

─ Instruction Memory:─ Packet Memory: ─ Application Memory: ─ Temporary Memory: Total:

Ning Weng ECE 526 9

Page 10: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 10

Memory ScalingMemory Scaling• Memory access time: raw access speed

─ Technology dependent─ Important for random access

• Memory bandwidth─ Important for overall system performance─ Scale with

• Multiple ports• Multiple banks• Wider bus

─ Limits by• Pins and package cost

Page 11: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 11

Content Addressable Content Addressable MemoryMemory

• Not using address to locate content

• CAM using content as input in a query-style format

• Organized as array of slots• Combination of

mechanisms─ Random access storage─ Exact-match pattern search

• Rapid search enabled with parallel hardware

Page 12: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 12

Lookup using Conventional Lookup using Conventional CAMCAM

• Given ─ Pattern for which to search─ Known as key

• CAM returns─ First slot that match key or─ All slots that match key

• Algorithmfor each slot do { if (key == slot) { declare key matches slot; } else { declare key does not

match slot; }}

Page 13: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 13

Ternary CAM (TCAM)Ternary CAM (TCAM)• Regular CAM

─ Binary value: 0 and 1─ Requiring key to match all the

content in one slot─ Not flexible

• TCAM─ Ternary value: 0, 1 and don’t

care─ Implemented using masking of

entries• Good for network processor

flow classification

Page 14: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 14

TCAM LookupTCAM Lookup• Each slot has bit mask• Hardware uses mask to decide which bits to test• Algorithm

for each slot do { if (key & mask ) == (slot & mask)) { declare key matches slot; } else { declare key does not match slot; }}

Page 15: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 15

Partial Matching using Partial Matching using TCAMTCAM

• Key matched slot 1• Packet belonging to flow ID: 00.02• Here “additional information” stored in each slot

Page 16: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 16

Classification using TCAMClassification using TCAM• Flexibility: “additional information” stored in separate memory• Extracting values from fields in headers• Forming values in contiguous string• Using a key for TCAM lookup• Storing classification in slot

Page 17: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 17

CommunicationCommunication• Internal interfaces: channels between processing

elements, memories─ Internal bus─ Hardware FIFO: sequential access─ Transfer register: random access─ Onboard shared memory: shared random access

• External interfaces─ Memory interfaces: accesses to larger off-chip memory─ Direct I/O interfaces: e.g., access to link interfaces─ Bus interfaces: accesses to other devices, e.g., control

CPU─ Switching fabric interface

• Access to switching fabric• Several standards (e.g., CSIX by NP Forum)

Page 18: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Communication Cost Communication Cost ExampleExample

• Consider a second generation network system that forwards IP datagram. If the system has 16 interfaces that each connect to an OC-192 line (data rate is 10 Gbps). These 16 interfaces are interconnected with a shared communication channel. The packet size is in the range of 40 bytes to 1500 bytes. What aggregate bandwidth is needed on the communication channel for the two design scenarios:─ Every bit of a packet transfers through the shared

communication channels.

─ Only a 4-byte packet memory address transfers through the shared communication channels.

Ning Weng ECE 526 18

Page 19: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 19

NP Operating SupportNP Operating Support• Programming model: interrupt, event vs. thread

based• Parallel and concurrent execution support• Dispatch mechanism: how threads are initiated

Page 20: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 20

SummarySummary• NP scaling by

─ Heterogeneous multiprocessors structured hierarchically─ Mixed memory technologies explicitly available to

programmer─ Different communication mechanisms─ Operating support important to achieve high system

performance

• NP scaling limited by─ Physical space: chip area (less than 400 mm2)─ Pin limits and packaging technology─ Power consumption and heat dissipation

Page 21: ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Ning Weng ECE 526 21

For Next Class and For Next Class and ReminderReminder

• Read Comer: chapter 15 and 16• Homework solution on-line by Friday• Midterm: 10/6• Project

─ topic finalized 10/5 (group leader email me)─ proposal presentation 10/22