1 ENTS689L: Packet Processing and Switching Anatomy of an IP Router Anatomy of an IP Router Vahid...
-
date post
20-Dec-2015 -
Category
Documents
-
view
223 -
download
2
Transcript of 1 ENTS689L: Packet Processing and Switching Anatomy of an IP Router Anatomy of an IP Router Vahid...
1ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Anatomy of an IP Router
Vahid Tabatabaee
Fall 2007
2ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
References
Title: Network Processors Architectures, Protocols, and PlatformsAuthor: Panos C. LekkasPublisher: McGraw-Hill
James Aweya, “IP Router Architectures: An Overview”, Nortel Networks, Ottawa, Canada
Florian Brodersen, Alexander Klinetschek, “Anatomy of a High Performance IP router”, Communication Network Seminar 2003/04, Hasso-Plattner-Institute, University of Potsdam, Jan. 2004
Steve Kohalmi, Tim Hale, “Anatomy of an IP Service Edge Switch”, 2002 Quary Technologies.
Cisco Systems CRS-1 router documents.
3ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Basic IP Router Components
Network Interfaces
Processing Modules
Buffering Modules
Interconnection Unit (switch fabric)
The processing and buffering modules may be replicated either fully or partially on the network interfaces.
Path computation, Routing Table Maintenance
Packet Forwarding, Packet Processing,May cache routing table
Transfer Packets btw. Ingress and Egress Interface (Line) Cards
4ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Basic Functions of a Router
Route Processing (Routing Protocols OSPF, RIP, …)
Path ComputationRouting Table
MaintenanceReachability Propagation
Packet Forwarding
Slow Path orControl Plane
Fast Path orData Plane
5ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Packet Forwarding
IP Packet Validation Version Number Header length field Check sum.
Dest. IP address parsing and table lookup. Local delivery in the network. Unicast delivery to an output port. Multicast delivery to a set of output ports
Packet Lifetime Control Adjust the time-to-live (TTL) field A packet with positive TTL is delivered to a local address Packet delivered to output ports has its TTL decremented and rechecked before
forwarding Packet Fragmentation
Check if the packet size is larger than MTU of the network If yes, fragment the packet.
6ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
First Generation of Routers
Similar to a typical computer layout.
All functionality is implemented in software.
Single CPU, single Memory, Single Bus!
7ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Problems with first generation routers
Processing speed is limited by the single CPU.
The CPU should process all packets destined to it and those packets that are passing through it.
Major packet processing tasks such as table lookups are memory intensive operations and can not be done faster by simple processor upgrades.
Software implementation is inefficient, since it is a small set of operations repeated on all packets.
Slow path and fast path are implemented on the same CPU. Therefore, slow path can influence the fast path.
The routing table size has grown from 20,000 entries from 1994 to 260,000 entries today.
Moving data from one interface to another can be time consuming that often exceeds the packet processing time. Source http://bgp.potaroo.net/
8ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
The routing table lookup speed can not be improved if we use traditional memories.
The conventional bus structure for the interconnection is very inefficient.
Every packet has to pass the bus at least twice.
The whole packet (not just the header) is transferred.
Problems with first generation routers
9ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
How fast a router should be?
An OC-48 link data rate : 2.488 Gbps
Packet rate is more important than the data rate.
Bottleneck is caused by the minimum packet size which depends on the technology.
E.g. Packet-over-SONET (PoS): 40 byte IP payload + 6 byte PPP/HDLC overhead:
2.405 Gbps /(8 x 46) = 6.53 MPPS The aggregate packet rate for a 16 port system:
16 x 6.53 = 104.48 MPPS One decision every 9.57 nsec.
SDRAM speed is about 10ns from sequential locations and practically around 20-50 ns.
10ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
What is the solution
Take advantage of Parallelism: NIC became more intelligent and took care of most packet
forwarding. We use ASIC in NIC (line cards) for packet classification and
forwarding. Most packets do not go to the CPU card (control card).
Switching Interface: Use switching element to pass packets between line cards
directly and simultaneously.
11ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Modern Switch Based Architecture
Classification and forwarding decisions are done in line cards.
High speed interconnection mechanism (switching) between the line cards.
This provides a fast data path.
Standard CPU (RISC processor) is used for the control plane (slow path).
Hardware and/or software implementation for classification and forwarding in the line card.
12ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Functional Blocks in a Modern Switches
The PHY Interface Responsible for transmitting and receiving information Conversion of the bit stream from digital form to analog signal and vice
versa. Switch Fabric
The router has a bus or a backplane The switch fabric reads packet from input port and routes it to the
output port. Packet processing
Fast path (data path): Handles all operations that are executed in real time on packets (e.g.: framing/parsing, classification, modification, compression/encryption, queueing)
Slow path (control path): Operations executed of the packet flows. (e.g.: add. Resolution, route calculation, update of routing table,…)
Host processing Network management, configuring devices, diagnostics Implemented in software on a CPU
13ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Line cards in Modern Switches
Line card handles packet processing such as:
Classification
Forwarding
Traffic Policing and shaping
Monitoring and Statistics
14ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Data Path Diagram
OpticsCDR &Serdes
Framer/
Mapper
NetworkProcessor
TrafficManager
SwitchInterface
SwitchingElement
SchedulingElement
Egress
Line CardSwitch Card
Ingress
Packet Processing Units
Source: Light Reading Report
15ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Data Path Functions
- Parse- Identify flow- Determine Egress Port- Mark QoS Parameters- Append TM or SF Header
- Police- Manage congestion (WRED)- Queue packets in class-based VOQs- Segment packets into switch cells
- Queues cells in class based VOQs- Flow control TM per class based VOQ - Schedule class based VOQs to egress ports
-Reassemble cells into packets-Shape outgoing traffic-Schedule egress traffic
Network ProcessorNetwork Processor Ingress Traffic ManagerIngress Traffic Manager Switch FabricSwitch Fabric Egress Traffic ManagerEgress Traffic Manager
WRED
Discard
Segmentation + header
TM Scheduler
SF Flow Control
SF Arbiter
Rea
ssem
ble
Egress Scheduler&
ShaperIncoming packets
Class based queueing of outgoing packets
Ingress Line CardIngress Line Card Switch FabricSwitch Fabric Egress Line CardEgress Line Card
16ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Switch Card to Line Card Connection
This connection should pass through the Backplane. Serdes (Serializer-Deserializer) is used for this
connection.Each Serdes signal run over two wires and two pins
(differential mode signal).The speed is usually around 3.125 Gbps.They run some sort of coding (8b/10b encoding)The actual data rate would be around 2.5 Gbps.There are attempts to provide 10 Gbps serdes.
17ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
How many serdes do we need?
How fast should be the connection between switch card and line card?
The line speed is not enough.
Switch fabric throughput is less than 100% due to contention.
Network Processor, Traffic manager and switch fabric add their headers.
There is also cell tax.
18ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Speedup
SwitchInterface
SwitchingElement
SchedulingElement
Line CardSwitch Card
LineCard
Elements
RL RTM RSFF
rag
me
nta
tio
n(C
ell
Ta
x)
Tra
ffic
Ma
na
ge
rH
ea
de
r
Sw
itc
h F
ab
ric
He
ad
er
Effective Speedup = RSF/RTM
In the commercial systems, speedup usually refers to RSF/RL.
Higher speedup factor: Increases system design
complexity. Increases power
consumption. Creates signal integrity
issues. Required Speedup factor is
around 2
19ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Redundancy
We have spare switch cards and control cards in the system.
The redundancy models: Passive redundancy (N:1) We have
one inactive switch card in the system that starts to work after failure.
Passive redundancy (1:1, N:N) for each active switch card, we have one inactive card.
Load-Sharing Redundancy (N-1) all cards are active and when a failure happens, performance will degrade gracefully.
Active Redundancy (1+1): Two sets of fabrics carrying the same traffic.
Source: www.idt.com/content/switchblock.jpg
20ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Example
In a 16 port 10Gbps switch with 2X speed up with and N:N redundancy how many 2.5 Gbps serdes do we need?
We need 20 Gbps active and 20 Gbps redundant data rate for each line card.
This means 16 serdes for each line card.
For 16 line cards we need 256 serdes in this system.
21ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Example
What is the effective speed up of this system for 40 byte IP packets if the traffic manager header size is 12 bytes, switch fabric header size is 8 bytes and the payload size of the cell is 52 bytes.
Solution: In slide 9 example we show that there can be 6.53 MPPS (40 byte packets) on an OC-48 line.
Similarly on an OC-192 there can be up to 9.622/(8x46) = 26.15 MPPS. Each packet is encapsulated in one cell, since
40 + 6 < 52 The maximum number of cells that a line card can generate is
(2.5 x 8 Gbps) / ((52+8+12)x8) = 34.722 Effective Speedup is,
Speedup = (34.722/26.15) = 1.33
22ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Traces Per Serdes
Typical LVDS speed is 1.25Gbps For 2.5Gbps we need 2 channels LVDS is differential, i.e. 2 traces per channel LVDS is unidirectional, i.e. 2 for full duplex Full duplex 2.5Gbps, using LVDS requires 8
traces In the previous example we will have 256 x 8 =
2048 traces on the back plane.
23ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
A sample Router (Cisco CRS-1)
24ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
The line card chassis
8 service cards and 8 physical layer interface module cards
25ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
26ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Physical layer Interface Module
Routing Processor (control plane)
Switching Card
16 slot Single-Shelf system
The distributed route processor (DRP) is optional components that provide enhanced routing capabilities.• The DRP contains two symmetric multiprocessors (SMPs), each of which performs routing functions. • Processor-intensive tasks (such as BGP speakers and ISIS) can be offloaded from the route processors (RPs) to the DRPs.
27ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Multishelf Systems
2 to 72 line card shelves1 to 8 fabric card shelves
28ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
How to handle packet processing?
Of the shelf CPUThis usually would be a RISC processor.In low end systems it could be a CISC processor.
ASICSpecialized high performance ASIC to handle packet
processing.
Ideal approach for companies such as IBM and intel, since they are manufacturers of Integrated Circuits
29ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Off-the-shelf CPU Systems
Packet processing is implemented in software running on the CPU.
Modifications, upgrades and debugging is accomplished by simple software updates and downloads
Update time much shorter which is good for both user and developer
Not very efficient: spending many clock cycles on tasks not related to packet processing.
Fastest off-the-shelf CPU can handle about 1 gigabit per second.
Trend is to do deeper packet processing (more on this later).
30ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Memory Bottleneck
The pipeline architecture of CPU enables them to perform billions of instructions per second.
However, in order to sustain the pipeline they should fetch data from memory and store it back continuously.
This can be done with very sophisticated multi-level hierarchy of different memory technology, interleaving memory banks.
This requires prohibitive cost, design complexity and power consumption.
Hence typical processor pipeline end-up being often empty, which reduces the system throughput.
Network traffic statistics models are completely different from local traffic on a computer bus. They do not have the same spatial and temporal locality properties. Hence, the typical processor’s cache systems are not effective.
31ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Sup-optimal Instruction Set
The instruction set that we need for packet processing requires specific bit level operations.
These instructions should be done at wire speed.
These instructions are not available as standard instructions of off-the-shelf CPU.
Hence, we have to assemble multiple standard instructions to perform the intended functionality.
32ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Packet processing with ASIC
ASIC typically delivers higher performance.
ASIC is not programmable: Adding new functionality new design Adding new protocol new design New design Costly for both vendor and the user.
ASIC design is very time consuming Design cycle takes 12 to 18 months. If we need some modification we may need to recode the
whole design. Many start-up failures are due to time delay.
33ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
ASIC development is costly
Expensive and time-consuming to change. For testing an ASIC you need to design a
system Expensive development tools (design and
verification). Requires ASIC designers (much more
expensive). Tape out of a chip costs around a million dollar.
34ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
So is there a middle ground solution?
Can we have a technology that :
Has flexibility of programmable processors
Has high speed of ASICs
Solution is called Network Processor!
Network processor are programmable similar to CPU, but their performance is close to ASIC
35ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Network Processors value proposition
Shorter time to market: Instead of 18 months it takes about 6 months to complete
development cycle of packet processing part. Longer time in market:
New features can be embedded into a deployed network processor based product.
Increased time in market reduces cost of product ownership over the life of product.
Just-in-time delivery of new features: We can modify the design and adding new features in the
field without penalizing the customer. Greater focus on other issues of business management
Most functions are already coded in a standard way Developers can focus on differentiating features
36ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Packet Processing Stages
1. Remove Link Layer Headers and Decryption: Ethernet PPP (Point-to-Point Protocol) Frame PPP over ATM PPP over Ethernet over ATM
2. Identify Ingress Subscriber: To extract information from the link layer protocol header about the owner of the
packet.
3. Filtering: To permit of deny specific traffic flows, based on various attributes of the IP and
higher layers headers.
4. Traffic Classification: To allow different traffic management, QoS, security and routing policies applied to
different types of flows.
5. Traffic Metering, Marking & Policing: To control Peak and Committed Information Rate. To determine PHB in the DiffServ Model (chnaging the priority)
37ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Packet Processing Stages
6. Custom Routing Polices: To direct some traffic through specific paths (internet, VPN, specific destination) Virtual Private Routed Network allows users to network in privacy over their own
routed network using their own private address. Sending suspicious traffic to explicit locations for special processing.
7. NAT & NAPT (Network Address [Port] Translation): Address translation at the source if the user is using a private address space. Static one-to-one with NAT and dynamic many-to-one with NAPT.
8. Route Table Look-up: Best matching prefix look-up on the destination IP address.
9. Enforcing the PHB/ PerFlow (Link Sharing): Priority, WRR, WFQ scheduling, WRED (weighted random early detection).
10. Egress Side Processing (QoS, filtering, encryption, NAT, Egress Subscriber Identification, Traffic Classification, Link Sharing)
11. Statistical Collection
38ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Deep Packet Processing
In deep packet processing we need to look at the contents of the packet not just the header.
Why do we need deep packet processing? Deep packet inspection for firewalls and intrusion detection
systems. Traffic shape or discard P2P traffic
Server load balancing: distribution of traffic among servers
based on the web destination
Network Monitoring and Analysis
39ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Packet Processing Implementation issues
We need multiple table look-ups for each packet. Access to whole packet not just the IP header is necessary. There can be ten of thousands of simultaneously active
subscribers comprising millions of application flows. In a fully loaded Gigabit Ethernet connection about 1.5 million
packets per second must be processed Modern general processors are optimized for numeric computation
rather than processing packets. Memory read and write speeds become bottlenecks. Caching and high-speed memory burst capability does not help,
since packet processing requires: Large tables Short entries Random access queries
40ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
How do network processors do this?
Specialized circuitry and micro-engines to perform all generic packet processing functions.
They also usually embed a major programmable module, usually a tailor-made RISC CPU (and sometimes more than one). Real time operating system Handshake communication with other parts of the system
41ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
Network Processor Categories
Platform Network Processors objectives: Handle most packet processing functions Minimize the number of components and the hardware cost Optimize the trade-off btw. Performance and flexibility Accelerate software development cycle
Peripheral Network Processors Designed to optimize a specific function Compressor chips IP security
42ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router
The other side argument
Every single task can be done in wire-speed.
How about multi-tasks at the same time.
What is a realistic scenario to consider?
Challenge of Benchmarking
Programming complexity