Network Processors Evolution and Current Trends May...
Transcript of Network Processors Evolution and Current Trends May...
Network Processors Evolution and Current Trends
May 1, 2008
Nazar Zaidi RMI Corporation, USA
INCC - May 1, 2008 2
Network Processors: Evolution & Trends
• Overview of Network Processing• Drivers & Demands for Network Processing• What is a “Network Processor”?• Characteristics of Network Processors• Few Examples & Trends
• Acknowledgements: Primary source of the material is www….– Pretty pictures are courtesy of RMI’s marketing team
INCC - May 1, 2008 3
Enterprise NetworkInternet Access Firewall/Routing Caching & Load-Balancing Access Application Database
Storage
IndividualInformationCustomer
InternetAnd/OrIntranet
Router/ Firewal
lInternetAccess
INCC - May 1, 2008 4
Mobile Communication and Information Convergence
Uu – Between Node B and User Equipmentlub – Between Node B and RNClur – Between RNClu-CS – Between RNC and Circuit Switched
Core Networklu-PS – Between RNC and Packet Switched
Core NetworkGn – Between SGSN and GGSNlur-g – Between BSC and RNC
RNC
Node B
SGSN
Node B IP
MGW MGW PSTN
GGSNPS
CS
PS – Packet Switch
CS – Circuit Switch
MSCBTS
RNC
BSC
GERANUm
Uulub
lur-g
lurGn
lu-PS
lu-CS
Abis
UTRAN
GERAN – GSM Edge Radio Access Network (2G/2.5G)UTRAN – Universal Terrestrial Radio Access Network (3G)BTS – Base Transceiver Station (Base Station in 2G)Node B – Base Station in 3GRNC – Radio Network Controller (3G)BSC – Base Station Controller (2G/2.5G)
A
Gb
CORE Network Packet Switching Domain
CORE Network Circuit Switching Domain
SGSN – Serving GPRS Support NodeGGSN –Gateway GPRS Support NodeMGW – Circuit Switched Media GatewayMSC – Mobile Switching Center
3G Interfaces
Note – MSC and MGW have one to many relationship
INCC - May 1, 2008 5
““Bit ShufflingBit Shuffling”” ““Information ShufflingInformation Shuffling””
Service Delivery Networks
WIRED
WIRELESS
INCC - May 1, 2008 6
Trends
Network Traffic(2x / 12 Months)
Compute Capacity(2x / 18 Months)
User Traffic2x / 12months
Nor
mal
ized
Gro
wth
si
nce
1990
1990 201020001
10000
1000
100
10
100000
Source: Prorating Data from Nick McKeown (Stanford University)
Traffic Growth
INCC - May 1, 2008 7
Qualcomm Oppenheimer May 2007 Qualcomm Jefferies Conference Oct ‘07
• Consumer and Service provider demand, as well as increase in available bandwidth, are driving data traffic growth in the infrastructure
• Processors need to evolve to meet the needs of content traffic growth in the Emerging Infrastructure
Wireless Traffic Growth
INCC - May 1, 2008 8
Today’s Networks
• Continuously growing traffic• Higher rates• Rich set of features
• Security– VPN/IPSec, SSL, Firewalls
• Application Awareness• Deep Packet Inspection• Traffic Engineering – QoS/SLA
etc.
More processing/inspection/decision making needed at much higher rates
GH
z an
d G
bps
GH
z an
d G
bps
19901990 201019951995 20002000 20032003 20052005 20072007
Moore’s Law
Network bandwidth
.01.01
0.10.1
11
1010
100100
1010
4040
100100
11
0.10.1
1.E-03
1.E-02
1.E-01
1.E+00
1.E+01
1.E+02
1.E+03
1.E+0419
96
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Processor Clock Frequency M obile Data Rate
Mobile Data Rate Mobile Data Rate –– MbpsMbps
Processor Frequency Processor Frequency –– MHzMHz
INCC - May 1, 2008 9
Early “Network Processors”
• Interface Message Processor (IMP)– Connected Computers to ARPANET
• Honeywell DDP-516 minicomupter
Wikipedia: Leonard Kleinrock and the first IMP. Taken from http://www.lk.cs.ucla.edu/personal_history.html
INCC - May 1, 2008 10
Data Plane Requirements• Low latency/low packet loss/high throughput• High performance packet processing• High I/O bandwidth• Variety of high-speed I/O interfaces• Low latency/high bandwidth memory access
Processing Requirements
Control Plane Requirements• High performance • High I/O bandwidth• Large memory capacity• Low-latency memory access• Power performance
ControlPath
Data PathIngress Egress
INCC - May 1, 2008 11
Processing Elements used in Networking
• General Purpose Processors– Control Plane Applications– Dataplane (albeit, “lower” performance) applications– Examples
• X86 – Intel, AMD• PowerPC – FreeScale, AMCC• MIPS, ARM – Various Vendors
• General Purpose Processor + Co-Processor– Security Co-Processors, Search Engines, Traffic Managers, TOE..
• ASICs– Forwarding, Policing, Shaping, QoS/Traffic Management……
• Single/Multi-Core System-on-a-chip (SOC)– Traditionally referred to as Network Processor
INCC - May 1, 2008 12
Motivation for Multiple Cores
• A new 64-byte packets arrives every 64ns on a 10G port– Including IPG and Preamble
• With zero overhead for packet reception and transmission functions, a 1Ghz processor provides a 64 cycle budget
Processor(Ghz) Cycles Inst/Cycle (IPC) Instructions3 192 0.75 1443 192 1.25 2405 320 0.75 2405 320 1.25 40010 640 0.75 48010 640 1.25 800
INCC - May 1, 2008 13
Motivation for Multiple Cores
• IPC (Instructions/Cycle) is affected by memory subsystem performance
– DRAM latency does not scale as the processor frequency– Cycles are wasted waiting for memory
• % loss is higher at higher clock frequencies
Trends
Network Traffic(2x / 12 Months)
Compute Capacity(2x / 18 Months)
DRAM Access Time(1.1x / 18 Months)
User Traffic2x / 12months
Nor
mal
ized
Gro
wth
si
nce
1990
1990 201020001
10000
1000
100
10
100000
Source: Prorating Data from Nick McKeown (Stanford University)
INCC - May 1, 2008 14
Frequency is not the answer
• A theoretical 10Ghz processor with a “reasonable” IPC provides an instruction budget in 100s of instructions
• Not sufficient for today’s packet processing functions
• Higher Frequency of Operation for CPU will result in “hot” CPUs
• System Power constraints alone prevent pushing the operating frequency
INCC - May 1, 2008 15
Multiple Cores/Processing Elements
• Packet Processing lends itself very well to parallel processing– Work on multiple packets/flows simultaneously
• N processing elements increase the cycle budget by factor of N– Referring to earlier 10G, 64-byte packet example:
• 1 CPU = 64ns budget• 2 CPU = 128ns budget• 8 CPU = 512ns budget
– Not accounting for other overheads
• Pipelining of functions achieves similar results
INCC - May 1, 2008 16
Multi-threading Improves ThroughputSingle CPU - Single Thread
Single CPU - Four Threads
Performance Improvement
MemoryLatency
Processing
MemoryLatency
MemoryLatency
MemoryLatency
TimeTime
Thread 2Thread 2
Thread 3Thread 3
Thread 4Thread 4
Thread 1Thread 1
Stalled
• Networking Applications benefit significantly from multi-threaded architectures
– Memory latencies are more effectively hidden– Cache-miss penalties are reduced or eliminated– Branch mis-predict penalties avoided
• Better computation density and power performance ratio
INCC - May 1, 2008 17
Network Processor Attributes
• Number of Cores and/or Processing Elements– Homogenous vs. Heterogenous Resources
• Hardware Accelerators and “off-load” capabilities• Types of Interfaces
– Memory, 1G, 10G, SPI, PCI, HT etc.
• Programmability– General Purpose Programming– Changing Standards– New Features– Bug Fixes
• Ease of programming– Very critical for adoption– Code Maintenance
Few Examples
INCC - May 1, 2008 19
Intel – IXP2800
Source: Microprocessor Report
INCC - May 1, 2008 20
IXP - MicroEngine
Source: Microprocessor Report
INCC - May 1, 2008 21
IBM - PowerNP
Source: IBM Journal of Research & Development, Vol47, No.2/3
INCC - May 1, 2008 22
AMCC – nP3700
Source: AMCC Website
INCC - May 1, 2008 23
Agere PayloadPlus
• FPP = Fast Pattern Processor
• RSP = Routing Switch Processor
Source: Hot Chips 2001
INCC - May 1, 2008 24
Motorola - C5
Source: Motorola Website
INCC - May 1, 2008 25
EZChip – NP1/2
Function Type of TOP
Packet parse TOPparse
Lookup and classify TOPsearch
Forwarding and QoS TOPresolve
decisions
Packet modify TOPmodify
Source: EZchip Website
TOP = Task Optimized Processor
INCC - May 1, 2008 26
Cisco - Toaster 3
Source: Microprocessor Report
INCC - May 1, 2008 27
Toaster TMC
Source: Microprocessor Report
INCC - May 1, 2008 28
Xelerated – Xelerator X10q
Source: Microprocessor Report
INCC - May 1, 2008 29
Bay MicroSystem - Chesapeake
Source: CommDesign
INCC - May 1, 2008 30
Broadcom - 1250
Source: Microprocessor Report
INCC - May 1, 2008 31
FreeScale - 8641D
Source: FreeScale Website
INCC - May 1, 2008 32
PASemi - PWRficient PA6T-1682M
Source: Microprocessor Report
INCC - May 1, 2008 33
Broadcom - 1480
Source: Broadcom Website
INCC - May 1, 2008 34
Multiple BRCM 1400 Connectivity
Source: Microprocessor Report
INCC - May 1, 2008 35
Cavium – Octeon Plus
Source: Cavium Network’s website
INCC - May 1, 2008 36
RMI - XLR• Up to 8 MIPS64TM XLRTM Cores
– Up to 1.2GHz Operation– 32 Fine Grain Threads– 32KB/32KB L1 DedicatedPlusTM cache– Unaligned memory access acceleration– 8-banked 2MB L2 Cache, 8 trans/clk– Fully cache-coherent design
• Full speed Interconnects – Fast Messaging Network– Memory Distributed Interconnect– I/O Distributed Interconnect
• Quad 800MHz DDR2 Controllers• Autonomous Security Engines
– Quad Parallel CryptoCores– 10Gbps Bulk Encryption + RSA– Single pass multi-operations– DES/3DES/AES/GCM/ARC4, Kasumi f8– SHA-1/256/384/512, MD5, Kasumi f9
• Dual 10Gbps Ports– SPI-4.2 and Native 10GE interfaces– SPI-4.2 Pass-Though
• Quad 1GE Ports• Native Hypertransport Port• Native QDRSRAM/LA-1 Port
RGMII RGMII RGMII RGMIIXGMIISPI-4.2
Fast Messaging NetworkFast Messaging Network
XGMIISPI-4.2
Packet Distribution EnginesPacket Distribution Engines
HT PCI-X 11 22 33 44
SecuritySecurityEngineEngine
DMAEngine
Eight Banked 2MB LevelEight Banked 2MB Level--2 Cache2 Cache
Mem
ory
and
I/O B
ridge
Mem
ory
Brid
ge
DR
AM
Dua
l-Cha
nnel
s
DR
AMD
ual-C
hann
els
QD
RSR
AM
/LA
1 Q
uad-
Cha
nnel
Gen
eral
Purp
ose
I/O
RRSSAA
Core 1Core 1
11223344
Core 2Core 2
55667788
Core 3Core 3
99101011111212
Core 4Core 4
1313141415151616
Core 5Core 5
1717181819192020
Core 6Core 6
2121222223232424
Core 7Core 7
2525262627272828
Core 8Core 8
2929303031313232
I/O Distributed Interconnect
Memory Distributed Interconnect
32K
B I
32K
B I
32K
B D
32K
B D
32K
B I
32K
B I
32K
B D
32K
B D
32K
B I
32K
B I
32K
B D
32K
B D
32K
B I
32K
B I
32K
B D
32K
B D
32K
B I
32K
B I
32K
B D
32K
B D
32K
B I
32K
B I
32K
B D
32K
B D
32K
B I
32K
B I
32K
B D
32K
B D
32K
B I
32K
B I
32K
B D
32K
B D
INCC - May 1, 2008 37
SUN - Niagara 2• 8 SPARC Cores, 4MB shared L2 cache• 64-bit SPARC V9 instruction set • Each SPC has the following features:
• Supports concurrent execution of 8 threads• 1 load/store, 2 Integer execution units• 1 Floating point and Graphics unit • 8-way, 16 KB I$; 32 Byte line size • 4-way, 8 KB D$; 16 Byte line size • 64-entry fully associative ITLB • 128-entry fully associative DTLB • MMU supports 8K, 64K, 4M, 256M page
sizes; Hardware Table walk • Advanced Cryptographic unit
• Two 10G Ethernet (XAUI) ports on chip• Cryptographic coprocessor – 1 per Core• On-chip PCI-Express, Ethernet, and
FBDIMM memory interfaces are SerDes based
INCC - May 1, 2008 38
Tilera – TILE64
Source: Microprocessor Report
INCC - May 1, 2008 39
FreeScale – MultiCore SOC Roadmap
Source: Microprocessor Report
INCC - May 1, 2008 40
Differentiating NP Attributes
• Processing Elements– Special/Fixed function– General Purpose CPUs– Multi-threading
• Instruction Set– Custom – General Purpose
• MIPS, PowerPC• RISC, VLIW
• Memory Subsytem– Bus, Ring, Mesh– Coherent– Buffer forwarding
• Acceleration Functions
INCC - May 1, 2008 41
ASICsASICs
Prog
ram
mab
ility
Throughput
IXPx28xx
IXP4xx
General PurposeProcessors
Control PlaneProcessors
FPGAs
NPUs
+ Accelerators
Custom IP
XLR/XLSXLR/XLS ProcessorProcessor
RMI
The company and product names shown above may be trademarks or servicemarks of their respective owners.
Network Processors
INCC - May 1, 2008 42
Summary
• General Purposes Processors were the early network processors
• Today’s networks are “intelligent” with very high bandwidth requiring significantly more packet processing
• Network Processors are multi-engine/core, multi-threaded SOCs with varying degree/ease of programmability and hardware acceleration
• Recent NPs are general-purpose coherent multi-processor, multi-threaded SOCs employing general purpose programming techniques