NetPro.ppt
-
Upload
dominque23 -
Category
Technology
-
view
240 -
download
0
Transcript of NetPro.ppt
Network Processors
Harsh Chilwal
900MHzVoice
1G
900MHz1800MHz
Voice
2G
900-1800-1900MHzSmart Phone
Full web service
3G
900-1800MHzVoice
Tiny Internet
2.5G
12kb/s
170
1000
Data Rate
Evolution : Cellular phone generation
Evolution : 3G cellular phones
base station (BS)
mobile station (MS)
base station controller (BSC)
12Kb/second
5Mb/second
100Mb/second
Network100 MS
10 BS
base station (BS)
mobile station (MS)
base station controller (BSC)
1Mb/second
500Mb/second
50Gbit/second
Network
Evolution : 3G cellular phones
500 MS
100 BS
NP
NPNP
0.1
1
1000
1980 1990 19951985 2000
DS0
Year
Bandwidth (Mb/s)
64K
1.5M
DS110
100 DS344Mb
OC12
622Mb
x24
x28
x12
10Gb
x1610,000OC192
100,000x4
OC768 40Gb
2005
Evolution : Networks
NP
DS= Digital signal OC = Optical carrier
Networking Trends Increasing Networking Traffic. New sophisticated protocols are being
introduced at rapid pace. Need for supporting new applications to provide
new services. Convergence of voice and data networks
introducing a lot of changes in the communication industry.
Increasing TTM Pressures Decreasing product life cycles.
General Purpose Processor based Software Router
Benefits Flexible for upgrading the system Easy for supporting additional interfaces Quick to develop new products with short TTM. The core processor performs all the routing
functionalities Drawbacks Not able to scale up for higher bandwidths, maximum
up to OC-12 speeds only Can support complex network operations viz., traffic
engineering, QoS, etc with a major reduction in performance
ASIC based Routers
» Benefits Provide wire-speed performances provided high speed
» Drawbacks Lacks flexibility; difficult to meet changing market
needs/demands Long design cycles increases TTM reduces PLC. Change in design or failure in design involves more
risks Need to replace the ASIC to provide new
functionality Complex network operation are still executed in
software
Network Processor based boxes
Promises to provide performance and flexibility Comprises of many packet processing elements
supporting multiple threads Achieves higher performance by pipelining and
parallel processing both in terms of threads and packet processing elements
Brings-in flexibility by due software programming
Easy to add features
Network Processor
Basic Architecture of Network Processors
Basic architecture (contd.)
Dis
pat
cher
Mer
ger
CP2CP1 CP3 CP4
Look-A-Side Co-processors
Risc
Com – Engine
MultipleStreams
Intro: Systems and Protocols: Relation with Standards
IETF / Forces WG:Data / Forwarding PlaneControl Plane
NPF:Service Layer
System WideNo awareness where things are
Functional LayerAwareness where things are
Operational LayerInterface Management
ITU-T/ANSI/ATM Forum:
ATM
IEEE Ethernet
IETF/Protocols
IPv4
MPLS
PPP/L2TP
IPv6
MIBs
Protocols Systems
OSI Network Architecture
DATA
Application
Pre.
Session
Transport
Network
Data Link
Physical
7
6
5
4
3
2
1
DATAAH
DATAPH
DATASH
DATATH
DATANH
DATADH
DATAPH
Application
Pre.
Session
Transport
Network
Data Link
Physical
7
6
5
4
3
2
1
Network
Network
AB
Typical Applications WAN/LAN Switching and Routing, Multi-
service Switches, Multi-layer switches, Aggregators
Web caching, Load balancing, Web switching, Content based load balancers
QoS solutions VoIP Gateways 2.5G and 3G wireless infrastructure
equipments Security - Firewall, VPN, Encryption, Access
control Storage solutions Residential Gateways
Software Framework
Scene setting - why specs are not enough
2 NPU vendors want to promote their solution with some ‘numbers’
Both chip architectures comprise– RISC engines– Hardware support engines– Various types of interfaces– Support for internal and external memory
They report the following data– Aggregate MIPS– Max number of lookups per second– ...
Commonalties in building blocks
Commonalties in building blocks
Commonalties in specifications
Commonalties in specifications
Commonalties in Interpretation?
Commonalties in Interpretation?
Specifications
NPU A NPU BAggregate MIPS 1000 6000Lookups/s 50M 100M#Counters 32K 4MSpeedgrade 10Gbps 10GbpsPerformance wirespeed wirespeed
Test scenario What is measured? Performance in
packets per second versus a forwarding information base (FIB) that is increased in size.
Start application is IPv4. Next, counters are added for per flow
billing purposes. Next, load balancing is introduced as an
additional feature. Finally, encryption becomes an additional
requirement for 2% of the data that is being forwarded
Performance curves
1010
2020
3030
5050 100100 150150 FIB(K entries)
FIB(K entries)
Performance(Mpps)
Performance(Mpps)
NPU BNPU B
NPU ANPU A
IPv4IPv4
Performance curves
1010
2020
3030
5050 100100 150150 FIB(K entries)
FIB(K entries)
Performance(Mpps)
Performance(Mpps)
NPU BNPU B
NPU ANPU A
IPv4 + countersIPv4 + counters
Requires more memory references
Requires more memory references
Performance curves
1010
2020
3030
5050 100100 150150 FIB(K entries)
FIB(K entries)
Performance(Mpps)
Performance(Mpps)
NPU BNPU B
NPU ANPU A
IPv4 + counters + Load balancingIPv4 + counters + Load balancing
Requires even more memory references
Requires even more memory references
Performance curves
1010
2020
3030
5050 100100 150150 FIB(K entries)
FIB(K entries)
Performance(Mpps)
Performance(Mpps)
NPU BNPU B
NPU ANPU A
IPv4 + counters + Load balancing + encryptionIPv4 + counters + Load balancing + encryption
No extra references andresources available
No extra references andresources available
A does not have sufficient resourcesA does not have
sufficient resources
Architecture A
Keyextract
Keyextract
LULU
CountCount
Int. memInt. mem
3 MIPScores
3 MIPScores
Int. memInt. mem
Int. memInt. mem
ExternalBuffer Mem
ExternalBuffer Mem
SchedSched
OC
-192 P
OS
OC
-192 P
OS
OC
-192 P
OS
OC
-192 P
OSHashHash
IPv4+ counters
+ LB+ crypto
IPv4+ counters
+ LB+ crypto
Architecture B IPv4
+ counters+ LB
+ crypto
IPv4+ counters
+ LB+ crypto
LBLB10 MIPScores
10 MIPScores
ExternalBuffer Mem
ExternalBuffer Mem
10G
E10G
E
10G
E10G
E
Memory interfaceMemory interface
IMEMIMEM
Specifications - revisited
NPU A NPU BAggregate MIPS 1000 6000Lookups/s 50M 100M#Counters 32K 4MSpeedgrade 10Gbps 10GbpsPerformance wirespeed wirespeed
Lookup width 128-bit 32-bitI/F technology POS EthernetPower 12W 20WCore frequency 300MHz 600MHzCost (USD) 800 1500
So No clear value statement could be made in favor
of either NPU solutions– NPU A achieves higher throughput but with limited
flexibility– NPU B achieves lower throughput but is more flexible
Were the provided specs accurate?– Yes. – The devices performed up to spec.– Although NPU B looks better on paper at first sight,
more resources have to be consumed for less per formant results.
– There is a cost associated with flexibility Were the provided specs relevant?
– No. They represent granular maximum performances.– For ‘real world’ applications,
some resources could not be maximally consumed some resources were over consumed
Benchmarking considerations Processor core metrics are not always
relevant for networking applications– It might be relevant for NPU B, since
functionality relies almost totally on those cores.
– It is definitely not the case for NPU A, since there is extensive additional hardware support for specific functions.
GRANULARITY
Highly granular specifications, data or benchmarking informationcan offer a wrongful picture of the actual performance capabilities
of the DUT. Since Network Processing Devices are designedwith specific applications in mind, benchmarks must exist for
those specific applications
GRANULARITY
Highly granular specifications, data or benchmarking informationcan offer a wrongful picture of the actual performance capabilities
of the DUT. Since Network Processing Devices are designedwith specific applications in mind, benchmarks must exist for
those specific applications
Benchmarking considerations
External factors affect NPD performance (where you don’t always suspect it)– A forwarding application relies on FIB lookups
to determine the destination of a packet– The size of the FIB table can influence
performance in many ways Usage of multiple memory banks increasing number of hash collisions
EXTERNAL FACTORS
Benchmarks should include parameters that take into account externalfactors that are relevant to the particular applications that are being
benchmarked.
EXTERNAL FACTORS
Benchmarks should include parameters that take into account externalfactors that are relevant to the particular applications that are being
benchmarked.
Benchmarking considerations
Interfaces present performance boundary conditions– Ethernet applications require inter frame
gaps that result in more relaxed pps numbers
INTERFACES
Benchmarks should also specify the types of interfaces that are being usedsince those interfaces have an impact all by themselves on maximum
performance figures
INTERFACES
Benchmarks should also specify the types of interfaces that are being usedsince those interfaces have an impact all by themselves on maximum
performance figures
Benchmarking considerations Combinations of applications or minor
extensions have a completely different impact on both network processing devices– NPU A has a lot of well engineered hardware
support that can offer additional services BUT fails almost completely when additional computing resources are required
– NPU B is very ‘soft’; performance degrades slowly when additional services are requested and shows no abrupt peaks in the performance curves.
HEADROOM
Benchmarks should combine applications as they occur in the real worldto give a ‘sense’ of headroom that is available to support real worldscenarios. It is however very hard to define a metric for headroom
HEADROOM
Benchmarks should combine applications as they occur in the real worldto give a ‘sense’ of headroom that is available to support real worldscenarios. It is however very hard to define a metric for headroom
CommBench – A Telecommunication Benchmark For
NPs
CommBench
HPAs PPAs
RTRFRAGDRRTCP
CASTZIPREEDJPEG
Benchmark Characteristics – Code & Computational Kernel Sizes
Benchmark Characteristics – Computational Complexity
Na,l – Num Of Instructions/byte required for app a operationg on a packet of length l
Benchmark Characteristics – Instruction Set Characteristics
Benchmark Characteristics – Memory Hierarchy
Example System: Cisco Toaster 10000
Almost all data plane operations execute on the programmable XMC Pipeline stages are assigned tasks – e.g. classification, routing, firewall,
MPLS– Classic SW load balancing problem
External SDRAM shared by common pipe stages
Example System: IXP 2400 XScale core replaces
StrongARM Microengines
– Faster– More: 2 clusters of 4
microengines each Local memory Next neighbor routes
added between microengines
Hardware to accelerate CRC operations and Random number generation
16 entry CAM
ME0 ME1
ME2ME3
ME4 ME5
ME6ME7
Scratch/Hash/CSR
MSF Unit
DDR DRAM controller
XScaleCore
QDR SRAM controller
PCI
References
Network Processor Design – Patrick Crowley etal. CommBench - A Telecommunications Benchmark for
Network Processors, Tilman Wolf and Mark Franklin. Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS),
http://www.ecs.umass.edu/ece/wolf/papers/commbench.pdf Network Processing Forum - Benchmarking www.wipro.com/pdf_files/networkprocessors_wipro_solPPT
.pdf http://intrage.insatlse.fr/~etienne/netpro.ppt