NetPro.ppt

Network Processors

Harsh Chilwal

900MHzVoice

1G

900MHz1800MHz

Voice

2G

900-1800-1900MHzSmart Phone

Full web service

3G

900-1800MHzVoice

Tiny Internet

2.5G

12kb/s

170

1000

Data Rate

Evolution : Cellular phone generation

Evolution : 3G cellular phones

base station (BS)

mobile station (MS)

base station controller (BSC)

12Kb/second

5Mb/second

100Mb/second

Network100 MS

10 BS

base station (BS)

mobile station (MS)

base station controller (BSC)

1Mb/second

500Mb/second

50Gbit/second

Network

Evolution : 3G cellular phones

500 MS

100 BS

NP

NPNP

0.1

1

1000

1980 1990 19951985 2000

DS0

Year

Bandwidth (Mb/s)

64K

1.5M

DS110

100 DS344Mb

OC12

622Mb

x24

x28

x12

10Gb

x1610,000OC192

100,000x4

OC768 40Gb

2005

Evolution : Networks

NP

DS= Digital signal OC = Optical carrier

Networking Trends Increasing Networking Traffic. New sophisticated protocols are being

introduced at rapid pace. Need for supporting new applications to provide

new services. Convergence of voice and data networks

introducing a lot of changes in the communication industry.

Increasing TTM Pressures Decreasing product life cycles.

General Purpose Processor based Software Router

Benefits Flexible for upgrading the system Easy for supporting additional interfaces Quick to develop new products with short TTM. The core processor performs all the routing

functionalities Drawbacks Not able to scale up for higher bandwidths, maximum

up to OC-12 speeds only Can support complex network operations viz., traffic

engineering, QoS, etc with a major reduction in performance

ASIC based Routers

» Benefits Provide wire-speed performances provided high speed

» Drawbacks Lacks flexibility; difficult to meet changing market

needs/demands Long design cycles increases TTM reduces PLC. Change in design or failure in design involves more

risks Need to replace the ASIC to provide new

functionality Complex network operation are still executed in

software

Network Processor based boxes

Promises to provide performance and flexibility Comprises of many packet processing elements

supporting multiple threads Achieves higher performance by pipelining and

parallel processing both in terms of threads and packet processing elements

Brings-in flexibility by due software programming

Easy to add features

Network Processor

Basic Architecture of Network Processors

Basic architecture (contd.)

Dis

pat

cher

Mer

ger

CP2CP1 CP3 CP4

Look-A-Side Co-processors

Risc

Com – Engine

MultipleStreams

Intro: Systems and Protocols: Relation with Standards

IETF / Forces WG:Data / Forwarding PlaneControl Plane

NPF:Service Layer

System WideNo awareness where things are

Functional LayerAwareness where things are

Operational LayerInterface Management

ITU-T/ANSI/ATM Forum:

ATM

IEEE Ethernet

IETF/Protocols

IPv4

MPLS

PPP/L2TP

IPv6

MIBs

Protocols Systems

OSI Network Architecture

DATA

Application

Pre.

Session

Transport

Network

Data Link

Physical

7

6

5

4

3

2

1

DATAAH

DATAPH

DATASH

DATATH

DATANH

DATADH

DATAPH

Application

Pre.

Session

Transport

Network

Data Link

Physical

7

6

5

4

3

2

1

Network

Network

AB

Typical Applications WAN/LAN Switching and Routing, Multi-

service Switches, Multi-layer switches, Aggregators

Web caching, Load balancing, Web switching, Content based load balancers

QoS solutions VoIP Gateways 2.5G and 3G wireless infrastructure

equipments Security - Firewall, VPN, Encryption, Access

control Storage solutions Residential Gateways

Software Framework

Scene setting - why specs are not enough

2 NPU vendors want to promote their solution with some ‘numbers’

Both chip architectures comprise– RISC engines– Hardware support engines– Various types of interfaces– Support for internal and external memory

They report the following data– Aggregate MIPS– Max number of lookups per second– ...

Commonalties in building blocks

Commonalties in building blocks

Commonalties in specifications

Commonalties in specifications

Commonalties in Interpretation?

Commonalties in Interpretation?

Specifications

NPU A NPU BAggregate MIPS 1000 6000Lookups/s 50M 100M#Counters 32K 4MSpeedgrade 10Gbps 10GbpsPerformance wirespeed wirespeed

Test scenario What is measured? Performance in

packets per second versus a forwarding information base (FIB) that is increased in size.

Start application is IPv4. Next, counters are added for per flow

billing purposes. Next, load balancing is introduced as an

additional feature. Finally, encryption becomes an additional

requirement for 2% of the data that is being forwarded

Performance curves

1010

2020

3030

5050 100100 150150 FIB(K entries)

FIB(K entries)

Performance(Mpps)

Performance(Mpps)

NPU BNPU B

NPU ANPU A

IPv4IPv4

Performance curves

1010

2020

3030

5050 100100 150150 FIB(K entries)

FIB(K entries)

Performance(Mpps)

Performance(Mpps)

NPU BNPU B

NPU ANPU A

IPv4 + countersIPv4 + counters

Requires more memory references

Requires more memory references

Performance curves

1010

2020

3030

5050 100100 150150 FIB(K entries)

FIB(K entries)

Performance(Mpps)

Performance(Mpps)

NPU BNPU B

NPU ANPU A

IPv4 + counters + Load balancingIPv4 + counters + Load balancing

Requires even more memory references

Requires even more memory references

Performance curves

1010

2020

3030

5050 100100 150150 FIB(K entries)

FIB(K entries)

Performance(Mpps)

Performance(Mpps)

NPU BNPU B

NPU ANPU A

IPv4 + counters + Load balancing + encryptionIPv4 + counters + Load balancing + encryption

No extra references andresources available

No extra references andresources available

A does not have sufficient resourcesA does not have

sufficient resources

Architecture A

Keyextract

Keyextract

LULU

CountCount

Int. memInt. mem

3 MIPScores

3 MIPScores

Int. memInt. mem

Int. memInt. mem

ExternalBuffer Mem

ExternalBuffer Mem

SchedSched

OC

-192 P

OS

OC

-192 P

OS

OC

-192 P

OS

OC

-192 P

OSHashHash

IPv4+ counters

+ LB+ crypto

IPv4+ counters

+ LB+ crypto

Architecture B IPv4

+ counters+ LB

+ crypto

IPv4+ counters

+ LB+ crypto

LBLB10 MIPScores

10 MIPScores

ExternalBuffer Mem

ExternalBuffer Mem

10G

E10G

E

10G

E10G

E

Memory interfaceMemory interface

IMEMIMEM

Specifications - revisited

NPU A NPU BAggregate MIPS 1000 6000Lookups/s 50M 100M#Counters 32K 4MSpeedgrade 10Gbps 10GbpsPerformance wirespeed wirespeed

Lookup width 128-bit 32-bitI/F technology POS EthernetPower 12W 20WCore frequency 300MHz 600MHzCost (USD) 800 1500

So No clear value statement could be made in favor

of either NPU solutions– NPU A achieves higher throughput but with limited

flexibility– NPU B achieves lower throughput but is more flexible

Were the provided specs accurate?– Yes. – The devices performed up to spec.– Although NPU B looks better on paper at first sight,

more resources have to be consumed for less per formant results.

– There is a cost associated with flexibility Were the provided specs relevant?

– No. They represent granular maximum performances.– For ‘real world’ applications,

some resources could not be maximally consumed some resources were over consumed

Benchmarking considerations Processor core metrics are not always

relevant for networking applications– It might be relevant for NPU B, since

functionality relies almost totally on those cores.

– It is definitely not the case for NPU A, since there is extensive additional hardware support for specific functions.

GRANULARITY

Highly granular specifications, data or benchmarking informationcan offer a wrongful picture of the actual performance capabilities

of the DUT. Since Network Processing Devices are designedwith specific applications in mind, benchmarks must exist for

those specific applications

GRANULARITY

Highly granular specifications, data or benchmarking informationcan offer a wrongful picture of the actual performance capabilities

of the DUT. Since Network Processing Devices are designedwith specific applications in mind, benchmarks must exist for

those specific applications

Benchmarking considerations

External factors affect NPD performance (where you don’t always suspect it)– A forwarding application relies on FIB lookups

to determine the destination of a packet– The size of the FIB table can influence

performance in many ways Usage of multiple memory banks increasing number of hash collisions

EXTERNAL FACTORS

Benchmarks should include parameters that take into account externalfactors that are relevant to the particular applications that are being

benchmarked.

EXTERNAL FACTORS

Benchmarks should include parameters that take into account externalfactors that are relevant to the particular applications that are being

benchmarked.

Benchmarking considerations

Interfaces present performance boundary conditions– Ethernet applications require inter frame

gaps that result in more relaxed pps numbers

INTERFACES

Benchmarks should also specify the types of interfaces that are being usedsince those interfaces have an impact all by themselves on maximum

performance figures

INTERFACES

Benchmarks should also specify the types of interfaces that are being usedsince those interfaces have an impact all by themselves on maximum

performance figures

Benchmarking considerations Combinations of applications or minor

extensions have a completely different impact on both network processing devices– NPU A has a lot of well engineered hardware

support that can offer additional services BUT fails almost completely when additional computing resources are required

– NPU B is very ‘soft’; performance degrades slowly when additional services are requested and shows no abrupt peaks in the performance curves.

HEADROOM

Benchmarks should combine applications as they occur in the real worldto give a ‘sense’ of headroom that is available to support real worldscenarios. It is however very hard to define a metric for headroom

HEADROOM

Benchmarks should combine applications as they occur in the real worldto give a ‘sense’ of headroom that is available to support real worldscenarios. It is however very hard to define a metric for headroom

CommBench – A Telecommunication Benchmark For

NPs

CommBench

HPAs PPAs

RTRFRAGDRRTCP

CASTZIPREEDJPEG

Benchmark Characteristics – Code & Computational Kernel Sizes

Benchmark Characteristics – Computational Complexity

Na,l – Num Of Instructions/byte required for app a operationg on a packet of length l

Benchmark Characteristics – Instruction Set Characteristics

Benchmark Characteristics – Memory Hierarchy

Example System: Cisco Toaster 10000

Almost all data plane operations execute on the programmable XMC Pipeline stages are assigned tasks – e.g. classification, routing, firewall,

MPLS– Classic SW load balancing problem

External SDRAM shared by common pipe stages

Example System: IXP 2400 XScale core replaces

StrongARM Microengines

– Faster– More: 2 clusters of 4

microengines each Local memory Next neighbor routes

added between microengines

Hardware to accelerate CRC operations and Random number generation

16 entry CAM

ME0 ME1

ME2ME3

ME4 ME5

ME6ME7

Scratch/Hash/CSR

MSF Unit

DDR DRAM controller

XScaleCore

QDR SRAM controller

PCI

References

Network Processor Design – Patrick Crowley etal. CommBench - A Telecommunications Benchmark for

Network Processors, Tilman Wolf and Mark Franklin. Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS),

http://www.ecs.umass.edu/ece/wolf/papers/commbench.pdf Network Processing Forum - Benchmarking www.wipro.com/pdf_files/networkprocessors_wipro_solPPT

.pdf http://intrage.insatlse.fr/~etienne/netpro.ppt

http://www.ecs.umass.edu/ece/wolf/papers/commbench.pdf

http://www.npforum.org/pressroom/SA_CDC2002_20020903.ppt

http://www.wipro.com/pdf_files/networkprocessors_wipro_solPPT.pdf













http://intrage.insatlse.fr/~etienne/netpro.ppt













NetPro.ppt

Technology

Transcript of NetPro.ppt