Network Processors

39
Network Processors Harsh Chilwal

description

Network Processors. Harsh Chilwal. Evolution : Cellular phone generation. 1G. 2G. 2.5G. 3G. Data Rate. 1000. 170. 900MHz Voice. 900MHz 1800MHz Voice. 900-1800MHz Voice Tiny Internet. 900-1800-1900MHz Smart Phone Full web service. 12 kb/s. Evolution : 3G cellular phones. - PowerPoint PPT Presentation

Transcript of Network Processors

Page 1: Network Processors

Network Processors

Harsh Chilwal

Page 2: Network Processors

900MHzVoice

1G

900MHz1800MHz

Voice

2G

900-1800-1900MHzSmart Phone

Full web service

3G

900-1800MHzVoice

Tiny Internet

2.5G

12kb/s

170

1000

Data Rate

Evolution : Cellular phone generation

Page 3: Network Processors

Evolution : 3G cellular phones

base station (BS)

mobile station (MS)

base station controller (BSC)

12Kb/second

5Mb/second

100Mb/second

Network100 MS

10 BS

Page 4: Network Processors

base station (BS)

mobile station (MS)

base station controller (BSC)

1Mb/second

500Mb/second

50Gbit/second

Network

Evolution : 3G cellular phones

500 MS

100 BS

NP

NPNP

Page 5: Network Processors

0.1

1

1000

1980 1990 19951985 2000

DS0

Year

Bandwidth (Mb/s)

64K

1.5M

DS110

100 DS344Mb

OC12

622Mb

x24

x28

x12

10Gb

x1610,000OC192

100,000x4

OC768 40Gb

2005

Evolution : Networks

NP

DS= Digital signal OC = Optical carrier

Page 6: Network Processors

Networking Trends Increasing Networking Traffic. New sophisticated protocols are being

introduced at rapid pace. Need for supporting new applications to provide

new services. Convergence of voice and data networks

introducing a lot of changes in the communication industry.

Increasing TTM Pressures Decreasing product life cycles.

Page 7: Network Processors

General Purpose Processor based Software Router

Benefits Flexible for upgrading the system Easy for supporting additional interfaces Quick to develop new products with short TTM. The core processor performs all the routing

functionalities Drawbacks Not able to scale up for higher bandwidths, maximum

up to OC-12 speeds only Can support complex network operations viz., traffic

engineering, QoS, etc with a major reduction in performance

Page 8: Network Processors

ASIC based Routers

» Benefits Provide wire-speed performances provided high speed

» Drawbacks Lacks flexibility; difficult to meet changing market

needs/demands Long design cycles increases TTM reduces PLC. Change in design or failure in design involves more

risks Need to replace the ASIC to provide new

functionality Complex network operation are still executed in

software

Page 9: Network Processors

Network Processor based boxes

Promises to provide performance and flexibility Comprises of many packet processing elements

supporting multiple threads Achieves higher performance by pipelining and

parallel processing both in terms of threads and packet processing elements

Brings-in flexibility by due software programming

Easy to add features

Page 10: Network Processors

Network Processor

Page 11: Network Processors

Basic Architecture of Network Processors

Page 12: Network Processors

Basic architecture (contd.)

Dis

pat

cher

Mer

ger

CP2CP1 CP3 CP4

Look-A-Side Co-processors

Risc

Com – Engine

MultipleStreams

Page 13: Network Processors

Intro: Systems and Protocols: Relation with Standards

IETF / Forces WG:Data / Forwarding PlaneControl Plane

NPF:Service Layer

System WideNo awareness where things are

Functional LayerAwareness where things are

Operational LayerInterface Management

ITU-T/ANSI/ATM Forum:

ATM

IEEE Ethernet

IETF/Protocols

IPv4

MPLS

PPP/L2TP

IPv6

MIBs

Protocols Systems

Page 14: Network Processors

OSI Network Architecture

DATA

Application

Pre.

Session

Transport

Network

Data Link

Physical

7

6

5

4

3

2

1

DATAAH

DATAPH

DATASH

DATATH

DATANH

DATADH

DATAPH

Application

Pre.

Session

Transport

Network

Data Link

Physical

7

6

5

4

3

2

1

Network

Network

AB

Page 15: Network Processors

Typical Applications WAN/LAN Switching and Routing, Multi-

service Switches, Multi-layer switches, Aggregators

Web caching, Load balancing, Web switching, Content based load balancers

QoS solutions VoIP Gateways 2.5G and 3G wireless infrastructure

equipments Security - Firewall, VPN, Encryption, Access

control Storage solutions Residential Gateways

Page 16: Network Processors

Software Framework

Page 17: Network Processors

Scene setting - why specs are not enough

2 NPU vendors want to promote their solution with some ‘numbers’

Both chip architectures comprise– RISC engines– Hardware support engines– Various types of interfaces– Support for internal and external memory

They report the following data– Aggregate MIPS– Max number of lookups per second– ...

Commonalties in building blocks

Commonalties in building blocks

Commonalties in specifications

Commonalties in specifications

Commonalties in Interpretation?

Commonalties in Interpretation?

Page 18: Network Processors

Specifications

NPU A NPU BAggregate MIPS 1000 6000Lookups/s 50M 100M#Counters 32K 4MSpeedgrade 10Gbps 10GbpsPerformance wirespeed wirespeed

Page 19: Network Processors

Test scenario What is measured? Performance in

packets per second versus a forwarding information base (FIB) that is increased in size.

Start application is IPv4. Next, counters are added for per flow

billing purposes. Next, load balancing is introduced as an

additional feature. Finally, encryption becomes an additional

requirement for 2% of the data that is being forwarded

Page 20: Network Processors

Performance curves

1010

2020

3030

5050 100100 150150 FIB(K entries)

FIB(K entries)

Performance(Mpps)

Performance(Mpps)

NPU BNPU B

NPU ANPU A

IPv4IPv4

Page 21: Network Processors

Performance curves

1010

2020

3030

5050 100100 150150 FIB(K entries)

FIB(K entries)

Performance(Mpps)

Performance(Mpps)

NPU BNPU B

NPU ANPU A

IPv4 + countersIPv4 + counters

Requires more memory references

Requires more memory references

Page 22: Network Processors

Performance curves

1010

2020

3030

5050 100100 150150 FIB(K entries)

FIB(K entries)

Performance(Mpps)

Performance(Mpps)

NPU BNPU B

NPU ANPU A

IPv4 + counters + Load balancingIPv4 + counters + Load balancing

Requires even more memory references

Requires even more memory references

Page 23: Network Processors

Performance curves

1010

2020

3030

5050 100100 150150 FIB(K entries)

FIB(K entries)

Performance(Mpps)

Performance(Mpps)

NPU BNPU B

NPU ANPU A

IPv4 + counters + Load balancing + encryptionIPv4 + counters + Load balancing + encryption

No extra references andresources available

No extra references andresources available

A does not have sufficient resourcesA does not have

sufficient resources

Page 24: Network Processors

Architecture A

Keyextract

Keyextract

LULU

CountCount

Int. memInt. mem

3 MIPScores

3 MIPScores

Int. memInt. mem

Int. memInt. mem

ExternalBuffer Mem

ExternalBuffer Mem

SchedSched

OC

-192 P

OS

OC

-192 P

OS

OC

-192 P

OS

OC

-192 P

OSHashHash

IPv4+ counters

+ LB+ crypto

IPv4+ counters

+ LB+ crypto

Page 25: Network Processors

Architecture B IPv4

+ counters+ LB

+ crypto

IPv4+ counters

+ LB+ crypto

LBLB10 MIPScores

10 MIPScores

ExternalBuffer Mem

ExternalBuffer Mem

10G

E10G

E

10G

E10G

E

Memory interfaceMemory interface

IMEMIMEM

Page 26: Network Processors

Specifications - revisited

NPU A NPU BAggregate MIPS 1000 6000Lookups/s 50M 100M#Counters 32K 4MSpeedgrade 10Gbps 10GbpsPerformance wirespeed wirespeed

Lookup width 128-bit 32-bitI/F technology POS EthernetPower 12W 20WCore frequency 300MHz 600MHzCost (USD) 800 1500

Page 27: Network Processors

So No clear value statement could be made in favor

of either NPU solutions– NPU A achieves higher throughput but with limited

flexibility– NPU B achieves lower throughput but is more flexible

Were the provided specs accurate?– Yes. – The devices performed up to spec.– Although NPU B looks better on paper at first sight,

more resources have to be consumed for less per formant results.

– There is a cost associated with flexibility Were the provided specs relevant?

– No. They represent granular maximum performances.– For ‘real world’ applications,

some resources could not be maximally consumed some resources were over consumed

Page 28: Network Processors

Benchmarking considerations Processor core metrics are not always

relevant for networking applications– It might be relevant for NPU B, since

functionality relies almost totally on those cores.

– It is definitely not the case for NPU A, since there is extensive additional hardware support for specific functions.

GRANULARITY

Highly granular specifications, data or benchmarking informationcan offer a wrongful picture of the actual performance capabilities

of the DUT. Since Network Processing Devices are designedwith specific applications in mind, benchmarks must exist for

those specific applications

GRANULARITY

Highly granular specifications, data or benchmarking informationcan offer a wrongful picture of the actual performance capabilities

of the DUT. Since Network Processing Devices are designedwith specific applications in mind, benchmarks must exist for

those specific applications

Page 29: Network Processors

Benchmarking considerations

External factors affect NPD performance (where you don’t always suspect it)– A forwarding application relies on FIB lookups

to determine the destination of a packet– The size of the FIB table can influence

performance in many ways Usage of multiple memory banks increasing number of hash collisions

EXTERNAL FACTORS

Benchmarks should include parameters that take into account externalfactors that are relevant to the particular applications that are being

benchmarked.

EXTERNAL FACTORS

Benchmarks should include parameters that take into account externalfactors that are relevant to the particular applications that are being

benchmarked.

Page 30: Network Processors

Benchmarking considerations

Interfaces present performance boundary conditions– Ethernet applications require inter frame

gaps that result in more relaxed pps numbers

INTERFACES

Benchmarks should also specify the types of interfaces that are being usedsince those interfaces have an impact all by themselves on maximum

performance figures

INTERFACES

Benchmarks should also specify the types of interfaces that are being usedsince those interfaces have an impact all by themselves on maximum

performance figures

Page 31: Network Processors

Benchmarking considerations Combinations of applications or minor

extensions have a completely different impact on both network processing devices– NPU A has a lot of well engineered hardware

support that can offer additional services BUT fails almost completely when additional computing resources are required

– NPU B is very ‘soft’; performance degrades slowly when additional services are requested and shows no abrupt peaks in the performance curves.

HEADROOM

Benchmarks should combine applications as they occur in the real worldto give a ‘sense’ of headroom that is available to support real worldscenarios. It is however very hard to define a metric for headroom

HEADROOM

Benchmarks should combine applications as they occur in the real worldto give a ‘sense’ of headroom that is available to support real worldscenarios. It is however very hard to define a metric for headroom

Page 32: Network Processors

CommBench – A Telecommunication Benchmark For

NPs

CommBench

HPAs PPAs

RTRFRAGDRRTCP

CASTZIPREEDJPEG

Page 33: Network Processors

Benchmark Characteristics – Code & Computational Kernel Sizes

Page 34: Network Processors

Benchmark Characteristics – Computational Complexity

Na,l – Num Of Instructions/byte required for app a operationg on a packet of length l

Page 35: Network Processors

Benchmark Characteristics – Instruction Set Characteristics

Page 36: Network Processors

Benchmark Characteristics – Memory Hierarchy

Page 37: Network Processors

Example System: Cisco Toaster 10000

Almost all data plane operations execute on the programmable XMC Pipeline stages are assigned tasks – e.g. classification, routing, firewall,

MPLS– Classic SW load balancing problem

External SDRAM shared by common pipe stages

Page 38: Network Processors

Example System: IXP 2400 XScale core replaces

StrongARM Microengines

– Faster– More: 2 clusters of 4

microengines each Local memory Next neighbor routes

added between microengines

Hardware to accelerate CRC operations and Random number generation

16 entry CAM

ME0 ME1

ME2ME3

ME4 ME5

ME6ME7

Scratch/Hash/CSR

MSF Unit

DDR DRAM controller

XScaleCore

QDR SRAM controller

PCI

Page 39: Network Processors

References

Network Processor Design – Patrick Crowley etal. CommBench - A Telecommunications Benchmark for

Network Processors, Tilman Wolf and Mark Franklin. Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS),

http://www.ecs.umass.edu/ece/wolf/papers/commbench.pdf Network Processing Forum - Benchmarking www.wipro.com/pdf_files/networkprocessors_wipro_solPPT

.pdf http://intrage.insatlse.fr/~etienne/netpro.ppt