High Speed Router Design - ecse.rpi.edu

54
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute [email protected] http://www.ecse.rpi.edu/Homepages/shivkuma Based in part on slides of Nick McKeown (Stanford), S. Keshav (Cornell)

Transcript of High Speed Router Design - ecse.rpi.edu

Page 1: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

1

High Speed Router Design

Shivkumar KalyanaramanRensselaer Polytechnic Institute

[email protected] http://www.ecse.rpi.edu/Homepages/shivkuma

Based in part on slides of Nick McKeown (Stanford), S. Keshav (Cornell)

Page 2: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

2

Overview

IntroductionEvolution of High-Speed RoutersHigh Speed Router Components:

Lookup AlgorithmClassificationSwitching

Page 3: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

3

What do switches/routers look like?

Access routerse.g. ISDN,ADSL

Core routere.g. OC48c POS

Core ATM switch

Page 4: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

4

Basic Architectural Components

PolicingOutput

SchedulingSwitching

Routing

CongestionControl

ReservationAdmissionControl

Control

Datapath:per-packet processing

Page 5: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

5

Basic Architectural Components:Forwarding Decision

ForwardingDecision

ForwardingDecision

ForwardingDecision

ForwardingTable

ForwardingTable

Interconnect

OutputScheduling

1.

2.

3.

ForwardingTable

Page 6: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

6

Per-packet processing in an IP Router

1. Accept packet arriving on an incoming link.2. Lookup packet destination address in the

forwarding table, to identify outgoing port(s).3. Manipulate packet header: e.g., decrement

TTL, update header checksum.4. Send (switch) packet to the outgoing port(s).5. Classify and buffer packet in the queue.6. Transmit packet onto outgoing link.

Page 7: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

7

Lookup and Forwarding Engine

headerpayload

Router

Destination Address

Dest-network PortForwarding Table

Routing Lookup Data Structure

65.0.0.0/8128.9.0.0/16

149.12.0.0/19

31

7

Packet

Outgoing Port

Page 8: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

8

Example Forwarding Table

7142.12.0.0/19

1128.9.0.0/16

365.0.0.0/ 8

Outgoing PortDestination IP Prefix

Prefix length

IP prefix: 0-32 bits

142.12.0.0/19

0 224

65.255.255.255128.9.16.14

128.9.0.0/1665.0.0.0/8

232-165.0.0.0

Page 9: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

9

Prefixes can Overlap

128.9.16.0/21 128.9.172.0/21

Longest matching prefix 128.9.176.0/24

Routing lookup: Find the longest matching prefix (aka the most specific route) among all prefixes that match the destination address.

0 232-1

128.9.0.0/16142.12.0.0/1965.0.0.0/8

128.9.16.14

Page 10: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

10

Difficulty of Longest Prefix Match

8

24

Prefix Values

Pref

ix

128.9.0.0/16 142.12.0.0/1965.0.0.0/8

128.9.16.14

128.9.172.0/21128.9.176.0/24

128.9.16.0/21

2-dimensional search: - Prefix Length- Prefix Value

32

Leng

th

Page 11: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

11

Lookup Rates Required

12540.0OC768c2002-03

31.2510.0OC192c2000-01

7.812.5OC48c1999-00

1.940.622OC12c1998-99

40B packets (Mpps)

Line-rate (Gbps)

LineYear

DRAM: 50-80 ns, SRAM: 5-10 ns

31.25 Mpps ⇒ 33 ns

Page 12: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

12

Update Rates RequiredRecent BGP studies show that updates can be:

Bursty: several 100s of routes updated/withdrawn => insert/delete operationsFrequent: Average 100+ updates per second

Need data structure to be efficient in terms of lookup as well as update (insert/delete) operations.

Page 13: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

13

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

Size of the Forwarding Table

95 96 97 98 99 00Year

Num

ber

of P

refi

xes

10,000/year

Renewed ExponentialGrowth

Renewed growth due to multi-homing of enterprise networks!

Source: http://www.telstra.net/ops/bgptable.html

Page 14: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

14

Global routing table vs Moore's law since 1999Potential Hyper-Exponential Growth!

50000

90000

100000

110000

01/99

Pref

ixes

Global prefixesMoore's law

Double growth

160000

150000

140000

130000

120000

80000

70000

60000

04/99 07/99 10/99 01/00 04/00 07/00 10/00 01/01 04/01

Page 15: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

15

Trees and Tries

Binary Search Tree

< >

< > < >

log2 N

N entries

Binary Search Trie

0 1

0 1 0 1

111010

Page 16: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

16

Trees and TriesMultiway tries

16-ary Search Trie

0000, ptr 1111, ptr

0000, 0 1111, ptr

000011110000

0000, 0 1111, ptr

111111111111

Page 17: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

17

Lookup: Multiway TriesTradeoffs

Degree ofTree

# MemReferences

# Nodes(x106)

Total Memory(Mbytes)

FractionWasted (%)

2 48 1.09 4.3 494 24 0.53 4.3 738 16 0.35 5.6 8616 12 0.25 8.3 9364 8 0.17 21 98256 6 0.12 64 99.5

Table produced from 215 randomly generated 48-bit addresses

Page 18: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

18

Routing Lookups in Hardware

Prefix length

Num

ber

Most prefixes are 24-bits or shorter

Page 19: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

19

Routing Lookups in Hardware14

2.19

.6.1

4

Prefixes up to 24-bits14

2.19

.614

1 Next Hop

24

Next Hop

142.19.6

224 = 16M entries

Page 20: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

20

Routing Lookups in Hardware12

8.3.

72.4

4

Prefixes up to 24-bits

128.

3.72

44

1 Next Hop

128.3.72

24 0 Pointer

8

Prefixes above 24-bits

Next Hop

Next Hop

Next Hop

offs

etba

se

Page 21: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

21

1.

Basic Architectural Components: Interconnect

ForwardingDecision

ForwardingTable

ForwardingDecision

ForwardingTable

ForwardingDecision

ForwardingTable Interconnect

2.Output

Scheduling

3.

Page 22: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

22

Most Ethernet switches and cheap packet routersBottleneck can be CPU, host-adaptor or I/O busWhat is costly? Bus ? Memory? Interface? CPU?

Shared Backplane

Line Interface

CPUMemory

CPU BufferMemory

LineInterface

DMA

MAC

LineInterface

DMA

MAC

LineInterface

DMA

MAC

First-Generation IP Routers

Page 23: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

23

Second-Generation IP RoutersCPU Buffer

Memory

LineCard

DMA

MAC

LocalLocalBufferBuffer

MemoryMemory

LineCard

DMA

MAC

LocalLocalBufferBuffer

MemoryMemory

LineCard

DMA

MAC

LocalLocalBufferBuffer

MemoryMemory

Port mapping intelligence in line cardsHigher hit rate in local lookup cacheWhat is costly? Bus ? Memory? Interface? CPU?

Page 24: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

24

Third-Generation Switches/Routers

LineCard

MAC

LocalBuffer

Memory

CPUCard

LineCard

MAC

LocalBuffer

Memory

Switched Backplane

Line Interface

Line Interface

Line Interface

Line Interface

Line Interface

Line Interface

Line Interface

Line InterfaceCPU

Memory

Third generation switch provides parallel paths (fabric)What’s costly? Bus? Memory, CPU?

Page 25: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

25

Fourth-Generation Switches/RoutersClustering and Multistage

1 2 3 4 5 6 7 8 9 10 111213 1415 16

17 1819 20 21 22 23 2425 26 2728 29 30 31 32

13 14 15 16 17 18

19 20 21 22 23 24

25 26 27 28 29 30

31 32 21

1 2 3 4 5 6

7 8 9 10 11 12

Page 26: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

26

Circuit switchA switch that can handle N calls has N logical inputs and N logical outputs

N up to 200,000Moves 8-bit samples from an input to an output port

Recall that samples have no headersDestination of sample depends on time at which it arrives at the switch

In practice, input trunks are multiplexedMultiplexed trunks carry frames = set of samples

Goal: extract samples from frame, and depending on position in frame, switch to output

each incoming sample has to get to the right output line and the right slot in the output frame

Page 27: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

27

Call blockingCan’t find a path from input to outputInternal blocking

slot in output frame exists, but no pathOutput blocking

no slot in output frame is availableOutput blocking is reduced in transit switches

need to put a sample in one of several slots going to the desired next hop

Page 28: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

28

Multiplexors and demultiplexorsMost trunks time division multiplex voice samplesAt a central office, trunk is demultiplexed and distributed to active circuitsSynchronous multiplexor

N input linesOutput runs N times as fast as input

123

N

MUX

123De-

MUX …1 2 3 … NN

Page 29: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

29

Switching: what does a switch do?Transfers data from an input to an output

many ports (density), high speeds Eg: Crossbar

Page 30: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

30

Time division switchingKey idea: when de-multiplexing, position in frame determines output trunkTime division switching interchanges sample position within a frame: time slot interchange (TSI)

Page 31: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

31

Space division switchingEach sample takes a different path through the switch, depending on its destination

Page 32: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

32

CrossbarSimplest possible space-division switchCrosspoints can be turned on or off, long enough to transfer a packet from an input to an outputInternally nonblocking

but need N2 crosspointstime to set each crosspoint grows quadratically

Page 33: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

33

Multistage crossbarIn a crossbar during each switching time only one cross-point per row or column is activeCan save crosspoints if a cross-point can attach to more than one input line (why?)This is done in a multistage crossbarNeed to rearrange connections every switching time

Page 34: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

34

Multistage crossbarCan suffer internal blocking

unless sufficient number of second-level stages

Number of crosspoints < N2

Finding a path from input to output requires a depth-first-searchScales better than crossbar, but still not too well

120,000 call switch needs ~250 million crosspoints

Page 35: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

35

Packet switchesIn a circuit switch, path of a sample is determined at time of connection establishmentNo need for a sample header--position in frame usedIn a packet switch, packets carry a destination field or label

Need to look up destination port on-the-flyDatagram switches

lookup based on entire destination address (longest-prefix match)

Cell or Label-switcheslookup based on VCI or Labels

Page 36: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

36

Blocking in packet switchesCan have both internal and output blockingInternal

no path to outputOutput

trunk unavailableUnlike a circuit switch, cannot predict if packets will block (why?)If packet is blocked => must either buffer or drop

Page 37: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

37

Dealing with blocking in packet switches

Over-provisioninginternal links much faster than inputs

Buffersat input or output

Backpressureif switch fabric doesn’t have buffers, prevent packet from entering until path is available

Parallel switch fabricsincreases effective switching capacity

Page 38: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

38

Switch Fabrics: Buffered crossbarWhat happens if packets at two inputs both want to go to same output?Can defer one at an input bufferOr, buffer cross-points: complex arbiter

Page 39: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

39

Switch fabric elementGoal: towards building “self-routing” fabricsCan build complicated fabrics from a simple element

Routing rule: if 0, send packet to upper output, else to lower output

If both packets to same output, buffer or drop

Page 40: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

40

BanyanSimplest self-routing recursive fabric

What if two packets both want to go to the same output?

output blocking

Page 41: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

41

Blocking in Banyan S/ws: SortingCan avoid blocking by choosing order in which packets appear at input portsIf we can

present packets at inputs sorted by outputremove duplicates remove gapsprecede banyan with a perfect shuffle stagethen no internal blocking

For example: [X, 010, 010, X, 011, X, X, X]:Sort => [010, 011, 011, X, X, X, X, X]Remove dups => [010, 011, X, X, X, X, X, X]Shuffle => [010, X, 011, X, X, X, X, X]Need sort, shuffle, and trap networks

Page 42: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

42

Sorting using MergingBuild sorters from merge networksAssume we can merge two sorted listsSort pairwise, merge, recurse

Page 43: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

43

Non-Blocking Batcher-Banyan

3

7

5

2

6

0

1

4

7

2

3

5

6

1

0

4

7

5

2

3

1

0

6

4

7

0

5

1

3

4

2

6

7

4

5

6

0

3

1

2

7

6

4

5

3

2

0

2

7

6

5

4

3

2

1

0

000001010011100101110111

Batcher Sorter Self-Routing Network

• Fabric can be used as scheduler. •Batcher-Banyan network is blocking for multicast.

Page 44: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

44

1.

Basic Architectural Components: Queuing, Classification

ForwardingDecision

ForwardingTable

ForwardingDecision

ForwardingTable

ForwardingDecision

ForwardingTable

OutputScheduling

3.

Interconnect2.

Page 45: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

45

Queuing:Two basic techniques

Input Queueing Output Queueing

Usually a non-blockingswitch fabric (e.g. crossbar)

Usually a fast bus

Page 46: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

46

Queuing:Output Queueing

Individual Output Queues Centralized Shared Memory

Memory b/w = (N+1).R

1

2

N

Memory b/w = 2N.R

1

2

N

Page 47: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

47

Input QueueingHead of Line Blocking

Del

ay

Load58.6% 100%

Page 48: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

48

Solution: Input Queueing w/Virtual output queues (VOQ)

Page 49: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

49

Input QueuesVirtual Output Queues

Del

ay

Load100%

Page 50: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

50

Packet Classification

Action

--------

---- ----

--------

Predicate ActionClassifier (Policy Database)

Packet Classification

Forwarding Engine

Incoming Packet

HEADER

Page 51: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

51

Multi-field Packet Classification

Field 1 Field 2 … Field k A ction

Rule 1 152.163.190.69/21 152.163.80.11/32 … UDP A1

Rule 2 152.168.3.0/24 152.163.0.0/16 … TCP A2

… … … … … …

Rule N 152.168.0.0/16 152.0.0.0/8 … ANY An

Given a classifier with N rules, find the action associated with the highest priority rule matching an incoming packet.

Page 52: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

52

Prefix matching: 1-d range problem

0 232-1

128.9/16

128.9.16.14

128.9.16/20 128.9.176/20

128.9.19/24

128.9.25/24

Most specific route = “longest matching prefix”

Page 53: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

53

Classification: 2D Geometry problemField #1 Field #2 Data

R5 R4

R3

R2R1

R7

P2

Fiel

d #2

R6

P1

e.g. (128.16.46.23, *)e.g. (144.24/16, 64/24)

Field #1

Page 54: High Speed Router Design - ecse.rpi.edu

Shivkumar KalyanaramanRensselaer Polytechnic Institute

54

Summary

High speed routers: lookup, switching, classification, buffer managementLookup: Range-matching, tries, multi-way triesSwitching: circuit s/w, crossbar, batcher-banyan, Queuing: input/output queuing issuesClassification: Multi-dimensional geometry problem