conf-mmcn-2002

8/6/2019 conf-mmcn-2002

1/19

Efficient Implementation of

Packet Scheduling Algorithm

on Network Processor

Weidong Shi

Xiaotong Zhuang

Indrani PaulKarsten Schwan

8/6/2019 conf-mmcn-2002

2/19

Plan

Motivation

DWCS QoS Packet Scheduler

Intel IXP Network Processor

Design Challenges

Hierarchically Indexed Linear

Queue (HILQ) Results

Conclusions

8/6/2019 conf-mmcn-2002

3/19

Motivation

Real time media gateway to

support thousands of concurrent media

steams schedule packets at wire-speed, 100mbps

or even 1000mbps

exploit state-of-the-art architectural

features to speed up schedulingthroughput

8/6/2019 conf-mmcn-2002

4/19

DWCS dynamic window

constraint scheduling

real time packet scheduler ensures QoS

on per steam basis

limit the number of late packets for

each steam over finite window of

arrivals

per steam loss tolerance constrain, in

a window of y packets, at maximal x

packets can be late or missing

scheduling is feasible when certain

conditions are met

8/6/2019 conf-mmcn-2002

5/19

DWCS Scheduler

while TRUE:

Find stream i with highest priority(use a precedence table)

Service packet at head of stream i

Adjust loss-tolerance for i according to some rules.

Deadline(i) = Deadline(i) + Inter-packet gap(i)

For each stream j missing its deadline:

While deadline is missed:

Adjust loss-tolerance for j according to some rules.

Drop head packet of stream j if droppable

Deadline(j) = Deadline(j) + Inter-packet gap(j)

8/6/2019 conf-mmcn-2002

6/19

DWCS Packet Ordering

Rules

Precedence among pairs of packets

1. Earliest Deadline First (EDF).

2. Equal deadlines, order lowest window

constraint (x/y) first.

3. Equal deadlines and zero window-constraints,

order highest window-denominator first (y).

4. Equal deadlines and equal non-zero window-

constraints, order lowest window-numerator

first (x).

5. All other cases: First-Come-First-Serve.

8/6/2019 conf-mmcn-2002

7/19

Heap Based DWCSHeap Based DWCS

8/6/2019 conf-mmcn-2002

8/19

Intel IXP Network

Processor

designed for software

router

multiple RISC cores in a

single chip

simultaneous multi-

threading

shared memory

architecture

packet level parallelism

load/store architecturewith big data transfer

size

8/6/2019 conf-mmcn-2002

9/19

Design Challenges

QoS packet scheduler is hardto be parallelized

simultaneous multi-threading

is good for throughput butnot for latency

heap based implementationrequires too many memoryaccesses for per scheduled

packet heap based implementation onIXP shows bad scalability.

receive threads

transmit threads

scheduler

8/6/2019 conf-mmcn-2002

10/19

Latency Distribution

of SRAM Access

010203040506070

19 21 23 25 27 29 31 33 35Number of C cles

Percen

tage

of

SRAM

Acces

s

8/6/2019 conf-mmcn-2002

11/19

Hierarchically Indexed

Linear Queue

one segment corresponds to a fixed time window

new arrival packet is put to a segment based onits deadline

a transmit thread keeps a pointer to the entry

whose packet should be put on the wire next. sweep through all entries of a segment and

jumps to the next segment when its time comes.

1ms segment with many entries

Transmit pointer

8/6/2019 conf-mmcn-2002

12/19


Linear Queue

Packets with increasing window-numerator (x)

1 2 3 19 20

12330

x

y

Inside a segment, position of

a packet is determined

according to DWCS rules

Packets with similar loss tolerance x/y

8/6/2019 conf-mmcn-2002

13/19


Linear Queue

Example: Assume that the maximum possible x is 20,

maximum possible Y is 30 and there are 50 loss

tolerance regions, 2000 cells.

Entry position p = tab(x,y) + x

Each entry stores a pointer to a buffer of

packet pointers

8/6/2019 conf-mmcn-2002

14/19


Linear Queue

Level 1 vectors

2n=N

Linear queue

Level 2 vectors0

12

3.

.

.

.

01

23.

.

.

.

31

0

1

2

3

.

.

.

.

31

01

23

.

.

.

.

31

speed up packet

search with multi-

level bit map

8/6/2019 conf-mmcn-2002

15/19


Linear Queue

Transmission Pointer

Level 0segment 0 segment 1 segment 99

8/6/2019 conf-mmcn-2002

16/19

Results

memory accesses / per scheduled packet

No. of active

streams 10 50 100 200 500 1000 2000

Heap 45.86 73.73 85.73 97.73 113.59 125.59 137.58Memory

access# Per

Stream

HILQ 19.8 14.36 13.68 13.34 13.135 13.068 13.034

8/6/2019 conf-mmcn-2002

17/19

Results

0

500

1000

1500

2000

2500

3000

3500

4000

0 500 1000 1500 2000 2500

No. of Active Streams per Segment in System

Schedu

ling

Delay

per

Strea

m

(microenine

cycles)

scheduling cycle scalability

8/6/2019 conf-mmcn-2002

18/19

Results

0200400

6008001000

1200

0 100 200 300 400 500 600Number of Active Streams

Throug

hput

(Mbps

)

512 byte packet-256 byte packet-

throughput scalability

8/6/2019 conf-mmcn-2002

19/19

Conclusions

HILQ based DWCS significantly reduces

memory accesses for per scheduled

packet comparing with heap based

implementation.

HILQ is able to service thousands of

steams at high networking speed.

HILQ achieves its performance through

optimizing the scheduler algorithm andexploiting certain architecture

attributes

conf-mmcn-2002

Documents

Transcript of conf-mmcn-2002