conf-mmcn-2002
Transcript of conf-mmcn-2002
-
8/6/2019 conf-mmcn-2002
1/19
Efficient Implementation of
Packet Scheduling Algorithm
on Network Processor
Weidong Shi
Xiaotong Zhuang
Indrani PaulKarsten Schwan
-
8/6/2019 conf-mmcn-2002
2/19
Plan
Motivation
DWCS QoS Packet Scheduler
Intel IXP Network Processor
Design Challenges
Hierarchically Indexed Linear
Queue (HILQ) Results
Conclusions
-
8/6/2019 conf-mmcn-2002
3/19
Motivation
Real time media gateway to
support thousands of concurrent media
steams schedule packets at wire-speed, 100mbps
or even 1000mbps
exploit state-of-the-art architectural
features to speed up schedulingthroughput
-
8/6/2019 conf-mmcn-2002
4/19
DWCS dynamic window
constraint scheduling
real time packet scheduler ensures QoS
on per steam basis
limit the number of late packets for
each steam over finite window of
arrivals
per steam loss tolerance constrain, in
a window of y packets, at maximal x
packets can be late or missing
scheduling is feasible when certain
conditions are met
-
8/6/2019 conf-mmcn-2002
5/19
DWCS Scheduler
while TRUE:
Find stream i with highest priority(use a precedence table)
Service packet at head of stream i
Adjust loss-tolerance for i according to some rules.
Deadline(i) = Deadline(i) + Inter-packet gap(i)
For each stream j missing its deadline:
While deadline is missed:
Adjust loss-tolerance for j according to some rules.
Drop head packet of stream j if droppable
Deadline(j) = Deadline(j) + Inter-packet gap(j)
-
8/6/2019 conf-mmcn-2002
6/19
DWCS Packet Ordering
Rules
Precedence among pairs of packets
1. Earliest Deadline First (EDF).
2. Equal deadlines, order lowest window
constraint (x/y) first.
3. Equal deadlines and zero window-constraints,
order highest window-denominator first (y).
4. Equal deadlines and equal non-zero window-
constraints, order lowest window-numerator
first (x).
5. All other cases: First-Come-First-Serve.
-
8/6/2019 conf-mmcn-2002
7/19
Heap Based DWCSHeap Based DWCS
-
8/6/2019 conf-mmcn-2002
8/19
Intel IXP Network
Processor
designed for software
router
multiple RISC cores in a
single chip
simultaneous multi-
threading
shared memory
architecture
packet level parallelism
load/store architecturewith big data transfer
size
-
8/6/2019 conf-mmcn-2002
9/19
Design Challenges
QoS packet scheduler is hardto be parallelized
simultaneous multi-threading
is good for throughput butnot for latency
heap based implementationrequires too many memoryaccesses for per scheduled
packet heap based implementation onIXP shows bad scalability.
receive threads
transmit threads
scheduler
-
8/6/2019 conf-mmcn-2002
10/19
Latency Distribution
of SRAM Access
010203040506070
19 21 23 25 27 29 31 33 35Number of C cles
Percen
tage
of
SRAM
Acces
s
-
8/6/2019 conf-mmcn-2002
11/19
Hierarchically Indexed
Linear Queue
one segment corresponds to a fixed time window
new arrival packet is put to a segment based onits deadline
a transmit thread keeps a pointer to the entry
whose packet should be put on the wire next. sweep through all entries of a segment and
jumps to the next segment when its time comes.
1ms segment with many entries
Transmit pointer
-
8/6/2019 conf-mmcn-2002
12/19
Hierarchically Indexed
Linear Queue
Packets with increasing window-numerator (x)
1 2 3 19 20
12330
x
y
Inside a segment, position of
a packet is determined
according to DWCS rules
Packets with similar loss tolerance x/y
-
8/6/2019 conf-mmcn-2002
13/19
Hierarchically Indexed
Linear Queue
Example: Assume that the maximum possible x is 20,
maximum possible Y is 30 and there are 50 loss
tolerance regions, 2000 cells.
Entry position p = tab(x,y) + x
Each entry stores a pointer to a buffer of
packet pointers
-
8/6/2019 conf-mmcn-2002
14/19
Hierarchically Indexed
Linear Queue
Level 1 vectors
2n=N
Linear queue
Level 2 vectors0
12
3.
.
.
.
01
23.
.
.
.
31
0
1
2
3
.
.
.
.
31
01
23
.
.
.
.
31
speed up packet
search with multi-
level bit map
-
8/6/2019 conf-mmcn-2002
15/19
Hierarchically Indexed
Linear Queue
Transmission Pointer
Level 0segment 0 segment 1 segment 99
-
8/6/2019 conf-mmcn-2002
16/19
Results
memory accesses / per scheduled packet
No. of active
streams 10 50 100 200 500 1000 2000
Heap 45.86 73.73 85.73 97.73 113.59 125.59 137.58Memory
access# Per
Stream
HILQ 19.8 14.36 13.68 13.34 13.135 13.068 13.034
-
8/6/2019 conf-mmcn-2002
17/19
Results
0
500
1000
1500
2000
2500
3000
3500
4000
0 500 1000 1500 2000 2500
No. of Active Streams per Segment in System
Schedu
ling
Delay
per
Strea
m
(microenine
cycles)
scheduling cycle scalability
-
8/6/2019 conf-mmcn-2002
18/19
Results
0200400
6008001000
1200
0 100 200 300 400 500 600Number of Active Streams
Throug
hput
(Mbps
)
512 byte packet-256 byte packet-
throughput scalability
-
8/6/2019 conf-mmcn-2002
19/19
Conclusions
HILQ based DWCS significantly reduces
memory accesses for per scheduled
packet comparing with heap based
implementation.
HILQ is able to service thousands of
steams at high networking speed.
HILQ achieves its performance through
optimizing the scheduler algorithm andexploiting certain architecture
attributes