conf-mmcn-2002

download conf-mmcn-2002

of 19

Transcript of conf-mmcn-2002

  • 8/6/2019 conf-mmcn-2002

    1/19

    Efficient Implementation of

    Packet Scheduling Algorithm

    on Network Processor

    Weidong Shi

    Xiaotong Zhuang

    Indrani PaulKarsten Schwan

  • 8/6/2019 conf-mmcn-2002

    2/19

    Plan

    Motivation

    DWCS QoS Packet Scheduler

    Intel IXP Network Processor

    Design Challenges

    Hierarchically Indexed Linear

    Queue (HILQ) Results

    Conclusions

  • 8/6/2019 conf-mmcn-2002

    3/19

    Motivation

    Real time media gateway to

    support thousands of concurrent media

    steams schedule packets at wire-speed, 100mbps

    or even 1000mbps

    exploit state-of-the-art architectural

    features to speed up schedulingthroughput

  • 8/6/2019 conf-mmcn-2002

    4/19

    DWCS dynamic window

    constraint scheduling

    real time packet scheduler ensures QoS

    on per steam basis

    limit the number of late packets for

    each steam over finite window of

    arrivals

    per steam loss tolerance constrain, in

    a window of y packets, at maximal x

    packets can be late or missing

    scheduling is feasible when certain

    conditions are met

  • 8/6/2019 conf-mmcn-2002

    5/19

    DWCS Scheduler

    while TRUE:

    Find stream i with highest priority(use a precedence table)

    Service packet at head of stream i

    Adjust loss-tolerance for i according to some rules.

    Deadline(i) = Deadline(i) + Inter-packet gap(i)

    For each stream j missing its deadline:

    While deadline is missed:

    Adjust loss-tolerance for j according to some rules.

    Drop head packet of stream j if droppable

    Deadline(j) = Deadline(j) + Inter-packet gap(j)

  • 8/6/2019 conf-mmcn-2002

    6/19

    DWCS Packet Ordering

    Rules

    Precedence among pairs of packets

    1. Earliest Deadline First (EDF).

    2. Equal deadlines, order lowest window

    constraint (x/y) first.

    3. Equal deadlines and zero window-constraints,

    order highest window-denominator first (y).

    4. Equal deadlines and equal non-zero window-

    constraints, order lowest window-numerator

    first (x).

    5. All other cases: First-Come-First-Serve.

  • 8/6/2019 conf-mmcn-2002

    7/19

    Heap Based DWCSHeap Based DWCS

  • 8/6/2019 conf-mmcn-2002

    8/19

    Intel IXP Network

    Processor

    designed for software

    router

    multiple RISC cores in a

    single chip

    simultaneous multi-

    threading

    shared memory

    architecture

    packet level parallelism

    load/store architecturewith big data transfer

    size

  • 8/6/2019 conf-mmcn-2002

    9/19

    Design Challenges

    QoS packet scheduler is hardto be parallelized

    simultaneous multi-threading

    is good for throughput butnot for latency

    heap based implementationrequires too many memoryaccesses for per scheduled

    packet heap based implementation onIXP shows bad scalability.

    receive threads

    transmit threads

    scheduler

  • 8/6/2019 conf-mmcn-2002

    10/19

    Latency Distribution

    of SRAM Access

    010203040506070

    19 21 23 25 27 29 31 33 35Number of C cles

    Percen

    tage

    of

    SRAM

    Acces

    s

  • 8/6/2019 conf-mmcn-2002

    11/19

    Hierarchically Indexed

    Linear Queue

    one segment corresponds to a fixed time window

    new arrival packet is put to a segment based onits deadline

    a transmit thread keeps a pointer to the entry

    whose packet should be put on the wire next. sweep through all entries of a segment and

    jumps to the next segment when its time comes.

    1ms segment with many entries

    Transmit pointer

  • 8/6/2019 conf-mmcn-2002

    12/19

    Hierarchically Indexed

    Linear Queue

    Packets with increasing window-numerator (x)

    1 2 3 19 20

    12330

    x

    y

    Inside a segment, position of

    a packet is determined

    according to DWCS rules

    Packets with similar loss tolerance x/y

  • 8/6/2019 conf-mmcn-2002

    13/19

    Hierarchically Indexed

    Linear Queue

    Example: Assume that the maximum possible x is 20,

    maximum possible Y is 30 and there are 50 loss

    tolerance regions, 2000 cells.

    Entry position p = tab(x,y) + x

    Each entry stores a pointer to a buffer of

    packet pointers

  • 8/6/2019 conf-mmcn-2002

    14/19

    Hierarchically Indexed

    Linear Queue

    Level 1 vectors

    2n=N

    Linear queue

    Level 2 vectors0

    12

    3.

    .

    .

    .

    01

    23.

    .

    .

    .

    31

    0

    1

    2

    3

    .

    .

    .

    .

    31

    01

    23

    .

    .

    .

    .

    31

    speed up packet

    search with multi-

    level bit map

  • 8/6/2019 conf-mmcn-2002

    15/19

    Hierarchically Indexed

    Linear Queue

    Transmission Pointer

    Level 0segment 0 segment 1 segment 99

  • 8/6/2019 conf-mmcn-2002

    16/19

    Results

    memory accesses / per scheduled packet

    No. of active

    streams 10 50 100 200 500 1000 2000

    Heap 45.86 73.73 85.73 97.73 113.59 125.59 137.58Memory

    access# Per

    Stream

    HILQ 19.8 14.36 13.68 13.34 13.135 13.068 13.034

  • 8/6/2019 conf-mmcn-2002

    17/19

    Results

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    0 500 1000 1500 2000 2500

    No. of Active Streams per Segment in System

    Schedu

    ling

    Delay

    per

    Strea

    m

    (microenine

    cycles)

    scheduling cycle scalability

  • 8/6/2019 conf-mmcn-2002

    18/19

    Results

    0200400

    6008001000

    1200

    0 100 200 300 400 500 600Number of Active Streams

    Throug

    hput

    (Mbps

    )

    512 byte packet-256 byte packet-

    throughput scalability

  • 8/6/2019 conf-mmcn-2002

    19/19

    Conclusions

    HILQ based DWCS significantly reduces

    memory accesses for per scheduled

    packet comparing with heap based

    implementation.

    HILQ is able to service thousands of

    steams at high networking speed.

    HILQ achieves its performance through

    optimizing the scheduler algorithm andexploiting certain architecture

    attributes