Indirect Adaptive Routing on Large Scale Interconnection ...isca09.cs.columbia.edu/pres/20.pdfKorean...

27
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim Korean Advanced Institute of Science and Technology

Transcript of Indirect Adaptive Routing on Large Scale Interconnection ...isca09.cs.columbia.edu/pres/20.pdfKorean...

  • 1

    Indirect Adaptive Routing on Large Scale Interconnection Networks

    Nan Jiang, William J. Dally

    Computer System Laboratory

    Stanford University

    John Kim

    Korean Advanced Institute of Science and Technology

  • 2

    Overview

    •  Indirect adaptive routing (IAR) –  Allow adaptive routing decision to be based on local and

    remote congestion information

    •  Main contributions –  Three new IAR algorithms for large scale networks –  Steady state and transient performance evaluations –  Impact of network configurations –  Cost of implementation

  • 3

    Presentation Outline

    • Background – The dragonfly network – Adaptive routing

    •  Indirect adaptive routing algorithms • Performance results •  Implementation considerations

  • 4

    Group 1 Group 0 Group 2 …

    Global Network

    Local Network

    Router 1 Router 2

    The Dragonfly Network

    •  High Radix Network –  High radix routers –  Small network diameter

    •  Each router –  Three types of channels –  Directly connected to a few

    other groups •  Each group

    –  Organized by a local network –  Large number of global

    channels (GC) •  Large network with a global

    diameter of one

    Router 0 p 0 p 1

  • 5

    Routing on the Dragonfly

    •  Minimal Routing (MIN) 1.  Source local network 2.  Global network 3.  Destination local network

    •  Some Adversarial traffic congests the global channels –  Each group i sends all packets

    to group i+1

    •  Oblivious solution: Valiant’s Algorithm (VAL) –  Poor performance on benign

    traffic

    Router 0

    Group 1 Group 0

    Group 2 …

    p 0 p 1

    … Router 1 Router 2 Congestion

  • 6

    Adaptive Routing

    •  Choose between the MIN path and a VAL path at the packet source [Singh'05] –  Decision metric: path delay –  Delay: product of path

    distance and path queue depth

    •  Measuring path queue length is unrealistic

    •  Use local queues length to approximate path –  Require stiff backpressure

    Source Router

    q 0 q 1

    q 2 q 3

    MIN GC

    VAL GC

    Congestion

  • 7

    Adaptive Routing: Worst Case Traffic

    0 0.1 0.2 0.3 0.4 0.5 100

    150

    200

    250

    300

    350

    400

    450

    Throughput (Flit Injection Rate)

    Pack

    et L

    aten

    cy (

    Sim

    ulat

    ion

    cycle

    s)

    Valiantʼs Minimal Adaptive

  • 8

    Indirect Adaptive Routing

    •  Improve routing decision through remote congestion information

    •  Previous method: –  Credit round trip [Kim et. al ISCA’08]

    •  Three new methods: –  Reservation –  Piggyback –  Progressive

  • 9

    Credit Round Trip (CRT)

    •  Delay the return of local credits from the congested router

    •  Creates the illusion of stiffer backpressure

    •  Drawbacks –  Remote congestion is still

    inferred through local queues

    –  Information not up to date Source Router

    Congestion

    Delayed Credits

    Credits

    MIN GC

    VAL GC

    [Kim et. al ISCA’08]

  • 10

    Reservation (RES)

    •  Each global channel track the number of incoming MIN packets

    •  Injected packets creates a reservation flit

    •  Routing decision based on the reservation outcome

    •  Drawbacks –  Reservation flit flooding –  Reservation delay

    Source Router

    Congestion

    RES Flit

    RES Failed

    MIN GC

    VAL GC

  • 11

    Piggyback (PB)

    •  Congestion broadcast –  Piggybacking on each

    packet –  Send on idle channels

    •  Congestion data compression

    •  Drawbacks –  Consumes extra

    bandwidth –  Congestion information

    not up to date (broadcast delay)

    Source Router

    Congestion

    GC Busy

    GC Free

    MIN GC

    VAL GC

  • 12

    Progressive (PAR)

    •  MIN routing decisions at the source are not final

    •  VAL decisions are final •  Switch to VAL when

    encountering congestion

    •  Draw backs –  Need an additional virtual

    channel to avoid deadlock –  Add extra hops

    Source Router

    Congestion

    MIN GC

    VAL GC

  • 13

    Experimental Setup

    •  Fully connected local and global networks – 33 groups – 1,056 nodes

    •  10 cycle local channel latency •  100 cycle global channel latency •  10-flit packets

  • 14

    Steady State Traffic: Uniform Random

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100

    120

    140

    160

    180

    200

    220

    240

    260

    280

    300

    Throughput (Flit Injection Rate)

    Pack

    et L

    aten

    cy (

    Sim

    ulat

    ion

    cycle

    s)

    Piggyback Credit Round Trip Progressive Reservation Minimal

  • 15

    Steady State Traffic: Worst Case

    0 0.1 0.2 0.3 0.4 0.5 100

    150

    200

    250

    300

    350

    400

    450

    Throughput (Flit Injection Rate)

    Pack

    et L

    aten

    cy (S

    imul

    atio

    n cy

    cles)

    Piggyback Credit Round Trip Progressive Reservation Valiantʼs

  • 16

    Transient Traffic: Uniform Random to Worst Case

    0 20 40 60 80 100 100

    200

    300

    400

    500

    Cycles After Transition

    Pack

    et L

    aten

    cy

    Average Packet Latency per Cycle - UR to WC

    Progressive Piggyback

    0 20 40 60 80 100 0

    50

    100

    Cycles After Transition % o

    f Pac

    kets

    Rou

    ting

    Nonm

    inim

    ally

    % Packets Routing Non-minimally per Cycle - UR to WC

    Progressive Piggyback

  • 17

    Network Configuration Considerations

    •  Packet size –  RES requires long packets to amortize reservation flit cost –  Routing decision is done on per packet basis

    •  Channel latency –  Affects information delay (CRT, PB) –  Affects packet delay (PAR, RES)

    •  Network size –  Affects information bandwidth overhead (RES, PB)

    •  Global diameter greater than one –  Need to exchange congestion information on the global

    network

  • 18

    Cost Considerations

    •  Credit round trip –  Credit delay tracker for every local channel

    •  Reservation –  Reservation counter for every global channel –  Additional buffering at the injection port to store packets

    waiting for reservation

    •  Piggyback –  Global channel lookup table for every router –  Increase in packet size

    •  Progressive –  Extra virtual channel for deadlock avoidance

  • 19

    Conclusion

    •  Three new indirect adaptive routing algorithms for large scale networks

    •  Performance and design evaluation of the algorithms

    •  Best Algorithm? –  Piggyback performed the best under steady state traffic –  Progressive responded fastest to transient changes

    –  Network configurations will affect some algorithm performance –  Cost of implementation

  • 20

    Thank You!

    • Questions?

  • 21

    Adaptive Routing: Uniform Traffic

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100

    120

    140

    160

    180

    200

    220

    240

    260

    280

    300

    Throughput - Flit Injection Rate

    Pack

    et L

    aten

    cy -

    Sim

    ulat

    ion

    cycle

    s VAL MIN Adaptive

  • 22

    Transient Traffic: Worst Case to Uniform Random

  • 23

    Transient Traffic: Worst Case 1 to Worst Case 10

  • 24

    1000 Random Permutation Traffic

    200 300 0 5

    10 15 20 25 30 PB

    Packet Latency

    % o

    f 1K

    Perm

    utat

    ions

    200 300 0 5

    10 15 20 25 30 CRT

    Packet Latency %

    of 1

    K Pe

    rmut

    atio

    ns

    200 300 0 5

    10 15 20 25 30 PAR

    Packet Latency

    % o

    f 1K

    Perm

    utat

    ions

    200 300 0

    5 10 15 20 25 30 RES

    Packet Latency

    % o

    f 1K

    Perm

    utat

    ions

    200 300 0 5

    10 15 20 25 30 VAL

    Packet Latency

    % o

    f 1K

    Perm

    utat

    ions

  • 25

    Effect of Packet size on RES: Worst Case Traffic

    0 0.1 0.2 0.3 0.4 0.5 0

    50

    100

    150

    200

    250

    300

    350

    400

    450

    500

    550

    Throughput - Flit Injection Rate

    Late

    ncy

    - Sim

    ulat

    ion

    cycle

    s

    1 Flit 2 Flits 4 Flits 8 Flits

  • 26

    Large local network: Uniform Random

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0

    50

    100

    150

    200

    250

    300

    350

    400

    Throughput - Flit Injection Rate

    Pack

    et L

    aten

    cy -

    Sim

    ulat

    ion

    cycle

    s

    PB CRT MIN PAR RES

  • 27

    Large local network: Worst Case

    0 0.1 0.2 0.3 0.4 0.5 0

    100

    200

    300

    400

    500

    600

    Throughput - Flit Injection Rate

    Pack

    et L

    aten

    cy -

    Sim

    ulat

    ion

    cycle

    s

    PB CRT PAR RES VAL