Fast-Response Dynamic Routing

download Fast-Response Dynamic Routing

of 9

Transcript of Fast-Response Dynamic Routing

  • 7/29/2019 Fast-Response Dynamic Routing


    Fast-Response Dynamic Routing Balancing for High-Speed

    Interconnection Networks*

    D. Lugones, D. Franco, and E. Luque.

    Department of Computer Architecture and Operating Systems.

    Universitat Autnoma of Barcelona, [email protected]

    {daniel.franco, emilio.luque}

    * This work was funded by the MEC-Spain under contract TIN2007-64974

    AbstractCommunication requirements in High Perform-

    ance Computing systems demand the use of high-speed Inter-

    connection Networks to connect processing nodes. However,

    when communication load is unfairly distributed across the

    network resources, message congestion appears. Congestion

    spreading increases latency and reduces network throughput

    causing important performance degradation. The Fast-

    Response Dynamic Routing Balancing(FR-DRB) is a method

    developed to perform a uniform balancing of communication

    load over the interconnection network. FR-DRB distributes themessage traffic based on a gradual and load-controlled path

    expansion. The method monitors network message latency and

    makes decisions about the number of alternative paths to be

    used between each source-destination pair for message deliv-

    ery. FR-DRB performance has been compared with other

    routing policies under a representative set of traffic patterns

    which are commonly created by parallel scientific applications.

    Experiments results show an important improvement in la-

    tency and throughput.


    Interconnection networks play a principal role in todays

    High Performance Computing (HPC) systems, which are avery important platform for solving scientific problems

    requiring ever-larger computational speed. Hence, an

    efficient design of the interconnection network becomescritical to conceive more powerful techniques that allow

    delivering messages at the fastest speed. As HPC system

    size is increased, the interconnection network becomes a

    bottleneck. Nowadays, network cost and powerconsumption are much higher than processors' [1]. To

    address this issue, the number of network components is

    reduced. However, this reduction leads the network

    throughput near the saturation point, because the network

    must fulfill the same communication requirements but usingfewer resources (switches and links). When communication

    load is unfairly distributed across the network some

    resources could be idle, while others could be quitecongested (Hot-spot). If congestion is not efficiently

    controlled those resources may reach the saturation. As a

    consequence, the message latency is considerably raised up,

    and the global system performance is degraded. Thissituation is even worse in lossless networks because

    congestion is quickly propagated to the whole network by

    the flow control mechanism [3].

    Therefore, current design trends demand efficientcongestion control techniques to improve the network

    throughput using a suitable amount of resources andreducing the congestion caused by the adverse traffic [1].

    This can be achieved by adaptive routing techniques which

    dynamically manage existing network resources to reduce



    Typically, adaptive congestion control mechanismsperform three basic tasks: network traffic monitoring,

    congestion detection, and congestion control.In traffic monitoring, such parameters as point to point

    message latency [9], buffer occupancy level [3] or link

    speed-down (also called backpressure) [1] are evaluated in

    order to detect and notify the congestion onset. Afternotification is received, some action is performed by

    network endnodes or switches to avoid performancedegradation.

    Message Throttling (MT) is probably the most popular

    control action due to low cost and easy implementation

    reasons. MT stops (or reduces) injection until packetsbelonging to the congested area are delivered to their

    corresponding destinations. Message throttling is used tokeep the buffer occupation bounded in switches. However,

    latency is still dreadfully increased, because packets must

    wait at the source nodes until congestion disappears, so

    performance is degraded.Other congestion control approaches are based on the

    buffer management in switchs ports [3]. In these cases,packets flows are locally reallocated at switches to avoid

    contention. However, good performance is not achieved

    because congestion sources are not controlled, and the local

    reallocation is not enough to reduce the traffic demand on

    the oversubscribed switch.Finally, congestion control techniques based on adaptive

    routing algorithms modify their behavior according to the

    traffic condition to avoid congestion. Such policies handle

    with congestion by sending messages from source to

    destination through alternative paths. Thus, congested areais avoided and message injection is upheld. Therefore, the

    978-1-4244-5012-1/09/$25.00 2009 IEEE

  • 7/29/2019 Fast-Response Dynamic Routing


    global system performance is improved because traffic load

    is fairly distributed over the network resources. Some

    examples are: HSAM [18], RECN-DD [3], PIPD [15], DRB

    [9] and [7] , GOAL [16], and other methods presented in[2], [12], [15], and [4].

    Some disadvantages of the adaptive routing mechanisms

    are the overhead resulting from information monitoring, thepath changing and the need to guarantee both deadlock

    freedom [2], and in-order packet delivery.

    As mentioned above, the information about congestion is

    analyzed by the routing algorithm in order to perform somecorrective action. In this case, information about the past

    is used to decide the immediate future behavior of the

    routing algorithm. Hence, a fast response speed ismandatory for the monitoring and notification activities to

    provide the routing algorithm with updated congestion

    information. It is also important that the algorithm has

    robustness respect to the available information it uses (i.e.Algorithm should make appropriate decisions despite

    monitoring information is not always very accurate). This

    issue raises a tradeoff: if good decisions are needed, moreinformation is required from the system, but more

    information means more traffic overhead. Therefore, the

    amount of information needed, and the overhead required togather and process this information must be balanced.

    Consequently, an efficient routing algorithm has to extractthe smartest behavior from the information that it has, and it

    must also provide a fast response time (i.e. It must be able to

    rapidly detect critical situations).

    In this paper, we present the Fast-Response DynamicRouting Balancing algorithm (FR-DRB), a new routing

    policy that uses several alternative paths simultaneously toincrease the available effective bandwidth between the

    source-destination pairs for message delivery.Our proposal prevents the network congestion and fulfils

    the features mentioned above. In FR-DRB we apply theconcept of communication load balancing to perform a

    uniform traffic load distribution over the network resources.Distribution is accomplished by a dynamic path expansion

    which is controlled according to the congestion level in each

    source-destination path. The Monitoring phase is achieved

    by measuring the total latency value that is registered by themessages along their path. The Notification phase is

    accomplished by acknowledge messages (Ack), which aregenerated according to the congestion level. In order to

    address the tradeoff between monitoring overhead and

    response speed, the FR-DRB mechanism generates Acksonly when network traffic is low. The destination nodesends the Ack only if message latency does not exceeds a

    Threshold latency value. Meanwhile, the source nodemonitors the time that the user message is delayed in the

    network by using a watchdog timer. When the watchdog

    timeout arises, FR-DRB immediately begins to use its

    functionality expanding the source-destinations paths inorder to achieve greater bandwidth and avoiding the Ack

    generation at destination node. Thus, FR-DRB eliminates

    monitoring overhead when network is working near


    The idea of using a watchdog timer is common to severalsystems and contexts. In [9], the watchdog is used in

    combination with pooling in the communication processor

    delivering messages to the receiver thread. Furthermore, adescription of some congestion control techniques using

    time windows is presented in [6], and also in [11].

    FR-DRB is based on DRB [7], a former algorithm aimedto provide load balancing in current technologies. However,

    FR-DRB is intended to extend the functionality but

    considering important design goals not included in theformer version. These goals are fast response to congestion,

    robustness and notification overhead reduction. In addition,

    FR-DRB is inline with current approaches used in

    commercial interconnects (i.e. InfiniBand), unlike theproposal presented in [9] which demands additional

    requirements to the network components (i.e. localadaptivity, and the acknowledge generation in switches).

    The rest of this paper is organized as follows. Section 3

    presents a complete description of the FR-DRB policy.Section 4 shows the performance evaluation conducted toachieve a comparison with other routing methods, and also,

    to measure FR-DRB response time. Finally, Section 5

    presents conclusions.


    FR-DRB defines the Metapath as the set of possible

    alternative paths between each source-destination pair.

    Metapath Configuration defines how to create alternative

    paths used to expand single paths, and when to use themaccording to the congestion level.

    Congestion detection is accomplished by watchdog

    timers and Ack messages. If timers exceed the limit value(timer expiration), then a metapath is configured and new

    alternative paths are selected. Hence, the available effective

    bandwidth between src-dst pairs is increased when networkis congested. Also, the latency undergone by the messages

    is recorded by the messages themselves. If latency is lowerthan threshold, it is sent back to the sender node using an

    acknowledge message (Ack) to stop the timer and to provide

    the sender node with latency information. Otherwise, for

    higher network latency values, the watchdog timer on thesender will reach a time limitindicating that latency is high.

    In this case, the acknowledge message is not generated.Each alternative path in the metapath is created by using

    two intermediate nodes (INs), which are surroundingneighbors of source and destination nodes respectively.

    Those INs are used like messages scattering and gatheringareas from source and destination nodes. INs are selected by

    FR-DRB for each source-destination pair in the userapplication. A three-step path (Multi-Step Path, MSP) is

    then built by selecting two INs: The IN1 which is a

    neighbor of the source node, and the IN2 which is neighbor

    of the destination node. Thus, the alternative paths createdby FR-DRB are built around the original path, and the

  • 7/29/2019 Fast-Response Dynamic Routing


    latency information is used to decide the number of

    alternative paths over which messages will be distributed.The basic phases of FR-DRB are shown in the following

    figures: Fig. 1 (a). Detection and notification: Congestion is

    detected according to packets latency and the buffersoccupation state in switches. In case of non congested path,

    notification is achieved by Acks packets. Otherwise,

    watchdog timer expiration is used to notify the sources

    nodes about congestion. Fig. 1 (b) Metapath configuration:

    A set of surrounding nodes for each source, and for each

    destination node is provided. Fig. 1 (c) shows an example ofa Metapath: A set of Multi-step paths (MSPs) defined by a

    set of intermediate nodes pairs.Next, a detailed description of the three algorithm

    components (Monitoring activity, Dynamic MetapathConfiguration and MultiStepPath Selection) is provided.

    3.1 Monitoring Activity.

    Traffic load monitoring is accomplished at two different

    network elements: the sender node, and the intermediate

    switches. At sender node, a watchdog timer registers thetime that users message spends in traveling to destination

    node plus the return time of the Ack message. In themeanwhile, the message latency is also accumulated at

    intermediate switches. The Watchdog timer is started (start

    signal) when the message is injected into the network, and it

    is stopped (stop signal) when the Ack arrives to the source,or when the timer exceeds a specified time limit. This

    limit is calculated according to:

    Where:- LZL(DATA) is the zero-load latency of the data packet.- LZL(ACK) is the zero-load latency of the ackpacket.- Th_latis the threshold latency value.

    The zero-load latency is defined as the minimal average

    latency accumulated by a packet in the network assumingthat the packet do not contend for resources with other

    packets [15]. Thus zero-load latency is given by network

    physical constrains such as distance between nodes (Hops),link bandwidth and packet size.

    When timer reaches the time limit (expiration), metapath

    configuration is invoked using this value as a parameter.Timer activity is shown in the Watchdog Timer function on

    Table 1(a).

    On the other hand, Latency information is registered by

    the FR-DRB switch and it is transported as the messagetravels from source to destination node, as shown in the

    Traffic Load Monitoring function (pseudo code of Table

    1(b)). The time that a message waits in switchs bufferswhen it gets blocked by other messages is known as

    contention latency. This is the latency value recorded in the

    message.Latency information is evaluated when a message arrives

    to its corresponding destination. If latency value is lowerthan a threshold, an Ack message is generated and sent back

    to the sender node in order to stop the watchdog timer.

    Otherwise, if accumulated latency is higher than the

    threshold, the Ack message is not generated because thewatchdog timer should already have invoked the metapath

    configuration module. Thus, Ack messages are not injectedwhen network is near saturation.

    Ack messages have higher priority in the routing unit,and their size is less than 1% of the data message, because

    only a header with the latency value is transported.Threshold value must be set according to the latency that

    users application can tolerate. For example, threshold valuecan be set to a 50% more than zero-load latency. This value

    implies that average link throughput is reduced in 33%

    respect to the nominal value. In this case, the path is

    considered as congested. From this point of view, thelatency works as a saturation index. When latency is

    a) Latency Detection and Notification (b) Metapath Configuration (c) MultiStepPath Selection

    Fig. 1. FR-DRB phases

  • 7/29/2019 Fast-Response Dynamic Routing


    going beyond the threshold, the monitoring module assumesthat paths performance is poor and allows FR-DRB to

    improve it. The goal of latency recording in messages is to

    identify the networks local traffic at any moment in orderto provide routing adaptivity. By using this local

    information, the effect of other messages (which were sent

    by other sources) is considered. Consequently, by means of

    this distributedmechanism aglobaland collective effect ofmutual influences is achieved.

    3.2 Metapath Configuration.

    FR-DRB executes the dynamic metapath configuration

    using the information gathered at monitoring phase. The

    objective of this configuration is to determine for eachsource-destination pair, the type and size of the metapathaccording to the message latency or the timer information.

    This is achieved by the selection of intermediate nodes. INs

    build a path which is different from the original one. The

    INs configuration regards the latency values at anymoment, together with the topological characteristics of the

    interconnection network. INs are selected according to theirdistance to the source (or destination) node. The INs of 1-

    hop distance are considered first, then INs of 2-hop

    distance, etc. This metapath expansion is performed

    gradually by including more surrounding neighbors in themetapath configuration. Thus, the traffic load is fairly

    distributed over the network resources. The metapath

    configuration phase is shown in Table 2. If metapathaverage latency is larger than the threshold value, then the

    metapath size is increased, otherwise, it is decreased.

    3.3 MultiStepPath Selection.

    Each time a message is injected into the network, the

    MultiStepPath Selection module is invoked to perform the

    traffic load distribution by selecting one multi-step path.Consequently, messages are proportionally distributed

    among the MSPs according to the latency information.Hence, the paths having the lowest latency values will

    receive the greater number of messages.

    Given a source node with N alternative paths, lets be Lci

    (i:1...N) the latency recorded in path Ci (if there is not anylatency recorded yet, zero-load latency is used), and lets be

    Bci the corresponding bandwidth calculated as: Bci=1/Lci.

    Then the alternative path Cx will be selected in thefollowing injection according to the probability:






    B CxU

    Paths are selected according to their latency and also to

    their length. If paths are long in hops, the messagetransmission time could be high enough and lead to

    performance degradation, so shortest and less loaded pathsare selected. The pseudo code in Table 3 shows the

    MultiStepPath selection phase.

    As explained above, when the message is injected into the

    network, a watchdog timer is started to count the time that

    Metapath Configuration (MSP, Th_Lat);/* Executed in source nodes each time a Latency (MSP) arrives or aTimer expires*/Variables Latencies_MSP:

    Vector[1..Number_of_MSP] of integer;Threshold Th_Lat;

    Begin1.Receive a Latency or a Timer Limit;2.Calculate the Metapath Latency (P*).

    Latency(P*)=( Latency(MSPs) -1)-13.If(Latency (P*) > Th_Lat)

    Increase the number of INs to provide new alternative paths.ElseIf(Latency (P*) < Th_Lat)

    Decrease the number of INs to constrict metapath.EndIf

    End Metapath Configuration

    Table 2. Metapath Configuration Code

    Traffic Load Monitoring (Msg M, Th_Lat, MSP)/*FR-DRB Switch*/

    Begin1. For each step of message M,

    1.1. Accumulate latency (queue time) to calculate MSP latency

    1.2. Continue to next intermediate node or to final destination.2. When the message arrives to final destination,

    2.1. If(Latency(MSP) > Th_Lat)do not send acknowledge message

    else Latency (MSP) is sent back to the source node in anacknowledge message.

    3.When the acknowledge message arrives at the source node:3.1 Reset watchdog timer (Stop signal)3.2. Latency (MSP) is delivered to the Metapath Configuration

    function (MSP, latency (MSP)).

    End Monitoring

    Watchdog Timer (start, stop: signals):/*FR-DRB Endnode*/

    Wait for start signal to arrive;

    RepeatIncrease timerIf(timer >T limit )

    Call Metapath configuration (table 2)Reset watchdog timer

    If(stop signal arrives)Reset watchdog timer

    End repeat



    Table 1. Traffic Load Monitoring and Timer functions

  • 7/29/2019 Fast-Response Dynamic Routing


    the acknowledge message takes to arrive. Then, if the timer

    exceeds a time limit, it can be deduced that latency is high

    enough. Therefore, the path selection can be performedbefore the Ack message arrival providing fast response to


    3.4 Putting All Components Together.All the functionality and operations performed by the FR-DRB algorithm are shown in Fig. 2. When a source node

    injects a message in the interconnection network, a

    MultiStepPath (MSP) is selected according to the respective

    latencies of the alternative paths. Path having the lowestlatency is selected with higher probability.

    The message is then injected into the network, andconcurrently, the watchdog timer is started to measure the

    message trip time. When message leaves the source node, it

    is forwarded to destination node through intermediate

    switches. Contention suffered at switchs buffers (queuinglatency) is recorded and stored in the message itself. When

    the message arrives to its destination, it is delivered to theuser. Then, latency information is sent back to the sender in

    the Ack header only if Recorded Latency < Threshold.

    Otherwise, monitoring activity is finished and the

    acknowledge message is not generated. Meanwhile, thewatchdog timer located at the source node runs side by side

    with the sending of the message. In case that Ack message

    arrives before the watchdog expiration, the latency value isdelivered to the metapath configuration module. This

    module configures the metapath by selecting the alternative

    paths to be used according to the latency value. However, ifthe watchdog exceeds the time limit, the FR-DRB algorithm

    will use the latency threshold to configure the metapath.

    Current switches in HPC systems are not just networkcops, since they are endowed with smart capabilities inorder to evaluate and adapt communication load in

    accordance to network condition [7]. For instance,InfiniBand (IBA) switches, the most used technology in

    todays HPC clusters [18], are provided with features aimed

    to perform the buffer monitoring and the multipath

    selection, as is required by the FR-DRB policy. IBA alsoprovides the watchdog timers to fulfill congestion control

    requirements [5].

    FR-DRB operations are performed concurrently withpacket delivery. As shown in Fig. 2, message is forwarded

    without any overhead when output port is free (thickarrows). Otherwise, latency accumulation is performed only

    when the messages are waiting in the buffer. Hence, this

    operation does not delay the send/receive primitives. Also,

    MultiStepPath selection and metapath configuration areperformed concurrently with the load injection, and the

    messages are not delayed either.Deadlock freedom is ensured by having a separate escape

    channel for each phase. As we adopt two intermediate

    nodes, one escape channel is used (if required) from Src to

    IN1, another one from IN1 to IN2, and a third one from IN2to Dst. Hence, each phase defines a virtual network, and the

    packets change virtual network at each intermediate node.

    Although each virtual network relies on a different escapechannel, they all share the same adaptive channel(s). Thus,

    our current FR-DRB implementation uses four virtual

    channels.The use of adaptive routing algorithms can cause out of

    order delivery of packets. If the user application requires in-order packet delivery, FR-DRB reorder packets at thedestination node by using the well known sliding window

    protocol, as is the case for other routing policies like [15].


    In order to assess the FR-DRB performance, we analyze

    how Latency and Throughputmetrics are improved by the

    monitoring activity, and the multipath configuration and

    selection mechanisms. Latency metric represents theelapsed time between the generation of a packet at the

    source node, until it is completely delivered at the

    destination node. Throughput metric represents the traffic

    load which is acceptedby the network vs. the traffic loadwhich is offeredby the sender nodes. Both metrics give a

    global and average network performance description. Inaddition, network latency maps and latency over time charts

    are also provided to evaluate mechanism transient response.

    Evaluation methodology is divided into two major parts.The first part is designed to perform a network response

    analysis under the Hot-spot traffic pattern to evaluate the

    FR-DRB transient behavior and the traffic load distributionin extreme conditions. This specific pattern establishes some

    fixed destinations in order to increase the traffic in a

    particular network area causing saturated paths. In addition,

    the remainder network nodes inject uniform load in order to

    create background traffic over the network.In the second part, we evaluate the proposed techniqueusing well known communication patterns: Butterfly,

    Perfect Shuffle and Matrix Transpose. These patterns

    are collection of benchmarks that describe the conditions

    commonly created by parallel scientific applications (furtherdescription of these patterns is provided in [2] and [14]).

    The FR-DRB operations and modules, together withnetwork components were modeled [8] using the standard

    simulation and modeling tool OPNET Modeler [13]. Opnet

    Multistep Path Selection ()

    /*Executed in the source node each time a message is injected*/

    Begin1.Build Probability Density Function (PDF) of MultiStepPath bandwidths

    (BCis).2.Select MultiStep Path using the PDF.3.Inject Message in the network

    3.1 Build a message header.3.1.1 Concatenate INs headers.

    3.2 Inject message3.3 Start timer

    End MultiStepPath Selection

    Table 3. FR-DRB MSP Selection Code

  • 7/29/2019 Fast-Response Dynamic Routing


    provides a Discrete Event Simulator (DES) engine. This

    environment allows defining network components behavior

    by a Finite State Machine approach (FSM), and it supportsdetailed specification of protocols, applications, and

    queuing policies. The simulations were conducted for threeInfiniBand-like networks using the most popular topologies

    in HPC systems (mesh, torus, and fat-tree), as is claimed by

    the Top 500 supercomputer list [18]. In all cases, virtual cut-

    through switching, and credit-based flow control wereassumed.

    In order to achieve a comparative analysis, we have

    implemented five routing policies. The Valiants Routing

    algorithm [17] is an oblivious routing protocol aimed toachieve full load balancing. This mechanism performs two

    phases. In the first phase, an intermediate node (IN) israndomly selected and packet is forwarded to this node.

    After the IN is reached, packets are sent to destination

    following the dimension order routing (DOR) approach [2].

    Also, we have implemented the Turn model [12], anadaptive method that allows several possible minimal paths

    between source and destination. At each switch, this policytries to forward packets through any free (or less loaded)

    output link from those belonging to the minimal path. Thus,

    localadaptivity is provided. Finally, in order to evaluate the

    FR-DRB response time and the impact of the timer

    expiration, we have set the time limit of the watchdog totwo different values: A fixed value related to the saturation

    point, and infinity which implies no watchdog expiration (as

    in the former DRB method [7]).

    4.1 Hotspot Analysis.

    Latency and throughput results obtained for the 1024-nodes mesh network are presented in Fig. 3 (a) and (b). FR-

    DRB shows the same behavior that the other routing

    policies at low loads, and consequently it does not overload

    the network. However, at higher loads throughput is

    improved using FR-DRB routing by 94% and latency is

    reduced by 96% related to DOR. The improvement relies inthe fact that FR-DRB is a method with a fast response time

    and low overhead. FR-DRB mechanism starts as soon as the

    watchdog timer surpasses the time limit without waiting forthe Ack message which may arrive very much later, as is the

    case for DRB. Performance improvements are larger at

    higher loads. This implies that FR-DRB distributes betterthe traffic load. Hence, independently of the original spatial

    distribution, the load that each switch perceives is similar,

    and the latency experienced by messages is uniform.

    In addition, Fig. 3 (c) shows the network latency surfaceunder a quadruple Hotspot pattern. This pattern is designedto analyze the network performance under heavy load. In

    this case, a deterministic routing algorithm (DOR) was used.

    As DOR does not perform any load balancing, Fig. 3 (c) is

    useful to see the impact of Hotspot in the network, becausethis is the worst congestion case. We show the average

    contention latency by means of the latency surface, in whicheach grid point (xy coordinates) represents the average

    latency in the buffers of network switches (Figures 3(d),

    3(e), 3(f), and 3(g)). Also, the effective load distribution, of

    each algorithm, is shown by the contour lines projected atthe base of the charts. Latency reduction accomplished by

    FR-DRB is 99% respect to DOR. Fig. 3 (d) shows thatValiant algorithm distributes the traffic load better thanTurn model (Fig.3(e)). However, the average message

    latency is worst because path length is doubled (in average).

    Thus, Valiant algorithm performs a suitable loaddistribution at expense of a latency rise. Also, the local

    adaptivity of Turn model improves latency behavior but it

    lacks of a suitable load balancing. FR-DRB outperforms thealgorithms mentioned above because it provides a global

    load distribution minimizing the message latency (Fig.

    Fig. 2. FR-DRB algorithm: Monitoring, Metapath Configuration and MSP Selection

  • 7/29/2019 Fast-Response Dynamic Routing


    Fig. 3. Performance results for Hot-spot pattern in the mesh network





    (a) (b)


  • 7/29/2019 Fast-Response Dynamic Routing


    3(g)). Therefore, network nodes using FR-DRB can adapt

    themselves to the network condition avoiding hotspots by

    using the free available bandwidth.Finally, Fig. 3 (h) shows the latency experienced by

    messages along the time. The Hotspot duration is in the

    range between 2 and 2.4 seconds. In case of deterministicrouting, latency peak reaches 45 ms approx. FR-DRB

    reduces this peak almost 7 times due to the multipath

    selection feature and the watchdog timer module, bothdescribed in section 3.

    4.2 Benchmark Traffic Analysis.

    We present the charts of latency reduction and

    throughput improvement for the three proposed trafficpatterns defined above (Butterfly, Perfect Shuffle and Matrix

    Transpose). Fig. 4(a) and (b) show performance

    improvements in a 1024-nodes torus network for the

    Valiant, Turn model, DRB, and FR-DRB policies. Resultsare presented in percentage [%] and they are all related to

    deterministic routing (DOR).

    We also present, in Fig. 5(a) and (b), the performance

    improvements achieved in a 64-nodes network arranged in aFat-tree topology (4ary- 2tree) which is widely used in

    today datacenters. In such topology, routing algorithms are

    different to torus algorithms. Routing in fat-trees is

    composed of two phases, an adaptive upwards phase and adeterministic downwards phase. As adaptive routing is used

    in the ascending phase, several output ports are possible ateach switch and the final choice depends on the selection

    function. The impact of the selection function on

    performance has been previously studied in [4]. We have

    implemented the First Free (FF) and Cyclic Priority (CP)selection functions to perform a comparative analysis. The

    FF selection function selects the first free physical link, and

    the CP selection function uses a round robin algorithm tochoose a different physical link each time a packet is


    (a) (b)

    Fig. 4. Performance results obtained for persistent patterns in a Torus topology network

    Fig. 5. Performance results for persistent patterns in a Fat-Tree topology network

    (a) (b)

  • 7/29/2019 Fast-Response Dynamic Routing


    Latency reductions and throughput improvements results

    are also presented in percentage [%]. These results arerelated to deterministic routing, in which packets are

    delivered through the same statically assigned path, and no

    adaptivity or randomization is provided. Experiments showthat FR-DRB achieves lower latencies (up to 80%) and

    higher throughput (up to 100%) than the other methods, in

    both topologies. When load increases, latencyimprovements also increase. This gain allows heavier

    communication load for networks using the FR-DRB

    mechanism, or in cost-bounded systems, our policy allows

    using less network resources for a given communicationload, because those resources are more efficiently handled.

    Improvements in latency shown by FR-DRB are given bythe monitoring strategy and by the multipath configuration

    and selection mechanism described in this paper.


    In this paper, we proposed the Fast-Response Dynamic

    Routing Balancing policy to deal with congestion in high-speed interconnection networks. FR-DRB controls the

    performance degradation produced by packet contention in

    network resources. Congestion control is accomplished bydistributing the communication load over several alternative

    paths. FR-DRB performs a latency monitoring in the pathconnecting source and destination nodes. When latency

    value is highly increased, source nodes start sending

    messages concurrently through new, different and less

    loaded alternatives paths. As all source nodes are awareabout the latency state, a global latency reduction is

    achieved as shown in the experiments. Sources nodes are

    also provided with watchdog timer that leads to faster

    response time when network is congested. Furthermore, thewatchdog limits the acknowledge message generation which

    reduces the overhead when network is near the saturationpoint.

    FR-DRB has been developed to fulfill the design

    objectives for parallel-computer interconnection networks.

    These objectives are all-to-all connection and low latencybetween any pair of nodes for any communication load in

    the network. Experiments performed to validate the FR-DRB policy have revealed very good improvements in

    latency and throughput, and congestion is reduced allowing

    the use of the network at higher loads. Therefore, FR-DRB

    is useful for persistent and bursty communication patterns,

    which are those that can produce the worst hot-spotsituations.


    [1] Baydal, E. A Family of Mechanisms for CongestionControl in Wormhole Networks, IEEE TPDS, vol 16,

    pp.772-784, 2005.

    [2] Duato J, Yalamanchili S, Ni L. InterconnectionNetworks, an Engineering Approach. Morgan

    Kaufmann. 2002.

    [3] Garcia P.J., et al. "RECN-DD: A Memory-EfficientCongestion Management Technique for Advanced

    Switching," in ICPP, pp. 23-32, 2006.

    [4] Gilabert, F., M. Gmez, et al. On the Influence of theSelection Function on the Performance of Fat-Trees. At

    Euro-Par, pp: 864-873. Vol. 4128. 2006.

    [5] IBTA, InfiniBand Architecture Specification, Volume 1,Release 1.2.1,

    [6] Jain, R., "Congestion control in computer networks:issues and trends," Network, IEEE, vol.4, no.3, pp.24-30,


    [7] Lugones, D. Franco, D. and Luque, E. "Dynamic RoutingBalancing On InfiniBand Networks", Journal of Comp.

    Sci. & Tech. (JCS&T), Vol. 8 - No. 2. pp. 104-110, 2008.

    [8] Lugones, D. Franco, D. and Luque, E. ModelingAdaptive Routing Protocols in High Speed

    Interconnection Networks ", at OPNETWORK2008,

    Washington, EEUU. 2008. available at:

    [9] Lugones, D. Franco, D. and Luque, E., "Dynamic andDistributed Multipath Routing Policy for High-SpeedCluster Networks", Cluster Computing and the Grid,

    2009. CCGRID '09. 9th IEEE/ACM International

    Symposium on , vol., no., pp.396-403, 18-21 May 2009

    [10] Maquelin O et al Polling watchdog: Combining pollingand interrupts for efficient message handling at ISCA,

    pp.179-188, 1996

    [11] Mo, J. and Walrand, J. Fair end-to-end window-basedcongestion control. IEEE/ACM Trans. Netw., 556-567,


    [12]Ni L, Glass C. The Turn model for Adaptive Routing.In ISCA, 278-287, 1992.

    [13] OPNET Technologies, Opnet Modeler AcceleratingNetwork R&D, June 2008, 2008.

    [14] Petrini, F., Hoisie, A., Wu-chun Feng, Graham R.,"Performance evaluation of the quadrics interconnection

    network," at IPDPS, pp.1698-1706, Apr 2001.

    [15] Y. Shihang, G. Min, I. Awan, "An Enhanced CongestionControl Mechanism in InfiniBand Networks for High

    Performance Computing Systems," at AINA, IEEE

    Computer Society, vol 1, pp. 845-850, 2006.

    [16] Singh A., Dally W., Towles B., Gupta AK. GloballyAdaptive Load-Balanced Routing on Tori, IEEE Comp.

    Arch. Letters, vol.3, pp.69, 2004.

    [17] Valiant LG. Brebner GJ. "Universal Schemes for ParallelCommunication". ACM STOC. Milwaukee, 263-277,


    [18] A. Vishnu, M. Koop, A. Moody, A. Mamidala, S.Narravula, D. Panda, "Hot-Spot Avoidance WithMultiPathing Over InfiniBand: An MPI Perspective," In

    Proceedings of the CCGRID, IEEE Computer Society,

    pp. 479-486, 2007.

    [19] Top500 Supercomputers Site, Interconnect Family sharefor 11/2008, Nov. 2008,