Fast-Response Dynamic Routing

7/29/2019 Fast-Response Dynamic Routing

1/9

Fast-Response Dynamic Routing Balancing for High-Speed

Interconnection Networks*

D. Lugones, D. Franco, and E. Luque.

Department of Computer Architecture and Operating Systems.

Universitat Autnoma of Barcelona, [email protected]

{daniel.franco, emilio.luque}@uab.es.

* This work was funded by the MEC-Spain under contract TIN2007-64974

AbstractCommunication requirements in High Perform-

ance Computing systems demand the use of high-speed Inter-

connection Networks to connect processing nodes. However,

when communication load is unfairly distributed across the

network resources, message congestion appears. Congestion

spreading increases latency and reduces network throughput

causing important performance degradation. The Fast-

Response Dynamic Routing Balancing(FR-DRB) is a method

developed to perform a uniform balancing of communication

load over the interconnection network. FR-DRB distributes themessage traffic based on a gradual and load-controlled path

expansion. The method monitors network message latency and

makes decisions about the number of alternative paths to be

used between each source-destination pair for message deliv-

ery. FR-DRB performance has been compared with other

routing policies under a representative set of traffic patterns

which are commonly created by parallel scientific applications.

Experiments results show an important improvement in la-

tency and throughput.

1. INTRODUCTION.

Interconnection networks play a principal role in todays

High Performance Computing (HPC) systems, which are avery important platform for solving scientific problems

requiring ever-larger computational speed. Hence, an

efficient design of the interconnection network becomescritical to conceive more powerful techniques that allow

delivering messages at the fastest speed. As HPC system

size is increased, the interconnection network becomes a

bottleneck. Nowadays, network cost and powerconsumption are much higher than processors' [1]. To

address this issue, the number of network components is

reduced. However, this reduction leads the network

throughput near the saturation point, because the network

must fulfill the same communication requirements but usingfewer resources (switches and links). When communication

load is unfairly distributed across the network some

resources could be idle, while others could be quitecongested (Hot-spot). If congestion is not efficiently

controlled those resources may reach the saturation. As a

consequence, the message latency is considerably raised up,

and the global system performance is degraded. Thissituation is even worse in lossless networks because

congestion is quickly propagated to the whole network by

the flow control mechanism [3].

Therefore, current design trends demand efficientcongestion control techniques to improve the network

throughput using a suitable amount of resources andreducing the congestion caused by the adverse traffic [1].

This can be achieved by adaptive routing techniques which

dynamically manage existing network resources to reduce

congestion.

2. RELATED WORK.

Typically, adaptive congestion control mechanismsperform three basic tasks: network traffic monitoring,

congestion detection, and congestion control.In traffic monitoring, such parameters as point to point

message latency [9], buffer occupancy level [3] or link

speed-down (also called backpressure) [1] are evaluated in

order to detect and notify the congestion onset. Afternotification is received, some action is performed by

network endnodes or switches to avoid performancedegradation.

Message Throttling (MT) is probably the most popular

control action due to low cost and easy implementation

reasons. MT stops (or reduces) injection until packetsbelonging to the congested area are delivered to their

corresponding destinations. Message throttling is used tokeep the buffer occupation bounded in switches. However,

latency is still dreadfully increased, because packets must

wait at the source nodes until congestion disappears, so

performance is degraded.Other congestion control approaches are based on the

buffer management in switchs ports [3]. In these cases,packets flows are locally reallocated at switches to avoid

contention. However, good performance is not achieved

because congestion sources are not controlled, and the local

reallocation is not enough to reduce the traffic demand on

the oversubscribed switch.Finally, congestion control techniques based on adaptive

routing algorithms modify their behavior according to the

traffic condition to avoid congestion. Such policies handle

with congestion by sending messages from source to

destination through alternative paths. Thus, congested areais avoided and message injection is upheld. Therefore, the

978-1-4244-5012-1/09/$25.00 2009 IEEE


2/9

global system performance is improved because traffic load

is fairly distributed over the network resources. Some

examples are: HSAM [18], RECN-DD [3], PIPD [15], DRB

[9] and [7] , GOAL [16], and other methods presented in[2], [12], [15], and [4].

Some disadvantages of the adaptive routing mechanisms

are the overhead resulting from information monitoring, thepath changing and the need to guarantee both deadlock

freedom [2], and in-order packet delivery.

As mentioned above, the information about congestion is

analyzed by the routing algorithm in order to perform somecorrective action. In this case, information about the past

is used to decide the immediate future behavior of the

routing algorithm. Hence, a fast response speed ismandatory for the monitoring and notification activities to

provide the routing algorithm with updated congestion

information. It is also important that the algorithm has

robustness respect to the available information it uses (i.e.Algorithm should make appropriate decisions despite

monitoring information is not always very accurate). This

issue raises a tradeoff: if good decisions are needed, moreinformation is required from the system, but more

information means more traffic overhead. Therefore, the

amount of information needed, and the overhead required togather and process this information must be balanced.

Consequently, an efficient routing algorithm has to extractthe smartest behavior from the information that it has, and it

must also provide a fast response time (i.e. It must be able to

rapidly detect critical situations).

In this paper, we present the Fast-Response DynamicRouting Balancing algorithm (FR-DRB), a new routing

policy that uses several alternative paths simultaneously toincrease the available effective bandwidth between the

source-destination pairs for message delivery.Our proposal prevents the network congestion and fulfils

the features mentioned above. In FR-DRB we apply theconcept of communication load balancing to perform a

uniform traffic load distribution over the network resources.Distribution is accomplished by a dynamic path expansion

which is controlled according to the congestion level in each

source-destination path. The Monitoring phase is achieved

by measuring the total latency value that is registered by themessages along their path. The Notification phase is

accomplished by acknowledge messages (Ack), which aregenerated according to the congestion level. In order to

address the tradeoff between monitoring overhead and

response speed, the FR-DRB mechanism generates Acksonly when network traffic is low. The destination nodesends the Ack only if message latency does not exceeds a

Threshold latency value. Meanwhile, the source nodemonitors the time that the user message is delayed in the

network by using a watchdog timer. When the watchdog

timeout arises, FR-DRB immediately begins to use its

functionality expanding the source-destinations paths inorder to achieve greater bandwidth and avoiding the Ack

generation at destination node. Thus, FR-DRB eliminates

monitoring overhead when network is working near

saturation.

The idea of using a watchdog timer is common to severalsystems and contexts. In [9], the watchdog is used in

combination with pooling in the communication processor

delivering messages to the receiver thread. Furthermore, adescription of some congestion control techniques using

time windows is presented in [6], and also in [11].

FR-DRB is based on DRB [7], a former algorithm aimedto provide load balancing in current technologies. However,

FR-DRB is intended to extend the functionality but

considering important design goals not included in theformer version. These goals are fast response to congestion,

robustness and notification overhead reduction. In addition,

FR-DRB is inline with current approaches used in

commercial interconnects (i.e. InfiniBand), unlike theproposal presented in [9] which demands additional

requirements to the network components (i.e. localadaptivity, and the acknowledge generation in switches).

The rest of this paper is organized as follows. Section 3

presents a complete description of the FR-DRB policy.Section 4 shows the performance evaluation conducted toachieve a comparison with other routing methods, and also,

to measure FR-DRB response time. Finally, Section 5

presents conclusions.

3. FAST-RESPONSE DYNAMIC ROUTING BALANCING

FR-DRB defines the Metapath as the set of possible

alternative paths between each source-destination pair.

Metapath Configuration defines how to create alternative

paths used to expand single paths, and when to use themaccording to the congestion level.

Congestion detection is accomplished by watchdog

timers and Ack messages. If timers exceed the limit value(timer expiration), then a metapath is configured and new

alternative paths are selected. Hence, the available effective

bandwidth between src-dst pairs is increased when networkis congested. Also, the latency undergone by the messages

is recorded by the messages themselves. If latency is lowerthan threshold, it is sent back to the sender node using an

acknowledge message (Ack) to stop the timer and to provide

the sender node with latency information. Otherwise, for

higher network latency values, the watchdog timer on thesender will reach a time limitindicating that latency is high.

In this case, the acknowledge message is not generated.Each alternative path in the metapath is created by using

two intermediate nodes (INs), which are surroundingneighbors of source and destination nodes respectively.

Those INs are used like messages scattering and gatheringareas from source and destination nodes. INs are selected by

FR-DRB for each source-destination pair in the userapplication. A three-step path (Multi-Step Path, MSP) is

then built by selecting two INs: The IN1 which is a

neighbor of the source node, and the IN2 which is neighbor

of the destination node. Thus, the alternative paths createdby FR-DRB are built around the original path, and the


3/9

latency information is used to decide the number of

alternative paths over which messages will be distributed.The basic phases of FR-DRB are shown in the following

figures: Fig. 1 (a). Detection and notification: Congestion is

detected according to packets latency and the buffersoccupation state in switches. In case of non congested path,

notification is achieved by Acks packets. Otherwise,

watchdog timer expiration is used to notify the sources

nodes about congestion. Fig. 1 (b) Metapath configuration:

A set of surrounding nodes for each source, and for each

destination node is provided. Fig. 1 (c) shows an example ofa Metapath: A set of Multi-step paths (MSPs) defined by a

set of intermediate nodes pairs.Next, a detailed description of the three algorithm

components (Monitoring activity, Dynamic MetapathConfiguration and MultiStepPath Selection) is provided.

3.1 Monitoring Activity.

Traffic load monitoring is accomplished at two different

network elements: the sender node, and the intermediate

switches. At sender node, a watchdog timer registers thetime that users message spends in traveling to destination

node plus the return time of the Ack message. In themeanwhile, the message latency is also accumulated at

intermediate switches. The Watchdog timer is started (start

signal) when the message is injected into the network, and it

is stopped (stop signal) when the Ack arrives to the source,or when the timer exceeds a specified time limit. This

limit is calculated according to:

Where:- LZL(DATA) is the zero-load latency of the data packet.- LZL(ACK) is the zero-load latency of the ackpacket.- Th_latis the threshold latency value.

The zero-load latency is defined as the minimal average

latency accumulated by a packet in the network assumingthat the packet do not contend for resources with other

packets [15]. Thus zero-load latency is given by network

physical constrains such as distance between nodes (Hops),link bandwidth and packet size.

When timer reaches the time limit (expiration), metapath

configuration is invoked using this value as a parameter.Timer activity is shown in the Watchdog Timer function on

Table 1(a).

On the other hand, Latency information is registered by

the FR-DRB switch and it is transported as the messagetravels from source to destination node, as shown in the

Traffic Load Monitoring function (pseudo code of Table

1(b)). The time that a message waits in switchs bufferswhen it gets blocked by other messages is known as

contention latency. This is the latency value recorded in the

message.Latency information is evaluated when a message arrives

to its corresponding destination. If latency value is lowerthan a threshold, an Ack message is generated and sent back

to the sender node in order to stop the watchdog timer.

Otherwise, if accumulated latency is higher than the

threshold, the Ack message is not generated because thewatchdog timer should already have invoked the metapath

configuration module. Thus, Ack messages are not injectedwhen network is near saturation.

Ack messages have higher priority in the routing unit,and their size is less than 1% of the data message, because

only a header with the latency value is transported.Threshold value must be set according to the latency that

users application can tolerate. For example, threshold valuecan be set to a 50% more than zero-load latency. This value

implies that average link throughput is reduced in 33%

respect to the nominal value. In this case, the path is

considered as congested. From this point of view, thelatency works as a saturation index. When latency is

a) Latency Detection and Notification (b) Metapath Configuration (c) MultiStepPath Selection

Fig. 1. FR-DRB phases


4/9

going beyond the threshold, the monitoring module assumesthat paths performance is poor and allows FR-DRB to

improve it. The goal of latency recording in messages is to

identify the networks local traffic at any moment in orderto provide routing adaptivity. By using this local

information, the effect of other messages (which were sent

by other sources) is considered. Consequently, by means of

this distributedmechanism aglobaland collective effect ofmutual influences is achieved.

3.2 Metapath Configuration.

FR-DRB executes the dynamic metapath configuration

using the information gathered at monitoring phase. The

objective of this configuration is to determine for eachsource-destination pair, the type and size of the metapathaccording to the message latency or the timer information.

This is achieved by the selection of intermediate nodes. INs

build a path which is different from the original one. The

INs configuration regards the latency values at anymoment, together with the topological characteristics of the

interconnection network. INs are selected according to theirdistance to the source (or destination) node. The INs of 1-

hop distance are considered first, then INs of 2-hop

distance, etc. This metapath expansion is performed

gradually by including more surrounding neighbors in themetapath configuration. Thus, the traffic load is fairly

distributed over the network resources. The metapath

configuration phase is shown in Table 2. If metapathaverage latency is larger than the threshold value, then the

metapath size is increased, otherwise, it is decreased.

3.3 MultiStepPath Selection.

Each time a message is injected into the network, the

MultiStepPath Selection module is invoked to perform the

traffic load distribution by selecting one multi-step path.Consequently, messages are proportionally distributed

among the MSPs according to the latency information.Hence, the paths having the lowest latency values will

receive the greater number of messages.

Given a source node with N alternative paths, lets be Lci

(i:1...N) the latency recorded in path Ci (if there is not anylatency recorded yet, zero-load latency is used), and lets be

Bci the corresponding bandwidth calculated as: Bci=1/Lci.

Then the alternative path Cx will be selected in thefollowing injection according to the probability:

N

1i

Ci

(Cx)

B

B CxU

Paths are selected according to their latency and also to

their length. If paths are long in hops, the messagetransmission time could be high enough and lead to

performance degradation, so shortest and less loaded pathsare selected. The pseudo code in Table 3 shows the

MultiStepPath selection phase.

As explained above, when the message is injected into the

network, a watchdog timer is started to count the time that

Metapath Configuration (MSP, Th_Lat);/* Executed in source nodes each time a Latency (MSP) arrives or aTimer expires*/Variables Latencies_MSP:

Vector[1..Number_of_MSP] of integer;Threshold Th_Lat;

Begin1.Receive a Latency or a Timer Limit;2.Calculate the Metapath Latency (P*).

Latency(P*)=( Latency(MSPs) -1)-13.If(Latency (P*) > Th_Lat)

Increase the number of INs to provide new alternative paths.ElseIf(Latency (P*) < Th_Lat)

Decrease the number of INs to constrict metapath.EndIf

End Metapath Configuration

Table 2. Metapath Configuration Code

Traffic Load Monitoring (Msg M, Th_Lat, MSP)/*FR-DRB Switch*/

Begin1. For each step of message M,

1.1. Accumulate latency (queue time) to calculate MSP latency

1.2. Continue to next intermediate node or to final destination.2. When the message arrives to final destination,

2.1. If(Latency(MSP) > Th_Lat)do not send acknowledge message

else Latency (MSP) is sent back to the source node in anacknowledge message.

3.When the acknowledge message arrives at the source node:3.1 Reset watchdog timer (Stop signal)3.2. Latency (MSP) is delivered to the Metapath Configuration

function (MSP, latency (MSP)).

End Monitoring

Watchdog Timer (start, stop: signals):/*FR-DRB Endnode*/

Wait for start signal to arrive;

RepeatIncrease timerIf(timer >T limit )

Call Metapath configuration (table 2)Reset watchdog timer

If(stop signal arrives)Reset watchdog timer

End repeat

(b)

(a)

Table 1. Traffic Load Monitoring and Timer functions


5/9

the acknowledge message takes to arrive. Then, if the timer

exceeds a time limit, it can be deduced that latency is high

enough. Therefore, the path selection can be performedbefore the Ack message arrival providing fast response to

congestion.

3.4 Putting All Components Together.All the functionality and operations performed by the FR-DRB algorithm are shown in Fig. 2. When a source node

injects a message in the interconnection network, a

MultiStepPath (MSP) is selected according to the respective

latencies of the alternative paths. Path having the lowestlatency is selected with higher probability.

The message is then injected into the network, andconcurrently, the watchdog timer is started to measure the

message trip time. When message leaves the source node, it

is forwarded to destination node through intermediate

switches. Contention suffered at switchs buffers (queuinglatency) is recorded and stored in the message itself. When

the message arrives to its destination, it is delivered to theuser. Then, latency information is sent back to the sender in

the Ack header only if Recorded Latency < Threshold.

Otherwise, monitoring activity is finished and the

acknowledge message is not generated. Meanwhile, thewatchdog timer located at the source node runs side by side

with the sending of the message. In case that Ack message

arrives before the watchdog expiration, the latency value isdelivered to the metapath configuration module. This

module configures the metapath by selecting the alternative

paths to be used according to the latency value. However, ifthe watchdog exceeds the time limit, the FR-DRB algorithm

will use the latency threshold to configure the metapath.

Current switches in HPC systems are not just networkcops, since they are endowed with smart capabilities inorder to evaluate and adapt communication load in

accordance to network condition [7]. For instance,InfiniBand (IBA) switches, the most used technology in

todays HPC clusters [18], are provided with features aimed

to perform the buffer monitoring and the multipath

selection, as is required by the FR-DRB policy. IBA alsoprovides the watchdog timers to fulfill congestion control

requirements [5].

FR-DRB operations are performed concurrently withpacket delivery. As shown in Fig. 2, message is forwarded

without any overhead when output port is free (thickarrows). Otherwise, latency accumulation is performed only

when the messages are waiting in the buffer. Hence, this

operation does not delay the send/receive primitives. Also,

MultiStepPath selection and metapath configuration areperformed concurrently with the load injection, and the

messages are not delayed either.Deadlock freedom is ensured by having a separate escape

channel for each phase. As we adopt two intermediate

nodes, one escape channel is used (if required) from Src to

IN1, another one from IN1 to IN2, and a third one from IN2to Dst. Hence, each phase defines a virtual network, and the

packets change virtual network at each intermediate node.

Although each virtual network relies on a different escapechannel, they all share the same adaptive channel(s). Thus,

our current FR-DRB implementation uses four virtual

channels.The use of adaptive routing algorithms can cause out of

order delivery of packets. If the user application requires in-order packet delivery, FR-DRB reorder packets at thedestination node by using the well known sliding window

protocol, as is the case for other routing policies like [15].

4. FR-DRBPERFORMANCE EVALUATION

In order to assess the FR-DRB performance, we analyze

how Latency and Throughputmetrics are improved by the

monitoring activity, and the multipath configuration and

selection mechanisms. Latency metric represents theelapsed time between the generation of a packet at the

source node, until it is completely delivered at the

destination node. Throughput metric represents the traffic

load which is acceptedby the network vs. the traffic loadwhich is offeredby the sender nodes. Both metrics give a

global and average network performance description. Inaddition, network latency maps and latency over time charts

are also provided to evaluate mechanism transient response.

Evaluation methodology is divided into two major parts.The first part is designed to perform a network response

analysis under the Hot-spot traffic pattern to evaluate the

FR-DRB transient behavior and the traffic load distributionin extreme conditions. This specific pattern establishes some

fixed destinations in order to increase the traffic in a

particular network area causing saturated paths. In addition,

the remainder network nodes inject uniform load in order to

create background traffic over the network.In the second part, we evaluate the proposed techniqueusing well known communication patterns: Butterfly,

Perfect Shuffle and Matrix Transpose. These patterns

are collection of benchmarks that describe the conditions

commonly created by parallel scientific applications (furtherdescription of these patterns is provided in [2] and [14]).

The FR-DRB operations and modules, together withnetwork components were modeled [8] using the standard

simulation and modeling tool OPNET Modeler [13]. Opnet

Multistep Path Selection ()

/*Executed in the source node each time a message is injected*/

Begin1.Build Probability Density Function (PDF) of MultiStepPath bandwidths

(BCis).2.Select MultiStep Path using the PDF.3.Inject Message in the network

3.1 Build a message header.3.1.1 Concatenate INs headers.

3.2 Inject message3.3 Start timer

End MultiStepPath Selection

Table 3. FR-DRB MSP Selection Code


6/9

provides a Discrete Event Simulator (DES) engine. This

environment allows defining network components behavior

by a Finite State Machine approach (FSM), and it supportsdetailed specification of protocols, applications, and

queuing policies. The simulations were conducted for threeInfiniBand-like networks using the most popular topologies

in HPC systems (mesh, torus, and fat-tree), as is claimed by

the Top 500 supercomputer list [18]. In all cases, virtual cut-

through switching, and credit-based flow control wereassumed.

In order to achieve a comparative analysis, we have

implemented five routing policies. The Valiants Routing

algorithm [17] is an oblivious routing protocol aimed toachieve full load balancing. This mechanism performs two

phases. In the first phase, an intermediate node (IN) israndomly selected and packet is forwarded to this node.

After the IN is reached, packets are sent to destination

following the dimension order routing (DOR) approach [2].

Also, we have implemented the Turn model [12], anadaptive method that allows several possible minimal paths

between source and destination. At each switch, this policytries to forward packets through any free (or less loaded)

output link from those belonging to the minimal path. Thus,

localadaptivity is provided. Finally, in order to evaluate the

FR-DRB response time and the impact of the timer

expiration, we have set the time limit of the watchdog totwo different values: A fixed value related to the saturation

point, and infinity which implies no watchdog expiration (as

in the former DRB method [7]).

4.1 Hotspot Analysis.

Latency and throughput results obtained for the 1024-nodes mesh network are presented in Fig. 3 (a) and (b). FR-

DRB shows the same behavior that the other routing

policies at low loads, and consequently it does not overload

the network. However, at higher loads throughput is

improved using FR-DRB routing by 94% and latency is

reduced by 96% related to DOR. The improvement relies inthe fact that FR-DRB is a method with a fast response time

and low overhead. FR-DRB mechanism starts as soon as the

watchdog timer surpasses the time limit without waiting forthe Ack message which may arrive very much later, as is the

case for DRB. Performance improvements are larger at

higher loads. This implies that FR-DRB distributes betterthe traffic load. Hence, independently of the original spatial

distribution, the load that each switch perceives is similar,

and the latency experienced by messages is uniform.

In addition, Fig. 3 (c) shows the network latency surfaceunder a quadruple Hotspot pattern. This pattern is designedto analyze the network performance under heavy load. In

this case, a deterministic routing algorithm (DOR) was used.

As DOR does not perform any load balancing, Fig. 3 (c) is

useful to see the impact of Hotspot in the network, becausethis is the worst congestion case. We show the average

contention latency by means of the latency surface, in whicheach grid point (xy coordinates) represents the average

latency in the buffers of network switches (Figures 3(d),

3(e), 3(f), and 3(g)). Also, the effective load distribution, of

each algorithm, is shown by the contour lines projected atthe base of the charts. Latency reduction accomplished by

FR-DRB is 99% respect to DOR. Fig. 3 (d) shows thatValiant algorithm distributes the traffic load better thanTurn model (Fig.3(e)). However, the average message

latency is worst because path length is doubled (in average).

Thus, Valiant algorithm performs a suitable loaddistribution at expense of a latency rise. Also, the local

adaptivity of Turn model improves latency behavior but it

lacks of a suitable load balancing. FR-DRB outperforms thealgorithms mentioned above because it provides a global

load distribution minimizing the message latency (Fig.

Fig. 2. FR-DRB algorithm: Monitoring, Metapath Configuration and MSP Selection


7/9

Fig. 3. Performance results for Hot-spot pattern in the mesh network

(d)

(e)(f)

(g)

(c)

(a) (b)

(h)


8/9

3(g)). Therefore, network nodes using FR-DRB can adapt

themselves to the network condition avoiding hotspots by

using the free available bandwidth.Finally, Fig. 3 (h) shows the latency experienced by

messages along the time. The Hotspot duration is in the

range between 2 and 2.4 seconds. In case of deterministicrouting, latency peak reaches 45 ms approx. FR-DRB

reduces this peak almost 7 times due to the multipath

selection feature and the watchdog timer module, bothdescribed in section 3.

4.2 Benchmark Traffic Analysis.

We present the charts of latency reduction and

throughput improvement for the three proposed trafficpatterns defined above (Butterfly, Perfect Shuffle and Matrix

Transpose). Fig. 4(a) and (b) show performance

improvements in a 1024-nodes torus network for the

Valiant, Turn model, DRB, and FR-DRB policies. Resultsare presented in percentage [%] and they are all related to

deterministic routing (DOR).

We also present, in Fig. 5(a) and (b), the performance

improvements achieved in a 64-nodes network arranged in aFat-tree topology (4ary- 2tree) which is widely used in

today datacenters. In such topology, routing algorithms are

different to torus algorithms. Routing in fat-trees is

composed of two phases, an adaptive upwards phase and adeterministic downwards phase. As adaptive routing is used

in the ascending phase, several output ports are possible ateach switch and the final choice depends on the selection

function. The impact of the selection function on

performance has been previously studied in [4]. We have

implemented the First Free (FF) and Cyclic Priority (CP)selection functions to perform a comparative analysis. The

FF selection function selects the first free physical link, and

the CP selection function uses a round robin algorithm tochoose a different physical link each time a packet is

forwarded.

(a) (b)

Fig. 4. Performance results obtained for persistent patterns in a Torus topology network

Fig. 5. Performance results for persistent patterns in a Fat-Tree topology network

(a) (b)


9/9

Latency reductions and throughput improvements results

are also presented in percentage [%]. These results arerelated to deterministic routing, in which packets are

delivered through the same statically assigned path, and no

adaptivity or randomization is provided. Experiments showthat FR-DRB achieves lower latencies (up to 80%) and

higher throughput (up to 100%) than the other methods, in

both topologies. When load increases, latencyimprovements also increase. This gain allows heavier

communication load for networks using the FR-DRB

mechanism, or in cost-bounded systems, our policy allows

using less network resources for a given communicationload, because those resources are more efficiently handled.

Improvements in latency shown by FR-DRB are given bythe monitoring strategy and by the multipath configuration

and selection mechanism described in this paper.

5. CONCLUSIONS

In this paper, we proposed the Fast-Response Dynamic

Routing Balancing policy to deal with congestion in high-speed interconnection networks. FR-DRB controls the

performance degradation produced by packet contention in

network resources. Congestion control is accomplished bydistributing the communication load over several alternative

paths. FR-DRB performs a latency monitoring in the pathconnecting source and destination nodes. When latency

value is highly increased, source nodes start sending

messages concurrently through new, different and less

loaded alternatives paths. As all source nodes are awareabout the latency state, a global latency reduction is

achieved as shown in the experiments. Sources nodes are

also provided with watchdog timer that leads to faster

response time when network is congested. Furthermore, thewatchdog limits the acknowledge message generation which

reduces the overhead when network is near the saturationpoint.

FR-DRB has been developed to fulfill the design

objectives for parallel-computer interconnection networks.

These objectives are all-to-all connection and low latencybetween any pair of nodes for any communication load in

the network. Experiments performed to validate the FR-DRB policy have revealed very good improvements in

latency and throughput, and congestion is reduced allowing

the use of the network at higher loads. Therefore, FR-DRB

is useful for persistent and bursty communication patterns,

which are those that can produce the worst hot-spotsituations.

REFERENCES

[1] Baydal, E. A Family of Mechanisms for CongestionControl in Wormhole Networks, IEEE TPDS, vol 16,

pp.772-784, 2005.

[2] Duato J, Yalamanchili S, Ni L. InterconnectionNetworks, an Engineering Approach. Morgan

Kaufmann. 2002.

[3] Garcia P.J., et al. "RECN-DD: A Memory-EfficientCongestion Management Technique for Advanced

Switching," in ICPP, pp. 23-32, 2006.

[4] Gilabert, F., M. Gmez, et al. On the Influence of theSelection Function on the Performance of Fat-Trees. At

Euro-Par, pp: 864-873. Vol. 4128. 2006.

[5] IBTA, InfiniBand Architecture Specification, Volume 1,Release 1.2.1, http://www.infinibandta.org/specs/.

[6] Jain, R., "Congestion control in computer networks:issues and trends," Network, IEEE, vol.4, no.3, pp.24-30,

1990.

[7] Lugones, D. Franco, D. and Luque, E. "Dynamic RoutingBalancing On InfiniBand Networks", Journal of Comp.

Sci. & Tech. (JCS&T), Vol. 8 - No. 2. pp. 104-110, 2008.

[8] Lugones, D. Franco, D. and Luque, E. ModelingAdaptive Routing Protocols in High Speed

Interconnection Networks ", at OPNETWORK2008,

Washington, EEUU. 2008. available at:

https://aomail.uab.es/~dlugones/opnet.html

[9] Lugones, D. Franco, D. and Luque, E., "Dynamic andDistributed Multipath Routing Policy for High-SpeedCluster Networks", Cluster Computing and the Grid,

2009. CCGRID '09. 9th IEEE/ACM International

Symposium on , vol., no., pp.396-403, 18-21 May 2009

[10] Maquelin O et al Polling watchdog: Combining pollingand interrupts for efficient message handling at ISCA,

pp.179-188, 1996

[11] Mo, J. and Walrand, J. Fair end-to-end window-basedcongestion control. IEEE/ACM Trans. Netw., 556-567,

2000.

[12]Ni L, Glass C. The Turn model for Adaptive Routing.In ISCA, 278-287, 1992.

[13] OPNET Technologies, Opnet Modeler AcceleratingNetwork R&D, June 2008, http://opnet.com. 2008.

[14] Petrini, F., Hoisie, A., Wu-chun Feng, Graham R.,"Performance evaluation of the quadrics interconnection

network," at IPDPS, pp.1698-1706, Apr 2001.

[15] Y. Shihang, G. Min, I. Awan, "An Enhanced CongestionControl Mechanism in InfiniBand Networks for High

Performance Computing Systems," at AINA, IEEE

Computer Society, vol 1, pp. 845-850, 2006.

[16] Singh A., Dally W., Towles B., Gupta AK. GloballyAdaptive Load-Balanced Routing on Tori, IEEE Comp.

Arch. Letters, vol.3, pp.69, 2004.

[17] Valiant LG. Brebner GJ. "Universal Schemes for ParallelCommunication". ACM STOC. Milwaukee, 263-277,

1981.

[18] A. Vishnu, M. Koop, A. Moody, A. Mamidala, S.Narravula, D. Panda, "Hot-Spot Avoidance WithMultiPathing Over InfiniBand: An MPI Perspective," In

Proceedings of the CCGRID, IEEE Computer Society,

pp. 479-486, 2007.

[19] Top500 Supercomputers Site, Interconnect Family sharefor 11/2008, Nov. 2008, http://www.top500.org.

Fast-Response Dynamic Routing

Documents

Transcript of Fast-Response Dynamic Routing