Fast-Response Dynamic Routing
-
Upload
chirag-sachdeva -
Category
Documents
-
view
217 -
download
0
Transcript of Fast-Response Dynamic Routing
-
7/29/2019 Fast-Response Dynamic Routing
1/9
Fast-Response Dynamic Routing Balancing for High-Speed
Interconnection Networks*
D. Lugones, D. Franco, and E. Luque.
Department of Computer Architecture and Operating Systems.
Universitat Autnoma of Barcelona, [email protected]
{daniel.franco, emilio.luque}@uab.es.
* This work was funded by the MEC-Spain under contract TIN2007-64974
AbstractCommunication requirements in High Perform-
ance Computing systems demand the use of high-speed Inter-
connection Networks to connect processing nodes. However,
when communication load is unfairly distributed across the
network resources, message congestion appears. Congestion
spreading increases latency and reduces network throughput
causing important performance degradation. The Fast-
Response Dynamic Routing Balancing(FR-DRB) is a method
developed to perform a uniform balancing of communication
load over the interconnection network. FR-DRB distributes themessage traffic based on a gradual and load-controlled path
expansion. The method monitors network message latency and
makes decisions about the number of alternative paths to be
used between each source-destination pair for message deliv-
ery. FR-DRB performance has been compared with other
routing policies under a representative set of traffic patterns
which are commonly created by parallel scientific applications.
Experiments results show an important improvement in la-
tency and throughput.
1. INTRODUCTION.
Interconnection networks play a principal role in todays
High Performance Computing (HPC) systems, which are avery important platform for solving scientific problems
requiring ever-larger computational speed. Hence, an
efficient design of the interconnection network becomescritical to conceive more powerful techniques that allow
delivering messages at the fastest speed. As HPC system
size is increased, the interconnection network becomes a
bottleneck. Nowadays, network cost and powerconsumption are much higher than processors' [1]. To
address this issue, the number of network components is
reduced. However, this reduction leads the network
throughput near the saturation point, because the network
must fulfill the same communication requirements but usingfewer resources (switches and links). When communication
load is unfairly distributed across the network some
resources could be idle, while others could be quitecongested (Hot-spot). If congestion is not efficiently
controlled those resources may reach the saturation. As a
consequence, the message latency is considerably raised up,
and the global system performance is degraded. Thissituation is even worse in lossless networks because
congestion is quickly propagated to the whole network by
the flow control mechanism [3].
Therefore, current design trends demand efficientcongestion control techniques to improve the network
throughput using a suitable amount of resources andreducing the congestion caused by the adverse traffic [1].
This can be achieved by adaptive routing techniques which
dynamically manage existing network resources to reduce
congestion.
2. RELATED WORK.
Typically, adaptive congestion control mechanismsperform three basic tasks: network traffic monitoring,
congestion detection, and congestion control.In traffic monitoring, such parameters as point to point
message latency [9], buffer occupancy level [3] or link
speed-down (also called backpressure) [1] are evaluated in
order to detect and notify the congestion onset. Afternotification is received, some action is performed by
network endnodes or switches to avoid performancedegradation.
Message Throttling (MT) is probably the most popular
control action due to low cost and easy implementation
reasons. MT stops (or reduces) injection until packetsbelonging to the congested area are delivered to their
corresponding destinations. Message throttling is used tokeep the buffer occupation bounded in switches. However,
latency is still dreadfully increased, because packets must
wait at the source nodes until congestion disappears, so
performance is degraded.Other congestion control approaches are based on the
buffer management in switchs ports [3]. In these cases,packets flows are locally reallocated at switches to avoid
contention. However, good performance is not achieved
because congestion sources are not controlled, and the local
reallocation is not enough to reduce the traffic demand on
the oversubscribed switch.Finally, congestion control techniques based on adaptive
routing algorithms modify their behavior according to the
traffic condition to avoid congestion. Such policies handle
with congestion by sending messages from source to
destination through alternative paths. Thus, congested areais avoided and message injection is upheld. Therefore, the
978-1-4244-5012-1/09/$25.00 2009 IEEE
-
7/29/2019 Fast-Response Dynamic Routing
2/9
global system performance is improved because traffic load
is fairly distributed over the network resources. Some
examples are: HSAM [18], RECN-DD [3], PIPD [15], DRB
[9] and [7] , GOAL [16], and other methods presented in[2], [12], [15], and [4].
Some disadvantages of the adaptive routing mechanisms
are the overhead resulting from information monitoring, thepath changing and the need to guarantee both deadlock
freedom [2], and in-order packet delivery.
As mentioned above, the information about congestion is
analyzed by the routing algorithm in order to perform somecorrective action. In this case, information about the past
is used to decide the immediate future behavior of the
routing algorithm. Hence, a fast response speed ismandatory for the monitoring and notification activities to
provide the routing algorithm with updated congestion
information. It is also important that the algorithm has
robustness respect to the available information it uses (i.e.Algorithm should make appropriate decisions despite
monitoring information is not always very accurate). This
issue raises a tradeoff: if good decisions are needed, moreinformation is required from the system, but more
information means more traffic overhead. Therefore, the
amount of information needed, and the overhead required togather and process this information must be balanced.
Consequently, an efficient routing algorithm has to extractthe smartest behavior from the information that it has, and it
must also provide a fast response time (i.e. It must be able to
rapidly detect critical situations).
In this paper, we present the Fast-Response DynamicRouting Balancing algorithm (FR-DRB), a new routing
policy that uses several alternative paths simultaneously toincrease the available effective bandwidth between the
source-destination pairs for message delivery.Our proposal prevents the network congestion and fulfils
the features mentioned above. In FR-DRB we apply theconcept of communication load balancing to perform a
uniform traffic load distribution over the network resources.Distribution is accomplished by a dynamic path expansion
which is controlled according to the congestion level in each
source-destination path. The Monitoring phase is achieved
by measuring the total latency value that is registered by themessages along their path. The Notification phase is
accomplished by acknowledge messages (Ack), which aregenerated according to the congestion level. In order to
address the tradeoff between monitoring overhead and
response speed, the FR-DRB mechanism generates Acksonly when network traffic is low. The destination nodesends the Ack only if message latency does not exceeds a
Threshold latency value. Meanwhile, the source nodemonitors the time that the user message is delayed in the
network by using a watchdog timer. When the watchdog
timeout arises, FR-DRB immediately begins to use its
functionality expanding the source-destinations paths inorder to achieve greater bandwidth and avoiding the Ack
generation at destination node. Thus, FR-DRB eliminates
monitoring overhead when network is working near
saturation.
The idea of using a watchdog timer is common to severalsystems and contexts. In [9], the watchdog is used in
combination with pooling in the communication processor
delivering messages to the receiver thread. Furthermore, adescription of some congestion control techniques using
time windows is presented in [6], and also in [11].
FR-DRB is based on DRB [7], a former algorithm aimedto provide load balancing in current technologies. However,
FR-DRB is intended to extend the functionality but
considering important design goals not included in theformer version. These goals are fast response to congestion,
robustness and notification overhead reduction. In addition,
FR-DRB is inline with current approaches used in
commercial interconnects (i.e. InfiniBand), unlike theproposal presented in [9] which demands additional
requirements to the network components (i.e. localadaptivity, and the acknowledge generation in switches).
The rest of this paper is organized as follows. Section 3
presents a complete description of the FR-DRB policy.Section 4 shows the performance evaluation conducted toachieve a comparison with other routing methods, and also,
to measure FR-DRB response time. Finally, Section 5
presents conclusions.
3. FAST-RESPONSE DYNAMIC ROUTING BALANCING
FR-DRB defines the Metapath as the set of possible
alternative paths between each source-destination pair.
Metapath Configuration defines how to create alternative
paths used to expand single paths, and when to use themaccording to the congestion level.
Congestion detection is accomplished by watchdog
timers and Ack messages. If timers exceed the limit value(timer expiration), then a metapath is configured and new
alternative paths are selected. Hence, the available effective
bandwidth between src-dst pairs is increased when networkis congested. Also, the latency undergone by the messages
is recorded by the messages themselves. If latency is lowerthan threshold, it is sent back to the sender node using an
acknowledge message (Ack) to stop the timer and to provide
the sender node with latency information. Otherwise, for
higher network latency values, the watchdog timer on thesender will reach a time limitindicating that latency is high.
In this case, the acknowledge message is not generated.Each alternative path in the metapath is created by using
two intermediate nodes (INs), which are surroundingneighbors of source and destination nodes respectively.
Those INs are used like messages scattering and gatheringareas from source and destination nodes. INs are selected by
FR-DRB for each source-destination pair in the userapplication. A three-step path (Multi-Step Path, MSP) is
then built by selecting two INs: The IN1 which is a
neighbor of the source node, and the IN2 which is neighbor
of the destination node. Thus, the alternative paths createdby FR-DRB are built around the original path, and the
-
7/29/2019 Fast-Response Dynamic Routing
3/9
latency information is used to decide the number of
alternative paths over which messages will be distributed.The basic phases of FR-DRB are shown in the following
figures: Fig. 1 (a). Detection and notification: Congestion is
detected according to packets latency and the buffersoccupation state in switches. In case of non congested path,
notification is achieved by Acks packets. Otherwise,
watchdog timer expiration is used to notify the sources
nodes about congestion. Fig. 1 (b) Metapath configuration:
A set of surrounding nodes for each source, and for each
destination node is provided. Fig. 1 (c) shows an example ofa Metapath: A set of Multi-step paths (MSPs) defined by a
set of intermediate nodes pairs.Next, a detailed description of the three algorithm
components (Monitoring activity, Dynamic MetapathConfiguration and MultiStepPath Selection) is provided.
3.1 Monitoring Activity.
Traffic load monitoring is accomplished at two different
network elements: the sender node, and the intermediate
switches. At sender node, a watchdog timer registers thetime that users message spends in traveling to destination
node plus the return time of the Ack message. In themeanwhile, the message latency is also accumulated at
intermediate switches. The Watchdog timer is started (start
signal) when the message is injected into the network, and it
is stopped (stop signal) when the Ack arrives to the source,or when the timer exceeds a specified time limit. This
limit is calculated according to:
Where:- LZL(DATA) is the zero-load latency of the data packet.- LZL(ACK) is the zero-load latency of the ackpacket.- Th_latis the threshold latency value.
The zero-load latency is defined as the minimal average
latency accumulated by a packet in the network assumingthat the packet do not contend for resources with other
packets [15]. Thus zero-load latency is given by network
physical constrains such as distance between nodes (Hops),link bandwidth and packet size.
When timer reaches the time limit (expiration), metapath
configuration is invoked using this value as a parameter.Timer activity is shown in the Watchdog Timer function on
Table 1(a).
On the other hand, Latency information is registered by
the FR-DRB switch and it is transported as the messagetravels from source to destination node, as shown in the
Traffic Load Monitoring function (pseudo code of Table
1(b)). The time that a message waits in switchs bufferswhen it gets blocked by other messages is known as
contention latency. This is the latency value recorded in the
message.Latency information is evaluated when a message arrives
to its corresponding destination. If latency value is lowerthan a threshold, an Ack message is generated and sent back
to the sender node in order to stop the watchdog timer.
Otherwise, if accumulated latency is higher than the
threshold, the Ack message is not generated because thewatchdog timer should already have invoked the metapath
configuration module. Thus, Ack messages are not injectedwhen network is near saturation.
Ack messages have higher priority in the routing unit,and their size is less than 1% of the data message, because
only a header with the latency value is transported.Threshold value must be set according to the latency that
users application can tolerate. For example, threshold valuecan be set to a 50% more than zero-load latency. This value
implies that average link throughput is reduced in 33%
respect to the nominal value. In this case, the path is
considered as congested. From this point of view, thelatency works as a saturation index. When latency is
a) Latency Detection and Notification (b) Metapath Configuration (c) MultiStepPath Selection
Fig. 1. FR-DRB phases
-
7/29/2019 Fast-Response Dynamic Routing
4/9
going beyond the threshold, the monitoring module assumesthat paths performance is poor and allows FR-DRB to
improve it. The goal of latency recording in messages is to
identify the networks local traffic at any moment in orderto provide routing adaptivity. By using this local
information, the effect of other messages (which were sent
by other sources) is considered. Consequently, by means of
this distributedmechanism aglobaland collective effect ofmutual influences is achieved.
3.2 Metapath Configuration.
FR-DRB executes the dynamic metapath configuration
using the information gathered at monitoring phase. The
objective of this configuration is to determine for eachsource-destination pair, the type and size of the metapathaccording to the message latency or the timer information.
This is achieved by the selection of intermediate nodes. INs
build a path which is different from the original one. The
INs configuration regards the latency values at anymoment, together with the topological characteristics of the
interconnection network. INs are selected according to theirdistance to the source (or destination) node. The INs of 1-
hop distance are considered first, then INs of 2-hop
distance, etc. This metapath expansion is performed
gradually by including more surrounding neighbors in themetapath configuration. Thus, the traffic load is fairly
distributed over the network resources. The metapath
configuration phase is shown in Table 2. If metapathaverage latency is larger than the threshold value, then the
metapath size is increased, otherwise, it is decreased.
3.3 MultiStepPath Selection.
Each time a message is injected into the network, the
MultiStepPath Selection module is invoked to perform the
traffic load distribution by selecting one multi-step path.Consequently, messages are proportionally distributed
among the MSPs according to the latency information.Hence, the paths having the lowest latency values will
receive the greater number of messages.
Given a source node with N alternative paths, lets be Lci
(i:1...N) the latency recorded in path Ci (if there is not anylatency recorded yet, zero-load latency is used), and lets be
Bci the corresponding bandwidth calculated as: Bci=1/Lci.
Then the alternative path Cx will be selected in thefollowing injection according to the probability:
N
1i
Ci
(Cx)
B
B CxU
Paths are selected according to their latency and also to
their length. If paths are long in hops, the messagetransmission time could be high enough and lead to
performance degradation, so shortest and less loaded pathsare selected. The pseudo code in Table 3 shows the
MultiStepPath selection phase.
As explained above, when the message is injected into the
network, a watchdog timer is started to count the time that
Metapath Configuration (MSP, Th_Lat);/* Executed in source nodes each time a Latency (MSP) arrives or aTimer expires*/Variables Latencies_MSP:
Vector[1..Number_of_MSP] of integer;Threshold Th_Lat;
Begin1.Receive a Latency or a Timer Limit;2.Calculate the Metapath Latency (P*).
Latency(P*)=( Latency(MSPs) -1)-13.If(Latency (P*) > Th_Lat)
Increase the number of INs to provide new alternative paths.ElseIf(Latency (P*) < Th_Lat)
Decrease the number of INs to constrict metapath.EndIf
End Metapath Configuration
Table 2. Metapath Configuration Code
Traffic Load Monitoring (Msg M, Th_Lat, MSP)/*FR-DRB Switch*/
Begin1. For each step of message M,
1.1. Accumulate latency (queue time) to calculate MSP latency
1.2. Continue to next intermediate node or to final destination.2. When the message arrives to final destination,
2.1. If(Latency(MSP) > Th_Lat)do not send acknowledge message
else Latency (MSP) is sent back to the source node in anacknowledge message.
3.When the acknowledge message arrives at the source node:3.1 Reset watchdog timer (Stop signal)3.2. Latency (MSP) is delivered to the Metapath Configuration
function (MSP, latency (MSP)).
End Monitoring
Watchdog Timer (start, stop: signals):/*FR-DRB Endnode*/
Wait for start signal to arrive;
RepeatIncrease timerIf(timer >T limit )
Call Metapath configuration (table 2)Reset watchdog timer
If(stop signal arrives)Reset watchdog timer
End repeat
(b)
(a)
Table 1. Traffic Load Monitoring and Timer functions
-
7/29/2019 Fast-Response Dynamic Routing
5/9
the acknowledge message takes to arrive. Then, if the timer
exceeds a time limit, it can be deduced that latency is high
enough. Therefore, the path selection can be performedbefore the Ack message arrival providing fast response to
congestion.
3.4 Putting All Components Together.All the functionality and operations performed by the FR-DRB algorithm are shown in Fig. 2. When a source node
injects a message in the interconnection network, a
MultiStepPath (MSP) is selected according to the respective
latencies of the alternative paths. Path having the lowestlatency is selected with higher probability.
The message is then injected into the network, andconcurrently, the watchdog timer is started to measure the
message trip time. When message leaves the source node, it
is forwarded to destination node through intermediate
switches. Contention suffered at switchs buffers (queuinglatency) is recorded and stored in the message itself. When
the message arrives to its destination, it is delivered to theuser. Then, latency information is sent back to the sender in
the Ack header only if Recorded Latency < Threshold.
Otherwise, monitoring activity is finished and the
acknowledge message is not generated. Meanwhile, thewatchdog timer located at the source node runs side by side
with the sending of the message. In case that Ack message
arrives before the watchdog expiration, the latency value isdelivered to the metapath configuration module. This
module configures the metapath by selecting the alternative
paths to be used according to the latency value. However, ifthe watchdog exceeds the time limit, the FR-DRB algorithm
will use the latency threshold to configure the metapath.
Current switches in HPC systems are not just networkcops, since they are endowed with smart capabilities inorder to evaluate and adapt communication load in
accordance to network condition [7]. For instance,InfiniBand (IBA) switches, the most used technology in
todays HPC clusters [18], are provided with features aimed
to perform the buffer monitoring and the multipath
selection, as is required by the FR-DRB policy. IBA alsoprovides the watchdog timers to fulfill congestion control
requirements [5].
FR-DRB operations are performed concurrently withpacket delivery. As shown in Fig. 2, message is forwarded
without any overhead when output port is free (thickarrows). Otherwise, latency accumulation is performed only
when the messages are waiting in the buffer. Hence, this
operation does not delay the send/receive primitives. Also,
MultiStepPath selection and metapath configuration areperformed concurrently with the load injection, and the
messages are not delayed either.Deadlock freedom is ensured by having a separate escape
channel for each phase. As we adopt two intermediate
nodes, one escape channel is used (if required) from Src to
IN1, another one from IN1 to IN2, and a third one from IN2to Dst. Hence, each phase defines a virtual network, and the
packets change virtual network at each intermediate node.
Although each virtual network relies on a different escapechannel, they all share the same adaptive channel(s). Thus,
our current FR-DRB implementation uses four virtual
channels.The use of adaptive routing algorithms can cause out of
order delivery of packets. If the user application requires in-order packet delivery, FR-DRB reorder packets at thedestination node by using the well known sliding window
protocol, as is the case for other routing policies like [15].
4. FR-DRBPERFORMANCE EVALUATION
In order to assess the FR-DRB performance, we analyze
how Latency and Throughputmetrics are improved by the
monitoring activity, and the multipath configuration and
selection mechanisms. Latency metric represents theelapsed time between the generation of a packet at the
source node, until it is completely delivered at the
destination node. Throughput metric represents the traffic
load which is acceptedby the network vs. the traffic loadwhich is offeredby the sender nodes. Both metrics give a
global and average network performance description. Inaddition, network latency maps and latency over time charts
are also provided to evaluate mechanism transient response.
Evaluation methodology is divided into two major parts.The first part is designed to perform a network response
analysis under the Hot-spot traffic pattern to evaluate the
FR-DRB transient behavior and the traffic load distributionin extreme conditions. This specific pattern establishes some
fixed destinations in order to increase the traffic in a
particular network area causing saturated paths. In addition,
the remainder network nodes inject uniform load in order to
create background traffic over the network.In the second part, we evaluate the proposed techniqueusing well known communication patterns: Butterfly,
Perfect Shuffle and Matrix Transpose. These patterns
are collection of benchmarks that describe the conditions
commonly created by parallel scientific applications (furtherdescription of these patterns is provided in [2] and [14]).
The FR-DRB operations and modules, together withnetwork components were modeled [8] using the standard
simulation and modeling tool OPNET Modeler [13]. Opnet
Multistep Path Selection ()
/*Executed in the source node each time a message is injected*/
Begin1.Build Probability Density Function (PDF) of MultiStepPath bandwidths
(BCis).2.Select MultiStep Path using the PDF.3.Inject Message in the network
3.1 Build a message header.3.1.1 Concatenate INs headers.
3.2 Inject message3.3 Start timer
End MultiStepPath Selection
Table 3. FR-DRB MSP Selection Code
-
7/29/2019 Fast-Response Dynamic Routing
6/9
provides a Discrete Event Simulator (DES) engine. This
environment allows defining network components behavior
by a Finite State Machine approach (FSM), and it supportsdetailed specification of protocols, applications, and
queuing policies. The simulations were conducted for threeInfiniBand-like networks using the most popular topologies
in HPC systems (mesh, torus, and fat-tree), as is claimed by
the Top 500 supercomputer list [18]. In all cases, virtual cut-
through switching, and credit-based flow control wereassumed.
In order to achieve a comparative analysis, we have
implemented five routing policies. The Valiants Routing
algorithm [17] is an oblivious routing protocol aimed toachieve full load balancing. This mechanism performs two
phases. In the first phase, an intermediate node (IN) israndomly selected and packet is forwarded to this node.
After the IN is reached, packets are sent to destination
following the dimension order routing (DOR) approach [2].
Also, we have implemented the Turn model [12], anadaptive method that allows several possible minimal paths
between source and destination. At each switch, this policytries to forward packets through any free (or less loaded)
output link from those belonging to the minimal path. Thus,
localadaptivity is provided. Finally, in order to evaluate the
FR-DRB response time and the impact of the timer
expiration, we have set the time limit of the watchdog totwo different values: A fixed value related to the saturation
point, and infinity which implies no watchdog expiration (as
in the former DRB method [7]).
4.1 Hotspot Analysis.
Latency and throughput results obtained for the 1024-nodes mesh network are presented in Fig. 3 (a) and (b). FR-
DRB shows the same behavior that the other routing
policies at low loads, and consequently it does not overload
the network. However, at higher loads throughput is
improved using FR-DRB routing by 94% and latency is
reduced by 96% related to DOR. The improvement relies inthe fact that FR-DRB is a method with a fast response time
and low overhead. FR-DRB mechanism starts as soon as the
watchdog timer surpasses the time limit without waiting forthe Ack message which may arrive very much later, as is the
case for DRB. Performance improvements are larger at
higher loads. This implies that FR-DRB distributes betterthe traffic load. Hence, independently of the original spatial
distribution, the load that each switch perceives is similar,
and the latency experienced by messages is uniform.
In addition, Fig. 3 (c) shows the network latency surfaceunder a quadruple Hotspot pattern. This pattern is designedto analyze the network performance under heavy load. In
this case, a deterministic routing algorithm (DOR) was used.
As DOR does not perform any load balancing, Fig. 3 (c) is
useful to see the impact of Hotspot in the network, becausethis is the worst congestion case. We show the average
contention latency by means of the latency surface, in whicheach grid point (xy coordinates) represents the average
latency in the buffers of network switches (Figures 3(d),
3(e), 3(f), and 3(g)). Also, the effective load distribution, of
each algorithm, is shown by the contour lines projected atthe base of the charts. Latency reduction accomplished by
FR-DRB is 99% respect to DOR. Fig. 3 (d) shows thatValiant algorithm distributes the traffic load better thanTurn model (Fig.3(e)). However, the average message
latency is worst because path length is doubled (in average).
Thus, Valiant algorithm performs a suitable loaddistribution at expense of a latency rise. Also, the local
adaptivity of Turn model improves latency behavior but it
lacks of a suitable load balancing. FR-DRB outperforms thealgorithms mentioned above because it provides a global
load distribution minimizing the message latency (Fig.
Fig. 2. FR-DRB algorithm: Monitoring, Metapath Configuration and MSP Selection
-
7/29/2019 Fast-Response Dynamic Routing
7/9
Fig. 3. Performance results for Hot-spot pattern in the mesh network
(d)
(e)(f)
(g)
(c)
(a) (b)
(h)
-
7/29/2019 Fast-Response Dynamic Routing
8/9
3(g)). Therefore, network nodes using FR-DRB can adapt
themselves to the network condition avoiding hotspots by
using the free available bandwidth.Finally, Fig. 3 (h) shows the latency experienced by
messages along the time. The Hotspot duration is in the
range between 2 and 2.4 seconds. In case of deterministicrouting, latency peak reaches 45 ms approx. FR-DRB
reduces this peak almost 7 times due to the multipath
selection feature and the watchdog timer module, bothdescribed in section 3.
4.2 Benchmark Traffic Analysis.
We present the charts of latency reduction and
throughput improvement for the three proposed trafficpatterns defined above (Butterfly, Perfect Shuffle and Matrix
Transpose). Fig. 4(a) and (b) show performance
improvements in a 1024-nodes torus network for the
Valiant, Turn model, DRB, and FR-DRB policies. Resultsare presented in percentage [%] and they are all related to
deterministic routing (DOR).
We also present, in Fig. 5(a) and (b), the performance
improvements achieved in a 64-nodes network arranged in aFat-tree topology (4ary- 2tree) which is widely used in
today datacenters. In such topology, routing algorithms are
different to torus algorithms. Routing in fat-trees is
composed of two phases, an adaptive upwards phase and adeterministic downwards phase. As adaptive routing is used
in the ascending phase, several output ports are possible ateach switch and the final choice depends on the selection
function. The impact of the selection function on
performance has been previously studied in [4]. We have
implemented the First Free (FF) and Cyclic Priority (CP)selection functions to perform a comparative analysis. The
FF selection function selects the first free physical link, and
the CP selection function uses a round robin algorithm tochoose a different physical link each time a packet is
forwarded.
(a) (b)
Fig. 4. Performance results obtained for persistent patterns in a Torus topology network
Fig. 5. Performance results for persistent patterns in a Fat-Tree topology network
(a) (b)
-
7/29/2019 Fast-Response Dynamic Routing
9/9
Latency reductions and throughput improvements results
are also presented in percentage [%]. These results arerelated to deterministic routing, in which packets are
delivered through the same statically assigned path, and no
adaptivity or randomization is provided. Experiments showthat FR-DRB achieves lower latencies (up to 80%) and
higher throughput (up to 100%) than the other methods, in
both topologies. When load increases, latencyimprovements also increase. This gain allows heavier
communication load for networks using the FR-DRB
mechanism, or in cost-bounded systems, our policy allows
using less network resources for a given communicationload, because those resources are more efficiently handled.
Improvements in latency shown by FR-DRB are given bythe monitoring strategy and by the multipath configuration
and selection mechanism described in this paper.
5. CONCLUSIONS
In this paper, we proposed the Fast-Response Dynamic
Routing Balancing policy to deal with congestion in high-speed interconnection networks. FR-DRB controls the
performance degradation produced by packet contention in
network resources. Congestion control is accomplished bydistributing the communication load over several alternative
paths. FR-DRB performs a latency monitoring in the pathconnecting source and destination nodes. When latency
value is highly increased, source nodes start sending
messages concurrently through new, different and less
loaded alternatives paths. As all source nodes are awareabout the latency state, a global latency reduction is
achieved as shown in the experiments. Sources nodes are
also provided with watchdog timer that leads to faster
response time when network is congested. Furthermore, thewatchdog limits the acknowledge message generation which
reduces the overhead when network is near the saturationpoint.
FR-DRB has been developed to fulfill the design
objectives for parallel-computer interconnection networks.
These objectives are all-to-all connection and low latencybetween any pair of nodes for any communication load in
the network. Experiments performed to validate the FR-DRB policy have revealed very good improvements in
latency and throughput, and congestion is reduced allowing
the use of the network at higher loads. Therefore, FR-DRB
is useful for persistent and bursty communication patterns,
which are those that can produce the worst hot-spotsituations.
REFERENCES
[1] Baydal, E. A Family of Mechanisms for CongestionControl in Wormhole Networks, IEEE TPDS, vol 16,
pp.772-784, 2005.
[2] Duato J, Yalamanchili S, Ni L. InterconnectionNetworks, an Engineering Approach. Morgan
Kaufmann. 2002.
[3] Garcia P.J., et al. "RECN-DD: A Memory-EfficientCongestion Management Technique for Advanced
Switching," in ICPP, pp. 23-32, 2006.
[4] Gilabert, F., M. Gmez, et al. On the Influence of theSelection Function on the Performance of Fat-Trees. At
Euro-Par, pp: 864-873. Vol. 4128. 2006.
[5] IBTA, InfiniBand Architecture Specification, Volume 1,Release 1.2.1, http://www.infinibandta.org/specs/.
[6] Jain, R., "Congestion control in computer networks:issues and trends," Network, IEEE, vol.4, no.3, pp.24-30,
1990.
[7] Lugones, D. Franco, D. and Luque, E. "Dynamic RoutingBalancing On InfiniBand Networks", Journal of Comp.
Sci. & Tech. (JCS&T), Vol. 8 - No. 2. pp. 104-110, 2008.
[8] Lugones, D. Franco, D. and Luque, E. ModelingAdaptive Routing Protocols in High Speed
Interconnection Networks ", at OPNETWORK2008,
Washington, EEUU. 2008. available at:
https://aomail.uab.es/~dlugones/opnet.html
[9] Lugones, D. Franco, D. and Luque, E., "Dynamic andDistributed Multipath Routing Policy for High-SpeedCluster Networks", Cluster Computing and the Grid,
2009. CCGRID '09. 9th IEEE/ACM International
Symposium on , vol., no., pp.396-403, 18-21 May 2009
[10] Maquelin O et al Polling watchdog: Combining pollingand interrupts for efficient message handling at ISCA,
pp.179-188, 1996
[11] Mo, J. and Walrand, J. Fair end-to-end window-basedcongestion control. IEEE/ACM Trans. Netw., 556-567,
2000.
[12]Ni L, Glass C. The Turn model for Adaptive Routing.In ISCA, 278-287, 1992.
[13] OPNET Technologies, Opnet Modeler AcceleratingNetwork R&D, June 2008, http://opnet.com. 2008.
[14] Petrini, F., Hoisie, A., Wu-chun Feng, Graham R.,"Performance evaluation of the quadrics interconnection
network," at IPDPS, pp.1698-1706, Apr 2001.
[15] Y. Shihang, G. Min, I. Awan, "An Enhanced CongestionControl Mechanism in InfiniBand Networks for High
Performance Computing Systems," at AINA, IEEE
Computer Society, vol 1, pp. 845-850, 2006.
[16] Singh A., Dally W., Towles B., Gupta AK. GloballyAdaptive Load-Balanced Routing on Tori, IEEE Comp.
Arch. Letters, vol.3, pp.69, 2004.
[17] Valiant LG. Brebner GJ. "Universal Schemes for ParallelCommunication". ACM STOC. Milwaukee, 263-277,
1981.
[18] A. Vishnu, M. Koop, A. Moody, A. Mamidala, S.Narravula, D. Panda, "Hot-Spot Avoidance WithMultiPathing Over InfiniBand: An MPI Perspective," In
Proceedings of the CCGRID, IEEE Computer Society,
pp. 479-486, 2007.
[19] Top500 Supercomputers Site, Interconnect Family sharefor 11/2008, Nov. 2008, http://www.top500.org.