A Fault-Tolerant Network Architecture for Modular Datacenter

7/31/2019 A Fault-Tolerant Network Architecture for Modular Datacenter

1/14


2/14

International Journal of Software Engineering and Its Applications

Vol. 6, No. 2, April, 2012

94

restriction on scalability of MDCN. So, intra-container networks could adopt some

complex topologies which may be considered not suitable for traditional DCNs.

In this work, we present SCautz, a novel hierarchical intra-container network struc-

ture for a Shipping Container Kautz network. SCautz comprises of a base physical

Kautz topology, which is built by interconnecting servers' NIC ports and a small

amount of redundant COTS (commodity off-the-shelf) switches. SCautz base topologyadopts the server-centric approach. It is servers that take charge of routing traffic and

work with switches to bypass the failed servers for achieving graceful perfor-mance

degradation.

The basic idea of SCautz is driven by the demand of MDCs service -free mode onfault-tolerance of MDCN, and the inspiration of design is based on scale-out principlein datacenter construction. Results from theoretical analysis and simulations show that

SCautz is more viable for MDCN because of the following reasons:

First, SCautzs base topology can offer as high network capacity as BCube [5] for

one-to-x (e.g., one-to-one, one-to-all) and all-to-all traffic.

Second, we propose a fault-tolerant routing algorithm called SCRouing+, whichleverages switches and peer servers connected to the same switch to bypass the failed

servers. SCautz thus can maintain the throughput for one-to-x traffic, and make

network performance degrade smoothly for all-to-all traffic and much slower than

MDCs computation and storage capacity do.Third, the extra cost of redundant switches is very low. Theoretical analysis shows

that a typical SCautz-based container with 1280 servers only needs 160 switches.

The rest of the paper is organized as follows. Section 2 discusses background.

Section 3 presents BCube and its support for various trac patterns. Section 4 designs

BSR. Section 5 addresses other design issues. Sections 6 studies graceful degradation.

Section 7 presents implementation and experiments. Section 8 discusses related workand Section 9 concludes the paper.

2. Related Works

As MDC gets popular, modular datacenter networks (MDCN) have attracted moreand more interest from cloud providers, hardware vendors and academic fields. Against

the fatal drawbacks in supporting cloud data-intensive computing, lots of noveldatacenter networks (DCN) have been proposed.

VL2 [6] and PortLand [7] organize the switches into more sophisticated Clos and fat-

tree structures respectively, in which any two servers are able to communicate to each

other at maximum rate of network-interface cards (NIC). Since the routing intelligences

are placed on switches, VL2 and PortLand belong to switch-centric DCN; while Dcell[8], BCube and Camcube [9] belong to server-centric DCN, for their routing

intelligences are placed on servers. Dcell proposes a new recursive structure for high

scalability, BCube leverages the low-end COTS switches to implement the intra-

container network based on the Hypercube topology [10], and CamCube designs a

direct-connect 3D torus topology, which has been adopted by Content AddressableNetwork (CAN) overlay [11]. Because its servers are always equipped with multiple

NICs, server-centric DCN is more effective in supporting data-intensive applications

and dealing with failures than switch-centric one. Moreover, since Dcell has network

performance bottleneck at its lower hierarchy and CamCube mainly studies the

flexibility of routing API for cloud appications, BCube could better offer higher

uniform network capacity, and achieve graceful performance degradation.


3/14


Vol. 6, No. 2, April, 2012

95

In server-centric MDCN, the failures of servers and switches both lead to overall

performance of containers decreases. For example, BCubes incomplete structure makeits throughput for one-to-x traffic patterns drops evidently, and ABT (Aggregation

Bottle Throughput) for all-to-all traffic degrades faster than computation and storage

do. Furthermore, switch failures decrease BCube performance much more significantly.

Its ABT shrinks beyond 50% in the presence of 20% switch failures [5].SCautz proposes a novel hierachical network structure, in the model of undirected

Kautz graph, to avoid the above problems. Kautz achieves near-optimal tradeoff

between node degree and diameter, and have better bisection width and bottleneck

degree. However, it was considered not suitable for mega datacenter, because it is hard

to be incrementally deployed without violating the origin structure. For MDCN, the

amount of servers in a container is fixed, and the interior network will not be changed

during its whole lifecycle. So this restriction doesnot exist anymore. Through

simulations and comparisons, we show that SCautz is more viable for MDCN.

3. SCautz Architecture

SCautz comprises of two types of components: servers with multiple NICs and COTS

switches. Servers interconnect their NICs forming a physical Kautz topology asSCautzs base network structure, denoted as . Switches use their low-speed (1Gbps) ports to connect a specific number of servers, and reserve the high-speed(10Gbps) ports for inter-container networks.

3.1. Preliminaries

For defining the base undirected Kautz topology of SCautz, we introduce the

definition of directed Kautz graph first. Let be an alphabet of d+1letters, and be the identifier space ofKautz, wherein vertex identifiers are a set of strings with length kand base d, and their

consecutive letters are different.

21 10

02

12 01

20

21 10

02

12 01

20

Figure 1. The Kautz Graph and its Undirected Structure

Definition 1 (Kautz graph[13]

) The vertices and edges of Kautz graph K(d,k) are V(K(d,k))

and E(K(d,k)):

, .


4/14


Vol. 6, No. 2, April, 2012

96

The Kautz graph is d-regular, the diameter is k and there are vertices and edges in it. The SCautzs base undirected Kautz structure UK(d,k) isobtained by omitting the directions of the edges and keeping the loops between the

vertices of the form (abab...), e.g., (01,10), (21,12) and (02,20). So it is regular,which is not like the general undirected Kautz. Figure 1 shows K(2, 2) and UK(2, 2).

3.2. SCautz Structure

The complete structure of SCautz with redundant switches is denoted as

or for short, defined as follows.Definition 2 Let be the complete SCautz network with base topology and the redundant switch structure. The node (), switch ( and ),cluster ( and ) and link (), in which links comprise of the links () directlyconnecting servers and the links connecting servers and switches (), are represented asfollows:

,

,

,

, , or ,

.

The definition of SCautz(d,k,t)s nodes is just the same as in Kautz(d,k), where tmeans the length of switchs identifier. Due to the different rules of organizing servers,

the switches are divided into two categories: and . And let servers,whose rightmost (or leftmost) substrings of length t are the same and identical to a

certain switchs identifier, connect to the corresponding (or ). So tdeterminesthe amount of servers in one cluster and the total amount of switches. The n servers

connected to one same switch form a cluster, hence, all clusters fall into two categories

as well: the clusters with (or ) are denoted as (or ), and each serveris a member of and simultaneously. Therefore, The switch =10connects with four servers (1010, 2010, 0210 and 1210) building the cluster

={10}; the switch =02 also connects with four servers (0201, 0202, 0210and 0212) building the cluster ={02 }, and server 0210 is the member ofclusters {10} and {02} both, as shown in Figure 2. In the rest paper, we willnot distinguish S and C, and represent = (resp. ) for short,e.g., ==10 and ==21. The links in SCautz include building and connecting switches with their servers. All the links inSCautz are undirected, and the physical cables are full-duplex, thus can be definedin two equivalent ways.


5/14


Vol. 6, No. 2, April, 2012

97

10cluster

02

0210

cluster

Cleft Cright

Sleft Sright

Figure 2. The Cluster Structures of Two Types in SCautz(2,4,2)

If clusters are treated as virtual nodes and the reduplicate links between any pairs ofclusters are not considered temporarily, we can easily obtain the following theorem 1

and prove it true according to Definition 3. The SCautz(2,4,2) are shown in Figure 3,

including the full higher-level and and the partialcorresponding physical structures of servers. Note that the arrows of the links in Figure

3 are only used to exhibit SCautzs logical structures of clusters better.

21 10

02

12 01

20

10

01 02

1010 2010 0210 1210

2101

0101

0102

2102

physical

structure

1201

0201

0202

0102

Cright logical

structure

cluster

cluster

cluster

21 10

02

12 01

20

10

01 02

1010 2010 0210 1210

2101

0101

0102

2102

physical

structure

1201

0201

0202

0102

Cright logical

structure

cluster

cluster

cluster

Figure 3. The SCautz(2,4,2)s Two Full higher-level Logical Structures and

Partial Physical Structures


6/14


Vol. 6, No. 2, April, 2012

98

THEOREM 1. All the (or ) form a logical Kautz structure, denoted as (or ).

In SCautz, and represent the right-neighbors and left-neighbors of the server X by one L-shift and R-shift operation. The right-neighbor

clusters and left-neighbor clusters of and are defined in the Definition4. (or ) denotes the cluster, which server X belongs to, via (or), while (or ) denotes the peer servers in the same cluster (or ) with server X.

Definition 4 For any server , let (),,( ) and )be the neighbor-clusters of and .

, , , .

Therefore, the server as a member of has d right-neighbor clusters andone left-neighbor cluster while it as a member of has d left-neighbor clustersand one right-neighbor cluster. Take 1210 as an example,

= =01, or = =02 and =21 hold, while

= =10, or = =20 and = =21 hold. Combing the hybridstructure of SCautz(d,k,t) and above definitions, we can obtain the following key

properties about any server , cluster and their neighbors.

Property 1. Each server ( ) in the cluster( ) has d right-neighbor servers , and these s are evenlydistributed in d different right-neighbor clusters . Moreover, a cluster has d right-neighbor clusters,and all the servers in this connect to m servers in each right-neighborcluster.

Property 2. Each server in the cluster has d left-neighbor servers , and these d servers are inthe same cluster

(

). Moreover,

a cluster has d left-neighbor clusters ( , and every servers whose s areidentical (assuming ) connect to all the servers in oneleft-neighbor clusters (( ).


7/14


Vol. 6, No. 2, April, 2012

99

Therefore, we obtain the following lemmas.

Lemma 1. If , then . That is all the servers in the cluster ( ) connect with only one server in each right-neighbor cluster , and

.Lemma 2. If , then . Thus all the severs in one cluster connect with corresponding servers and .

010

101 102

1010 2010

0101 0102

Scautz(2,4,3)

2101 2102

cluster

cluster

cluster

10

01 02

1010 2010 0210 1210

2101

0101

0102

2102

Scautz(2,4,2)

1201

0201

0202

0102

cluster

cluster

cluster

Figure 4. The Cluster Interconnection Structures in SCautz(2,4,3) and

SCautz(2,4,2)

Therefore, if , there are node-disjoint paths and edge-disjointpaths from to each of its d right-neighborclusters ( ). Take SCautz(2,4,3) and SCautz(2,4,2) asexamples, for SCautz(2,4,3) shown in Figure 3, there are two servers in cluster 010 and

their neighbor-servers are distributed in two neighbor-clusters 101 and 102. But

according to Lemma 1, the two servers connect to only one server in each neighbor-

cluster respectively. So, if server 0101 fails, all the links between cluster 010 and 101

are broken; while for SCautz(2,4,2), according to Lemma 2, there are two node-disjointpaths and four edge-disjoint paths, so it is more reliable than SCautz(2,4,3). Thus, we

will always let in this paper.

Lemma 3. There are node-disjoint paths and edge-disjoint paths from

to its each of itsd

left-neighbor clusters ().It is easy to know the logical Kautz structures of C_right (X) and C_left (X) are

isomorphic, so we can also derive the corresponding properties about C_left (X) andthey will not be listed here.

The SCautz is server-centric and its routing intelligence is implemented on servers.

In consideration of the number limits of servers Ethernet NIC slots and COTS switcheslow-speed ports, we pick SCautz(4,5,3) as a typical structure for MDCN. SCautz(4,5,3)


8/14


Vol. 6, No. 2, April, 2012

100

supports 1280 servers using only 160 COTS switches. Each server need to be equipped

10 Ethernet ports, in which 8 ports are used for constructing Kautz topology and 2 used

for connecting to each type of COTS switches. Now the multi-port (Dual-port, Quad-

port) Ethernet NICs have become COTS components, and the COTS switch is generally

equipped with tens (e.g, 24) of 1 GigE ports and several (e.g, 4) 10 GigE ports. SCautz

uses switches 1 GigE ports to communicate with servers in the same cluster andreserve high-speed 10 GigE ports for inter-container network. Thus, SCautz is a

practical approach for intra-container network of MDCN.

4. Routing in SCautz

According to SCautzs hierarchical structure, we propose a suite of routing algorithms to

effectively utilize the redundant resources. In this section, we first introduce the regular

routing methods in Kautz, i.e. in fault-free ; and then we analyze theirdisadvantages on dealing with node faults; at last, we present a fault-tolerant routing

algorithm, SCRouing+, to achieve graceful performance degradation.

4.1. Routing in Kautz graph

is a complete undirected Kautz structure. For directed Kautz graph, Fiolproposed a shortest path routing algorithm from source X to destination Y by using L-shift,

defined in Definition 2: Find the largest suffix of X which coincides with a prefix of Y, and

the substring is denoted as R-string. Then put the hop H with longer suffix that coincides with

a prefix of Y than its previous hop until reach the destination Y, and isobtained. In the same way, could be computed by using R-shift operations, andR-shift is defined below too.

Definition 2 Let L-shift and R-shift denote the shift operations on X:

= = .

Combing Fiols [12] and Pradhans [13] ideas, we design a routing algorithm for

, called SCRouting. Let |R-string| and |L-string| refer to the length of R-string and L-string. SCRouting algorithm first compares |R-string| and |L-string|. If |R-

string|>|L-string|, then the is picked as by performing L-shift;otherwise, is picked as to route packets.

4.2. Routing in Kautz graph

In , there are either d parallel R-paths or d parallel L-paths betweenany pairs of servers. Generally, the Kautz graph uses one R-path (or L-path) for data

transmission. If the path breaks down, it is discarded and replaced by another one from

the rest d-1 R-paths (or L-paths). The reason why not find a sub-path to bypass thefailed links or nodes is that such a sub-path may need at most k hops. For example, if

node 20 fails, then path 12->20->01 is not valid anymore, then it compute another new

path 12->21->10>01 from 12 to 01, as shown in Figure 5. In this way, though the

destination is still reachable, the capacity has shrunken: For one-to-one traffic, the sparepaths are always longer than the primitive one, so the delay of single-path routing

increases; since there are d-1 parallel paths left, so throughput of multi-path routing

decreases by 1/d. For one-to-x traffic, since even one failure of link or server will make


9/14


Vol. 6, No. 2, April, 2012

101

all the paths via it become invalid, so the network capacity and reliability degrades

severely.

To remedy the deficiencies, we propose a fault-tolerant routing algorithm

SCRouing+ based on SCautzs hybrid structure. It can handle the faults in paths

generated by both SCBRouting in and SCRouting in .SCRouing+ uses the survival peer server in the same cluster with the unreachable one tobypass the failed link or server: for R-path( ), it utilizes the peer server in, while for L-path( ), it utilizes the one in s.

Figure 5. Fault-tolerant Routing in Kautz

Figure 6. SCRouting+ fault-tolerant Routing in SCautz

Let (resp. ) represent the i-th right-neighbor (resp.left-neighbor) servers by i L-shift (resp. R-shift) operations. For example assuming

and 2, the

21 10

02

12 01

20 failed

12 20

01010201 1201 2101

1012

0212

0120

2120

1212

2012

1020

2020

cluster cluster

cluster

failed

failed failed

01


10/14


Vol. 6, No. 2, April, 2012

102

means s right-neighbors right-neighbor server, i.e. the second right-neighbor. Thenthe lemma 4 is obtained and proved easily .

Lemma 4. For in logical , and are in the same cluster. If their mrightmost letters are identical and m+1 rightmost letters are different, then

, in which

. So it is true for

in

logical .According to the Lemma 4, if a server detects the next hop is unreachable, SCRouting+

picks an idle peer server as the next hop from the ones, wheres suffix (or prefix)of length coincides with s and that of length not. Then (or ). Thus SCRouting+ bypasses thefailed hop and reaches its next hop. Moreover, the new fault-tolerant path is only one hop

more than the original one, and without impacts on the other parallel paths. For example,

server 2120 is down, resulting in the sub-path 0212->2120->1201 in certain path invalid.

SCRouing+ constructs the sub-path 0212->1012->0120->1201 to bypass 2120, shown in

Figure 6, instead of a new path 0212->2120->1201->2012->0120->1201 in regular method.

5. Simulations

In this section, we conduct simulations to evaluate the behavior of SCautz and SCRouting+

on fault-tolerance. First, we analyze the performance of SCautzs base topology on handlingvarious patterns of traffic and compare the results to several representative BCubes. And

then we test the performance decline of SCautz and BCube when failures happen and

increase.In these simulations, we use SCautz(4,5,3) as a typical intra-container network of MDCN,

whose base Kautz topology is UK(4,5) and t=3. There are servers equippedwith 5 dual-port NICs and COTS switches with 24 1 GigE ports and 410 GigE ports. For comparisons, we pick two full BCube structures (BCube(32,1),

BCube(4,4))[6]

and one partial BCube (BCube(8,3))[3]

, in which the partial BCube(8,3) uses

2 complete BCube(8,2) with full layer-4 switches ( ). So there are 1024 servers in allthree BCubes but with 64, 1280, 704 switches in BCube(32,1), BCube(4,4) and BCube(8,3)

respectively.

5.1. Performance of We assume the bandwidth of each server s NIC port is 1 Gbps and intermediate

servers relay traffic without delay. We summary some key results in Table 1.

Table 1. Key Simulation Results of and Bcube BCube(32,1) BCube(4,4) BCube(8,3)

ave_path 4.38 1.94 3.75 3.511-to-1 4 2 5 4

1-to-all 4 2 5 4

ABT 1168.95 1057.03 1365.33 1170.29


11/14


Vol. 6, No. 2, April, 2012

103

From the simulations and comparisons, we know that could offer as highthroughput for one-to-x traffic and high throughput for all-to-all traffic as BCube(8,3) does.

But s ABT and per-server throughput are a little lower than BCube(4,4)because of its longer average path length, because the average path length directly affects the

ABT. In our work, when computing path length for BCube, we considers the switches as

dumb crossbar, as

[5]

says but unlike in

[14,15]

, so the two hops travelling a switch onlyaccounted as one. In addition, BCube(4,4) needs more switches of an order of magnitude. The

results illustrate that just is able to effectively accelerate various types oftraffic patterns as well as BCube, when a container is fault-free.

5.2 Fault-tolerance Evaluation

Since either link or server failure makes one hop in the path unreachable, we assume all

faults are caused by servers or switches and server failures also result in computation and

storage capacity decline in our simulations.

As shown in Figure 7, when one server failure happens, the per-server throughput of

BCube(32,1), BCube(4,4) and BCube(8,3) lose by 50%, 20% and 25% for one-to-x traffic.

Using switches, SCRouting+ algorithm could bypass the failed server by one more hop and

keep the original path valid. So is able to retain the original throughput as afault-free one.

In Figure 8, when 10% and 20% servers fail, the overall computation capacity drops 10%

and 20% correspondingly, while BCubes ABT drops by 15.3% and 25.23%, represented by

the polyline named BCube(8,3). In contrast, only loses by 6.91% and13.74% throughput respectively, much slower than computation and storage decrease. In

addition, BCubes ABT shrinks beyond 50% when 20% switch fail, but no impact on

.

Figure 4. Throughput Degradation for one-to-one Traffic


12/14


Vol. 6, No. 2, April, 2012

104

Figure 8. ABT Degradation for all-to-all Traffic

5.3 Fault-tolerance Analysis

From the above simulations, we can see that is able to leverage redundantswitches to maintain the per-server throughput for one-to-x traffic and reduce about half ABT

decrease than BCube, so as to improve the reliability of SCautz evidently. Switch faults have

little impact on , but result in BCubes ABT drop sharply. It is because that switches inSCatuz are mainly used to tolerate the increasing faults, while switches in BCube exist

between any two servers and participate in forwarding each network packet.

It is easy to obtain an effective scheme of SCautz-based container to deal with frequent and

increasing failures: First let a fault-free containers SCautzs base topology functions, andthen leverage switches to tolerate faults. Thus, SCautz is able to retain the merits of its

original base structure and achieve performance graceful degradation.

9. Conclusion

MDCs distinct service-free service model poses stricter demand on fault-tolerance ofdatacenter network. According to the scale-out design principle, we propose a novelhierarchical intra-container network structure for MDC, named SCautz. SCautz comprises of

a base physical Kautz topology and hundreds of redundant COTS switches. Its base topology,

, is able to effectively accelerate one-to-x traffic and offer high networkthroughput for all-to-all traffic, behaving as well as BCube. Besides, each switch of two types

together with a specific number of servers form clsters, and clusters build two logical

Kautz structures in higher level. Thus, SCautz is able to retain the throughput for processingone-to-x traffic in the presence of failures and achieve more graceful performance

degradation by reducing about half ABT decrease than BCube.

In this paper, we have proved that SCautz is able to meet the strict requirements ofMDCN through theoretical analysis and simulating evaluations. In our future work, we

will study how to design inter-container network by interconnecting SCautz-based

containers to build mega-datacenters. Moreover, we need to design novel load-balanced


13/14


Vol. 6, No. 2, April, 2012

105

routing algorithm to process burst network flows of data-intensive applications [16, 17],

so the map-reduce-like applications would not miss the strict deadline for fetching

intermediate results from worker nodes[18]

.

Acknowledgements

This work is supported in part by the National Basic Research Program of China (973)under Grant No. 2011CB302600, the National Natural Science Foundation of China (NSFC)

under Grant No. 60903205, the Foundation for the Author of National Excellent Doctoral

Dissertations of PR China (FANEDD) under Grant No. 200953, and the Research Fund for

the Doctoral Program of Higher Education (RFDP) under Grant No. 20094307110008.

References

[1] J. R. Hamilton, Recent Architecture for Modular Data Centers, Proceedings of Biennial Conference onInnovative Data Systems Research (CIDR), (2007) January 7-10, 2007, Asilomar, California, USA.

[2] K. V. Vishwanath, A. Greenberg, and D. A. Reed, Modular data centers: how to design them?, Proceedingsof LSAP, (2009), June 10. Munich, Germany.

[3] A. B. Letaifa, A. Haji, M. Jebalia and S. Tabbane, State of the Art and Research Challenges of new servicesarchitecture technologies: Virtualization, SOA and Cloud Computing. International Journal of Grid andDistributed Computing (IJGDC). 3, 68 (2010).

[4] P. Chakraborty, D. Bhattacharyya, N. Y. Sattarova and S. Bedaj, Green computing: Practice of Efficient andEco-Friendly Computing Resources, International Journal of Grid and Distributed Computing (IJGDC). 2,33 (2009).

[5] C. Guo, G. Lu, et al. BCube: a high performance, server-centric network architecture for modular

datacenters, Proceedings of the ACM SIGCOMM conference on Data communication (SIGCOMM 09),(2009) August 1721, Barcelona, Spain.

[6] A. Greenberg and J. R. Hamilton, VL2: a scalable and flexible data center network, Proceedings of theACM SIGCOMM conference on Data communication (SIGCOMM 09), (2009) August 1721, Barcelona,Spain.

[7] R. N. Mysore, A. Pamboris, et al., PortLand: a scalable fault-tolerant layer 2 data center network fabric,Proceedings of the ACM SIGCOMM conference on Data communication (SIGCOMM 09), (2009) August

1721, Barcelona, Spain.[8] C. Guo, H. Wu, et al., Dcell: a scalable and fault-tolerant network structure for data centers, Proceedings of

the ACM SIGCOMM conference on Data communication (SIGCOMM 098), (2008) August 1722, Seattle,Washington, USA.

[9] H. Abu-Libdeh, P. Costa, et al., Symbiotic routing in future data centers, Proceedings of the ACMSIGCOMM conference on SIGCOMM (SIGCOMM 10). (2010) August 30September 3, New Delhi, India.

[10] H. Sim, J.-C. Oh and H.-O. Lee, Multiple Reduced Hypercube(MRH): A New Interconnection NetworkReducing Both Diameter and Edge of Hypercube, International Journal of Grid and Distributed Computing(IJGDC). 3, 19 (2010).

[11] M. O. Balitanas and T. Kim, Using Incentives for Heterogeneous peer-to-peer Network, InternationalJournal of Advanced Science and Technology (IJAST), 14, 23 (2010).

[12] M. A. Fiol and A. S. Llado, The partial line digraph technique in the design of large interconnectionnetworks, IEEE Trans. Computers, 41, 848 (1992).

[13] D. K. Pradhan and S. M. Reddy, A fault-tolerant communication architecture for distributed systems, IEEE

Trans. Computers, 32: 863, (1982).[14] Praveen G, P. Vijayrajan, Analysis of Performance in the Virtual Machines Environment, International

Journal of Advanced Science and Technology (IJAST), 32, 53 (2011).

[15] H. Wu, G. Lu, D. Li, et al., MDCube: a high performance network structure for modular datacenterinterconnection, Proceedings ofCoNEXT 09,(2009), December 14, Rome, Italy.

[16] M. Al-Fares, S. Radhakrishnan, BarathRaghavan, NelsonHuang and AminVahdat, Hedera: Dynamic Flow

Scheduling for Data Center Networks, Proceedings of the 7th USENIX conference on Networked systemsdesign and implementation (NSDI10), (2010).


14/14


Vol. 6, No. 2, April, 2012

106

[17] C. Raiciu, S Barre, A. Greenhalgh, D. Wischik and M. Handley, Improving datacenter performance androbustness with multipath tcp., Proceedings of the ACM SIGCOMM conference on SIGCOMM(SIGCOMM 11), (2011) August 1519, Toronto, Ontario, Canada.

[18] C. Wilson and H. Ballani, Better never than late: Meeting deadlines in datacenter networks, In: Proceedings

of the ACM SIGCOMM conference on SIGCOMM (SIGCOMM 11), (2011) August 1519, Toronto, Ontario,Canada.

Authors

Feng Huang

He received the B.Sc. degree (with honors) in computer science

from College of Computer, National University of DefenseTechnology (NUDT), Changsha, China, in 2001. He is now a

student for Ph.D. at National Lab for Parallel and Distributed

Processing, NUDT. His research interests include could computing,

datacenter network, grid computing, virtual machine technology

and data-intensive applications.

A Fault-Tolerant Network Architecture for Modular Datacenter

Documents

Transcript of A Fault-Tolerant Network Architecture for Modular Datacenter

Building Fault Tolerant Microservices

Making Services Fault Tolerant

TRICON Fault Tolerant Systems

Safe and Fault Tolerant

FAULT DETECTION AND FAULT TOLERANT APPROACHES … · FAULT DETECTION AND FAULT TOLERANT APPROACHES WITH AIRCRAFT APPLICATION ... Fault Detection and Identification: ... G., "Application

Building Fault Tolerant Micro Services - Jfokus€¦ · Building Fault Tolerant Micro Services Kristoffer Erlandsson kristoffer.erlandsson@avanza.se @kerlandsson. Building Fault Tolerant

Building Fault-Tolerant Applications on AWSsahuja/cloudcourse/StudentPres3.pdf · THIS is what makes AWS unique in building Fault-Tolerant applications! ... Building Fault-Tolerant

Fault Tolerant Ethernet (FTE)

Advanced Design Scheme for Fault Tolerant Distributed ... · Advanced design scheme for fault tolerant distributed networked control systems B ... coordinated and ... for Fault Tolerant

Fault Tolerant Ethernet Installation and Service Guide · Fault Tolerant Ethernet Installation and Service ... Fault Tolerant Ethernet Installation and Service Guide ... Changing/Verify

Semantic and State: Fault Tolerant Application Design for ...asriniva/presentations/siampp04/edgar.pdf · Semantic and State: Fault Tolerant Application Design for a Fault Tolerant

Building Fault-Tolerant Applications on AWSd0.awsstatic.com/whitepapers/aws-building-fault-tolerant-application… · Amazon Web Services – Building Fault-Tolerant Applications

Fault-Tolerant Avionics

Fault Tolerant Computing Basics

Flcs Fault Tolerant Designs

Fault-tolerant Control Systems

AWS fault tolerant architecture

Techniques for fault-tolerant quantum error correctionbreichar/talks/Techniques for fault-tolerant qu... · – Strictly fault-tolerant verification ... IIIXXXX ZZZIIII ZIIZZII IZIZIZI

Introduction to fault Tolerant

Synchrony and Time in Fault-Tolerant Distributed Algorithmspub.ist.ac.at/formats2010/schmid-formats.pdfin Fault-Tolerant Distributed Algorithms ... FORMATS‘10 Tutorial. Target: Fault-tolerant