[IEEE 2008 23rd International Symposium on Computer and Information Sciences (ISCIS) - Istanbul,...

6
ABSTRACT A variety of file replication algorithms are proposed for Data Grids in the literature. Most of these algorithms ignore real-time Grid applications which are emerging in many disciplines of science and engineering today. Thus, it is important to study and improve the real-time performance of these file replication algorithms. Based on this motivation, in this study, performance of four replication algorithms, two from the literature and two new algorithms, are evaluated. For this evaluation, a process oriented and discrete-event driven simulator is developed. A detailed set of simulation studies are conducted using the simulator and the results obtained are presented to elaborate on the real-time performance of these replication algorithms. 1. INTRODUCTION Data Grids, which are composed of geographically distributed storage, computing, and networking resources, are envisaged to run a variety of scientific simulation and experiment applications [1]. Common to such applications is that they involve a large number of data-intensive jobs and require the efficient management and transfer of terabytes and petabytes of information [1], [2]. A major problem identified for the peta-scale data intensive computing is how to schedule jobs and their related data in an effort to minimize jobs’ completion times and bandwidth/storage space consumed due to the file transfers [3]-[6]. In order to alleviate these problems, data are dynamically replicated on multiple storage systems guided by a replication algorithm [3]-[6]. Data Grids are also considered for running applications with real-time requirements [7], [8]. A common characteristic of such applications is that data produced or stored in one component of the system need to be transferred across a network of limited resources to another component (or components) while respecting associated real-time attributes. Achieving such data transfers in a timely manner and servicing as many data transfer requests as possible in a distributed environment is a nontrivial problem known as the real-time data dissemination problem [9]-[10]. In [11], several basic algorithms, namely caching, cascading, and fast spread, were chosen for the evaluation of the impact of the file replication on the real-time data distribution in a multi-tier Grid environment. According to [11], these file replication algorithms are promising in that they lead to delivering most of the requested files on time to their destination machines. However, they also leave some room for performance improvement in terms of meeting more deadlines. The motivation of this study stems from the following observation. The fast spread and caching were originally proposed in [5] for the best-effort multi-tier Data Grid environments in which the file transfers do not have deadlines associated with them. Note that, in a multi-tier computing model, the system resources are hierarchically grouped into a few tiers. According to these two algorithms’ original descriptions, sharing of files among the machines that belong to the same tier is not allowed. For example, suppose that a machine in Tier-2 has requested a file with deadline in a Data Grid with three tiers (Tier-0 is the root tier and Tier-2 is the lowest one). This file will be fetched from either a related Tier- 1 machine or from the Tier-0 machine. In terms of real-time performance, the possibility of meeting this deadline declines while the distance between source and destination machines gets longer. That is, in this example, using the Tier-1 machine as the source will result in greater chance for meeting the deadline as compared to having the Tier-0 machine as the source. On the other hand, a machine in Tier-2 with the requested file is not allowed to be the source under this computing model. However, if the peer-to-peer communication among the machines in the same tier is allowed, it may be possible to choose a Tier-2 machine as the source, which is usually closer than both the Tier-0 and Tier-1 machine. As a result, it may be possible to meet more deadlines. Based on this rationale, the fast spread and caching are modified so that they now allow the peer-to-peer file sharing. The impact of the peer-to-peer file sharing on the performance of these algorithms is evaluated in this study using a process oriented and discrete-event driven simulator. The results have proved that the peer-to-peer file sharing has positive impact on their real-time performances. 2. A MULTI-TIER DATA GRID MODEL Data Grid is modeled as shown in Figure 1. According to this model, Data Grid has three different views, namely network view, computing view and file management view: Network view: The network is used to interconnect machines (Tier-0, Tier-1 and Tier-2 machines) in the Data Grid. As shown in the figure, Internet plays the role of interconnecting a number of other networks (or subnetworks). In the model, Internet is composed of only routers and links. On the other hand, each subnetwork includes not only routers and links, but also machines, each of which is connected to a router/switch. In order to support real-time computing on the Data Grid, the network infrastructure must be capable of assuring end-to- end delay bounds for the data flows. The problem of providing end-to-end delay bounds (or QoS guarantees in general) for Impact of Peer-to-Peer Communication on Real-time Performance of File Replication Algorithms Mustafa Mujdat Atanak and Atakan Doğan Anadolu University Department of Electrical and Electronics Engineering, 26470 Eskisehir Turkey + (90) (222) 335 0580 {mmatanak, atdogan}@anadolu.edu.tr 978-1-4244-2881-6/08/$25.00 ©2008 IEEE

Transcript of [IEEE 2008 23rd International Symposium on Computer and Information Sciences (ISCIS) - Istanbul,...

Page 1: [IEEE 2008 23rd International Symposium on Computer and Information Sciences (ISCIS) - Istanbul, Turkey (2008.10.27-2008.10.29)] 2008 23rd International Symposium on Computer and Information

ABSTRACT

A variety of file replication algorithms are proposed for Data Grids in the literature. Most of these algorithms ignore real-time Grid applications which are emerging in many disciplines of science and engineering today. Thus, it is important to study and improve the real-time performance of these file replication algorithms. Based on this motivation, in this study, performance of four replication algorithms, two from the literature and two new algorithms, are evaluated. For this evaluation, a process oriented and discrete-event driven simulator is developed. A detailed set of simulation studies are conducted using the simulator and the results obtained are presented to elaborate on the real-time performance of these replication algorithms.

1. INTRODUCTION

Data Grids, which are composed of geographically distributed storage, computing, and networking resources, are envisaged to run a variety of scientific simulation and experiment applications [1]. Common to such applications is that they involve a large number of data-intensive jobs and require the efficient management and transfer of terabytes and petabytes of information [1], [2]. A major problem identified for the peta-scale data intensive computing is how to schedule jobs and their related data in an effort to minimize jobs’ completion times and bandwidth/storage space consumed due to the file transfers [3]-[6]. In order to alleviate these problems, data are dynamically replicated on multiple storage systems guided by a replication algorithm [3]-[6].

Data Grids are also considered for running applications with real-time requirements [7], [8]. A common characteristic of such applications is that data produced or stored in one component of the system need to be transferred across a network of limited resources to another component (or components) while respecting associated real-time attributes. Achieving such data transfers in a timely manner and servicing as many data transfer requests as possible in a distributed environment is a nontrivial problem known as the real-time data dissemination problem [9]-[10].

In [11], several basic algorithms, namely caching, cascading, and fast spread, were chosen for the evaluation of the impact of the file replication on the real-time data distribution in a multi-tier Grid environment. According to [11], these file replication algorithms are promising in that they lead to delivering most of the requested files on time to their destination machines. However, they also leave some room for performance improvement in terms of meeting more deadlines.

The motivation of this study stems from the following observation. The fast spread and caching were originally proposed in [5] for the best-effort multi-tier Data Grid environments in which the file transfers do not have deadlines associated with them. Note that, in a multi-tier computing model, the system resources are hierarchically grouped into a few tiers. According to these two algorithms’ original descriptions, sharing of files among the machines that belong to the same tier is not allowed. For example, suppose that a machine in Tier-2 has requested a file with deadline in a Data Grid with three tiers (Tier-0 is the root tier and Tier-2 is the lowest one). This file will be fetched from either a related Tier-1 machine or from the Tier-0 machine. In terms of real-time performance, the possibility of meeting this deadline declines while the distance between source and destination machines gets longer. That is, in this example, using the Tier-1 machine as the source will result in greater chance for meeting the deadline as compared to having the Tier-0 machine as the source. On the other hand, a machine in Tier-2 with the requested file is not allowed to be the source under this computing model. However, if the peer-to-peer communication among the machines in the same tier is allowed, it may be possible to choose a Tier-2 machine as the source, which is usually closer than both the Tier-0 and Tier-1 machine. As a result, it may be possible to meet more deadlines. Based on this rationale, the fast spread and caching are modified so that they now allow the peer-to-peer file sharing. The impact of the peer-to-peer file sharing on the performance of these algorithms is evaluated in this study using a process oriented and discrete-event driven simulator. The results have proved that the peer-to-peer file sharing has positive impact on their real-time performances.

2. A MULTI-TIER DATA GRID MODEL

Data Grid is modeled as shown in Figure 1. According to this model, Data Grid has three different views, namely network view, computing view and file management view:

Network view: The network is used to interconnect machines (Tier-0, Tier-1 and Tier-2 machines) in the Data Grid. As shown in the figure, Internet plays the role of interconnecting a number of other networks (or subnetworks). In the model, Internet is composed of only routers and links. On the other hand, each subnetwork includes not only routers and links, but also machines, each of which is connected to a router/switch.

In order to support real-time computing on the Data Grid, the network infrastructure must be capable of assuring end-to-end delay bounds for the data flows. The problem of providing end-to-end delay bounds (or QoS guarantees in general) for

Impact of Peer-to-Peer Communication on Real-time Performance of File Replication Algorithms

Mustafa Mujdat Atanak and Atakan Doğan Anadolu University

Department of Electrical and Electronics Engineering, 26470 Eskisehir Turkey + (90) (222) 335 0580

{mmatanak, atdogan}@anadolu.edu.tr

978-1-4244-2881-6/08/$25.00 ©2008 IEEE

Page 2: [IEEE 2008 23rd International Symposium on Computer and Information Sciences (ISCIS) - Istanbul, Turkey (2008.10.27-2008.10.29)] 2008 23rd International Symposium on Computer and Information

applications has been the subject of many studies in the Grid community (e.g., [12], [13]) as well as the network community (e.g., [14], [15]). In particular, Integrated Services model [16] has proposed two sorts of service: guaranteed and predictive service. Later in [17], the network element behavior required to deliver a guaranteed delay and bandwidth in the Internet were described. Furthermore, Resource Reservation Protocol (RSVP) [18] complements Integrated Services by enabling resource reservations on the routers along the path. Based on [14]-[18] and the related studies, the network is assumed to support guaranteed end-to-end delay bounds, where the link bandwidths can be reserved for the transmission of files with deadlines, and then released afterwards.

Computing view: The machines M= {M1, M2, ..., Mm} in the Data Grid are assumed to be organized in tiers, which results in a multi-tier computing model inspired by the Large Hadron Collider (LHC) computing model [19], [20]. Each machine Mi is associated with a limited storage capacity Ci.

In Figure 1, Data Grid is modeled to have three tiers. Tier 0 is the unique source where all data files are produced and initially stored. The Tier 1 machines correspond to a few national centers around a country. Each Tier 1 machine (and Tier 0 machine) has a number of related Tier 2 machines each of which models a workgroup at a university or research laboratory.

In the tiered computing model adopted, file transfer requests can only come from the Tier 2 machines. On the other hand, the Tier 1 machines play the role of the intermediate replica servers on which the files can be replicated under a specific file replication algorithm.

The Data Grid is used for running real-time distributed applications which generate a set of requests R= {R1, R2, …, Rr} for data files X, where X= {X1, X2, …, Xq} denotes the set of q unique data files. Each request Rk is associated with one of q files Xk to be transferred from a source machine to a destination machine Mk, a deadline value Dk by which file Xk must be delivered to its destination Mk. Thus, request Rk is summarized by the following tuple: {Xk, Mk, Dk}.

File management view: In order to facilitate the dynamic file replication in the multi-tier Data Grid, following structures are assumed to be in place in the system: Replica Catalog (RC), Replica Manager (RM), Local Replica Manager (LRM), and Local Replica Catalog (LRC).

The RC is a centralized database which stores the mappings from the logical file names to the physical file names in the Grid. The logical file names (X) are unique identifiers for the data files in the Grid, whereas the physical file names represent their physical locations. When the RC is queried by the RM using a logical file name (Xk) the RC should respond to the RM with one or multiple available physical file names corresponding to the logical file name. In addition to responding to the RM’s queries, the RC also receives the logical file name update messages from the LRMs and consolidates them into the database.

The RM is a centralized service which accepts the file transfer requests each with a deadline from the LRMs of the Tier 2 machines. Upon receiving a request Rk, the RM adds this request to a queue of waiting requests and calls for a file transfer scheduler (FTS) algorithm for the queued requests. FTS runs as follows:

1. Sort the queued requests in the increasing order based on their urgencies and delete those requests with zero or negative urgency values from the queue.

2. For each queued request, starting from the most urgent one:

o Query the RC and obtain a set of physical file names related to the logical file name Xk.

o Choose a physical file among the ones returned by the RC based on the following criteria: (1) The number of hops between source and destination machines is the smallest. (2) All links along the path have enough available bandwidth so that the requested file can be made available at its destination before its deadline.

Figure 1. Multi-tier Data Grid computing model.

Page 3: [IEEE 2008 23rd International Symposium on Computer and Information Sciences (ISCIS) - Istanbul, Turkey (2008.10.27-2008.10.29)] 2008 23rd International Symposium on Computer and Information

o If a deadline satisfying path has been found, try to reserve a storage capacity on the related Tier-1 and/or Tier-2 machines just enough to hold the requested file.

o If the storage reservations are successful, make bandwidth reservations on the links along the path.

o Start the file transfer from a source machine to the destination.

Note that the FTS will reconsider all queued requests once the transmission of a request is completed, which leads to releasing the reserved bandwidth on some links for the completed request.

The LRM runs on all machines in all tiers and it is responsible from the following activities. (1) Honoring a file transfer request: If the LRM is deployed on a Tier 2 machine, upon receiving a request, it checks its Local Replica Catalog (LRC) to see if it owns the requested file, where the LRC holds the logical file names on this machine. If the LRM owns it, it provides the file. Otherwise, the Tier 2 LRM sends a request to the RM for the remote file access. If the LRM runs on a Tier 0 or Tier 1 machine, it will receive the requests only from the RM. After receiving a request message from the RM, the LRM starts the transmission of the requested file to the destination machine. (2) Keeping its Local Replica Catalog up-to-date: each LRM must update its LRC whenever a new file is written into or deleted from the local storage. Following the update, the LRM must publish the LRC content to the RC. (3) Executing a file replacement policy: The LRM must implement a file replacement policy to choose one or more files to delete when a new file needs to be copied into the local storage facility without enough available storage space. (4) Making storage space reservation: Once the RM requests a storage space reservation, the LRM reserves the requested capacity if there is enough room on its storage. Otherwise, it runs the file replacement algorithm to delete some files and then reserves the requested capacity. It is also possible that the LRM cannot make the reservation if all available capacity has been already reserved. The LRM informs the RM about the reservation status.

3. FILE REPLICATION ALGORITHMS

Caching [5]: The replica manager forwards all file transfer requests to the LRM of the Tier 0 machine. As a result, the Tier 1 machines do not function as replica servers and their LRMs are not required to publish their LRCs. Once a Tier 2 machine receives the file it has requested, it keeps the file in its local storage for a possible future reference until it gets replaced.

Fast Spread [5]: During a file transfer from Tier-0 machine to Tier-2 machine, the requested file will be copied into the storage of Tier-2 machine as well as that of Tier-1 machine along the path. Thus, the Tier-1 machines can now function as replica servers. As a result, a later request for the same file can be forwarded from a Tier-1 machine instead of Tier-0 machine.

Caching-PP: In the original form of Caching described in [5], it is not allowed to get a file from a Tier-2 machine in the same subnetwork. Caching-PP enables the peer-to-peer communication where a Tier-2 machine can provide the requested file to another Tier-2 machine in the same subnetwork. Note that Tier-1 machines do not still store any file as in the original caching algorithm.

Fast Spread-PP: Fast Spread-PP also enables the peer-to-peer communication between the Tier-2 machines within the same subnetwork. Furthermore, the peer-to-peer communication among the Tier-1 machines is possible as well.

In order to complement these file replication algorithms, a particular file replacement policy needs to be enforced by the LRMs. In this study, Least Recently Used (LRU) is assumed based on its proven performance in [11]. Least Recently Used: LRM keeps a history for each file in its LRC indicating the latest time at which that file is accessed. When the storage space available is not enough to hold a new file, LRM sorts the files in the increasing order of their latest access times into a list and then deletes the files starting from the file at the top of the list until it frees up large enough storage space for the new file.

4. SIMULATIONS

A simulator was developed to investigate the performance of four file replication strategies together with LRU. The simulator was written in C programming language using the CSIM 19 library [21]. The CSIM library allows the development of process-oriented discrete-event simulation programs. A CSIM program models a system as a collection of CSIM processes which interact with each other by using the CSIM structures.

The simulator consists of three parts, namely network, request generator and heuristic.

4.1 Network

With the start of the simulation, a LHC-like tiered computing system is created. As described, there are three tiers. Tier 0, 1, and 2 are assumed to include 1, 10, and 100 machines, respectively, which is a total of 111 machines (85 in [5]). These machines are interconnected by a randomly generated network topology similar to the one in Figure 1. That is, the network is composed of Internet with arbitrary topology and ten subnetworks with arbitrary/star topology. In the network, fifty routers/switches are assumed to be deployed. The number of routers that each subnetwork and Internet will have is randomly determined.

The bandwidth of a link connecting two routers in the subnetwork is randomly taken to be 1 Gbit/s, 2 Gbit/s, or 4 Gbit/s for each direction. On the other hand, the Internet backbone link between two routers is randomly assigned a bandwidth value of 2.5 Gbit/s, 5 Gbit/s, or 10 Gbit/s for each direction. Finally, a bandwidth value uniformly distributed between 6 Gbit/s and 8 Gbit/s is assigned to a link between a Tier-0 machine and a router for both directions; between 4 Gbit/s and 6 Gbit/s for a Tier-1 machine – router link; between 2 Gbit/s and 4 Gbit/s for a Tier-2 machine – router link. Note that the full-duplex communication is allowed for all links. Furthermore, the link bandwidth can be shared between multiple file transfer flows as reserved by the RM.

In the simulated Grid, the storage capacity of each machine is determined based on the relative capacity (R) index, which is the ratio between the total storage capacity of Tier 1 and Tier 2 machines and the total size of all data files. It should be noted that a higher value of R will have an effect of allowing more file replicas in the Grid. Thus, the performance of a particular file replication algorithm depends on the value of R. In the simulations, three different values for the relative capacity, which are 10.5%, 20.5% and 20%, are assumed and the

Page 4: [IEEE 2008 23rd International Symposium on Computer and Information Sciences (ISCIS) - Istanbul, Turkey (2008.10.27-2008.10.29)] 2008 23rd International Symposium on Computer and Information

corresponding storage capacities are determined as shown Table 1.

Case Tier 1 Tier 2 Overall

T1 (TB) R1 (%) T2 (TB) R2

(%) R

(%)

1 10x0.2=2 10 100x0.001= 0.1 0.5 10.5

2 10x0.4=4 20 100x0.001= 0.1 0.5 20.5

3 10x0.2=2 10 100x0.02= 2 10 20

Table 1. Storage capacity configurations for Tier 1 and Tier 2 machines.

In Table 1, the relative capacity of the simulated Grid is computed as follows. The number of data files in a Grid will be in the order of millions. However, it is not feasible to simulate the transfer of such a large number of files on a single computer. Thus, the number of files is scaled down and it is assumed to be 20000 during the simulations, each of which is 1 GByte. Fixed file size is also common in other studies, e.g., [4], [8], [10]. As a result, the total size of all files is 20 TByte. Taking case 2 as an example, the total storage capacity of Tier 1 machines, T1, is 2.5 TB, where Tier 1 has 10 machines each of which has a storage capacity of 0.4 TB; the total storage capacity of Tier 2 machines, T2, is 0.1 TB, where Tier 2 includes 100 machines, each with a storage capacity of 1 GB. Thus, the total replication capacity, T, is T1 + T2= 4.1 TB and the relative capacity R becomes 4.1/20= 20.5%.

4.2 Request Generator

The request generator component is implemented as a CSIM process. During the simulations, the requests are assumed to come in to the Grid according to a Poisson process with a specific arrival rate. Furthermore, they are submitted only from the Tier 2 machines [4], [5], [6]. For each submitted request, three different parameters are associated with it. First, the request is associated with a data file available in the Grid according to a particular file access pattern. However, as of now, no actual file access patterns for the Grid applications are known. Thus, three commonly used file access patterns [3]-[6], namely random, geometric, and Zipf, are implemented in the simulator. Specifically, the geometric distribution with the file popularity parameter of 0.05 and the Zipf distribution with the file popularity parameter of 0.8 are assumed. Second, the request is accompanied with a randomly chosen Tier 2 destination machine. Finally, a deadline value is assigned to the request using the formula: Dk = Ak + normal (µ, σ) where Ak is the arrival time of the request and normal (µ, σ) is a function which returns a normally distributed real number with mean µ and standard deviation σ.

4.3 Heuristics

Four file replication algorithms and one replacement policy are implemented in another CSIM process, called as file_transfer. For each file request submitted, one file_transfer process is created to take care of the transfer of the file.

4.4 Performance Results

Using the simulator developed, a set of simulation studies were conducted. First, a base set of results was established. In the base set, four algorithms with LRU are evaluated in terms of the satisfied requests for the random, geometric, and Zipf file access patterns under the following simulation parameters:

number of data files = 20000, number of requests= 50000, (µ, σ)= (50, 25), and arrival rate of requests= 1 request/second. Later, each of these simulation parameters is individually varied to study the impact of the parameter on the performance of the algorithms. The results of the simulation studies are presented in Tables 2-6, where each data shown is the average of 10 simulation runs. Note that each iteration of the simulation creates a different Grid topology and request set under the given simulation parameters.

Table 2 shows the base set of results. As far as the performance is concerned, the clear winner is the Fast Spread-PP, followed by Fast Spread, Caching-PP, and Caching. Thus, it is evident that the peer-to-peer file sharing has positive impact on the real-time performance of the Data Grid systems. According to Table 2, the distribution of the storage capacity is also very important. When the results for the relative capacity of 20.5% and 20% are compared, the latter one has lead to superior performance, thanks to the added capacity to the Tier-2 machines which enables them to hold more files. In terms of the effect of the file access pattern on the performance, all three algorithms show their best performance for the geometric access pattern, followed by the Zipf and random. Under the geometric and Zipf access patterns, several files are requested more as compared to the others. This makes it more probable to find these popular files on the replica servers, which results in more requests to be satisfied. The worse performance under the Zipf file access pattern can be due to the long-tail of the Zipf distribution. That is, the number of different files requested under the Zipf is more as compared to the geometric distribution and this adversely affects the replication performance.

Table 3 compares the performance of the algorithms when the number of files is increased from 20000 to 40000 while keeping the other simulation parameters unaltered and the relative capacity at 20%. All four algorithms have performed worse for the random access pattern, while they performed the same for the geometric and Zipf access patterns. This is due to the fact that the probability of accessing the same file decreases for the random access pattern, while it stays approximately constant for the latter two.

Table 4 presents the performance of the algorithms when the number of requests is increased from 50000 to 75000 while keeping the other simulation parameters fixed. This simulation study is conducted to see how the algorithms react under a longer run. As shown in the table, all four algorithms have showed a stable performance.

Table 5 shows the performance of the algorithms when (µ, σ)= (100,25) while keeping the other simulation parameters unchanged. Note that increasing µ from 50 to 100 leads to relaxing the request deadlines. Thus, it is expected that all four algorithms perform better when µ = 100, which is really the case in Table 5.

Table 6 shows the performance of the algorithms when the request arrival rate is increased from 1 request/sec to 2 request/sec while keeping the other simulation parameters fixed. Increasing the request arrival rate is expected to lead to more resource contention among the requests. Consequently, the less number of requests will be satisfied. According to Table 6, all three algorithms have been significantly affected and they have experienced decreases in their performances. For the random and Zipf file access patterns, the decrease in the performance is more profound; for the geometric access pattern, it seems to be tolerable.

Page 5: [IEEE 2008 23rd International Symposium on Computer and Information Sciences (ISCIS) - Istanbul, Turkey (2008.10.27-2008.10.29)] 2008 23rd International Symposium on Computer and Information

Base

Random Geometric Zipf 10.5% 20.5% 20% 10.5% 20.5% 20% 10.5% 20.5% 20%

Caching 39395 39395 41916 40096 40096 49392 39460 39460 43274 Fast Spread 39596 39791 42240 43333 43333 49568 42226 42514 47362 Caching-PP 39400 39400 42251 41976 41976 49617 39722 39722 46587 Fast Spread-PP 39666 39916 42468 43346 43346 49623 42285 42562 47620

Table 2. The performance of four replication algorithms under the base simulation parameters.

No of files= 20000 (Base) No of files= 40000 Random Geometric Zipf Random Geometric Zipf

Caching 41916 49392 43274 41899 49392 42939 Fast Spread 42240 49568 47362 42058 49568 46698 Caching-PP 42251 49617 46587 42072 49617 45852 Fast Spread-PP 42468 49623 47620 42169 49623 46921

Table 3. Impact of increasing the number of files in Grid on the replication algorithms.

No of requests= 50000 (Base) No of request= 75000 Random Geometric Zipf Random Geometric Zipf

Caching 41916 49392 43274 62520 74121 64569 Fast Spread 42240 49568 47362 63026 74404 70980 Caching-PP 42251 49617 46587 63001 74450 69428 Fast Spread-PP 42468 49623 47620 63248 74464 71238

Table 4. Impact of increasing the number of requests in Grid on the replication algorithms.

(µ, σ)= (50, 25) (Base) (µ, σ)= (100, 25) Random Geometric Zipf Random Geometric Zipf

Caching 41916 49392 43274 42274 49922 43874 Fast Spread 42240 49568 47362 42676 49994 48909 Caching-PP 42251 49617 46587 42681 49993 48050 Fast Spread-PP 42468 49623 47620 42913 49995 49177

Table 5. Impact of loosening the deadlines on the replication algorithms.

Arrival rate = 1 (Base) Arrival rate = 2 Random Geometric Zipf Random Geometric Zipf

Caching 41916 49392 43274 21733 43176 23706 Fast Spread 42240 49568 47362 22165 49243 31238 Caching-PP 42251 49617 46587 22161 49406 29519 Fast Spread-PP 42468 49623 47620 22310 49455 31159

Table 6. Impact of increasing the arrival rate on the replication algorithms.

Page 6: [IEEE 2008 23rd International Symposium on Computer and Information Sciences (ISCIS) - Istanbul, Turkey (2008.10.27-2008.10.29)] 2008 23rd International Symposium on Computer and Information

5. CONCLUSION

From the results presented in the previous section, it is evident that the peer-to-peer file sharing should be enabled in a multi-tier real-time Data Grid computing environment to boost the system’s real-time performance. Fast Spread-PP and Caching-PP have shown better performance than its counterparts. Furthermore, the file access pattern of the tasks running on the Grid has significant impact on the real-time Grid performance. The algorithms have demonstrated better performance under the geometric file access pattern, followed by the Zipf and random. Finally, the distribution of the storage space capacity available is also very important to better serve the real-time applications. These initial yet detailed results on the impact of the peer-to-peer communication on the replication algorithms in terms of the real-time Grid performance motivate the development of more sophisticated replication algorithms to better use of the Grid resources, which will be topic of the future research.

6. REFERENCES

1. A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke, “The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets,” Journal of Network and Computer Applications, vol. 23, no. 3, pp. 187-200, 2000.

2. B. Allock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman, S. Mader, V. Nefedova, D. Quesnel, S. Tuecke, “Data Management and Transfer in High Performance Computational Grid Environments,” Parallel Computing Journal, 28(5), pp. 749-771, 2002.

3. K. Ranganathan and I. Foster, “Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids,” Journal of Grid Computing, 1(1), pp. 63-74, 2003.

4. K D. G. Camaron, A. P. Millar, C. Nicholson, R. C.-Schiaffino, F. Zini and K. Stockinger, “Analysis of Scheduling and Replica Optimisation Strategies for Data Grids using OptorSim,” Journal of Grid Computing, vol. 2, no. 1, pp. 57-69, 2004.

5. K. Ranganathan and I. Foster, “Identifying Dynamic Replication Strategies for a High-Performance Data Grid,” Lecture Notes In Computer Science, vol. 2242, pp. 75-86, 2001.

6. M. Tang, B.-S. Lee, C.-K. Yeo, and X. Tang, “Dynamic Replication Algorithms for the Multi-tier Data Grid,” Future Generation Computer Systems, 21, pp. 775-790, 2005.

7. K. Keahey, T. Fredian, Q. Peng, D.P. Schissel, M. Thompson, I. Foster, M. Greenwald, and D. McCune, “Computational Grids in Action: The National Fusion Collaboratory,” Future Generation Computer Systems, 18:8, pp. 1005-1015, October 2002.

8. Y. Wang, F. De Carlo, D. Mancini, I. McNulty, B. Tieman,

J. Bresnahan, I. Foster, J. Insley, P. Lane, G. von Laszewski, C. Kesselman, M.-H. Su, M. Thiebaux, “A High-Throughput X-ray Microtomography System at the Advanced Photon Source,” Review of Scientific Instruments, 72(4), pp. 2062-2068, April 2001.

9. M. Eltayeb, A. Doğan, F. Özgüner, “Concurrent Scheduling: Efficient Heuristics for Online Large-Scale Data Transfers in Distributed Real-Time Environments,” IEEE Transactions on Parallel and Distributed Computing, vol. 17, no. 11, pp. 1348-1359, 2006.

10. A. Doğan, “Performance of Real-Time Data Scheduling Heuristics Under Data Replacement Policies and Access Patterns in Data Grids,” Lecture Notes in Computer Science, 4330, pp. 884-893, 2006.

11. A. Doğan, “Performance of File Replication Policies for Real-time File Access in Data Grids,” Submitted to GridNets 2007.

12. I. Foster, C. Kesselman, C. Lee, Rç Lindell, K. Nahrstedt, and A. Roy, “A Distributed Resource Management Architecture That Supports Advance Reservation and Co-allocation,” Int’l Workshop on Quality of Service, pp. 27-36, 1999.

13. R. J. Al-Ali, K. Amin, G. von Laszewski, O. F. Rana, D. W. Walker, M. Hategan, and N. Zaluzec, “Analysis and Provision of QoS for Distributed Grid Applications,” Journal of Grid Computing, vol. 2, no. 2, pp. 163-182, 2004.

14. A. Orda, “Routing with End-to-End QoS Guarantees in Broadband Networks,” IEEE/ACM Transactions on Networking, vol. 7, no. 3, June 1999.

15. R. A. Guerin and A. Orda, “Networks with Advance Reservations: The Routing Perspective,” Infocom, 2000.

16. RFC 1633 Integrated Services in the Internet Architecture: an Overview.

17. RFC 2212 Specification of Guranteed Quality of Service. 18. RFC 2205 Resource Reservation Protocol (RSVP). 19. GridPP Collaboration, “GridPP: Development of the UK

Computing Grid for Particle Physics,” Journal of Physics G: Nuclear and Particle Physics, 32 N1-N20, 2006.

20. P. Avery, “Data Grids: A New Computational Infrastructure for Data Intensive Science,” Philosophical Transactions of the Royal Society A: Mathematical, Physical & Engineering Sciences (360), pp. 1191-1209, 2002.

21.User’s Guide: CSIM19 Simulation Engine (C Version), http://www.mesquite.com.