Formal analysis of an agent-based optimisation strategy ...zini/Publications_files/mags2006.pdf ·...

Formal analysis of an agent-based optimisation

strategy for Data Grids

D.G. Camerona, R. Carvajal-Schiaffinob, C. Nicholsonc,K. Stockingerd, F. Zinie,∗, A.P. Millarc,

L. Serafinie

a CERN, European Organisation for Nuclear Research,1211 Geneva, Switzerland

b Universidad de Santiago de Chile,Av. Bernardo O’Higgins 3363, Santiago de Chile

c University of Glasgow,Glasgow, G12 8QQ, Scotland

d Lawrence Berkeley National Laboratory,Berkeley, CA 94720, USA

e ITC-irst,via Sommarive 18, 38050 Povo (Trento), Italy

∗ Corresponding author,tel: +39 0461 314 314, fax: +39 0461 302 040,

e-mail: [email protected]

Abstract

In a world-wide computational Grid, users typically want their jobsto be executed as fast as possible, while the goal of a Grid infrastructureis to assure specific quality of service for all users. In order to reconcilethese apparently contrasting goals, the authors proposed an economy-based strategy to be used in a Data Grid for efficient access to and dis-tribution of data replicas [20] needed by Grid jobs. The strategy is basedon an economic model where data-seeking agents negotiate optimal pricesfor exchanging data files with data-storing agents. Data-storing agentsalso try to keep on their Grid sites the most useful files.

In previous works, performance of the economic model has been em-pirically studied via simulations conducted using the Data Grid simulatorOptorSim [11, 18]. In this paper the problem of efficient access to and dis-tribution of replicas is introduced and the economic model is described.A formalisation of the auction protocol basis of the economic model isthen presented and formal proof of some properties (namely, that it is

1

deadlock-free and always terminates) is provided. The auction protocolis modelled using Petri nets, a formal and graphical language that is wellsuited for modelling concurrent distributed systems.

Keywords: Data Grids, Petri nets, economy-based optimisation, auction pro-tocol, formal analysis.

1 Introduction

The core idea from which Grid computing originates is to provide geographicallydistributed users with software and hardware infrastructures for the effectivesharing of heterogeneous computing resources, data and services [27]. A typicalexample is the Data Grid [22], by means of which scientists are able to run jobsthat transparently access and analyse large amounts of scientific data distributedaround research centres worldwide.

The road towards a global, general purpose infrastructure has been under-taken in the last few years by several international projects and initiatives (seefor example the EU DataGRID [29], GridPP [13], CrossGrid [8], GriPhyN [3],EGEE [2], Open Science Grid [6], NextGrid [5], CoreGRID [1]). Many of theseprojects are or have been working towards the realisation of specialised com-putational Grids, to be used by Virtual Organisations of scientists as the un-derlying infrastructure for their research and experiments. A feature of suchcollaborative scientific enterprises is that they require access to very large datacollections, large scale computing resources and high performance services. Therealisation of specialised Grids for virtual communities of so-called e-scientistsis the subject of much research which has given significant results in the produc-tion of core Grid middleware services that are available as the basis for furtherapplication development.

A fundamental aspect of an effectively running Grid is the efficient exploita-tion of available resources and services. However, the term “efficiency” assumesdifferent meanings from different perspectives because the Grid is a highly dy-namic environment in which several self-interested actors co-exist. Each of themhas its own goals to achieve, which can be in conflict with other actors’ goals.Grid users typically want their requests to the Grid to be satisfied as quickly aspossible, while the goal of the Grid infrastructure is to provide the desired (orrequired) quality of service for all the users belonging to a virtual organisation.

As in most distributed system, centralised global optimisation is impracti-cal in a Grid due to the large number of different parameters and informationsources. Therefore, in order to study how a good global behaviour can beachieved in a Grid, a multi-agent model seems to be the most appropriate.In such a model a number of optimisation agents, located in each Grid site,autonomously try to optimise the use of local resources by negotiating or in-teracting with agents located in other sites. In this way it is possible to studywhether and how efficiency is an emergent property of the local optimisationdecision making process performed by distributed optimisation agents.

2

In the last few years the research work of the authors focused on strategiesand algorithms for data access and replication in Data Grids modeled as amulti-agent systems. Replication, which is the process of placing copies of dataat different locations on the Grid, has been identified as an important way ofimproving the efficiency of Data Grids [29]. The Grid simulator OptorSim [7, 11]was developed with the aim of experimentation and validation of data accessand replication algorithms. In OptorSim storage and computing resources arewrapped by intelligent agents that interact with the rest of the Grid in order tooptimise the use and location of data files.

One of the proposed approaches for achieving efficient Data Grids was a dataaccess and replication strategy based on the use of an economic model. In thisstrategy data-seeking agents trade with data-storing agents in order to negoti-ate optimal prices for the data files they require [20, 12]. The use of economicprinciples and market models as the basis for grid scheduling and resource al-location is very promising (see for example [25], [9], and [24]). The proposedeconomic model includes two components: an auction-based protocol used bythe agents for selecting data files and a replication strategy which they use tocalculate the values of files. The experimental analysis of the economic model,conducted using OptorSim, has shown that for certain file access patterns thismodel significantly outperforms more traditional replication and cache manage-ment strategies [18].

Simulation is a commonly used approach in distributed computing and it isalso suitable for the analysis of a Grid environment. However, it has the disad-vantages of requiring a large amount of time to study a particular optimisationstrategy and of a lack of formal verification. A complementary approach tosimulation is the use of formal methods in order to verify the correctness of theproposed strategies and analyse their global performance. Formal methods pro-vide a rigorous mathematical framework for specifying, defining and verifyingconcurrent distributed computing systems such as a Grid environment.

The major research contribution of this paper is the use of Petri nets [33] toprovide a formalisation of the auction protocol which is the essential componentof the economic model and prove some of its structural properties, such astermination and absence of deadlocks. In addition, with respect to previouswork, the paper includes an improved presentation of the problem of distributedreplica optimisation and how the economy-based optimisation strategy faces thisproblem.

The paper is organised as follows. Section 2 illustrates the problem of effi-cient replica access and distribution while Section 3 details the economic modelapproach to such a problem. In Section 4 the Petri nets model of the auctionprotocol is presented along with the analysis of some of its properties. Relatedwork is discussed in Section 5 and conclusions are included in Section 6.

3

2 Efficient replica access and distribution

In the adopted abstraction of a Data Grid, computing resources at a site arerepresented by Computing Elements (CEs) and storage is represented by StorageElements (SEs). A job on the Grid will run on one or more Computing Elementsand will typically require access to some data files, which are stored on StorageElements.

Replication of data files is a key factor in improving the overall efficiency ofa Data Grid. However, replication must be performed in a controlled manner tobe effective. Therefore, the assumption is that replica optimisation is performedin a distributed way by a number of Optimisation Agents, which handle both thesearch for and storage of data. One agent is present on each Grid site and eachagent performs local replica optimisation. This local optimisation should besuch that global efficiency is achieved as the emergent behaviour of the system.

As previously stated, the needs of single users do not generally agree withthe requirements of the virtual organisation that manages a Grid infrastructure.On the one hand, every single user would like to have the power of the Gridat their complete disposal whenever they need a job to be executed. On theother hand, the organisation must assure good quality of service for all its users.In terms of replica optimisation, this means every Optimisation Agent tries toachieve two goals G1 and G2, which concern the needs of single users and oftheir virtual organisation respectively.

G1: Minimisation of job execution cost. In general, users want their jobsto be executed with minimum cost. An Optimisation Agent therefore aimsto minimise the execution cost of every job that is executed on its Gridsite.

Cost here is a deliberately vague term. Current Data Grids operate onan egalitarian basis, so the cost would be simply the time a job takes toexecute. In other environments it is likely jobs will incur some financialpenalty for running. With this in mind, a jobs execution cost-functionwould take into account both the time taken to execute and the resultingfinancial penalty. The balance between these often competing require-ments represents the priority prescribed to the job.

G2: Quality of service for virtual organisation. As the focus is on repli-cation of data, assuring good quality of service means that a virtual or-ganisation wants the storage in Grid sites to be profitably exploited. AnOptimisation Agent therefore aims to keep local those files that are mostuseful for jobs being executed on the local or nearby sites.

An Optimisation Agent performs the following tasks in order to achieve its goals.

• For G1, whenever an Optimisation Agent is contacted to provide a filefor a locally running job, it executes a task called Replica Selection. Thistask implies the discovery of the replica of the required file which has theminimum cost. Here, the meaning of “minimum cost” is the replica that

4

minimises data access latency, i.e. the time needed by the job to read thefile content. Data access latency is the sum of storage access time plusnetwork data transfer time. If a file is stored locally, network data transfertime is taken to be zero.

• For G2, the Optimisation Agent behaves as follows. Whenever it is re-quested to deliver a file which is not on the local site, either by a locallyrunning job or another (remote) Optimisation Agent, it executes threetasks:

– Replication Decision (RD). This is the decision about whether or notthe requested file should be replicated locally.

– Replica Selection (RS). If RD gives a positive answer, a remote replicahas to be replicated locally. The criteria adopted for the selection arethe same as those used above for goal G1.

– File Replacement (FR). When a remote replica has been selected, itcould be the case that there is not enough space to accommodate thenew file. One or more local files must therefore be replaced.

3 Economy-based optimisation of data access and

replication

There are several ways in which an Optimisation Agent can execute its tasks.Furthermore, replication decision, replica selection, and file replacement arein general correlated. For example, selection and replacement can take placeonly if a positive replication decision has been made. On the other hand, theoutcome of the replication decision making process could vary depending onwhere remote replicas are located or which files are currently stored in the localsite. In the following an economy-based strategy that uses specific algorithmsfor the three tasks will be discussed. For the sake of clarity, the algorithms willbe presented separately; in particular, the focus will be on the algorithm usedin the economic model to perform replica selection.

In the economic model, computing elements “purchase” the data files re-quired to run their jobs, while storage elements try to invest in files that will be“profitable” in the future - those which they can sell either to CEs or other SEs.CEs try to minimise the file purchase cost, while SEs try to maximise their prof-its. All the interaction is done via intelligent optimisation agents which performthe required reasoning.

The adoption of an economic approach has two main motivations. The firstis that replica optimisation decisions should be made in a distributed manner.It would be difficult to perform such complex optimisations in a centralised way,due to the large domain size and the possibility of a single point-of-failure inter-rupting future optimisation; by restricting optimisation to single grid sites whichinteract using economic mechanisms, the problem can be made manageable and

5

an overall improvement in performance can result from the emergent market-place behaviour. The second motivation is the highly dynamic environment ofa Grid, in which the availability of resources can change without warning. Byusing an economic model, the dynamism of the market can be exploited to makeinformed decisions at job execution time.

3.1 Implementation of Optimisation Agent tasks in the

economic model

The tasks described in the previous section that an Optimisation Agent mustexecute to achieve its goals are implemented in the economic model as follows.

3.1.1 Replication Decision

Replication Decision is the process undertaken by an Optimisation Agent todecide whether or not to replicate a file to its local SE. In the economic model analgorithm that implements conditional replication based on a prediction functionis used. This algorithm bases replication decision making on the prediction ofthe future popularity of files. If the predicted popularity of the requested fileis greater than the minimum predicted popularity of any files currently stored,then the file is replicated. The two prediction functions considered were:

Binomial prediction function. This algorithm values the potential new replicaaccording to a prediction of the future use of local files, based on a bino-mial probability distribution. It assumes a binomial distribution of filerequests and is described in detail in [19].

Zipf prediction function. This algorithm estimates future file popularity basedon the assumption that file requests have a Zipf-like distribution [37].

A history of recent file requests that the Optimisation Agent has experienced,from a time δt in the past, is maintained. This is used by the selected predictionfunction to assess the expected popularity of a given data file.

3.1.2 Replica Selection

Replica selection is the process used by an Optimisation Agent to identify the“best” replica of a file. In the economic model an algorithm that implementsauction-based selection is used. This algorithm involves an auction protocol suchthat file requests are propagated via a peer-to-peer network over the neighbour-hood of the site that originates the auction. All the sites that receive the messageand have a local copy of the file reply with a bid expressing the price for whichthey are willing to sell the file. The site that bids the lowest value wins theauction. Section 3.3 explains this process in detail.

6

3.1.3 File Replacement

File replacement is the process used by an Optimisation Agent in order to decidewhich local file(s) it should delete if there is not enough space for the localreplication of a file. In the economic model the prediction function describedabove is used to calculate which of the currently stored deletable files are leastpopular and hence can be deleted to create space for the new replica.

3.2 Optimisation agents

The Optimisation Agents has three separate components, each of which interactwith different parts of the Grid infrastructure.

Access Mediator (AM). This component processes file requests from jobsrunning on a CE. For each requested file, it starts an auction to identify thecheapest replica of the file (see Section 3.3 for details). The AM gathers bidsfor the file from local and remote Storage Brokers (SB), selects the winner ofthe auction and performs file payments.

P2P Mediator (P2PM). This is responsible for establishing and maintain-ing a peer-to-peer communication infrastructure between Grid sites. It propa-gates auction messages between AMs and SBs.

Storage Broker (SB). This component is responsible for listening for filerequest messages from the local P2P Mediator. If it can meet the request witha file stored in the corresponding SE, it responds immediately to the P2P Me-diator. If the file is not stored in the corresponding SE, it may start a nestedauction in order to obtain a local replica of the file and be able to reply to theparent auction.

AMs, P2Ps, and SBs are all players in the Grid, as well as CEs and SBs. Thelogic of the economy-based optimisation strategy is embedded into these agentsand an optimisation agent can be seen as just a container for these agents.

The connections between these three components can be seen in Figure 1.Since a Grid site might not have any CEs or SEs, some of the above componentscould be absent from the Optimisation Agent for that site. In Figure 1, Site 4contains both Computing and Storage Elements and thus the local OptimisationAgent consists of all these components. Other sites include only the componentsneeded and thus the local Optimisation Agents are simpler.

3.3 Auction protocol

The goal of the auction protocol is to select the cheapest replica of a file neededby a job running on a computing element. In other words, the protocol is used byoptimisation agents to perform the replica selection task. A procurement Vickrey

7

auction [35] is used, where agents are interested in purchasing the lowest-costavailable resource.

Vickrey auctions are second-price sealed-bid auctions. They involve a singlenegotiation round, in which each bidder submits a bid to the auctioneer. Thisreduces the communication overheads involved and thus the time to conductthe auction. For the auction to be valid, bidders must not be able to see eachother’s bids. With procurement Vickrey auctions the winner is the agent thatmade the lowest bid, as this represents the available resource with the lowestcost, but the agent is paid the price of the second-lowest bid.

An advantage of this type of auction over others is that the best strategy forthe bidders is to bid their true valuation. Bidding greater than the true valuationis clearly a risky strategy: if an agent did this, then the likelihood that that agentwould win is reduced. Bidding less than the true value increases the chancesof winning, but if the practice was commonplace then the subsequent paymentwould be reduced. Bidding with the true valuation will therefore maximise boththe chance of winning the auction, and the payment received for the resource.

Vickrey auctions are vulnerable to some potential problems. The decisionrests critically on the fairness of the auctioneer. Since the bids are sealed, it isimpossible for any bidder to assess the honesty of the auctioneer (it is perhaps forthat reason that Vickrey auctions are uncommon). Another potential problemis with collusion. It is possible for a subset of bidders to form a cartel and sendan artificially low bid. If precisely one bidder returns an artificially low bid,then that agent would win the auction without the risk of lost income.

Within controlled environments, such as simulations, these risks will not oc-cur unless explicitly included in the code. For less controlled situations, neitherproblem is likely to be manifest. The auctioneer’s choice of winning agent di-rectly affects performance, it is unlikely that a dishonest auctioneer would beengineered or if it were, that it would go unnoticed. The risk of collusion inbidders is more real, but since the bids represent real resources, it is usuallypossible to validate bids to assess their validity.

The joint behaviour of a computing element and the corresponding optimi-sation agents while executing a Grid job is described in the algorithm on theleft of Figure 2. For each file needed by the job an auction is conducted to opti-mally select a replica of that file. The algorithm on the right of Figure 2 refersto how an Optimisation Agent behaves when, during an auction, a request fora file is received. In the following the actions performed by an access mediator(auctioneer) and storage broker (bidders) during an auction are explained indetails.

3.3.1 Access Mediator - the auctioneer

The access mediator wants to buy access to a data file. The SBs bid the pricethey are willing to sell the file for and the winning SB is paid the second-lowestbidding price by the AM. The price bid by a SB is simply the time required forthe file to be accessed from the AM’s site. When an AM receives a request fora file, it starts an auction, and at the end of the auction returns the winning

8

replica to the CE.The AM first issues a CallForBids for the required file, which is propagated

by the local P2P Mediator to the local SBs and other SBs via the P2P network.The message will reach a subset of Grid sites, the size of which depends on thetopology of the network and the maximum distance for auction propagation (aparameter of the auction). Once it has issued a request for bids, the AM waitsso that potential file sellers can bid for the file. After a certain time, the AMselects the auction winner (i.e. the SB that submitted the lowest bid). If theAM receives no bids whilst waiting, the auction will have no winner and the jobthat needs the file aborts.

The AM waits until the winning replica is ready on the site1. After anyreplication process on the winning site has completed, the AM is notified andthe physical location of the winning replica is returned to the CE.

3.3.2 Storage Broker - the bidder

Once a storage broker receives a CallForBid message, it first checks if its SEstores the required file. If the file is present, it calculates a bid and replies witha BidReply message. The local P2PM gathers BidReply messages from all localSBs and forwards them to the P2PM on the site that started the auction. ThisP2PM will forward the messages to the corresponding AM.

If the file is not stored locally, the SB might start a nested auction. Thenested auction is conducted in exactly the same way as described above but thepurpose is to create, on the corresponding SE, a replica of the requested file. Anested auction is started if the Optimisation Agent decides that having a localreplica of the file is economically beneficial, based on the Replication Decisionprocess described in Section 3.1.1. When a nested auction has completed, theSB calculates a bid for the file and then takes part in the original (or parent)auction.

There is an important difference between first level (parent) and nested auc-tions. The goal of the former is to select the cheapest replica required by somejob executed in a CE i.e., it tries to fulfill G1 described above. The best replicamight be located either on the same site as the CE or on any remote site. Thelatter always aims to replicate a file to the local SE, as this will increase the SB’sexpected future income (G2). In other words, the mechanism underlying theauction protocol performs long term optimisation, allowing automatic replica-tion towards “data hot-spots”. Also, nested auctions allow replication to thirdparty sites: sites where the file was not initially needed. Third party replicationappears to provide a good mechanism for distributing required files amongst aneighbourhood of nearby SEs. It reduces the bottleneck caused by only consid-ering close SEs (those located on the same site as the CE) for replication.

1This because a SB can bid for a file even if the related SE does not store a replica of therequested file. See Section 3.3.2 for details.

9

3.4 Experimental results

The economy-based optimisation strategy has been thoroughly investigated us-ing the Data Grid simulator OptorSim. While the interested reader is directedto [18] for extensive results, here only a summary is presented. Table 1 includesperformance results of the economic model and other algorithms used to im-plement replication decision, replica selection and file replication. For example,the economic model has been compared with strategies that use unconditionalreplication for replication decision, replica catalogue-based replica selection anda modified version of the classical LFU (Least Frequently Used) algorithm forfile replacement.

These results show that for some very different grid configurations, withvarious numbers of jobs and file access patterns, both versions of the economicmodel perform well.

4 Petri net modelling and analysis

Petri nets (PN) are a mathematical formalism which is well suited for modellingsystems whose dynamics are characterised by concurrency, synchronisation, mu-tual exclusion and conflict. These are typical features of distributed environ-ments, such as Data Grids. The mathematical foundations of the formalismallow analysis of both correctness (i.e. logic) and efficiency (i.e. performance).Petri nets are a family of formalisms, ranging from low to high level, each ofthem best suited to a different purpose.

A PN model of a dynamic system consists of two parts: a net structure,that represents the static part of the system, and a marking that represents adistributed overall state. The net structure is represented by a directed bipartitegraph in which there are two kinds of nodes: places and transitions. They arerepresented pictorially as circles and rectangles respectively. Places correspondto local system states and transitions are used to describe events that modifythe state of the system. Edges in the graph define two relations between localstates and events in two ways: an edge from a place to a transition indicates thelocal state in which the event can occur; an edge from a transition to a placeindicates the local state transformations induced by the event.

A marking is represented pictorially by tokens inside the places. The markingof a place is its state value, e.g. a token inside a place means that the system isin the local system state corresponding to that place.

A net system is a PN model with an initial marking, and the system be-haviour is given by the evolution rules for the markings. Evolution rules de-pend on the structure and the status of the net. For example, at time t thesystem can move from local state ls1 to ls2 if there is a transition between thecorresponding places p(ls1) and p(ls2) and there are tokens in p(ls1).

10

4.1 Auction modelling

This section presents how the auction protocol used in the economic modelfor replica selection can be modeled and analysed with Petri nets. The globalbehaviour of the Data Grid (with respect to auction protocol) can be modeledby two nets (shown in Figure 3 and Figure 4), corresponding to the componentswhose pseudo-code is given in Figure 2.

The reader can observe that the instructions that appear in the two al-gorithms in Figure2 are represented as transitions in the Petri nets shown inFigure 3 and Figure 42.

The algorithms in Figure 2 represent the behaviour of single instances ofGrid components. In order to modelling a Grid with several CEs and SEsmultiple tokens can be used. For example, a Grid with K CEs and N SEs canbe modelled using K tokens in the net in Figure 3 and N tokens in the netin Figure 4. Initially, all token are put in the place idle of the two nets andthis represent the initial state of the Grid where all components are inactive.When a computing element start execution a token move to place working. Anauction from the side of a CE and corresponding optimisation agent is modelledby a token moving along the left branch of net in figure 3. From the side on anoptimisation agent receiving a CallForBids an action is modelled by the net inFigure 4.

The two components CE+AO and AO can be modelled separately becausethe communication between them is asynchronous. The transition CallForBidsin Figure 3 represents the sending of asynchronous messages from Grid siteswhere a CE is requesting files to other Grid sites. The transition can be viewedas the firing of the transition BidRequestArrived in Figure 4.

4.2 Structural analysis

A Petri net without any initial marking models the structure of a system. Struc-tural analysis permits to find properties that are valid for all possible systemsthat could be obtained by setting an arbitrary marking. By means of structuralanalysis is therefore possible to prove properties of the auction protocol that arevalid independently of the topological structure of the grid. Petri nets can beused also to evaluate performance of the auction protocol, as shown in [21].

The following structural properties are proven:

Property 1 the auction protocol is deadlock free;

Property 2 the auction protocol terminates.

The intended meaning of termination is that, whenever a file is needed by aCE, the protocol either returns a pointer to a replica of that file or the auctionfails. If Properties 1 and 2 hold, it is assured that every Grid job concludes itsexecution (with possible abortion).

2Automated translation of pseudo-code representation into Petri nets has been the focusof active research [10, 26, 31].

11

Structural analysis is performed using linear algebra techniques [34]. A PNwith m places and n transitions is represented by two m×n incidence matricescalled Pre and Post. If Pre(i, j) = 1 then there is an edge from place i totransition j; if Post(i, j) = 1 then there is an edge from transition j to place i.The matrix C = Post−Pre is called the token flow matrix and represents thepossible flows of tokens in the net. For example, if C(i, j) = 1 and C(k, j) = −1then when a token is in place i, transition j is activated and the result of thetransition is a token in place k.

The positive solutions of CT· x = 0 are called p-semiflows. Intuitively, a

p-semiflow represents the subset of places in a net that are reached during aparticular type of execution of the system.

If a PN is said to be structurally bounded then the maximum number oftokens in any place is finite and therefore the state space of the system modelledby that PN is finite. A sufficient condition for a PN to be structurally boundedis that all places are covered by p-semiflows and that the initial marking is finite.

The positive solutions of C · x = 0 are called t-semiflows. A t-semiflow canbe seen as a path through the transitions of a net corresponding to a particulartype of circular execution of the system. A t-semiflow indicates the existence ofa marking for which a cyclic behaviour of the system is possible. The existenceof t-semiflows that cover all the execution paths is a necessary condition for aPN model to be able to return to its initial state. If this is the case, the net issaid to be live.

In order to perform the structural analysis an interactive menu-driven pro-gram called INA [4] is used. This tool is able to edit, reduce, execute and analysemany different kinds of Petri nets.

As shown in Figure 3, the net modelling the joint behaviour of CEs andcorresponding AOs has 7 places and 9 transitions. Structural analysis appliedto this net indicates one p-semiflow (Table 2, upper part) that covers all places.As there are K tokens in this net (corresponding to the K CEs in the Grid),the conclusion is that the net is bounded. This means that during the auctionprotocol any component CE+OA can only be in a finite number of states. As aconclusion, this net properly model a real Data Grid executing a finite numberof jobs which need to access a finite number of files.

The analysis also shows that all transitions are covered by one of the t-semiflows in Table 2, lower part. Each t-semiflow represents one of the threepossible execution paths of the system. The conclusion is that the net is live.

The model of OAs when are asked for a file during an auction (Figure 4)has 9 places and 12 transitions. The structural analysis shows the presence ofone p-semiflow (Table 3, upper part). Moreover, there are four t-semiflows thatcover all transitions (Table 3, lower part). As the number of SEs in the Grid isfinite, the conclusion is that the net is also bounded and live.

Liveness of the two nets means that each of their transitions is potentiallyenabled in any global state. In other words, any Grid activity modelled bytransitions in Figures 3 and 4 can be executed in any state of the system.Liveness implies deadlock-freeness, so the final result is that the two nets, aswell as the auction protocol, are deadlock-free.

12

From liveness of the two nets termination of the auction protocol could alsobe derived. In fact, liveness implies that the two PNs can always potentiallymove from local states S1 (first net) and S2 (second net), representing thestarting of the protocol on the two sides, to local states S3 and S4, representingits ending.

5 Related work

This section relates to a brief account on economic approaches to Grid comput-ing recently proposed and to some examples of use of Petri nets for the analysisof Grid-like distributed systems and modelling of Grid workflows.

There have been several economic approaches to Grid computing developedin recent years, mainly geared towards minimising the costs of job scheduling.The SPAWN system [36] provides a market mechanism for trading CPU-times ina network of workstations. The POPCORN Project [32] provides an infrastruc-ture for globally distributed computation with market mechanisms for buyingand selling CPU-time based on the same Vickrey auction protocol used in theeconomic model described in this paper. Nimrod-G [16] is an economy-drivenresource broker for scheduling parametric computations in a typical Grid envi-ronment. Resource allocation is based on a deadline- and budget-constrainedscheduling algorithm [15] with the goal of either minimising the runtime of jobsor minimising the cost of using Grid resources. The economic approach pro-posed in this paper is slightly different because it is geared less towards thecost-effective use of computational processing time but more towards optimisa-tion of data location and storage.

Petri nets have not been extensively used so far to model and formally anal-yse Grid-like systems. However, there are a few works, focusing mainly onperformance evaluation of the modeled system. In [23], stochastic Petri nets areused to model a large-scale generic concurrent system. Performance evaluationis performed and it is shown that such evaluation yields performance indicesvery close to those obtained by simulation with much less computational effort.However, Petri nets efficient analysis is based on the symmetry of the modelledsystems. For this reason, the work in [23] is in general not adaptable to themodelling of Data Grids, whose topology is highly dependent on the organisa-tional structure of a virtual community of users. In [14] the authors use Petrinets to model a number of Grid scenarios including variable number of mobileagents which need to access distributed resources. They present experimentaland analytical performance evaluation of agent migration strategies, to be usedfor optimising agents’ task completion times and the quality of service. Com-pared to these works, the use of Petri nets proposed in this paper is concernedwith the problem of formal verification of a specific (auction-based) protocol forfile selection in a Data Grid.

Petri nets have also been used recently to model Grid workflows in the con-text of eScience. Workflows in eScience applications involves connecting compo-nents or software modules to create a scientific application, which may then be

13

scheduled on the resources provided by a computational Grid. The modellingof Grid workflows using Petri nets allows for the verification of their structuralproperties such as termination. For example, the ARION system [30] providesthe basic e-services of search and retrieval of objects in scientific collections.The access to e-services is modelled using a workflow specification languagebased on an XML-based routing language, that is mapped to Petri nets. Petrinets are used in [28] to model the workflow used to build applications that willbe executed on the Grid. In both cases the emphasis in on the verification ofproperties of workflows in which several e-services are used, without consideringtheir optimisation. Complementary, the focus of this paper is on the analysis ofan optimisation strategy for the use of a single service, namely access to datafiles.

6 Conclusions

This paper has presented an economy-based strategy for efficient data selectionand replication in Data Grids. The strategy is based on intelligent optimisationagents interacting via a peer-to-peer mechanism for trading data files neededby jobs running on grid sites. Each agent performs local optimisation for itsown site on the Grid, with the aim of achieving global efficiency as the emer-gent result. The proposed economic model is unique in the fact that, unlikemost other proposed Grid market economies, the agents trade for data ratherthan for the use of Grid resources. Since data is the most important part of aData Grid, however, this focus is justified. The use of a peer-to-peer auctionprotocol enables a scalable de-centralised market place and efficient communica-tion between optimisation agents. The agents can dynamically adapt to marketconditions and do not depend on the presence or absence of other agents.

The economy-based strategy has previously been shown in simulation stud-ies to be effective at optimising the use of Grid resources. In this paper themain focus was on the auction protocol used by the strategy for optimised dataselection. The behaviour of optimisation agents during auctions has been mod-eled and analysed by means of Petri nets and some structural properties of theprotocol have been formally proven.

In particular, the boundedness of the net describing auctioneers was proven.As a conclusion, the net properly models a real Data Grid executing a finitenumber of jobs which need to access a finite number of files. Moreover, the factthat the nets modelling auctioneers and bidders are live implies that the auctionprotocol always terminates and there are no deadlocks, regardless of the Gridtopology or number of sites. The formal correctness of the auction protocol ishence verified.

Petri net modelling has proven to be an effective tool for examining theproperties of the agent-based economic model described in this paper. Boththis and previous work using simulation techniques have shown the economicmodel to be valid and effective for optimising Data Grids. In the future, the workof the authors will aim at both to increase the realism of the economic model

14

(for example by more detailed modelling of Computing and Storage Elements)and to conduct further tests of validity - not only by formal proof, but also byempirical comparison with real Data Grid performance measurements.

References

[1] CoreGRID - the European research network on foundations, software in-frastructures and applications for large scale distributed, grid and peer-to-peer technologies. http://www.coregrid.net/.

[2] EGEE: Enabling Grids for E-sciencE. http://public.eu-egee.org/.

[3] GriPhyN: The grid physics network. http://www.griphyn.org/.

[4] INA: Integrated Net Analyzer. http://www.informatik.hu-berlin.de/

~starke/ina.html.

[5] NextGrid: Architecture for next generation grids. http://www.nextgrid.org/.

[6] Open science grid. http://www.opensciencegrid.org/.

[7] OptorSim - A replica optimiser simulation. http://cern.ch/edg-wp2/

optimization/optorsim.html.

[8] The CrossGrid project. http://www.eu-crossgrid.org/.

[9] D. Abramson, R. Buyya, and J. Giddy. A computational economy for gridcomputing and its implementation in the Nimrod-G resource broker. FutureGeneration Computer Systems 18, 8 (2002), 1061–1074.

[10] G. Balbo and S. Donatelli. Understanding parallel programs behaviourthrough Petri Net models. Journal of Parallel and Distributed Computing15, 3 (1992), 171–187.

[11] W.H. Bell, D.G. Cameron, L. Capozza, A.P. Millar, K. Stockinger, andF. Zini. Simulation of dynamic grid replication strategies in OptorSim.International Journal of High Performance Computing Applications 17, 4(2003), 403–416.

[12] W.H. Bell, D.G. Cameron, R. Carvajal-Schiaffino, A.P. Millar,K. Stockinger, and F. Zini. Evaluation of an economy-based file repli-cation strategy for a data grid. In Proceedings of 3nd IEEE IntranationalSymposium on Cluster Computing and the Grid (CCGrid 2003) (Tokyo,Japan, May 2003), IEEE CS-Press.

[13] D. Britton. GridPP: A project overview. In Proceedings of UK e-ScienceAll Hands Conference (Nottingham, UK, September 2003), pp. 444–451.

15

[14] D. Bruneo, M. Scarpa, A. Zaia, and A. Puliafito. Communicationparadigms for mobile grid users. In Proceedings of 3nd IEEE InternationalSymposium on Cluster Computing and the Grid (CCGRID’03) (May 2003),IEEE-CS Press, pp. 669–677.

[15] R. Buyya, R. Murshed, and D. Abramson. A deadline and budget con-strained cost-time optimization algorithm for scheduling task farming ap-plications on global grids. In International Conference on Parallel andDistributed Processing Techniques and Applications (Las Vegas, NV, USA,June 2002).

[16] R. Buyya, H. Stockinger, J. Giddy, and D. Abramson. Economic modelsfor management of resources in peer-to-peer and grid computing. In SPIE’sInternational Symposium on the Convergence of Information Technologiesand Communications (ITCom 2001) (Denver, CO, USA, August 2001).

[17] D.G. Cameron, R. Carvajal-Schiaffino, A.P. Millar, C. Nicholson,K. Stockinger, and F. Zini. UK grid simulation with OptorSim. In UKe-Science All Hands Meeting (Nottingham, UK, September 2003), pp. 188–191.

[18] D.G. Cameron, R. Carvajal-Schiaffino, A.P. Millar, C. Nicholson,K. Stockinger, and F. Zini. Analysing scheduling and replica optimisa-tion strategies for data grids with OptorSim. Journal of Grid Computing2, 1 (2004), 57–69.

[19] L. Capozza, K. Stockinger, and F. Zini. Preliminary evaluation of revenueprediction functions for economically-effective file replication. Tech. Rep.DataGrid-02-TED-020724, CERN, Geneva, Switzerland, July 2002.

[20] M. Carman, F. Zini, L. Serafini, and K. Stockinger. Towards an economy-based optimisation of file access and replication on a data grid. In Workshopon Agent based Cluster and Grid Computing at International Symposiumon Cluster Computing and the Grid (CCGrid 2002) (Berlin, Germany, May2002), IEEE-CS Press, pp. 340–345.

[21] R. Carvajal-Schiaffino and F. Zini. Analysis of replica selection protocolsfor grid data access services. In Proceedings of 5th IEEE InternationalSymposium on Cluster Computing and the Grid (CCGrid 2005) (Cardiff,UK, May 2005), IEEE CS-Press. Poster session.

[22] A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke. TheData Grid: Towards an architecture for the distributed management andanalysis of large scientific datasets. Journal of Network and ComputerApplications 23, 3 (July 2000), 187–200.

[23] G. Ciardo, L. Cherkasova, V. Kotov, and T. Rokicki. Modeling a scalablehigh-speed interconnect with stochastic Petri Nets. In Sixth InternationalWorkshop on Petri Nets and Performance Models (PNPM’95) (Duham,NC, USA, October 1995), IEEE-CS Press, pp. 83–92.

16

[24] C. Ernemann, V. Hamscher, and R. Yahyapour. Economic Scheduling inGrid Computing: 8th International Workshop. In Job scheduling strate-gies for parallel processing (Edinburgh, Scotland, July 2002), vol. 2537 ofLecture Notes in Computer Science, Springer, pp. 128–152.

[25] T. Eymann, O. Ardaiz, M. Catalano, P. Chacin, I. Chao, F. Freitag,M. Gallegati, G. Giulioni, L. Joita, L. Navarro, D. Neumann, O. Rana,M. Reinicke, R. Carvajal-Schiaffino, B. Schnizler, W. Streitberger, D. Veit,and F. Zini. Catallaxy-based grid markets. Multiagent and Grid Systems,Special Issue on Smart Grid Technologies & Market Models . To appear.

[26] A. Ferscha. A Petri Net approach for performance oriented parallel programdesign. Journal of Parallel and Distributed Computing 15, 3 (1992), 188–206.

[27] I. Foster and C. Kesselman, Eds. The Grid 2: Blueprint for a new Com-puting infrastructure. Elsevier Science Publisher, 2004. 2nd edition.

[28] A. Hoheisel and U. Der. An XML-based framework for loosely coupledapplications on grid environments. In Proceedings of The InternationalConference on Computational Science (ICCS 2003) (2003), vol. 2657 ofLecture Notes in Computer Science, Springer, pp. 245–254.

[29] W. Hoschek, J. Jean-Martinez, A. Samar, H. Stockinger, and K. Stockinger.Data management in an international data grid project. In IEEE/ACMInternational Workshop on Grid Computing (Grid 2000) (Bangalore, India,December 2000), pp. 77–90.

[30] C. Houstis, S. Lalis, M. Pitikakis, G. V. Vasilakis, K. Kritikos, and A. Smar-das. A grid service-based infrastructure for accessing scientific collections:the case of the Arion system. The International Journal of High Perfor-mance Computing Applications 17, 3 (2003), 269–280.

[31] M. Ajmone Marsan, G. Balbo, G. Conte, S. Donatelli, and G. Franceschinis.Modelling with generalized stochastic Petri Nets. John Wiley & Sons, 1995.

[32] N. Nisan, S. London, O. Regev, and N. Camiel. Globally distributed com-putation over the Internet - the POPCORN project. In Proceedings of TheInternational Conference on Distributed Computing Systems (ICDCS’98)(Amsterdam, The Netherlands, May 1998), IEEE CS-Press, pp. 592–601.

[33] C.A. Petri. Kommunication mit automaten. PhD thesis, Bonn, Germany,1962.

[34] M. Silva, E. Teruel, and J. M. Colom. Linear algebraic and linear program-ming techniques for analysis of place/transition net systems. In LectureNotes in Petri Nets I: Basic Models, vol. 1491 of Lecture Notes in Com-puter Science. Springer, 1998, pp. 309–373.

17

[35] W. Vickrey. Counterspeculation, auctions, and competitive sealed tenders.The Journal of Finance 16, 1 (March 1961), 8–37.

[36] C.A. Waldspurger, T. Hogg, B.A. Huberman, J.O. Kephart, and S. Stor-netta. Spawn: A distributed computational economy. IEEE Transactionson Software Engineering 18, 2 (1992), 103–117.

[37] G.K. Zipf. Relative frequency as a determinant of phonetic change, 1929.

18

Strategy CMS, 1000 CMS, 1000 CMS, 10000 GridPP, 1000jobs, sequential jobs, Zipf jobs, sequential jobs, sequential

LFU 4.440 2.022 3.753 2080Economic 4.476 1.933 2.236 1412(binomial)Economic 4.602 2.007 2.690 1826(Zipf)

Table 1: Summary of simulation results from [18] and [17], giving the mean jobtime in seconds for each configuration.The scheduler used was QueueAccessCost,which accounts for both data location and job queues when scheduling jobs tosites.

19

Idle + Working + ReadyForAuction + InWait + Timeout +ReadyForSelection + ReadyForProcessing

StartJob + EndJobStartJob + AnotherFileNeeded + CallForBids + ReceiveBids +AuctionFailedStartJob + AnotherFileNeeded + CallForBids + ReceiveBids +ThereAreBids + SelectReplica + ProcessFile

Table 2: p-semiflow (above) and t-semiflows (below) for PN model in Figure 3.

20

Idle + CallForBidsReceived + FileLookup +ReadyForBidding + ReadyForReplicationDecision + ReadyForAuction +InWait + Timeout + ReadyForSelection

ReceiveCallForBids + ForwardCallForBids + ThereIsLocalReplica +BidReplyReceiveCallForBids + ForwardCallForBids + NoLocalReplica +LocalReplicationNoReceiveCallForBids + ForwardCallForBids + NoLocalReplica +LocalReplicationYes + CallForBids + ReceiveBids + AuctionFailedReceiveCallForBids + ForwardCallForBids + NoLocalReplica +LocalReplicationYes + CallForBids + ReceiveBids + ThereAreBids +SelectReplica&LocalReplication

Table 3: p-semiflow (above) and t-semiflows (below) for PN model in Figure 4.

21

Figure 1: Components of the economy-based Data Grid model.

22

CE + OA

1: while(AnotherFileNeeded())2: CallForBids();3: ReceiveBids();4: if (ThereAreBids())5: SelectReplica();6: else7: AuctionFailed();8: endif9: ProcessFile();

10: endwhile

OA

1: while()2: ReceiveCallForBids();3: ForwardCallForBids();4: if (ThereIsLocalReplica())5: BidReply();6: else7: if (LocalReplicationYes())8: CallForBids();9: ReceiveBids();

10: if (ThereAreBids())11: SelectReplica&LocalReplication();12: BidReply();13: endif14: endif15: endif16: endwhile

Figure 2: Algorithms of the CE+OA (left) and OA (right).

23

StartJob

Working

Idle

EndJob

ThereAreBids

ReadyForProcessing

ReadyForSelection

AuctionFailed

Timeout

InWait

AnotherFileNeeded

CallForBids

ReceiveBids

ReadyForAuction

SelectReplica

ProcessFile

Figure 3: PN model of Computing Elements and Optimisation Agents as auc-tioneers.

24

FileLookup

InWait

Timeout

AuctionFailed

BidReply

ForwardCallForBids

ReadyForBidding

ThereIsLocalReplica NoLocalReplica

ReadyForSelection

ThereAreBids

Idle

ReceiveCallForBids

CallForBidsReceived

LocalReplicationYes LocalReplicationNo

ReadyForReplicationDecision

ReadyForAuction

CallForBids

ReceiveBids

SelectReplica&LocalReplication

Figure 4: PN model of Optimisation Agents as bidders.

25

Authors’ biographies

David Cameron. David Cameron is currently a Fellow at CERN, the Eu-ropean Organisation for Nuclear Research, where he is working on distributeddata management for ATLAS, one of the particle physics experiments underconstruction there. His research interests include management, cataloging andoptimisation of large scale distributed data and the development of Grid tech-nologies. He holds a PhD in particle physics from the University of Glasgow,studying replica management and optimisation as part of the European Data-GRID project and a degree in physics and astronomy, also from the Universityof Glasgow.

Ruben Carvajal-Schiaffino. Ruben Carvajal-Schiaffino is currently an as-sistant professor at the Mathematics and Computer Science Department of theUniversidad de Santiago de Chile. His research interests include Petri nets andGrid Computing. He was researcher at ITC-irst working in the data replicaoptimization group of the European DataGRID project. He holds a PhD inComputer Science from the Universit di Genova, studying distributed analysisof Petri nets.

Caitriana Nicholson. Caitriana Nicholson is a member of the ExperimentalParticle Physics group at the University of Glasgow, Scotland. Her researchinterests include data grid performance optimisation and distributed data man-agement, as applied to particle physics experiments in particular. Caitrianagained a degree in Physics from the University of Glasgow and then studied forher PhD there, on the topic of file management for high-energy physics exper-iments. Her current research is on the development of an event-level metadatainfrastructure for the ATLAS experiment at CERN, the European Organizationfor Nuclear Research.

Kurt Stockinger. Kurt Stockinger is a Computer Scientist with the ScientificData Management Research Group of Berkeley Lab, Berkeley, California, USA.His research interests include database access optimization, multi-dimensionalindexing for large-scale data warehouses and performance optimization of par-allel and distributed systems (Data Grids). Previously, Kurt was leading theOptimization Task of the European DataGRID Project managed by CERN. Hewas also a visiting researcher at the California Institute of Technology where heworked on object-oriented databases for High Energy Physics applications. Kurtstudied computer science and business administration at the University of Vi-enna, Austria, and the Royal Holloway College, University of London, England.He received a PhD in computer science and business administration from theUniversity of Vienna, Austria, under supervision of CERN’s Database Group.

Floriano Zini. Floriano Zini is a researcher at ITC-irst, the Centre for Scien-tific and Technological Research of the Istituto Trentino di Cultura, in Trento,

26

Italy. He holds an advanced degree in Computer Science from the Universityof Torino and a Ph.D. in Computer Science from the University of Genova. Hehas worked and published articles on inductive machine learning using geneticalgorithms, application of logic programming to agent-based software prototyp-ing, and optimisation of resource usage in computational and data grids. Hiscurrent research work focuses on the study of economic models for the optimisedallocation of services and resources in computational grids.

Paul Millar. Having completed his PhD thesis, Paul Millar joined the Exper-imental Particle Physics group within Department of Physics and Astronomyat the University of Glasgow. His research interests include metadata, gridcomputing, distributed monitoring and distributed data management. Whilsta member of the PPE group, Dr Millar conducted research into distributeddata-management within UK HEP Grid collaboration: GridPP. This work wasin collaboration with the European DataGRID project, where he contributedtowards their investigation of distributed data management. Within the EDGproject, he helped develop a Grid simulator (OptorSim) with which the teamconducted research into advanced algorithms for autonomous replica optimisa-tion. Dr Millar continues his research at the University of Glasgow within thesuccessor to the GridPP project. He has participated in and now heads an in-ternational collaboration researching various aspects of metadata within HEPGrid computing.

Luciano Serafini. Luciano Serafini has 15 years of experience in academicresearch and technology transfer in the areas of Knowledge Representation, In-formation Processing and Artificial Intelligence. Research areas include logic fordistributed knowledge (since 1990 development of the logic of context informa-tion integration, automated reasoning, multi agent system, ontological reasoningfor the semantic web). Member of the PC committee of several top level con-ferences and workshops in the research area. Partner responsible in EuropeanProjects (DataGRID, Aposdle). Organisation chair of the 2002 edition of theEuropean Summer School on logic language and information, and supervisor ofseveral Master and PhD students, by the University of Trento, Verona and Mi-lano. Teaching activity includes courses in Database, Information systems, logicand Knowledge Representation, for the Master degree and Doctoral Degree atthe University of Trento.

27

Formal analysis of an agent-based optimisation strategy ...zini/Publications_files/mags2006.pdf ·...

Documents

Transcript of Formal analysis of an agent-based optimisation strategy ...zini/Publications_files/mags2006.pdf ·...