1. Market Models for Federated Clouds.pdf

14
2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2015.2415792, IEEE Transactions on Cloud Computing 1 Market models for federated clouds Ioan Petri, Javier Diaz-Montes, Mengsong Zou, Tom Beach, Omer Rana and Manish Parashar Abstract—Multi-cloud systems have enabled resource and service providers to co-exist in a market where the relationship between clients and services depends on the nature of an application and can be subject to a variety of different Quality of Service (QoS) constraints. Deciding whether a cloud provider should host (or finds it profitable to host) a service in the long-term would be influenced by parameters such as the service price, the QoS guarantees required by customers, the deployment cost (taking into account both cost of resource provisioning and operational expenditure, e.g. energy costs) and the constraints over which these guarantees should be met. In a federated cloud system users can combine specialist capabilities offered by a limited number of providers, at particular cost bands – such as availability of specialist co-processors and software libraries. In addition, federation also enables applications to be scaled on-demand and restricts lock in to the capabilities of a particular provider. We devise a market model to support federated clouds and investigate its efficiency in two real application scenarios:(i) energy optimisation in built environments and (ii) cancer image processing both requiring significant computational resources to execute simulations. We describe and evaluate the establishment of such an application based federation and identify a cost-decision based mechanism to determine when tasks should be outsourced to external sites in the federation. The following contributions are provided: (i) understanding the criteria for accessing sites within a federated cloud dynamically, taking into account factors such as performance, cost, user perceived value, and specific application requirements; (ii) developing and deploying a cost based federated cloud framework for supporting real applications over three federated sites at Cardiff (UK), Rutgers and Indiana (USA), (iii) a performance analysis of the application scenarios to determine how task submission could be supported across these three sites, subject to particular revenue targets. Index Terms—Federated Clouds, Cloud Computing, Cost Models, Market Mechanism, CometCloud 1 I NTRODUCTION R ESEARCH in cloud computing has led to a variety of mechanisms for the acquisition and use of re- sources, enabling ‘elastic’ and on-demand acquisition and use of such resources. The availability of cloud systems also provides application developers with the potential to change the way these applications interact with computational infrastructure (which, tra- ditionally, has been static and must be known a priori). Applications such as simulations are carried out using specialist software (such as EnergyPlus [3] or Octave [4]) which require significant computational resources and data management capability, and can generally be a time consuming process. The users of these applications are also often interested in car- rying out what-if scenarios by altering simulation parameters to determine various patterns within the solution space. Being able to utilize computational resources at external sites provides one option for reducing execution times of such applications, espe- cially if local resources (i) do not support suitable computational, data storage or hosting capability; (ii) I. Petri and O. F. Rana are with School of Computer Science & Informatics, Cardiff University, UK. E-mail: [email protected],[email protected] Tom Beach is with School of Engineering, Cardiff University, UK. E-mail: [email protected] J.Diaz-Montes, M. Zou and M. Parashar are with Cloud and Autonomic Computing Center, Rutgers University, NJ USA. E-mail:[email protected],[email protected] [email protected] are already being used for other applications and therefore likely to be unavailable over a particular simulation period; and/or (iii) are too expensive to acquire or use due to high operational costs. Cloud computing provides a useful alternative to enable users to conduct more complex simulations, which would be otherwise impossible due to limited avail- ability of local resources. Most significantly, the elastic nature of cloud computing enables resources to be acquired dynamically (perhaps after carrying out an initial set of simulations), preventing the need to guess the number of required resources beforehand. Using this approach, a user may be able to carry out a what-if investigation (on a smaller data set or with a restricted parameter range) on local resources, before making use of cloud based resources where the exact number of resources may grow with data volumes. This work focuses on understanding, from the application perspective, what factors need to be considered when integrating resources across multiple sites. In particular, our research question develops around the decision process involved when consider- ing the utilization of remote resources over local ones (especially from a cost perspective) and how remote resources that are part of a federated cloud could be dynamically integrated and used alongside local ones. Federation of cloud systems (and the use of “cloud bursting” techniques) enables connection of local in- frastructure to a global marketplace where partici- pants can transact (buy and sell) capacity on demand. This ability to scale out, on demand, provides one of

Transcript of 1. Market Models for Federated Clouds.pdf

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TCC.2015.2415792, IEEE Transactions on Cloud Computing

1

Market models for federated cloudsIoan Petri, Javier Diaz-Montes, Mengsong Zou, Tom Beach, Omer Rana and Manish Parashar

Abstract—Multi-cloud systems have enabled resource and service providers to co-exist in a market where the relationshipbetween clients and services depends on the nature of an application and can be subject to a variety of different Quality ofService (QoS) constraints. Deciding whether a cloud provider should host (or finds it profitable to host) a service in the long-termwould be influenced by parameters such as the service price, the QoS guarantees required by customers, the deployment cost(taking into account both cost of resource provisioning and operational expenditure, e.g. energy costs) and the constraints overwhich these guarantees should be met. In a federated cloud system users can combine specialist capabilities offered by a limitednumber of providers, at particular cost bands – such as availability of specialist co-processors and software libraries. In addition,federation also enables applications to be scaled on-demand and restricts lock in to the capabilities of a particular provider.We devise a market model to support federated clouds and investigate its efficiency in two real application scenarios:(i) energyoptimisation in built environments and (ii) cancer image processing both requiring significant computational resources to executesimulations. We describe and evaluate the establishment of such an application based federation and identify a cost-decisionbased mechanism to determine when tasks should be outsourced to external sites in the federation. The following contributionsare provided: (i) understanding the criteria for accessing sites within a federated cloud dynamically, taking into account factorssuch as performance, cost, user perceived value, and specific application requirements; (ii) developing and deploying a costbased federated cloud framework for supporting real applications over three federated sites at Cardiff (UK), Rutgers and Indiana(USA), (iii) a performance analysis of the application scenarios to determine how task submission could be supported acrossthese three sites, subject to particular revenue targets.

Index Terms—Federated Clouds, Cloud Computing, Cost Models, Market Mechanism, CometCloud

F

1 INTRODUCTION

R ESEARCH in cloud computing has led to a varietyof mechanisms for the acquisition and use of re-

sources, enabling ‘elastic’ and on-demand acquisitionand use of such resources. The availability of cloudsystems also provides application developers withthe potential to change the way these applicationsinteract with computational infrastructure (which, tra-ditionally, has been static and must be known apriori). Applications such as simulations are carriedout using specialist software (such as EnergyPlus [3]or Octave [4]) which require significant computationalresources and data management capability, and cangenerally be a time consuming process. The usersof these applications are also often interested in car-rying out what-if scenarios by altering simulationparameters to determine various patterns within thesolution space. Being able to utilize computationalresources at external sites provides one option forreducing execution times of such applications, espe-cially if local resources (i) do not support suitablecomputational, data storage or hosting capability; (ii)

I. Petri and O. F. Rana are with School of ComputerScience & Informatics, Cardiff University, UK. E-mail:[email protected],[email protected] Beach is with School of Engineering, Cardiff University, UK. E-mail:[email protected], M. Zou and M. Parashar are with Cloud andAutonomic Computing Center, Rutgers University, NJ USA.E-mail:[email protected],[email protected]@rutgers.edu

are already being used for other applications andtherefore likely to be unavailable over a particularsimulation period; and/or (iii) are too expensive toacquire or use due to high operational costs. Cloudcomputing provides a useful alternative to enableusers to conduct more complex simulations, whichwould be otherwise impossible due to limited avail-ability of local resources. Most significantly, the elasticnature of cloud computing enables resources to beacquired dynamically (perhaps after carrying out aninitial set of simulations), preventing the need toguess the number of required resources beforehand.Using this approach, a user may be able to carryout a what-if investigation (on a smaller data set orwith a restricted parameter range) on local resources,before making use of cloud based resources wherethe exact number of resources may grow with datavolumes. This work focuses on understanding, fromthe application perspective, what factors need to beconsidered when integrating resources across multiplesites. In particular, our research question developsaround the decision process involved when consider-ing the utilization of remote resources over local ones(especially from a cost perspective) and how remoteresources that are part of a federated cloud could bedynamically integrated and used alongside local ones.

Federation of cloud systems (and the use of “cloudbursting” techniques) enables connection of local in-frastructure to a global marketplace where partici-pants can transact (buy and sell) capacity on demand.This ability to scale out, on demand, provides one of

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TCC.2015.2415792, IEEE Transactions on Cloud Computing

2

the unique benefits of cloud computing – althoughbeing able to undertake such scale out across multi-ple providers/vendors remains a challenge. Accessingglobal services instead of increasing costs associatedwith building new infrastructure (which may not befully utilised and may only be needed to supportpeaks in workload over short time frames) can pro-vide significant benefits. More importantly, organi-sations with spare capacity in their data centre canmonetize that capacity by selling it to other providersthrough a marketplace, creating an additional sourceof revenue [17], [18], [32].

In this paper, we present an application-centricfederation between Cardiff (UK), Rutgers (US) andIndiana(US) Universities. Using this environment, wemake two main contributions: (i) how resources acrossmultiple institutions can be federated to create a mar-ketplace; (ii) how specialist capability located acrossmultiple, distributed sites – where matching can takeplace between task requirements and such capability– can be effectively utilised in a particular applicationcontext. We use two specific applications as part ofour framework: (i) EnergyPlus – which is used tocalculate energy flow in a built environment and(ii) Octave used for cancer image processing, andshow how a common framework integrating theseapplications can emerge governed by specific costmodels. We also present a decision function that de-termines where tasks should be computed based on acost analysis. Our federation model uses CometCloud[28], [16] – an autonomic computing engine based onthe Comet [29] decentralized coordination substrate,that supports heterogeneous and dynamic federatedcloud/grid/HPC infrastructures. CometCloud sup-ports the integration of public/private clouds andenables autonomic cloudbursts, i.e., dynamic scale-out to clouds to address extreme requirements suchas heterogeneous and dynamic workloads and spikesin demands. CometCloud has been integrated witha variety of other systems, such as FutureGrid andOpenStack. Current work is focused on integratingCometCloud with IBM SoftLayer.

The rest of this paper is organised as follows:Sections 1 and 2 outline the development and useof federated clouds, providing a key motivation forour work (and analysing several related approaches inthis area). In Section 3 we present several cost modelswhich we consider relevant for our work. Section 4presents the model explaining the methodological de-tails of CometCloud and the aggregated CometCloudfederation. The evaluation of our implemented systemis presented in section 7. We conclude and identify ourcontributions in sections 9 and 8 respectively.

2 RELATED WORK

Several approaches have been reported in literature tosupport cloud federation. Villegas et al. [19] proposed

a composition of cloud providers as an integrated(or federated) cloud environment in a layered servicemodel. Assuncao et al. [20] described an approach forextending a local cluster to a cloud resource usingdifferent scheduling strategies. Along the same lines,Ostermann et al. [21] extended a grid workflow ap-plication development and computing infrastructureto include cloud resources and carried out experi-ments on the Austrian Grid and an academic cloudinstallation of Eucalyptus using a scientific workflowapplication. Similarly, Vazquez et al. [22] proposed anarchitecture for an elastic grid infrastructure using theGridWay meta-scheduler and extended grid resourcesto support Nimbus. Bittencourt et al. [23] proposedan infrastructure to manage the execution of serviceworkflows in a hybrid system composed of bothgrid and cloud computing resources. Analogously,Riteau et al. [24] proposed a computing model whereresources from multiple cloud providers are used tocreate large-scale distributed virtual clusters. Theyused resources from two experimental testbeds, Fu-tureGrid in the United States and Grid’5000 in Franceto achieve this. Goiri et al. [25] explore federationfrom the perspective of a profit-driven policy foroutsourcing (a required capability) by attempting tominimise the use of the external resource (and therebythe price of resources paid to the external provider).Toosi et al. [18] focuses on specifying reliable poli-cies to enable providers to decide which incomingrequest to select and prioritise, thereby demonstratingthat policies can provide a significant impact on theproviders’ performance [26], [27]. Other significantwork in Cloud federation is the use of in-networkanalysis capability offered in the GENI project (whichmakes use of OpenFlow and MiddleBox) to offer arange of services within a network, rather then atnetwork end points. GENICloud 1 enables “slices”across such network elements to be reserved by differ-ent applications/users, with commonly used scriptsavailable for users to deploy on these slices. Suchslices are supported over in-network general-purposeprocessors fronted by OpenFlow switches. A numberof practical “test-bed” federations have been devel-oped using these concepts, such as JGN-X (Japan),SAVI (Canada), Ofelia (Europe). Such infrastructuresprovide a useful enabler for deploying real worldapplications, however, they do not take into accountspecific requirements of an application into account.Other related work in this context is the “OCCI”activity with the Open Grid Forum, where Cloud in-teroperability has been the main focus. Similarly, com-mercial providers, such as IBM, also support Cloud-based enterprise federation using their “GridIron”system – which makes use of Web services to supportfederation. There is also considerable interest in cloudfederation and interoperability from IEEE – as part of

1. http://groups.geni.net/geni/wiki/GENICloud

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TCC.2015.2415792, IEEE Transactions on Cloud Computing

3

“P2302: Standard for Intercloud Interoperability andFederation (SIIF)” effort2.

Several cost models currently exist to encouragegreater use of cloud resources (ranging from auctionsto spot pricing), primarily to enable providers tooptimise on potential revenue they could generatefrom their resource pool. Kondo et al. [7] explore theadoption of cloud computing platforms and servicesby the scientific community focusing on the perfor-mance and monetary cost-benefits for scientific ap-plications. This study compares the performance andmonetary cost-benefits of clouds versus desktop gridapplications, based on computational size and storagerequirements. When comparing the two technologiesit has been found that hosts register at a rate of124 cloud nodes per day and the ratio of volunteernodes needed to achieve the compute power of asmall EC2 instance is about 2.83 active volunteer hoststo 1. Douglas et al. [6] describe how provenanceinformation can be used to determine how much anapplication execution is likely to cost on public clouds(primarily focusing on Amazon EC2 instances). Theyextend their job submission system with cost andresource information based on previous executionsto estimate the cost of executing scientific computingapplications. A key assumption in this approach isthat the backend computational infrastructure doesnot change over time and that execution times/costspreviously recorded are likely to be representativewhen considered in the future. Generally, reservedresource instances are likely to be cheaper than on-demand instances. When making a reservation, acloud consumers pays a providers before consump-tion, making it easier for the provider to plan itsresource provisioning (and potentially energy usage).This however has the disadvantage of uncertaintyof a consumer’s future demand. Cost optimisationalgorithms have been proposed for adjusting thecost/benefit tradeoff between reservation of resourcesand allocation of on-demand resources [8].

Although these solutions enable participants to in-crease capacity and extend an existing market place,these efforts are unclear on when a local requestshould be sent to a remote site (in a federation) andthe types of policies that may be used to governsuch a federation model. From a cost perspective,these studies leave undetermined the efficiency spe-cific cloud cost models have with regards to a realworld application, and are often more abstract in theirdescription – focusing on general job/task character-istics rather than consider specific ones. In this workwe describe various cost models that can be usedto ensure greater performance and cost managementboth for users and providers by considering realisticapplication requirements.

2. http://standards.ieee.org/develop/project/2302.html

3 COST MODELS

Several studies investigate the problem of costs inclouds infrastructures [10], [11], [13] trying to proposea cost minimisation algorithm for making serviceplacement decision in clouds. Alongside these costmodels, authors have proposed pricing schemes formaximising revenues [12]. In the filed of dynamic costmodels for cloud computing, Xu et al. [9] propose aframework for improving profits and reducing cost.The proposed framework enables the provider tolease its computing resources in the form of virtualmachines to users, and charge the users for the pe-riod these resources are used. The proposed scenarioconsiders a cloud provider with fixed capacity usinga spot price according to the market demand. Arevenue maximisation solution is presented as a finite-horizon stochastic dynamic program, with stochasticdemand arrivals/ departures. The result shows thatthe optimal pricing policy exhibits time and utilisationmonotonicity, and the optimal revenue has a concavestructure. The evaluation of economic efficiency ofa cloud system with indicators for cost optimisa-tion have been addressed by developing a web toolwhere users can test various costing options. Variousanalysis are used in the internal cloud environmentto compare capability with cost. The evaluation isundertaken with a suite of metrics and formulas con-sidering the characters of virtualization and elasticity.Utilisation cost leverages the metrics items defined inthe cloud and takes workload, i.e. number of VMs cur-rently running, as input. Furthermore, dependenciesamong different cost metrics are identified and usedin the calculation of utilisation cost. These modelsare embedded into a tool which can calculate costwithout knowing the exact data of every cost itemand provides a flexible way to analyse the effectof different metrics on the cost [14]. Understandingcloud computing cost models can help users to makeinformed decisions about which provider to use. Asidentified previously, cloud providers have variouscost models based on an expected demand/workloadprofile or determined business objectives. Such cloudcost models are listed below:Consumption-based cost model – where clients only payfor the resources they use. This is particularly relevantin IaaS offerings such as Amazon. The advantage ison both sides, as providers can balance their costs andclients only pay for resources they request such asdisk space, CPU time and network traffic.Subscription-cost pricing model – where clients pay asubscription charge for using a service for a period oftime – typically on a monthly basis. This subscriptioncost typically provides unlimited usage (subject tosome “fair use” constraints) during the subscriptionperiod. This model is dominant in SaaS offerings suchas IBM SmartCloud for Social Business [33].Advertising-based cost model – where a client gets a

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TCC.2015.2415792, IEEE Transactions on Cloud Computing

4

no charge or heavily-discounted service whereas theproviders receive most of their revenue from adver-tisers. This model is quite common in cloud mediaservices such as free TV provider net2TV.Market-based cost model – where a client is charged ona per-unit-time basis. The market price varies overtime based on supply and demand and the client canacquire the service at the current price for immediateuse. Within this cost model services can also be pur-chased based on a bidding process at a reduced price.Amazon EC2 Spot Instances is an example of a marketbased model. A number or variants to this model havebeen implemented using multiple concurrent auctionmarkets – and subject to significant research in cloudbrokerage.Group buying cost model – where clients can acquirereduced cost services only if there are enough clientsinterested in a deal. Profit can be achieved by bothsides as long as there is a sufficient level of de-mand. An example of the group buying cost model isGroupon [15]. A deal is valid over a certain period oftime, where purchase orders are accepted from users.Computation related to purchase orders is undertakenonly after the deal is over, hence the estimated timeof completion (ETC) is calculated assuming that theearliest start time is the end of the deal. Moreover, thedeal is constrained by the number of clients interestedin the service/product and potential impact on thereputation of the provider. The following cases canbe identified as part of the group buying cost model:

• Sufficient demand – where the deal is accepteddue to a sufficient number of clients acceptingthe offer. Subsequently, there is a computation ofa start time in order to grant an ETC to the users.

• Insufficient demand – where there are not enoughclients interested in the deal. This leads to thefollowing cases: (i) the provider can continue of-fering the service identified in the deal, acceptingsubsequent costs due to limited demand fromclients, but gaining on potential reputation bene-fits or (ii) the provider cancels the deal, minimis-ing costs but losing reputation as purchase ordersare cancelled and no computation is performed.

As cloud computing evolves, we envision that othercost models will emerge. These future models areexpected to be market-oriented and extend (pure)auction oriented systems that attract consumers toacquire services and provide enough price flexibility.Other possibilities range from comparison sites, ag-gregation services and more complex group buyingmodels.

4 APPLICATION SCENARIOS

In this section we describe two application scenar-ios, demonstrating how CometCloud federation cangreatly facilitate the scalability of these applications.

Fig. 1: EnergyPlus sensor application

Section 4.1 describes an application focusing on en-ergy simulations within built environments and sec-tion 4.2 describes a cancer imaging application. Thefirst application is centered on the use of data fromsensors embedded within a building, while the sec-ond focusses on the processing of image-based datarequiring high end computing resources.

4.1 EnergyPlus for real-time energy optimisationEnergyPlus has been demonstrated to provide anefficacious tool for running energy simulations [1], [2].In EnergyPlus, inputs and outputs are specified bymeans of ASCII (text) files. The following types ofinput are used: (i) Input Data Dictionary (IDD) thatdescribes the types (classes) of input objects andthe data associated with each object; (ii) Input DataFile (IDF) that contains all the data for a particularsimulation; and (iii) Weather Data File (EPW) thatcontains all the data for exterior climate of a building.From a computational perspective, depending on thesize of a building (which influences the the numberand range of parameters being considered) Energy-Plus requires significant computational resources toexecute. For relatively smaller building models (with asmall number of surfaces, zones, and systems), whichdo not require large amount of computer memory,processor speed is generally more significant thanI/O. For large models, main memory and internalcache have a greater influence on reducing run time.If an energy model run will produce lots of hourlyor time step data, I/O access speed and latency alsobecome important in reducing run time.

When undertaking simulation based optimisationwith EnergyPlus in cloud based infrastructures akey aspect for both building (facility) managers and(cloud) infrastructure providers is to minimise timeand costs. Hence, facility managers require simula-tion results within a given time frame; and a cloudprovider/data centre manager is interested in reduc-ing costs associated with executing the simulation andpossible penalties due to an inability to comply withdeadlines. As a number of EnergyPlus instances needto be concurrently executed, there are two parametersthat are particularly important to identify: (i) complex-ity of the building model; (ii) the period to simulate. In

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TCC.2015.2415792, IEEE Transactions on Cloud Computing

5

Fig. 2: Octave cancer image processing application

addition, the cloud system needs to support: (i) com-pletion time deadline: assuming that sensors generatereadings every 15 minutes (for the particular buildingwe consider in this case study), a new configurationof the building is constructed at this interval. Asillustrated in Figure 1, the optimisation process needsto use the updated building configuration based onthe latest readings as input and return optimisedset points within this interval; (ii) results quality: anoptimisation process, as identified in this study, iscomposed of a number of EnergyPlus simulations.Depending on the complexity of the building and theperiod to simulate, a time interval is associated witheach optimisation process.

4.2 Octave for Cancer Image Processing

Medical imaging identifies a technique or processused to create images of the human body (or partsand associated function) for clinical purposes (medicalprocedures seeking to reveal, diagnose, or examinedisease). Measurement and recording techniques aredesigned to produce images with corresponding datasusceptible to be represented as a parameter graph ora map which contain information about the measure-ment location. All these types of data need furtheranalysis and patterns need to be observed and diag-nosis applied.

In cancer image processing an image is split inparts which are computed separately on distributedmachines (see Figure 2). The results are then mergedand patterns are observed and analysed. The imageanalysis system works with a specific image patchobtained based on previous medical evaluations. Thisimage patch is then segmented into several closedbins with equal intervals with high weights in centerand decreasing weights outwards. The segmentationof the patch has to be equally divided into segments,from which a historical annual histogram (HAH) iscalculated and concatenated together. This reconstruc-tion process of the various segments is undertakenby running multiple Octave simulation instances. Thisreconstruction has to consider a set of requirements:(1) has to be scale and rotation invariant; (2) has tocapture spatial configuration of image features; and

(3) has to be suitable for hierarchical searching in thesub-image retrieval [5]. As a result, the system ap-pears to have reached the practical level of a diagnosisassistance system of the second-opinion type in whichthe physician and the computer collaborate. Such acloud system can contribute to the efficiency of cancerscreening process as part of the diagnosis assistanceby providing an automatic screening system in whichthe computer checks the image before it is diagnosedby the physician, so that the number of images to bechecked by the physician is significantly reduced. Thiscan be extremely efficacious, as in the near future,with the use of computers in mass screening, thenumber of images taken per patient will be drasticallyincreased. In order to achieve the required level ofassistance in cancer screening it is necessary to ensurethe accuracy of tumor region recognition by testingand adopting new complex algorithms.

5 CLOUD FEDERATION MARKETPLACE

We develop a CometCloud-based federation whereresource exchange is undertaken in a marketplace. Us-ing CometCloud we develop a common “coordinationspace” that is logically viewable across multiple sitesin the federation (but physically distributed acrosssites). This coordination space is used as the basisto provide a variety of different market mechanisms,supporting users to submit requests and providers torespond with offers. This section describes how thefederated market framework has been developed.

5.1 CometCloud Federation Setup

CometCloud is an autonomic computing enginebased on the Comet [29] decentralized coordina-tion substrate, and supports highly heterogeneousand dynamic cloud/grid/High Performance Comput-ing infrastructures, enabling the integration of pub-lic/private clouds and autonomic cloudbursts, i.e., dy-namic scale-out to clouds to address extreme require-ments such as heterogeneous and dynamic workloadsand spikes in demands. Conceptually, CometCloud iscomposed of a programming layer, service layer andinfrastructure layer. The infrastructure layer uses adynamic self-organizing overlay to interconnect dis-tributed resources of various kind and offer themas a single pool of resources. The service layer pro-vides a range of services to support autonomics atthe programming and application level. This layersupports a Linda-like [30] tuple space coordinationmodel and provides a virtual shared-space abstractionas well as associative access primitives. Dynamicallyconstructed transient spaces are also supported toallow applications to explicitly exploit context localityto improve system performance. Asynchronous (pub-lish/subscribe) messaging and event services are also

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TCC.2015.2415792, IEEE Transactions on Cloud Computing

6

provided by this layer. The programming layer pro-vides the basis for application development and man-agement. It supports a range of paradigms includingthe master/worker/Bag-of-Tasks. Masters generatetasks and workers consume them. Masters and work-ers can communicate via the virtual shared space orusing a direct connection. Scheduling and monitoringof tasks are supported by the application framework.The task consistency service handles lost/failed tasks.

Each site in the federation has a number of availableworkers and a master that receives requests: (i) locally– identifying tasks received from users at the samesite; (ii) remotely – requests from users at anothersite. A user submits a request (EnergyPlus or Octavewith configuration parameters) to the master nodeat one federation site. The master determines thenumber of rounds of simulation to run, where eachround of simulation identifies a unique combinationof parameter ranges. Each simulation is deployed onone CometCloud worker. The master must decidehow many tasks to run based on a cost based de-cision criteria. We assume that there is one workerper compute/data access node. All workers are thesame, hence the master needs to decide the num-ber of workers allocated to local vs. external/remoterequests. When one site has a high workload andit is unable to process requests from local userswithin their deadlines, it negotiate for the outsourcingof requests to other remote sites. This could rangefrom two cloud systems sharing workload to a cloudoutsourcing some of its workload to multiple othercloud systems. Conversely this ability allows systemswith a lower workload to utilise spare capacity byaccepting outsourced tasks from other cloud systems.Practically, this process of task exchange is undertakenby the master nodes of the two clouds negotiatinghow many tasks to be exchanged. Once this has beencompleted the master node on the receiving cloudinforms its workers (using CometSpace) about thenumber of tasks it is taking from a remote site, and theconnection details of the request handler from wherethe task is to be fetched. Subsequently, when a workercomes to execute a task from an external cloud system,it then connects to the request handler of the remotecloud to collect the task and any associated data.

5.2 Aggregated CometCloud Federation

In the CometCloud federation model, each site com-municates with others to identify itself, negotiate theterms of interaction, discover available resources andadvertise its own resources and capabilities. In thisway, a federated management space is created atruntime and sites can join and leave at will. This feder-ation model does not have any centralized componentand users can access the federation from any site(making it more fault tolerant) – see Figure 3. Anotherkey benefit of this model is that since each site can

differentiate itself based on specialist capability, it ispossible to schedule requests to take advantage ofsuch capabilities.

Fig. 3: The overall Federation Management Space,here (M) denotes a master, (W) is a worker, (IW)an isolated worker, (P) a proxy, and (R) is a requesthandler.

The federation model is based on the Comet [29]coordination “spaces” (an abstraction, based on theavailability of a distributed shared memory that allusers and providers can access and observe, enablinginformation sharing by publishing requests/offersto/for information to this shared memory). In par-ticular, we have decided to use two kinds of spacesin the federation. First, we have a single federatedmanagement space used to create the actual feder-ation and orchestrate the different resources. Thisspace is used to exchange any operational messagesfor discovering resources, announcing changes at asite, routing users’ request to the appropriate site(s),or initiating negotiations to create ad-hoc executionspaces. On the other hand, we can have multipleshared execution spaces that are created on-demandto satisfy computing needs of the users. Executionspaces can be created in the context of a single siteto provision local resources or to support a cloudburstto public clouds or external high performance com-puting systems. Moreover, they can be used to createa private sub-federation across several sites. This casecan be useful when several sites have some commoninterest and they decide to jointly target certain typesof tasks as a specialized community.

As shown in Figure 3, each shared execution spaceis controlled by an agent that initially creates suchspace and subsequently coordinates access to re-sources for the execution of a particular set of tasks.Agents can act as a master node within the space tomanage task execution, or delegate this role to a ded-icated master (M) when some specific functionality isrequired. Moreover, agents deploy workers to actuallycompute the tasks. These workers can be in a trustednetwork and be part of the shared execution space, orthey can be part of external resources such as a publiccloud and therefore in a non-trusted network. The firsttype of workers are called secure workers (W) andcan pull tasks directly from the space. Meanwhile, the

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TCC.2015.2415792, IEEE Transactions on Cloud Computing

7

second type of workers are called isolated workers(IW) and cannot interact directly with the sharedspace. Instead, they have to interact through a proxy(P) and a request handler (R) to be able to pull tasksfrom the space.

5.3 Decision PoliciesA key policy is for each site to attempt to increaserevenue by hosting/executing remote tasks withoutexcessively compromising execution time (starvation)of local tasks running on the system. For developingour policies we use various tuning parameters suchas: c representing the cost per second per node and tthe average execution time per task.General Policy: When a new job is submitted intoComet space, the following action is taken dependingon the type of task. For each task, we calculate thetime to complete locally – i.e. ttc = ttc(s) + ttc(e),where ttc(s) is the time until the job can start andttc(e) is the time needed for job to execute.Policy for Local Tasks: When evaluating a local task thepolicy is to always accept local tasks, thus local taskshave a non-rejection policy, according to which taskssubmitted from local users will never be rejected butqueued.Policy for Remote Tasks: When evaluating a remotetask the policy is to accept as many remote tasksas long as ttc < t by ensuring that the averageexecution time per task (t) is always higher thenthe time to complete locally (ttc), with an associatedprice of p = ttc(e) ∗ c. Remote tasks have a rejectionpolicy attached, according to which remote tasks canbe rejected when the site cannot meet the deadlineand when accepting remote tasks can affect the pro-cessing of local tasks. When outsourcing tasks, a siterequests price quotations from all other connectedsites and sends as many tasks as possible to thecheapest of these. This policy will be repeated untilall tasks have been outsourced. Depending on thenumber of sites involved, this process could generatesignificant price information traffic between sites. Inour current prototype we consider two sites involvedin the federation. To aid scalability, we also foreseethe availability of specialist broker services whichmanage pricing information from multiple sites andcan update this information periodically by pollingfor this information from site manager agents/nodes.Market Policy: In this policy both local and remotetasks go to a common market for offers from everysite interested in executing them. As in the previouscases, tasks are discriminated based on their origin todecide the offered price as well as the resources. Sitesonly place offers if ttc < t.

6 IMPLEMENTATION OVERVIEWIn our CometCloud federated system, a site must havevalid credentials (authorized SSH keys), and config-ured parameters such as IP address, ports, number of

workers. To integrate with an external site, Comet-Cloud provides gateways enabling requests to beforwarded to these infrastructures. As part of our fed-eration framework we use three sites: the Cardiff siteconsists of 12 machines, each with 12 CPU cores. Eachphysical machine uses a KVM (Kernel-based VirtualMachine (VM)) hypervisor. Each VM runs UbuntuLinux utilising one 3.2GHz core with 1GB of RAM and10 GB storage. In this deployment, workers, masterand request handler are running on a separate VM:(i)workers are in charge of computing the actual tasksreceived from the request handler, (ii) the requesthandler is responsible for selecting and forwardingtasks to external workers at other federation sites and(iii) the master generates tasks based on user requests,submits tasks into CometSpace and collects results.The networking infrastructure is 1Gbps, with an av-erage latency of 0.706ms. At this site, one EnergyPlussimulation takes on average 6 mins to execute (5min 57.935 sec). The Rutgers site is deployed on acluster with 32 dedicated machines. Each node has8 cores, 6 GB memory, 146 GB storage and a 1Gbnetwork connection, with average latency of 0.227ms.Indiana site: makes use of OpenStack, as part of theFutureGrid project. We use medium instances, eachwith 2 cores and 4GB of memory, and average latencyof the cloud virtual network being 0.706 ms. Eachfederation site has a site manager that controls onepublisher and several subscriber processes, where thepublisher puts tuples into the federated space and thesubscriber retrieves the desired ones. Tuples can beused to exchange operational messages or to describetasks that need to be computed. The site manager,including the publisher and subscriber processes arerunning on a single machine with network access tothe rest of the sites. The master and workers are allrunning in independent machines.

7 EVALUATION

In this section we evaluate the status of the federatedcloud when two different cost models are used. Weseek to identify the advantages of each cost modeland its appropriateness for application deployment.We consider that application deployment on cloudinfrastructures have particularities especially whenthey illustrate realistic scenarios. With such a vari-ability in execution, cost models need to be adaptedin order to ensure the requirements underpinningthe application (i.e. quality of services, deadline, etc)and maximizing the benefits for clients and providers.Hence, the focus of the experiments is to evaluate amulti-cloud federated system governed by differentcost market models in the context of real applications.As previously mentioned, experimental data is takenfrom two applications: (i)building energy optimisationand (ii) cancer image processing, based on which wehave determined the frequency of tasks, incoming

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TCC.2015.2415792, IEEE Transactions on Cloud Computing

8

load and job complexity. We evaluate two differentcost model configurations: (a) consumption-based costmodel and (b) “Groupon” cost model based on whichsites have the option of outsourcing to remote sites.

7.1 Energy Simulation Application ScenarioA user request represents a job defined as[input, obj, deadline], where input identifies theinput data represented as [IDF,W, [param]], IDFrepresents the building model to be simulated,W represents the weather file required for thesimulation, [param] defines a set of parameterranges associated with the IDF file that needto be optimised [param] = [ri → (xm, xn)]. Ajob obj encodes properties of the optimisationprocess objective : [outV arName,min/max], definingthe name of the output variable to be optimisedoutV arName and the target of the optimisationprocess min/max, min:minimising the outV arNameor max:maximising the outV arName.Deadline is aparameters defining the time interval associated withthe job submitted.

7.1.1 Consumption-based cost modelWe consider a cost based function f(X) : C → Rwhere C is a set of constraints(cost,deadline) and Ris a set of decisions based on the existing constraintsC. Each Master decides how to compute the receivedjob based on a locally computed decision function.Based on a cost analysis the Master can decide: (i)where to compute the tasks: (a) locally or (b) remotely;(ii) how many iterations need to be run giving thedeadline received from the user. All the costs havebeen calculated in pounds (£) derived from AmazonEC2 cost.

Cost(total) = Cost(T ) + Cost(S) + Cost(E) (1)

Cost with data transfer:Cost(T) – identifies the cost oftransferring the data from one site to another. Thiscost is composed by the cost with transferring theinput from one federation site to another and thecost with transferring the results between sites. S(IDF)represents the size of the IDF file(MB), S(W) repre-sents the size of the weather file(MB), N representsthe number of tuples exchanges between sites, S(T)represents the size of a tuple(MB) and C(MBps) is thecost per megabyte of storage per second (see Table 1).

Cost(T ) = [(S(IDF ) + S(W ) + (N ∗ S(T ))) ∗ C(MBps)] (2)

Cost with storage:Cost(S) – identifies a type of costassociated with storing the data composed by the in-put submitted by the user, input transferred betweensites, results transferred between sites(see Table 1).Cost(S) = [(T∗(W∗(S(DF )+S(W )+S(IDF ))))∗C(MBps)] (3)

Cost with execution: Cost(E) – is the cost deter-mined by the actual deployment of the EnergyPlusinstances. This cost is related to the complexity of theobjective submitted by the user (and influences thenumber of EnergyPlus instances that need to be run).

C(E) = [C(CPUs) ∗ S] (4)

In the calculation of the total cost C(T ) we have usedthe following parameters:

TABLE 1: Total cost parameters

Parameter Description

S(IDF) the size of the IDF mode (MB)S(W) the size of the weather file (MB)S(T) the size of the tuple (MB)

S(DF) the size of the resulted data file (MB)C(MBt) Cost Per MegaByte of transfer (£)

C(MBps) Cost Per Megabyte of storage per second (£)C(CPUs) Cost Per CPU Per Seconds(£)

N Number of tuples exchangedS Number of Seconds of CPU Time usedT Total length seconds until final answer foundW Number of workers used on remote system

7.1.2 Experiments

In our experiments we consider the following param-eters: (i) CPU time of remote site to determine timespent by each worker to compute tasks; (ii) storagetime on the remote site as the amount of time neededto store data remotely; and (iii) the amount of databeing transferred across sites.

TABLE 2: Input Parameters: Experiment 1

P1 P2 P3 P4 Deadline

{16,18,20,22,24} {0,1} {0,1} {0,1} 1 Hour

TABLE 3: Results: Experiment 1

Single Cloud Federated Cloud

Nodes 3 6Cost £ 0 £ 7.46Tasks 38 38Deadline 1 hour 1 hourTuples exchanged - 15CPU on remote site - 5626.45 SecStorage on remote site - 1877.10 SecCompleted tasks 34/38 38/38 in 55min 40s

Experiment 1: Job completed: With the parameter rangesidentified in Table 2, the federation site has twooptions: (i) run tasks on the local infrastructure (singlecloud case) or (ii) outsource some tasks to a remotesite (federation cloud case). In Table 3 we observethat in the case of single cloud there is no costassociated with processing as all tasks run locally.However because the tasks generated based on thejob parameter ranges have a corresponding deadlineof 1 hour, only 34 out of 38 can be completed. Asthese tasks represent EnergyPlus simulations, runninga subset of simulations can impact the quality ofresults. In the second part of this experiment sometasks are outsourced. From Table 3 we observe thatthe total number of nodes used to compute the taskswithin a federation context is 6 workers which hasa direct impact on the total time consumed. In thefederation context all tasks successfully complete in

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TCC.2015.2415792, IEEE Transactions on Cloud Computing

9

55 minutes, by outsourcing 15 tasks to the remotesite. We can conclude that a cost effective decisionis to process as many tasks as possible locally. This isonly possible when the parameters ranges are smalland consequently the number of tasks derived canbe deployed exclusively on the local infrastructure.However, at the local site only 34 out of 38 tasksare completed. When the parameter ranges are large,resulting in a higher number of tasks, the federationoption can reduce cost and increase the quality ofresults. When outsourcing to remote sites more taskscan be completed as illustrated in Table 3. The processof outsourcing has an associated cost of 7.46 £.

TABLE 4: Input Parameter: Experiment 2

P1 P2 P3 P4 Deadline

{16,17,18,19,20,21,22,23,24} {0,1} {0,1} {0,1} 1 Hour

TABLE 5: Results: Experiment 2

Single Cloud Federated Cloud

Nodes 3 6Cost 0 £ 7.90Tasks 72 72Deadline 1 hour 1 hourTuples exchanged - 15CPU on remote site - 5637.27 SecStorage on remote site - 1869.41 SecCompleted tasks 37/72 48/72

Experiment 2: Job uncompleted: In this experiment weincrease the parameter ranges and consequently thenumber of tasks that need to be processed (butmaintain the 1 hour deadline). We can observe fromTable 5 that in the context of single cloud federation (3workers) only 37 out of 72 tasks are completed withinthe deadline of 1 hour. On the other hand, when usinga federation with 6 workers we observe that 58 tasksare completed in the same deadline. This takes placeby exchanging 15 tuples between the two federationsites, with increased cost for execution and storage.Contrary to experiment 1 where most of the tasks aresuccessfully completed within the single federationcloud, in this experiment we observe that only 48 outof 72 tasks are completed. We can conclude that insome cases, according to user requests, neither singleor federated cloud is suitable. However there is asignificant improvement in terms of number of taskscompleted when using cloud federation – in the con-text of experiment 2, 11 more tasks are completed. Itmust be noted, that the percentage of tasks completedhas a direct impact on the quality of results, whichmay involve reducing the total cost of energy usewithin a building. In comparison with experiment 1,there is an increase in cost of task execution, dueto increased cost of using computational and storageresources remotely.

Fig. 4: Summary of experimental results for federatedclouds

TABLE 6: Input Parameters: Experiment 3

P1 P2 P3 P4 Deadline

{14,15,16,17,18,19,20 {0,1} {0,1} {0,1} 1h 30 min,21,22,23,24,25,26,27}

TABLE 7: Results: Experiment 3

Single Cloud Federated Cloud

Nodes 3 6Cost 0 £ 10.70Tasks 112 112Deadline 1 h 30 min 1 h 30 minTuples exchanged - 22CPU on remote site - 7983.74 secStorage on remote site - 2687.15 secCompleted tasks 42/112 62/112

Experiment 3: Job uncompleted–parameters ranges ex-tended: In this experiment we increase the parameterranges producing a greater number of tasks. In addi-tion, we extend the deadline associated to 1 hour and30 minutes. The input parameters of this experimentare provided in Table 6. We show in this experimentthat the application federation represents a complexproblem that necessitates significant computing re-sources to run. In the single federation scenario weuse 3 workers to compute a number of 112 tasks basedon the parameter ranges submitted by the user. Theoverall number of tasks completed with single cloudis 42. Acknowledging that the quality of results isproportional to the number of tasks completed, wecan conclude that in the context of this experimentsingle cloud federation provide unsatisfactory results.However, when using the federation to outsource apercentage of tasks we observe that the number oftasks completed increases to 62. This was achievedby using a total of 6 worker at the federated site.In terms of cost associated with outsourcing, it canbe concluded that the higher the number of tasksto be processed, the higher the cost for storage andexecution.

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TCC.2015.2415792, IEEE Transactions on Cloud Computing

10

7.2 Cancer Image Processing Application Sce-narioIn this scenario, a task represents an Octave simula-tion and a job is formed by a number of simulations.Each job returns a result within a particular time-frame and has a certain quality of results. In thisscenario, we consider three federated sites using amarket policy and study how the total cost variesover time when executing tasks. In this policy bothlocal and remote tasks go to a common market foroffers from every site interested in executing them.As in the previous cases, tasks are discriminatedbased on their origin to decide the offered price aswell as the resources. One federated site does notperform the real computation of tasks, it is onlyresponsible for receiving client requests and parsingthe parameter ranges, giving bids based on a numberof parameters combinations and sending the tasksto local master/worker. We assign one local master,one local worker and one external worker to eachsite. Local workers only consume local tasks, whileexternal workers can consume both local and externaltasks. The cost of computing a task is calculatedbased on the type of the task, the task location/originand worker type and measured in conventionally inmonetary units(m.u.).

TABLE 8: Aggregated CometCloud-Experiment Re-sult

Experiment Completed Tasks Total Cost(m.u.)

Minimum Cost 472/1600 1458Maximum Task Number 956/1600 3027

We have created 200 requests, each of them be-ing parsed to generate eight sub-tasks. In total wegenerate a number of 1600 tasks. The deadline foreach task is 100 minutes and the normal completiontime for each task is 30-35 minutes. We also assign acost to complete for each task. Using a market policy,each site bids for consuming tasks and based on adecision function the winner to compute the tasks ischosen. We use two different decision functions in ourexperiments:Experiment 1: Minimum Cost Scenario – In the firstexperiment, we modified our decision function fordetermine a winner from the bidding to consider thecost as the most important factor. In this case, thefederated site that promises to give the lowest pricefor computing the tasks wins the bid.Experiment 2: Maximum Number of Tasks – In contrastto the first experiment, the decision function nowchooses the federated site that promises to be able tofinish the highest amount of tasks before the deadline.

As illustrated in Table 8, the results of these twoexperiments show significant differences in total costand number of completed tasks. For the minimumcost scenario, the final cost is about half of the cost

of the maximum number of tasks scenario. While inthe maximum number of tasks scenario, the numberof completed tasks is almost twice comparing withthe minimum cost scenario. Since the number of com-pleted tasks can be correlated to the quality of results,we can conclude that different decision function couldsignificantly affect the cost and the quality of results.

7.2.1 Groupon Cost ModelIn this scenario we have three federated sites using amarket policy. However, in this case one of the siteswill have the groupon-like buying cost model (seeSection 3) and offer services in bulk for a discountedprice, while the others will keep using a regularpricing model. From the user’s perspective, the maindifference is that in the groupon model there is theassumption that client requests are processed afterthe deal is over, while in the regular pricing modelclient requests are executed as soon as possible. In thisscenario we consider that each site has ten availableworkers to execute tasks. We have chosen Rutgerssite to be the groupon site – focusing on computingtasks during the period of time that the deal is active.This site will periodically generate new groupon dealsspecifying deal start time, deal end time, minimumnumber of required clients, and the discounted costratio. In this experiment, we generated three deals asdescribed in Table 9.

TABLE 9: Groupon Deal Detail Information

ID StartTime EndTime Min Buyer Num Discount

1 0 100 50 40%2 140 190 30 60%3 230 300 30 80%

In this scenario we have considered the followingmetrics:

• Price: Regular bidding price is calculated basedon the operational costs and the provide’s desiredprofit. Groupon price is calculated by applying adiscount over the regular price. This discounts as-sumes that the operational costs are lower when alarge number of requests are processed together.In our experiments we consider the cost to beproportional to the time required to compute theworkload.

RegularPrice = Profit+ Cost (5)GrouponPrice = Profit+ Cost−Discount (6)

• Reputation: The reputation R of a site is calculatedusing the scores obtained after processing client

requests, R = 1n

n∑i=1

(Score), where score is a value

among -1, 0 and 1. A site obtains a score of1 if the all the tasks involving a client request

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TCC.2015.2415792, IEEE Transactions on Cloud Computing

11

0

0.1

0.2

0.3

0.4

0.5

0.6

50 100 150 200 250 300 350

Pric

e ($

)

Time (minutes)

Rutgers Site(Groupon)Cardiff Site

Indiana Site

(a) Task distribution and the price that clientspaid for them

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

50 100 150 200 250 300 350

Tota

l Rev

enue

($ )

Time (minutes)

Rutgers Site(Groupon)Cardiff Site

Indiana Site

(b) Total revenue

Fig. 5: Summary of experimental results of Maximize Price Influence

0

0.1

0.2

0.3

0.4

0.5

0.6

50 100 150 200 250 300 350

Pric

e ($

)

Time (minutes)

Rutgers Site(Groupon)Cardiff Site

Indiana Site

(a) Task distribution and the price that clientspaid for them

0

0.2

0.4

0.6

0.8

1

1.2

50 100 150 200 250 300 350

Tota

l Rev

enue

($ )

Time (minutes)

Rutgers Site(Groupon)Cardiff Site

Indiana Site

(b) Total revenue

Fig. 6: Summary of experimental results of Maximize Reputation Influence

is completed within the deadline and 0 if thedeadline is missed, -1 is obtained when a grouponsite cancels a deal due to insufficient number ofclients.

• Delay Time: For regular sites, this is the time be-tween client request submission and start execu-tion. For groupon site, the delay time is generallylonger because it also includes the time spentwaiting for the end of the deal.

• Estimated Time of Completion (ETC): The ETC ofcertain computation is the Delay Time plus theexecution time required to complete the compu-tation. In order for a user to even consider a site,the ETC proposed for certain computation has tobe lower or equal to the computation deadline.

In this scenario we will have 180 client requests,each of them involving the execution of four sub-tasks. We simulate the execution of tasks using realtraces from previous executions of the Cancer appli-cation. All client requests go to a common marketplace looking for offers from every site interested inexecuting them. Each interested site places offers andthen the client decides which site executes its tasks.A decision function is applied to all valid offers (i.e.ETC < deadline) to decide which one is the best forthe client. Since we consider that a site only sends an

offer when ETC < deadline, then all offers a clientreceives are valid. Our decision function includesparameters, such as, price, reputation, and delay time.The decision function to minimize is presented inequation 7.

S = w(P ) ∗Price+w(R) ∗ (1−Reputation)+w(D) ∗Delay (7)

The relevance of each parameter can be specifiedusing a weighting w(x). We use percentages to iden-tify the importance of each parameter to the overallsolution. Hence, the weights are values between 0 and1, and w(P ) + w(R) + w(D) = 1. Equation 7 includesPrice, which is influenced by the time required tocompute the workload (the longer the execution time,the higher the price). We also include the Reputationas this identifies the reliability of a site or provider.The decision function uses 1 − reputation because inthe best case scenario the reputation is 1. Finally, wehave the Delay that may be important when the clientwants to start the execution as soon as possible andis probably not interested in models such as groupon.

Every time the groupon site wins an auction (clientbuys the deal), it puts the client requests in a waitinglist. Once the deal is over, the groupon site has todecide whether it will fulfill the deal and processall waiting requests or not. This decision is taken

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TCC.2015.2415792, IEEE Transactions on Cloud Computing

12

based on the number of clients that bought the dealWaitRequets, the minimum number of required buy-ers MinRequiredRequests, and the reputation of thesite. We identify the following cases:

• if there are enough clients (WaitRequets >=MinRequiredRequests) then the deal is grantedand requests are processed as promised.

• if there are not enough clients (WaitRequets <MinRequiredRequests) then the site evaluateshow cancelling the deal will affect its reputa-tion. In order to maintain a good reputation, ifWaitRequets > 0.5 ∗MinRequiredRequests andReputation < 0.5 then the site will grant thedeal and process the requests as promised, whichmay involve higher operational costs and affectto the obtained profit; In any other case, thesite cancels the deal, returns the money to theclients and assumes a loss in reputation. In otherwords, groupon site will choose between losingreputation or still computing all tasks with highercost and lower profit.

7.2.2 ExperimentsExperiment 1: Maximize Price Influence: In this experi-ment, we assigned the following weights to the deci-sion function described in equation 7: w(P ) = 0.8,w(R) = 0.1, w(D) = 0.1. Thus, clients preferablychose a site with lowest price for computing tasks anddo not care much about the reputation and delay inexecution.

We can observe from the result shown in Table 10that the groupon site is able to attract enough clientsto meet the minimum requirements of the deal. Thisis also beneficial for clients as they get lower prices asshown in Figure 5a. Although the groupon site hadthe cheapest prices, we can observe in Figure 5a thatsome users chose regular sites. The estimated timeof completion (ETC) proposed by the groupon sitewas unacceptable for those clients with a deadlinelower than the ETC proposed by the groupon site.Alternatively, all requests generated during deal 2 and3 are processed by the groupon site. Figure 5b showshow a groupon site could potentially process moreclient requests and gain more benefit, even when theprofit per task is lower than on regular sites. In thiscase, the revenue of the groupon site largely exceedsthe revenue of the other two regular sites.

TABLE 10: Experiment 1 - Client requests processedat each site

Deal ID Groupon site Cardiff Indiana

1 58 24 142 34 0 03 52 0 0

Experiment 2: Maximize Reputation Influence: We mod-ified the weights of our decision function to w(P ) =

0.4, w(R) = 0.5, w(D) = 0.1. In this case, clientconsiders reputation and price as the most impor-tant. We also modified the minimum buyer numberrequirement for deal 2 from 30 to 40, which will forcethe site to cancel the deal. In this way we expect tosee how the reputation affects client decisions.

Figure 6 shows that the first deal is completedas in the previous experiment. However, the seconddeal is cancelled as the groupon site was not ableto attract enough clients, see Table 11. This meansthat all accepted requests are cancelled, the client’smoney returned, and the site’s reputation gets signif-icantly affected. As a result, in the third deal clientsavoid executing their tasks on the groupon site, seeTable 11. All tasks are computed by other normal sitesbecause clients prefer sites with higher reputation.The revenue of these sites also shows the importanceof maintaining a good reputation, see Figure 6b.

TABLE 11: Experiment 2 - Client requests processedat each site

Deal ID Groupon site Cardiff Indiana

1 58 21 172 25(Cancelled) 3 43 0 31 24

From these experiments we can see that differentwinner selection policies used by clients would sig-nificantly affect the task prices they need to pay andthe profit of normal and groupon sites. Moreover,a groupon site must carefully design its decisionfunction for determining whether it should fulfill itspromise of executing waiting tasks at deal price or notto gain higher profit.

8 DISCUSSION

Significant progress has been made in the field ofcloud computing over recent years. Various cloudprovisioning models and a variety of different cloudservices have been proposed. The providers, many ofwhich are commercial, expose different cloud pack-ages according to varying user requirements. As theseproviders continue to operate within a cloud market,it is important to consider the types of offeringsthey have (which can range from a single resourceto resource bundles). Although there is a significantrange of configuration options available with manycloud providers – the economic models underpinningthese offerings are limited. We demonstrate how acloud provider may adopt different cost models andwe propose: (i) a consumption based model and (ii)a group buying (groupon) cost model. Whereas ina consumption based model the users pay only forthe number of resources they use (over a particulartime frame), ensuring equity between the providersand consumers, in the context of a group buying cost

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TCC.2015.2415792, IEEE Transactions on Cloud Computing

13

model the focus shifts to clients, who can get signifi-cant discounts by the use of a “deal” mechanism. Wealso present a cost oriented approach where cloudproviders calculate the cost for a consumer basedon specific application requirements. We evaluate ourfederated cloud system by developing relevant usecase scenarios which use two real applications: anEnergyPlus-based scenario and an image processingworkflow implemented using Octave, both havingperformance and real time deployment requirements.These scenarios are based on a federation frameworkthat we have deployed between three sites.

Various approaches already exist in the contextof cost models for clouds [10], [11], [13], [12], wedifferentiate from these in two ways: (i) we abstractthe type of the cloud infrastructure being considered– our work applies both for private and public clouds,and (ii) we explore different pricing techniques inthe context of a real federated cloud deployment.We provide a methodology for developing federatedclouds that can be efficiently used for running anddeploying real applications. The contribution of thisstudy develops around the notion of cost models andeconomic mechanisms for cloud environments.

These models and mechanisms are validated in areal infrastructure using CometCloud – a federationframework that facilitates the orchestration of geo-graphically distributed resources and creates a multi-cloud environment where user requests are handledbased on a decision function (influenced by resourceavailability, cost, performance and other user-definedconstraints such as access privileges). Hence, our ap-proach can be generalised for other types of appli-cations, or can be adapted for a different federationframework without significant modification.

9 CONCLUSIONS

In this paper, we have investigated the problem of acost based cloud federation by devising a federationframework based on CometCloud. We have shownhow our model can be deployed within two applica-tion scenarios such as EnergyPlus used for energy opti-misation and Octave used for cancer image processingand identified the advantages of an aggregated tuple-space (Comet space) to allocate resources based ona cost based decision function. In our CometCloudbased model users can submit jobs and retrieve resultsin real time, benefiting from advantages of outsourc-ing tasks.

We have presented the design and implementationof the proposed approach and experimentally eval-uated a number of scenarios for individual cloudsand federated clouds. The experimental results haveshown a number of benefits that federation provideswith regards to task completion and cost optimisation.From our experiments, we can conclude that it isessential to evaluate the trade-off between obtaining a

high quality solution (which may take longer to runon local resources) at a lower cost, vs. outsourcingtasks to a remote site in a cloud federation at a highercost (but with lower execution time). In the secondpart of our evaluation we have demonstrated how thegroupon cost model can be beneficial for both serviceproviders and users. Our experiments have shownhow the reputation of a site and the number of clients(within a group, using the same service) may becritical factors when using this model. Understandinghow such a group-based model can be integratedwithin an application remains the next logical step ofthis work. Given a budget, an application user maytherefore speculate on how many concurrent simula-tions (jobs) to execute to achieve a discount from acloud provider – which may subsequently influencethe scheduling decisions made by a job manager.

Acknowledgements The research presented in this work is sup-ported in part by US National Science Foundation (NSF) via grantsnumbers OCI 1339036, OCI 1310283, DMS 1228203, IIP 0758566and by an IBM Faculty Award. This project used resources fromFutureGrid supported in part by NSF OCI-0910812.

REFERENCES

[1] Fumo, N.; Mago, P.; Luck, R. ”Methodology to Estimate Build-ing Energy Consumption Using EnergyPlus Benchmark Mod-els.” Energy and Buildings; (42:12); pp. 2331-2337, 2010.

[2] Garg, V.; Chandrasen, K.; Tetali, S.; Mathur, J. ”Energyplus Sim-ulation Speedup Using Data Parallelization Concept.” ASMEEnergy Sustainability Conference, New York: American Societyof Mechanical Engineers. pp. 1041-1048, 2010.

[3] EnergyPlus, Available at: http://apps1.eere.energy.gov/build-ings/energyplus/

[4] Octave, Available at: http://www.gnu.org/software/octave/[5] X. Qi, R. H. Gensure, D. J. Foran, and L. Yang. Contentbased

white blood cell retrieval on bright-field pathology images.Proceeding of SPIE Medical Imaging, 2013.

[6] G. Douglas, B. Drawert, C. Krintz and R. Wolski, “Cloud-Tracker: Using Execution Provenance to Optimize the Costof Cloud Use”, Proc. GECON conference, Sept. 16-18, 2014.Cardiff, UK. Springer.

[7] Kondo, D.; Javadi, B.; Malecot, P.; Cappello, F.; Anderson, D.P.,”Cost-benefit analysis of Cloud Computing versus desktopgrids,” IPDPS 2009. IEEE International Symposium on Parallel& Distributed Processing, 2009, pp.1-12, Rome, Italy, 23-29 May2009

[8] Chaisiri, S.; Bu-Sung Lee; Niyato, D., ”Optimization of ResourceProvisioning Cost in Cloud Computing,” Services Computing,IEEE Transactions on , vol.5, no.2, pp.164-177, April-June 2012

[9] Hong Xu; Baochun Li, ”Dynamic Cloud Pricing for RevenueMaximization,” IEEE Transactions on Cloud Computing , vol.1,no.2, pp.158-171, July-December 2013.

[10] Jrn Altmann, Mohammad Mahdi Kashef, Cost model basedservice placement in federated hybrid clouds, Future Genera-tion Computer Systems, Volume 41, December 2014, Pages 79-90, ISSN 0167-739X.

[11] Van den Bossche, R.; Vanmechelen, K.; Broeckhove, J., ”Cost-Optimal Scheduling in Hybrid IaaS Clouds for Deadline Con-strained Workloads,” Cloud Computing (CLOUD), 2010 IEEE3rd International Conference on , vol., no., pp.228,235, 5-10 July2010

[12] Juthasit Rohitratana, Jrn Altmann, Impact of pricing schemeson a market for Software-as-a-Service and perpetual software,Future Generation Computer Systems, Volume 28, Issue 8,October 2012, Pages 1328-1339, ISSN 0167-739X.

[13] Khanh-Toan Tran; Agoulmine, N.; Iraqi, Y., ”Cost-effectivecomplex service mapping in cloud infrastructures,” NetworkOperations and Management Symposium (NOMS), 2012 IEEE, vol., no., pp.1,8, 16-20 April 2012

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TCC.2015.2415792, IEEE Transactions on Cloud Computing

14

[14] Xinhui Li; Ying Li; Tiancheng Liu; Jie Qiu; Fengchun Wang,”The Method and Tool of Cost Analysis for Cloud Computing,”IEEE International Conference on Cloud Computing, 2009.CLOUD ’09, pp.93-100, Miami, Florida, USA, 21-25 Sept. 2009

[15] Liu Hong Wei; Zhu Hui, ”Empirical research on the groupontechnology acceptance mode,”6th International Conference onNew Trends in Information Science and Service Science andData Mining (ISSDM), 2012 , vol., no., pp.356,360, 23-25 Oct.2012

[16] H. Kim and M. Parashar, “CometCloud: An Autonomic CloudEngine,” Cloud Computing: Principles and Paradigms, Wiley,2011, pp 275-297.

[17] I. Goiri, J. Guitart, and J. Torres. Economic model of a cloudprovider operating in a federated cloud. Information SystemsFrontiers, pp. 1-17, 2011.

[18] Adel Nadjaran Toosi, Rodrigo N. Calheiros, Ruppa K. Thu-lasiram, and Rajkumar Buyya, Resource Provisioning Policiesto Increase IaaS Provider’s Profit in a Federated Cloud Environ-ment. In Proceedings of the 2011 IEEE International Conferenceon High Performance Computing and Communications (HPCC’11). IEEE Computer Society, Washington, DC, USA, pp. 279-287, 2011.

[19] D. Villegas, N. Bobroff, I. Rodero, J. Delgado, et al., Cloudfederation in a layered service model. J. Comput. Syst. Sci.,78(5):1330-1344, 2012.

[20] M. D. de Assuncao, A. di Costanzo, and R. Buyya, Evaluatingthe cost-benefit of using cloud computing to extend the capacityof clusters, In Proc. 18th ACM Intl. Symp. on High performancedistributed computing (HPDC), pp. 141-150, 2009.

[21] S. Ostermann, R. Prodan, and T. Fahringer, Extending gridswith cloud resource management for scientific computing, InProc. 10th IEEE/ACM Intl. Conf. on Grid Computing, pp. 42-49, 2009.

[22] C. Vazquez, E. Huedo, R. Montero, and I. Llorente, Dynamicprovision ofcomputing resources from grid infrastructures andcloud providers, In Proc. Workshops at the Grid and PervasiveComputing Conference, pp. 113-120, 2009.

[23] L. F. Bittencourt, C. R. Senna, and E. R. M. Madeira. Enablingexecution of service workflows in grid/cloud hybrid systems.In Network Operations and Management Symp. Workshop, pp.343-349, 2010.

[24] P. Riteau, M. Tsugawa, A. Matsunaga, J. Fortes, and K. Kea-hey. Large-scale cloud computing research: Sky computing onfuturegrid and grid, In ERCIM News, 2010.

[25] I. Goiri, J. Guitart, and J. Torres, Characterizing Cloud feder-ation for enhancing providers profit, in Proceedings of the 3rdInternational Conference on Cloud Computing. Miami: IEEEComputer Society, pp. 123-130, July 2010.

[26] E. Gomes, Q. Vo, and R. Kowalczyk, Pure exchange marketsfor resource sharing in federated Clouds, Concurrency andComputation: Practice and Experience, vol. 23, 2011.

[27] B. Song, M. M. Hassan, and E.-N. Huh, A novel Cloud marketinfrastructure for trading service, in Proceedings of the 9thInternational Conference on Computational Science and ItsApplications (ICCSA). Suwon: IEEE Computer Society, pp. 44-50, June 2009.

[28] CometCloud Project. http://www.cometcloud.org/. Last ac-cessed: August 2013.

[29] L. Zhen and M. Parashar,A computational infrastructure forgrid-based asynchronous parallel applications, HPDC, pp. 229-230, 2007.

[30] N. Carriero and D. Gelernter, Linda in context, Commun.ACM, vol. 32, no. 4, 1989.

[31] H. Kim, Y. el-Khamra, S. Jha, I. Rodero and M. Parashar,Autonomic Management of Application Workflow on HybridComputing Infrastructure, Scientific Programming Journal, Jan.2011.

[32] Anand, K. S. and R. Aron (2003). Group buying on the web:A comparison of price-discovery mechanisms. ManagementScience 49, 1546–1562.

[33] IBM Smart Cloud. http://www.ibm.com/cloud-computing/us/en/

Ioan Petri is a Research Associate in Schoolof Computer Science & Informatics at CardiffUniversity. He holds a PhD in ’Cyberneticsand Statistics’ and has worked in industry, asa software developer at Cybercom Plenware.His research interests are cloud comput-ing, peer-to-peer economics and informationcommunication technologies.

Javier Diaz-Montes is currently AssistantResearch Professor at Rutgers Universityand a member of the Rutgers DiscoveryInformatics Institute (RDI2) and the USNational Science Foundation (NSF) Cloudand Autonomic Computing Center. He re-ceived his PhD degree in Computer Sci-ence from the Universidad de Castilla-LaMancha (UCLM), Spain (Doctor Europeus,Feb. 2010). His research interests are in thearea of parallel and distributed computing

and include autonomic computing, cloud computing, grid computing,virtualization, and scheduling. He is a member of IEEE and ACM.

Mengsong Zou is currently a PhD studentin Computer Science at Rutgers University,and a member of the Rutgers Discovery In-formatics Institute (RDI2) and the US Na-tional Science Foundation (NSF) Cloud andAutonomic Computing Center. He receivedboth of his Bachelor and Master degrees inComputer Science from Huazhong Univer-sity of Science and Technology, China. Hiscurrent research interest lies in parallel anddistributed computing, cloud computing and

scientific workflow management.

Tom Beach is a lecturer in ConstructionInformatics. He holds a PhD in ComputerScience from Cardiff University. He special-izes in High Performance, Distributed, andCloud Computing. His recent work focuseson applications in the Architecture Engineer-ing and Construction domain. This work hasincluded the use of cloud computing for thestorage, security, management, coordinationand processing of building data and the de-velopment of rule checking methodologies to

allow this data to be tested against current building regulations andperformance requirements.

Omer F. Rana is a Professor of PerformanceEngineering in School of Computer Science& Informatics at Cardiff University & a mem-ber of Cardiff University’s “Data InnovationInstitute”. He holds a Ph.D. in ”Neural Com-puting and Parallel Architectures” from Impe-rial College (University of London). His re-search interests include distributed systemsand scalable data analysis.

Manish Parashar is Professor of ComputerScience at Rutgers University. He is also thefounding Director of the Rutgers DiscoveryInformatics Institute (RDI2), the NSF Cloudand Autonomic Computing Center (CAC) atRutgers and the The Applied Software Sys-tems Laboratory (TASSL), and is AssociateDirector of the Rutgers Center for Informa-tion Assurance (RUCIA). Manish receivedthe IBM Faculty Award in 2008 and 2010,the Tewkesbury Fellowship from University

of Melbourne, Australia (2006), and the Enrico Fermi Scholarship,Argonne National Laboratory (1996). He is a Fellow of AAAS, Fellowof IEEE / IEEE Computer Society, and ACM Distinguished Scientist.