Accelerating Distributed Workflows With Edge...

10
Accelerating Distributed Workflows With Edge Resources Siddharth Ramakrishnan, Robert Reutiman, Abhishek Chandra, Jon Weissman Dept. of Computer Science and Engineering University of Minnesota, Twin Cities Minneapolis, USA Email: {ramak,reutiman,chandra,jon}@cs.umn.edu Abstract—Distributed data-intensive workflow applica- tions are increasingly relying on and integrating remote resources including community data sources, services, and computational platforms. Increasingly, these are made available as data, SAAS, and IAAS clouds. The execution of distributed data-intensive workflow applications can expose network bottlenecks between clouds that compromise per- formance. In this paper, we focus on alleviating network bottlenecks by using a proxy network. In particular, we show how proxies can eliminate network bottlenecks by smart routing and perform in-network computations to boost workflow application performance. A novel aspect of our work is the inclusion of multiple proxies to accelerate different workflow stages optimizing different performance metrics. We show that the approach is effective for work- flow applications and broadly applicable. Using Montage 1 as an exemplar workflow application, results obtained through experiments on PlanetLab showed how different proxies acting in a variety of roles can accelerate distinct stages of Montage. Our microbench- marks also show that routing data through select proxies can accelerate network transfer for TCP/UDP bandwidth, delay, and jitter, in general. Index Terms—Distributed computing; workflows; net- work systems; data-intensive computing I. I NTRODUCTION Data-intensive distributed workflow applications rep- resent an important and emerging class of applications. Such applications arise in a multitude of settings where data sources and computation are naturally distributed, in areas such as bioinformatics (e.g. GADU [42]), as- tronomy (e.g. Montage [23]), civil engineering (e.g. SCEC [16]), high-energy physics (e.g. LIGO [44]) to name a few. The distribution of data reflects the large volume of available community datasets collected via sensors, outputs of experimental processes, and so forth. Distributed computation arises from situations in which The authors would like to acknowledge grant NSF/IIS-0916425 that supported this research. 1 This research made use of Montage, funded by the National Aeronautics and Space Administration’s Earth Science Technology Office, Computation Technologies Project, under Cooperative Agree- ment Number NCC5-626 between NASA and the California Institute of Technology. Montage is maintained by the NASA/IPAC Infrared Science Archive. the computational stages are either pinned to resources, e.g. Web Services as in Taverna [46], or require large amounts of computing, e.g. dynamic Grid deployments such as Pegasus [17]. In other cases, a workflow is created as an orchestration of pre-deployed services. Platforms hosting workflow components span peer-to- peer, Grid, and now, Cloud systems. Another feature of complex distributed workflow ap- plications is that performance metrics may differ for different components. For example, a large-data transfer to one component requires a high bandwidth path, yet an interactive component that allows the user to view intermediate results and tune parameters, may require low latency or low jitter (if visualization is used). Even beyond the application execution and computation, the diverse capabilities of end-user hosts, ranging from desk- top machines to resource- and energy-constrained mobile devices, create bottlenecks for output delivery to users of these workflow applications. Such end-point bottle- necks are particularly critical in applications requiring visualization or real-time human input for execution. The interconnection of workflow components and resources may present network bottlenecks to workflow applica- tions that live on the upper-end of data and compute requirements. In this paper, we propose to utilize a proxy net- work that can accelerate components of a distributed workflow by alleviating network bottlenecks. We have developed a simple tool to identify such bottlenecks, and use the proxy network to route around them. This network also boosts application performance by per- forming in-network computation close to communicating components. To demonstrate the potential of proxies, we have performed experiments on PlanetLab, exploiting its resource and network diversity to emulate distributed clouds. We show that such proxies can accelerate dis- tributed workflow applications such as Montage. Our work differs from other efforts in several ways. As opposed to Web proxies [40] that accelerate point- to-point (client-server) interactions and mainly support data caching, our aim is to accelerate multi-node in-

Transcript of Accelerating Distributed Workflows With Edge...

Page 1: Accelerating Distributed Workflows With Edge Resourcesdcsg.cs.umn.edu/Publications/Accelerating_Distributed_Workflows_With... · bottlenecks by using a proxy network. In particular,

Accelerating Distributed Workflows With EdgeResources

Siddharth Ramakrishnan, Robert Reutiman, Abhishek Chandra, Jon WeissmanDept. of Computer Science and Engineering

University of Minnesota, Twin CitiesMinneapolis, USA

Email: {ramak,reutiman,chandra,jon}@cs.umn.edu

Abstract—Distributed data-intensive workflow applica-tions are increasingly relying on and integrating remoteresources including community data sources, services, andcomputational platforms. Increasingly, these are madeavailable as data, SAAS, and IAAS clouds. The execution ofdistributed data-intensive workflow applications can exposenetwork bottlenecks between clouds that compromise per-formance. In this paper, we focus on alleviating networkbottlenecks by using a proxy network. In particular, weshow how proxies can eliminate network bottlenecks bysmart routing and perform in-network computations toboost workflow application performance. A novel aspect ofour work is the inclusion of multiple proxies to acceleratedifferent workflow stages optimizing different performancemetrics. We show that the approach is effective for work-flow applications and broadly applicable.

Using Montage1 as an exemplar workflow application,results obtained through experiments on PlanetLab showedhow different proxies acting in a variety of roles canaccelerate distinct stages of Montage. Our microbench-marks also show that routing data through select proxiescan accelerate network transfer for TCP/UDP bandwidth,delay, and jitter, in general.

Index Terms—Distributed computing; workflows; net-work systems; data-intensive computing

I. INTRODUCTION

Data-intensive distributed workflow applications rep-resent an important and emerging class of applications.Such applications arise in a multitude of settings wheredata sources and computation are naturally distributed,in areas such as bioinformatics (e.g. GADU [42]), as-tronomy (e.g. Montage [23]), civil engineering (e.g.SCEC [16]), high-energy physics (e.g. LIGO [44]) toname a few. The distribution of data reflects the largevolume of available community datasets collected viasensors, outputs of experimental processes, and so forth.Distributed computation arises from situations in which

The authors would like to acknowledge grant NSF/IIS-0916425 thatsupported this research.

1This research made use of Montage, funded by the NationalAeronautics and Space Administration’s Earth Science TechnologyOffice, Computation Technologies Project, under Cooperative Agree-ment Number NCC5-626 between NASA and the California Instituteof Technology. Montage is maintained by the NASA/IPAC InfraredScience Archive.

the computational stages are either pinned to resources,e.g. Web Services as in Taverna [46], or require largeamounts of computing, e.g. dynamic Grid deploymentssuch as Pegasus [17]. In other cases, a workflow iscreated as an orchestration of pre-deployed services.Platforms hosting workflow components span peer-to-peer, Grid, and now, Cloud systems.

Another feature of complex distributed workflow ap-plications is that performance metrics may differ fordifferent components. For example, a large-data transferto one component requires a high bandwidth path, yetan interactive component that allows the user to viewintermediate results and tune parameters, may requirelow latency or low jitter (if visualization is used). Evenbeyond the application execution and computation, thediverse capabilities of end-user hosts, ranging from desk-top machines to resource- and energy-constrained mobiledevices, create bottlenecks for output delivery to usersof these workflow applications. Such end-point bottle-necks are particularly critical in applications requiringvisualization or real-time human input for execution. Theinterconnection of workflow components and resourcesmay present network bottlenecks to workflow applica-tions that live on the upper-end of data and computerequirements.

In this paper, we propose to utilize a proxy net-work that can accelerate components of a distributedworkflow by alleviating network bottlenecks. We havedeveloped a simple tool to identify such bottlenecks,and use the proxy network to route around them. Thisnetwork also boosts application performance by per-forming in-network computation close to communicatingcomponents. To demonstrate the potential of proxies, wehave performed experiments on PlanetLab, exploiting itsresource and network diversity to emulate distributedclouds. We show that such proxies can accelerate dis-tributed workflow applications such as Montage.

Our work differs from other efforts in several ways.As opposed to Web proxies [40] that accelerate point-to-point (client-server) interactions and mainly supportdata caching, our aim is to accelerate multi-node in-

Page 2: Accelerating Distributed Workflows With Edge Resourcesdcsg.cs.umn.edu/Publications/Accelerating_Distributed_Workflows_With... · bottlenecks by using a proxy network. In particular,

teractions while supporting other computational rolesas well. Overlay networks [2], [20] strive to providebetter network paths for network applications. Whilesome of our techniques are related, our focus is onmulti-stage workflow applications that are both compute-and data-intensive. Volunteer computing and data sharingsystems [4], [14] provide the ability to tap into theaggregate CPU capacity or network bandwidth of idledonated resources, while our approach aims to exploitthe characteristics of specific nodes on behalf of appli-cations. In our own earlier work, we focused on singleproxy nodes for accelerating cloud applications [47].Here, we propose using multiple proxies at differentpoints in the workflow execution to account for diverseperformance requirements within a workflow application.

We present both microbenchmarks and results usingMontage as an illustration of the potential benefits ofour approach. Results for Montage show that proxiescan help reduce the data download bandwidth fromdata sources and within the application stages via smartrouting. Proxies also reduce the Montage output datadelivery time to end-users 65-80% through routing, com-pression and data transformation. Our microbenchmarkresults show that our approach has great promise aslarge-data communication common to distributed data-intensive workflows can be accelerated significantly, byas much as 56% for TCP bandwidth. This has importantimplications for data-intensive applications decomposedacross geographically separated clouds.

II. DISTRIBUTED WORKFLOWS

Distributed workflow applications are represented asdirected graphs with nodes (or components) represent-ing actions (typically computation or control), and arcsrepresenting data-flow, control-flow, or both. Considerfour exemplars taken from the domain of scientificcomputing: (1) Montage, an application for constructingscience-grade astronomical mosaics, (2) LIGO, the laserinterferometer gravitational-wave observatory designedto detect cosmic gravitational waves and the harnessingof these waves for scientific research, (3) SCEC, a sys-tem for the comprehensive understanding of earthquakes,and (4) GADU, Genome Analysis and Database Updatesystem, designed to perform periodic high-throughputanalysis of publicly available protein sequences usingbioinformatics tools.

Workflow applications may have many consumers (orend-users) as mentioned at the outset. This may arisewhen workflow applications are deployed as servicesmade available to a wider community, e.g., many spe-cific bioinformatics workflows are routinely used byBioinformaticians such as NCBI’s Blast tools [10] whichallow on-line query submission. Another example is anemergency planning and response workflow system, e.g.

hurricane evacuation that has many potential consumersfrom planners, to emergency personnel, to individuals,etc. The upshot is that the resources allocated to theworkflow may be multiplexed across many concurrentinstantiations.

For data/compute-intensive or service-based work-flows, components or stages may be placed on dif-ferent nodes, that may be geographically distributed,perhaps across multiple sites or clouds. This could bedone for multiple reasons. First, many of these work-flows require large number of compute resources, forwhich they may need to exploit distributed, multi-siteresources (e.g. Montage). Second, the input data sourcesmay be geographically distributed, and some stagesof the workflow may also require data from multiplesources. For example, imagine a workflow that integratedPubMed data sources or the GoogleEarth cloud, andso on. Placing computation stages closer to the datasources would help reduce the data dissemination cost.For dynamic workflow systems, such decisions aboutcomponent/stage placement can be made at runtime vs.a static services-based mapping. In either case, however,network conditions between the stages, as well as thosefrom the data sources and to the end-users, may changedynamically, thus compromising performance metrics.One solution could be to migrate components, howeverthis is a costly and complex remedy. Instead, we proposea light-weight proxy solution that can respond to networkdynamics. How is this relevant to the cloud? Increas-ingly, a cloud ecosystem is evolving to include manydifferent types of different clouds offering data, services,or raw resources. Large-scale scientific workflow ap-plications will therefore likely span multiple clouds astheir predecessors spanned multiple Grid sites. Thus, theproxy edge network will benefit applications spanningmultiple geographically dispersed clouds.

III. PROXY NETWORK

Our proposed proxy network contains a configurablecollection of edge nodes offering varying amounts ofresources including bandwidth, storage, and computa-tion. One strong appeal of the proxy network is thatit offers nodes at many network locations that may pro-vide improved performance with respect to distributedcomponents and/or the end-user application (Figure 1).For example, it is well-known that Internet bottlenecksdue to IP routing can be overcome by application-level routing. We envision that such a network couldbe realized in a variety of ways (e.g. volunteers, CDNs,are two possibilities), and we do not assume a specificdeployment model in this paper. Volunteer computinghas been successively used in both Grid and P2P com-munities provided sufficient incentives are provided. Weare currently designing infrastructure that would enable

Page 3: Accelerating Distributed Workflows With Edge Resourcesdcsg.cs.umn.edu/Publications/Accelerating_Distributed_Workflows_With... · bottlenecks by using a proxy network. In particular,

C1

C2

C3

C4

UserData

Workflow

+

C1

C2

C3

C4

Dashboard

Proxy Network

Proxy-

accelerated

Path

Figure 1. Proxy network accelerating a distributed workflow application. The figure shows how a set of proxies, identified using the networkdashboard, could be inserted into the workflow to provide better data and communication paths, in order to accelerate the application. Severalcomponents of the workflow are deployed in geographically separated clouds.

secure execution of proxy code on volunteers. The issueof incentives is a topic of future work.

Note that even though the use of a proxy between twoelements introduces an additional communication hop atthe high-level, other groups have shown the existence ofbeneficial network paths involving extra hops [32], [50].Our goal is to assess the benefits of proxies from anapplication standpoint and across a diverse set of metrics.

The heart of the proxy architecture is the proxynode. Proxies provide “bottleneck relief” for distributedworkflow applications. The proxy network consists of alarge number of logically connected edge nodes that mayassume a rich set of roles to boost the performance ofdistributed applications, including:• Service interaction: A proxy may act as a client to aworkflow component or to an external data source. Thisrole allows a proxy with better network connectivity toimprove performance. For example, a proxy may havemuch higher bandwidth to/from a workflow componentrelative to the end-user.• Data Transformation: A proxy may carry out com-putations on data via a set of data operators. This roleallows a proxy to filter, compress, merge, and mine data.• Caching: A proxy may efficiently store and serve datato other nearby proxies that may consume the data lateron. Proxies can also cache intermediate results from acomponent interaction that may be reused again.• Routing: A proxy may route data to another party aspart of an application workflow. This role allows a proxyto efficiently send data to another proxy or applicationcomponent for additional processing, caching, or com-ponent interactions. This role is particularly important ifthe application contains multiple interacting componentswhich are all widely distributed, and there may be nosingle proxy that can efficiently orchestrate all of these

interactions.Network Dashboard: A key part of our architecture is

a network dashboard that can provide performance statis-tics about the proxies in the proxy network (e.g., theirbandwidth, latency, etc.). We envision this dashboardto be useful in proxy selection and discovery process.As part of our system infrastructure, we have developedsuch a network dashboard tool2 for PlanetLab that pro-vides us with the networking statistics for potential proxynodes in PlanetLab. Each proxy executes a resourcemonitor that collects and transmits monitoring data tothe entry points and the dashboard control. The resourcemonitor periodically probes all proxies to measure thefollowing entities - bandwidth (for TCP streams andUDP datagrams), the delay in the arrival of successivedatagrams, and the variation of this delay, also knownas jitter.

In Figure 1, we illustrate how a distributed work-flow application can benefit from a proxy network. Asimple example from image processing could be anapplication that retrieves images from an external datasource (e.g. SkySurvey), performs in-network processing(e.g. Montage mosaic construction), and then returnsthe output to the end-user. Proxies could accelerate thisapplication in the following ways: (1) they can providea better network path to retrieve input data from externalsources to the computation node(s), (2) they can providea better network path to deliver output data from thecomputation node(s) to the end-user, (3) they can distillor compress output if the end-user device has reduceddisplay capacity or a poor network path, and (4) they canperform in-network computations when proxy resourcesoffer greater power than at the application control site.Proxies (in 3 and 4) allow customized processing on

2Network Dashboard Tool URL: http://netstat.cs.umn.edu

Page 4: Accelerating Distributed Workflows With Edge Resourcesdcsg.cs.umn.edu/Publications/Accelerating_Distributed_Workflows_With... · bottlenecks by using a proxy network. In particular,

behalf of the application. We empirically illustrate all ofthese opportunities with Montage in Section IV. Proxiesare not a new idea by themselves, but it is the synthesisof the diverse roles of proxies and the use of multipleproxies within the same application that is new.

Proxy Selection: We have experimented with acommunication-centric proxy selection algorithm. Thealgorithm probes the dashboard to determine the commu-nication characteristics of a path A → B and candidateproxies for acceleration. The algorithm selects the beststable proxy - considering both performance and perfor-mance variance. We show present results using Montageshortly.

In this paper, we focus on evaluating the potentialperformance benefits offered by proxies. Programmingproxies is outside the scope of this paper. More detailsof the proxy architecture can be found in [36]. Inprior work, we have developed a generic programmingframework for enabling proxies to act as a client forany http-based Web services [9]. The computing role canbe confined to a predetermined set of trusted operatorsdownloaded by the proxy, similar to the trust expressedby clients in cycle-harvesting systems such as Con-dor [15] or @Home networks with respect to executingnon-local code. Arbitrary execution of non-local codecan be enabled by various sandboxing technologies suchas virtualization [8], language restriction (e.g. Java), andothers (e.g. Google Native Client [49]). We also donot focus on the setup and maintenance of the proxynetwork, for which existing mechanisms, such as thoseused in peer-to-peer networks [41], [12] and cycle-sharing systems [31], [7] can be used.

IV. MONTAGE CASE STUDY

In this section, we demonstrate how proxies can beused to benefit a real-world distributed workflow appli-cation. We present a set of experiments using Montage(Figure 2), a software suite for creating image mo-saics. Montage has been implemented in many differentways on different platforms. In our implementation, wehave distributed the Montage stages across a slice inPlanetLab. We execute each compute-intensive stage ofthe Montage application on a different machine. Theother stages of the Montage application are distributedover these machines. The output of each stage is sentto the next stage. Proxies are used to accelerate datatransfer between stages of Montage that are executedon different physical machines. In this case study, weillustrate the benefit of the proxy architecture via tworoles: service interaction (Sections IV-A and IV-C) anddata transformation (Section IV-B).

We aim to create a mosaic of the sky by sourcing im-ages of different quadrants directly from the SkySurveydata cloud [39]. Montage illustrates all of the potential

Figure 2. Drilling down into Montage

optimizations illustrated earlier. We show that proxiescan speed up the runtime of the Montage applicationby reducing the data transfer times from SkySurvey tothe compute nodes. In addition, we show how proxiescan benefit the delivery of output data to a user eitherfor archival or for real-time visualization, where the usermay be either on a desktop machine or on a resource-constrained mobile client. Finally, we show how proxiescan accelerate the “internals” of the application (not justthe input and output phases) by speeding up the datatransfer between different stages of a distributed Mon-tage workflow. In the results presented, we anonymizethe specific PlanetLab nodes used (node A, B, etc.).

The remainder of the section thoroughly examinesthe benefits of using proxies to accelerate the Montageworkflow as shown in (Figure 3).

A. Accelerating Inflow

Here, we show how proxies can be used to reducethe time to retrieve data from the SkySurvey servers tothe nodes that compute a mosaic using Montage. Forthe sake of simplicity, we initially assume that eachcompute node performs all stages of the image mosaicoperation. Thus, we concentrate on acceleration of theinput delivery phase from the SkySurvey servers. Thistype of acceleration is applicable to any application thatneeds to retrieve data from a number of geographicallydisperse locations, like blog analysis, distributed datamining, etc.

We performed experiments over PlanetLab nodes lo-cated across multiple continents. We selected a set of 4PlanetLab nodes, and downloaded datasets of differentsizes from the SkySurvey servers. The results show thatproxies can speed up input delivery by as much as 60%(e.g. proxy node C, Figure 4(a)).

We proceed to investigate the stability of proxies overtime, i.e. are the speedups repeatable over a long period?For this experiment, we download a dataset of a fixedsize on to a set of PlanetLab nodes, a number of timesover 24 hours. We compare the times to download datawith and without proxies to accelerate the transfer. The

Page 5: Accelerating Distributed Workflows With Edge Resourcesdcsg.cs.umn.edu/Publications/Accelerating_Distributed_Workflows_With... · bottlenecks by using a proxy network. In particular,

End-Users

Proxy-

accelerated

PathB

A

C

D

G

F

E

...

...

...

Montage Workflow

SkySurveyStage 1

Stage 2

Output

Stage

Figure 3. Accelerating Montage. Multiple clouds are shown: SkySurvey data cloud connecting to Montage component services running acrossdistributed compute clouds.

(a) Speedup observed when proxies (A,B,C,D) route thetransfer of different input image-sets for the Montageapplication

(b) The speedup observed over time, when proxies accelerate commu-nication between PlanetLab nodes (y-axis) and the SkySurvey web-servers. This plot shows the average values (with 95% confidenceintervals) by which the data transfer can be accelerated using proxies,observed over a 24 hour period.

Figure 4. Speedup and consistency seen when proxies are used to accelerate the transfer of data sets from the SkySurvey servers to differentPlanetLab nodes outside the continental US.

results are plotted in 4(b). We see that a number ofproxies exhibit a sustained positive speedup over theduration of a day and the benefit appears to be insensitiveto data size. Also, by the size of the confidence intervals,we see that the variation in this speedup is fairly smallfor a number of proxies, indicating that these speedupsare fairly stable over time.

To summarize, we have shown that proxies can beused to accelerate data transfer between node pairs. Thespeedup obtained is substantial and is stable over theperiod of a day.

The next question we posed: is finding a good proxy a“needle-in-the-haystack”, i.e., are good proxies rare? Toanswer this question, we did a detailed evaluation usingour network dashboard. The first set of results showthe ubiquity of helpful proxies for large data transfers(Figure 5).

Approximately 1600 pairs of network endpoints weremonitored. We calculated the number of alternate paths,formed by routing data through a single proxy, that aresuperior to the direct path. This analysis was carriedout for TCP streams as it is the primary protocol used

for large data transfers. On average, there exist a largenumber of alternate paths that may benefit a given pairof network endpoints. Furthermore, the benefit of thesepaths remains constant over a long duration, suggestingthat these opportunities - the benefit of alternate paths- are long lived. Looking at these aggregate values, onenotes that about 80% of the alternate paths are fasterthan the direct ones by at least 10%. Specific paths canbe accelerated far more, over 50% in our study.

B. Accelerating Outflow

In addition to communication acceleration, proxiescan also perform custom computation before deliveringfinal results to the end-user. We now present resultsfor optimizing the application’s output stage, namely,delivering the output data to the user. Here, we considertwo scenarios: (i) where the user is on a wired desktopmachine, and (ii) where the user is on a mobile device.Montage produces large image files, and so the criticalrequirement in both scenarios is to reduce the networkoverhead for data transmission. In addition, for the mo-bile client scenario, we would like to present the output

Page 6: Accelerating Distributed Workflows With Edge Resourcesdcsg.cs.umn.edu/Publications/Accelerating_Distributed_Workflows_With... · bottlenecks by using a proxy network. In particular,

2.5 7.5 12.5 17.5 22.5Time (hours)

> 25%> 15%> 10%> 0%

Figure 5. For a set of 1600 pairs of network endpoints, we plot thenumber of alternate paths that could improve the TCP bandwidth. Thedata plotted was collected over a day, and is discretized at intervalsof 2.5 hours. For each interval and each pair of network endpoints,approximately 40 paths were analyzed, leading to a total of 64kpaths analyzed per interval. Each curve indicates the average TCPacceleration as a percentage.

Figure 6. This plot shows the download, compression, and decom-pression times when a proxy is used to losslessly compress the outputfits files before uploading them to the desktop client.

image to the user in a size and resolution appropriate tothe mobile device. More importantly, we would like toreduce the bandwidth and energy usage of the mobileclient, as these are precious resources. We show howproxies can be used to achieve these goals. For theseexperiments the proxies were collocated with the dataon the Montage output node.Desktop Client: The Montage process outputs large im-age files with the fits3 [21] extension. This is the same filetype used as input to the Montage process. Fits files canbe converted into jpg files by a utility that comes with theMontage software suite. It may also be desirable to keepthe output images in fits format to prevent image dataloss and to reuse the images as input to another Montageexecution later. A proxy could cache this data for lateroptimization, e.g. it might be useful to another Montageexecution avoiding some recomputation if regions ofinterest overlap. The drawback to keeping the imagesin fits format is their large size. Therefore it may bedesirable to compress the fits images losslessly prior to

3Flexible Image Transport System

transport to the desktop end client for storage.For this experiment we located a proxy that had BOTH

sufficient computing capacity for compression ANDgood network connectivity for output delivery using thedashboard. The benefits of compression are obvious butlocating a proxy that also had high delivery bandwidthwas a second constraint. This proxy compressed theimages with a gzip algorithm and transfered them toa desktop client. The size savings ranged from 14.5-17.38% for the output images generated earlier (asso-ciated with the input images in Figure 4(a)). The fitsimages used as input for compression were 23.57MB,34.85MB, 47.81MB, and 45.72MB in size. Figure 6shows the download times for the images with and with-out compression using a well-connected proxy. Thereis also a tradeoff between space savings and down-load/compression time. On average it took 6.55 secondsto compress an image and 2.8 seconds to decompress.This overhead made compression more useful for thelarger images.

Figure 7. This plot shows the download and compression times whena proxy is used to compress (slightly lossy) and resize the output jpgfiles.

Mobile Client: It may also be desirable to view theimage outputs remotely via a mobile device if the useris on the go. To this end we created a mobile applicationwritten for the Google Android platform [19]. The proxyreduces the resolution of the images to 225 pixels width,while preserving the aspect ratio of the image. Thisresults in significant space savings while still producingan output image with a viewable resolution for a phone.The phone displays the image below the transfer button.

As before, the images were converted from fits formatto jpg format using the utility that comes with Montage.They were converted using the highest quality settingsinitially. The resulting sizes were 392KB, 1136KB,1272KB, and 565KB. These images were at full resolu-tion and were used as input to the proxy. The resolutionsranged from 1504x2054 pixels to 2735x2191 pixels.These resolutions are too large for the typical desktopcomputer let alone a mobile phone. The compressionsavings from resizing the images were extremely high,

Page 7: Accelerating Distributed Workflows With Edge Resourcesdcsg.cs.umn.edu/Publications/Accelerating_Distributed_Workflows_With... · bottlenecks by using a proxy network. In particular,

ranging from 93.36% to 98.42%. This results in asignificant space savings if multiple images are storedon the phone for later viewing. Figure 7 compares thedownload times for images with and without compres-sion. For these downloads the link was set at optimal3G/UTMS download speeds and latency. This was con-trolled through the settings for the Android emulator.The bandwidth a mobile phone receives typically isn’toptimal and can vary from location to location. Theoverall speedup varied from 11% slower for the smallestfile to 60% faster for the largest file, indicating thatcompression is desirable for the larger images for overalldownload time. However, when we look at the breakupof the download time, we see that compression can takeup a significant portion (65-80%) of the total time. Sincecompression occurs at the proxy, the actual networktransmission time for the image is much smaller (13-40%of that for uncompressed image). Reducing the networktransmission time is critical for a mobile client as it alsosaves precious bandwidth and power, showing the benefitof using proxies across all file sizes.

C. Accelerating Internal Stages

Finally, we show how proxies can be used to ac-celerate the data transfer between different stages ofa distributed workflow application. Again, we considerthe Montage application. The Montage application com-prises of a number of stages that are executed seriallyone after another. Montage has a few compute intensiveoperations - mProjExec, mDiffExec, mFitExec, whichtake several seconds to minutes to complete. Parallelismin Montage can be obtained by running multiple in-stances of these operations on a partitioned input set,on different machines. Thus, a typical parallel executionof the Montage application will have a high fan-outand fan-in for certain stages. Planners such as Pegasus[17] are used to create workflows that can be utilize theparallelism available in grid environments.

The results are shown in Figure 8. The input set, ofsize 36MB, is present on the machine that executes thefirst compute intensive stage of Montage. This stagegenerates a set of intermediate files, totaling around118MB. This data is transfered over the network toanother machine that executes the next few stages of theMontage workflow. The transfer phase is accelerated by42% if data is routed through a proxy. The intermediatefiles generated by next few stages, totaling 369MB, arethen transfered to another machine that executes the finalfew stages. When data is routed through a proxy, wesee an improvement of 24% in the completion timeof this phase. The final stage of Montage generates asingle output fits image, of size 24MB. Thus, by usingproxies to accelerate the network-intensive tasks of theworkflow, we are able to reduce the total execution time

of the workflow by 13%. This setup illustrates the mostconservative improvement one can achieve by deployingthe Montage application over a wide-area network. Adeeper parallel execution of the stages that compriseMontage will further reduce the computation time, andthus, the networking benefits brought about by proxieswill be even more pronounced.

The key point to stress is that different proxies wereutilized by different phases of Montage (input, internal,output), and certain proxies were selected based ontheir effectiveness for multiple roles simulataneously (i.e.computing and communication for the output phase).

Figure 8. This plot shows the % improvement when proxies areused to accelerate communication between different stages of Montagerunning on different nodes. The y-axis shows the total execution time,in seconds.

D. Performance Diversity

As mentioned earlier, complex distributed workflowsare characterized by different performance requirementsat different stages, e.g. high b/w for large data transfer,low latency for response-sensitive components, low jitterfor end-user visualization, and so on. We now presentresults that show that indeed proxies can optimize a wideset of metrics even for a single path.

When a workflow stage utilizes multimedia streamingfor remote visualization or other purposes, many appli-cations utilize the UDP protocol for data transfer. Theuse of a proxy can amplify the observed UDP bandwidthbetween two end-points. We present an example of onesuch experiment (of many examples) in Figure 9(b),where the alternate paths accelerate data transfer by 6-50%. Continuing with the benefits seen for the UDPprotocol, one may reduce the network jitter and delay(Figures 9(c) and 9(d)) between the endpoints by usinga proxy, to under 10% of the original values. Thiswould benefit human-in-the-loop applications by reduc-ing network delay and jitter for data transmissions. Thus,proxies can be used to reduce the sensitivity of the enduser to the vagaries of the network. These observationsmirror those seen for the TCP protocol - proxies could be

Page 8: Accelerating Distributed Workflows With Edge Resourcesdcsg.cs.umn.edu/Publications/Accelerating_Distributed_Workflows_With... · bottlenecks by using a proxy network. In particular,

5 10 15 20 25Time (hours)

DirectProxy LProxy EProxy IProxy JProxy A

(a) TCP bandwidth (KBps) (Higher is better)

5 10 15 20 25Time (hours)

DirectProxy CProxy BProxy KProxy H

(b) UDP bandwidth (KBps) (Higher is better)

5 10 15 20 25Time (hours)

DirectProxy CProxy KProxy HProxy DProxy FProxy B

(c) UDP delay (ms) (Lower is better)

5 10 15 20 25Time (hours)

DirectProxy FProxy BProxy KProxy H

(d) UDP jitter (ms) (Lower is better)

Figure 9. These plots show the networking benefit of using proxies to route data between a particular pair of nodes, for both the TCP and theUDP protocol. The solid line plots values for a direct network path between the endpoints. Other points on each plot indicates values that maybe realized by routing data through certain proxies. The plot incorporates data sampled at different times over a day, with more recent valuesappearing on the right.

used to improve the network performance and networkQoS by a substantial amount that is sustained over time.

As stated earlier, for complex workflows differentproxies may be appropriate to accelerate different partsof the same application. This opportunity has not beenstudied in the literature to our knowledge. We presentone simple real example from PlanetLab to further illus-trate this, a node S is communicating to both node A andnode B. In this experiment, these nodes are distributedin PlanetLab in the US. It is possible that the metric ofinterest for each communication (S→A and S→B) maybe different. We present results in Table I that showsnot only that proxies can improve this communication(TCP b/w, UDP b/w, jitter, delay), but that in severalcases, different proxies are needed. If one looks at A→S,we see different proxies are needed for different metrics(except E which is best for both jitter and delay). Thesame is true for B→S. Furthermore, for the same metric(e.g. TCP or UDP b/w), A→S and B→S each requiredifferent proxies (C/D vs. F/G).

V. RELATED WORK

Many cloud systems such as Amazon EC2/S3 andGoogle Apps provide virtual resources to third-partyapplications. There are many data clouds such as GoogleEarth and Sloan Digital Sky Survey [39] that serve usefuldata to end-users. These systems are optimized for data-

intensive computing within a single cloud or data center,however, a challenge is transferring the underlying dataand the results to or from these clouds.

A number of papers and scholarly articles, mostnotably [45], [27], advocate an intelligent network withcomputations on data streams in the network fabricitself. This scheme has been adapted for protocol con-version [45], caching [11], transcoding data [1], remoteexecution [29] and even routing optimization [34], [26].Although our model is logically similar, we operate atthe application layer of the protocol stack. We make noassumptions about the network, and treat it like a blackbox. We seek to address the different set of problems atthe application layer of the protocol stack by exploitingthe location and diversity of a proxy network comprisedof volunteered resources. Other projects [5], [6] havealso proposed in-network computing and caching. How-ever, these works have focused on data query applica-tions (as opposed to generalized workflows) and do notaddress the harder problem of optimized routing nor theselection of proxies in general.

Volunteer edge computing and data sharing systemsare best exemplified by Grid and peer-to-peer systemsincluding, Kazaa [37], Bittorrent [14], Globus [22],BOINC [4], and @home projects [33]. These systemsprovide the ability to tap into idle donated resourcessuch as CPU capacity or aggregate network bandwidth,

Page 9: Accelerating Distributed Workflows With Edge Resourcesdcsg.cs.umn.edu/Publications/Accelerating_Distributed_Workflows_With... · bottlenecks by using a proxy network. In particular,

Table IBENEFIT OF DIFFERENT PROXIES FOR DIVERSE METRICS: THIS TABLE SHOWS THAT DIFFERENT PROXIES MAY OPTIMIZE DIFFERENT

METRICS SUCH AS BANDWIDTH, JITTER, AND DELAY, EVEN FOR THE SAME PAIR OF END NODES.

Source-destination TCP b/w (KBps) UDP b/w (KBps) Jitter (ms) Delay (ms)pair Direct Via Best Direct Via Best Direct Via Best Direct Via Best

Path Proxy Proxy Path Proxy Proxy Path Proxy Proxy Path Proxy ProxyS→A 145 172 C 67 120 D 10.9 0.28 E 15 0.089 ES→B 33 46 F 29 75 G 21.0 0.28 E 33 0.089 E

but they are not designed to exploit the characteristicsof specific nodes on behalf of applications.

Overlay networks can provide greater reliability andimprove performance of the networking components.[2] details a scheme allowing distributed Internet ap-plications to detect and recover from path outages, andre-route around failures. [20] supports this view andprovides evidence of improved robustness and scalabilityof the network. [3], [30] improve client availability byusing multi-homing and cooperative overlay networks tofind and use a larger number of paths to the server.

Performance improvements using overlay networksare brought about by exploiting triangle inequality innetworks and selecting better paths [32] [50] [38]. Theresults we observe in this paper are consistent withexisting literature. Through our work, we demonstratethe usefulness of proxies in such a setting. [43] improvesperformance in grid environments by using store andcooperating forwarding techniques. They demonstratehow “logical depot" can be used to accelerate networkcomponents of a distributed grid application. Similarly,we show how the network components of a HPC appli-cation – a Montage workflow – can be accelerated usingvolunteer donated resources. We differ from the existingimplementations in the sense that our architecture ishighly dynamic and comprised of donated resources.

Estimating network paths and forecasting future net-work conditions are addressed by [48]. We have usedsimple active probing techniques and network heuristicsfor prototyping and evaluation of network paths in ourexperiments. Existing tools [35], [18], [28] would giveus a more accurate view of the network as a whole.Direct probing in a large network isn‘t scalable, andwe advocate the use of passive or secondhand measure-ments [25]. [24] shows that it is possible to infer networkconditions based on CDN redirections and [13] is animplementation of such a scheme. Such techniques canbe easily used by end clients to identify proxies in closenetwork proximity, without any point-to-point probing.We aim to integrate a number of these techniques intothe proxy network.

Our approach is novel in that proxies may assumea diverse set of roles unlike other systems in whichnetwork nodes either compute, route, serve data, or

invoke services. Selecting multiple proxies and enablingthem to assume different roles is key to the performanceof distributed data-intensive workflow applications.

VI. CONCLUSION

In this paper, we demonstrated how a proxy networkcan benefit distributed data-intensive complex workflowapplications by performing smart routing and in-networkcomputations to boost performance. A novel aspect ofour work is the inclusion of multiple proxies to acceler-ate different workflow stages optimizing different perfor-mance metrics. We show that the approach is effectivefor workflow applications and broadly applicable. Theproxy network is a perfect fit for workflow applicationsrunning across geographically dispersed clouds.

To demonstrate the potential of proxies, we performedexperiments on PlanetLab, exploiting its resource andnetwork diversity. Using Montage as an applicationexemplar, we showed how input, output, and internaldata transmission can be optimized resulting in lowerexecution times as well as improving end-user met-rics such as delay and energy usage. Our experimentsalso revealed that proxies can improve many aspectsof network performance including latency, throughput,and jitter. Future work will focus on the developmentof automated proxy selection algorithms for workflowapplications based on their assigned roles.

REFERENCES

[1] E. Amir, S. Mccanne, and H. Zhang. An application level videogateway. In ACM Multimedia, pages 255–265, 1995.

[2] D. Andersen, H. Balakrishnan, F. Kaashoek, and R. Morris.Resilient overlay networks. In ACM SOSP, 2001.

[3] D. G. Andersen, H. Balakrishnan, M. F. Kaashoek, and R. N. Rao.Improving web availability for clients with monet. In Proc. 2ndSymposium on Networked Systems Design and Implementation(NSDI), 2005.

[4] D. P. Anderson. BOINC: A System for Public-Resource Compt-ing and Storage. In Proceedings of the 5th ACM/IEEE Interna-tional Workshop on Grid Computing, 2004.

[5] H. Andrade, T. M. Kurç, A. Sussman, and J. H. Saltz. Activeproxy-g: optimizing the query execution process in the grid. InSC, pages 1–15, 2002.

[6] H. Andrade, T. M. Kurç, A. Sussman, and J. H. Saltz. Exploit-ing functional decomposition for efficient parallel processing ofmultiple data analysis queries. In IPDPS, page 81, 2003.

[7] A. Awan, R. Ferreira, S. Jagannathan, and A. Grama. Un-structured Peer-to-Peer Networks for Sharing Processor Cycles.Journal of Parallel Computing, 2005.

Page 10: Accelerating Distributed Workflows With Edge Resourcesdcsg.cs.umn.edu/Publications/Accelerating_Distributed_Workflows_With... · bottlenecks by using a proxy network. In particular,

[8] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. H. an dRolf Neugebauer, I. Pratt, and A. Warfield. Xen and the Art ofVirtualization. In Proceedings of the Nineteenth ACM Symposiumon Operating systems Principles, October 2003.

[9] A. Barker, J. B. Weissman, and J. van Hemert. OrchestratingData-centric Workflows. In IEEE/ACM CCGrid InternationalSymposium on Cluster Computing and the Grid, 2008.

[10] The Basic Local Alignment Search Tool (BLAST). http://www.ncbi.nlm.nih.gov/blast.

[11] C. M. Bowman, U. Manber, P. B. Danzig, M. F. Schwartz, D. R.Hardy, and D. P. Wessels. Harvest: A scalable, customizablediscovery and access system. Technical report, Dept. of ComputerSc., University of Colorado, 1994.

[12] Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, andS. Shenker. Making Gnutella-like P2P Systems Scalable. InProceedings of ACM SIGCOMM, Aug. 2003.

[13] D. Choffnes and F. Bustamante. On the effectiveness of mea-surement reuse for performance-based detouring. In INFOCOM2009, IEEE, pages 693 –701, April 2009.

[14] B. Cohen. Incentives build robustness in BitTorrent. In Proceed-ings of the First Workshop on the Economics of Peer-to-PeerSystems, June 2003.

[15] Condor project. http://www.cs.wisc.edu/condor/.[16] E. Deelman, S. Callaghan, E. Field, H. Francoeur, R. Graves,

N. Gupta, V. Gupta, T. H. Jordan, C. Kesselman, P. Maechling,J. Mehringer, G. Mehta, D. Okaya, K. Vahi, and L. Zhao.Managing Large-Scale Workflow Execution from Resource Pro-visioning to Provenance tracking: The CyberShake Example. InProceedings of the Second IEEE International Conference on e-Science and Grid Computing (e-Science’06), 2006.

[17] E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman,G. Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob,and D. S. Katz. Pegasus: A Framework for Mapping ComplexScientific Workflows onto Distributed Systems. Scientific Pro-gramming Journal, 13(3):219–237, July 2005.

[18] A. Downey. Using pathchar to estimate internet link characteris-tics. In Proceedings of ACM SIGCOMM, pages 241–250, 1999.

[19] F. Ableson, C. Collins, and R. Sen. Unlocking Android: ADeveloper’s Guide. Manning, CT, USA, 2009.

[20] N. Feamster, H. Balakrishnan, J. Rexford, A. Shaikh, and J. V. D.Merwe. The case for separating routing from routers. InACM SIGCOMM Workshop on Future Directions in NetworkArchitecture. ACM Press, 2004.

[21] A Brief Introduction to the Flexible Image Transport System(FITS). http://fits.gsfc.nasa.gov/fits_intro.html.

[22] I. Foster, C. Kesselman, J. Nick, and S. Tuecke. The Physiologyof the Grid: An Open Grid Services Architecture for DistributedSystems Integration. In Proceedings of the Global Grid Forum,June 2002.

[23] J. C. Jacob, D. S. Katz, T. Prince, B. G. Berriman, J. C.Good, A. C. Laity, E. Deelman, G. Singh, and M.-H. Su. TheMontage Architecture for Grid-Enabled Science Processing ofLarge, Distributed Datasets. In Proceedings of the Earth ScienceTechnology Conference, June 2004.

[24] A. jan Su, D. R. Choffnes, A. Kuzmanovic, and F. E. Busta-mante. Drafting behind akamai (travelocity-based detouring). InProceedings of ACM SIGCOMM, pages 435–446, 2006.

[25] J. Kim, A. Chandra, and J. Weissman. OPEN: Passive NetworkPerformance Estimation for Data-intensive Applications. Tech-nical Report 08-041, Dept. of CSE, Univ. of Minnesota, 2008.

[26] L. Kleinrock. Nomadic computing. In Keynote Address: MOBI-COM, 1995.

[27] T. Koponen, M. Chawla, B.-G. Chun, A. Ermolinskiy, K. H. Kim,S. Shenker, and I. Stoica. A data-oriented (and beyond) networkarchitecture. ACM SIGCOMM Computer Communication Re-view, 37(4), 2007.

[28] K. Lai and M. Baker. Measuring link bandwidths using adeterministic model of packet delay. In Proceedings of ACMSIGCOMM, pages 283–294, 2000.

[29] M. T. Le, S. Seshan, F. Burghardt, and J. Rabaey. Software ar-

chitecture of the infopad system. In Proceedings of the MobidataWorkshop on Mobile and Wireless Information Systems, 1994.

[30] Z. Li, P. Mohapatra, and C. nee Chuah. Virtual multi-homing:On the feasibility of combining overlay routing with bgp routing.In Proc. of Networking 2005, pages 1348–1352, 2005.

[31] V. Lo, D. Zappala, D. Zhou, Y. Liu, and S. Zhao. ClusterComputing on the Fly: P2P Scheduling of Idle Cycles in theInternet. In Proceedings of the IEEE Fourth InternationalConference on Peer-to-Peer Systems, 2004.

[32] C. Lumezanu, Y. Baden, N. Spring, and B. Bhattacharjee. Tri-angle inequality and routing policy violations in the internet. InProceedings of the 10th International Conference on Passive andActive Network Measurement, 2009.

[33] D. Molnar. The SETI@Home problem. ACM Crossroads, Sept.2000.

[34] D. Mosberger and L. L. Peterson. Making paths explicit in thescout operating system. In USENIX OSDI, pages 153–167, 1996.

[35] A. Pasztor and D. Veitch. Active probing using packet quartets.In ACM SIGCOMM Internet Measurement Workshop, pages 293–305, 2002.

[36] S. Ramakrishnan, R. Reutiman, A. Chandra, and J. Weissman.Standing on the Shoulders of Others: Using Proxies to Oppor-tunistically Boost Distributed Applications. Technical Report 10-012, Dept. of CSE, Univ. of Minnesota, Aug. 2010.

[37] S. Saroiu, K. P. Gummadi, R. J. Dunn, S. D. Gribble, and H. M.Levy. An Analysis of Internet Content Delivery Systems. InProceedings of Symposium on Operating Systems Design andImplementation, Dec. 2002.

[38] S. Savage, T. Anderson, A. Aggarwal, D. Becker, N. Cardwell,A. Collins, E. Hoffman, J. Snell, A. Vahdat, G. Voelker, andJ. Zahorjan. Detour: a case for informed internet routing andtransport. IEEE Micro, 19:50–59, 1999.

[39] Sloan digital sky survey. http://www.sdss.org/.[40] Squid proxy cache. http://www.squid-cache.org.[41] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Bal-

akrishnan. Pastry: Scalable, decentralized object location. InProceedings of the ACM SIGCOMM, Aug 2001.

[42] D. Sulakhe, A. Rodriguez, M. Wilde, I. Foster, and N. Malt-sev. Interoperability of GADU in Using Heterogeneous GridResources for Bioinformatics Applications. IEEE Transactionson Information Technology in Biomedicine, 12(2):241–246, Mar.2008.

[43] M. Swany. Improving throughput for grid applications withnetwork logistics. In SC ’04: Proceedings of the 2004 ACM/IEEEconference on Supercomputing, page 23. IEEE Computer Society,2004.

[44] I. J. Taylor, E. Deelman, D. B. Gannon, and M. Shields, edi-tors. Workflows for e-Science: Scientific Workflows for Grids.Springer-Verlag, Dec. 2006.

[45] D. L. Tennenhouse, J. M. Smith, W. D. Sincoskie, D. J. Wetherall,and G. J. Minden. A survey of active network research. IEEECommunications Magazine, 35:80–86, 1997.

[46] D. Turi, P. Missier, C. A. Goble, D. D. Roure, and T. Oinn.Taverna workflows: Syntax and semantics. In eScience, pages441–448, 2007.

[47] J. Weissman and S. Ramakrishnan. Using proxies to acceleratecloud applications. In HotCloud’09: Proceedings of the 2009conference on Hot topics in cloud computing, Berkeley, CA,USA, 2009. USENIX Association.

[48] R. Wolski, N. T. Spring, and J. Hayes. The network weatherservice: A distributed resource performance forecasting servicefor metacomputing. Journal of Future Generation ComputingSystems, 15:757–768, 1999.

[49] B. Yee, D. Sehr, G. Dardyk, J. B. Chen, R. Muth, T. Ormandy,S. Okasaka, N. Narula, and N. Fullaga. Native client: a sandboxfor portable, untrusted, x86 native code. In Proceedings of IEEESecurity and Privacy, 2009.

[50] H. Zheng, E. K. Lua, M. Pias, and T. G. Griffin. Internet routingpolicies and round-trip-times. In PAM, 2005.