1 An Optimal Ofﬂoading Partitioning Algorithm in …1 An Optimal Ofﬂoading Partitioning...

1

An Optimal Offloading Partitioning Algorithmin Mobile Cloud Computing

Huaming Wu, Daniel Seidenstucker, Yi Sun, Carlos Martın Nieto,William Knottenbelt, and Katinka Wolter

Abstract—Mobile offloading is an effective way that migrates computation-intensive parts of applications from resource-constrained mobiledevices onto remote resource-rich servers. Application partitioning plays a critical role in high-performance offloading systems, whichinvolves splitting the execution of applications between the mobile side and cloud side so that the total execution cost is minimized. Throughpartitioning, the mobile device can have the most benefit from offloading the application to a remote cloud. In this paper, we study how toeffectively and dynamically partition a given application into local and remote parts while keeping the total cost as small as possible. Forgeneral tasks (i.e., arbitrary topological consumption graphs), we propose a new min-cost offloading partitioning (MCOP) algorithm that aimsat finding the optimal application partitioning (determining which portions of the application to run on mobile devices and which portions oncloud servers) under different partitioning cost models and mobile environments. The simulation results show that the proposed algorithmprovides a stably low time complexity method and can significantly reduce execution time and energy consumption by optimally distributingtasks between mobile devices and cloud servers, and in the meantime, it can well adapt to environment changes.

Index Terms—Mobile device, mobile cloud computing, communication networks, offloading, cost graph, partitioning algorithm.

F

1 INTRODUCTION

A LONG with the maturity of mobile cloud computing,mobile cloud offloading is becoming a promising method

to reduce execution time and prolong battery life of mobiledevices. Its main idea is to augment execution through mi-grating heavy computation from mobile devices to resourcefulcloud servers and then receive the results from them viawireless networks. Offloading is an effective way to overcomethe resources and functionalities constraints of the mobiledevices since it can release them from intensive processing andincrease performance of the mobile applications.

Offloading all computation components of an applicationto the remote cloud is not always necessary or effective.Especially, for some complex applications that can be dividedinto a set of dependable parts, a mobile device should judi-ciously determine whether to offload computation and whichportion of the application should be offloaded to the cloud.We need to make offloading decisions for all the parts, andthe decision made for one part depends on the other parts.As mobile computing increasingly interacts with the cloud, anumber of approaches have been proposed, e.g., MAUI [1] andCloneCloud [2], aiming at offloading some parts of the mobileapplication execution to the cloud. To achieve a good perfor-mance, they particularly focus on an application partitioningproblem, i.e., to decide which parts of an application shouldbe offloaded to powerful servers in a remote cloud and whichparts should be executed locally on mobile devices such thatthe total execution cost is minimized. Therefore, partitioningalgorithms play a critical role in a high-performance offloading

• H. Wu, D. Seidenstucker, Y. Sun, C. M. Nieto and K. Wolter are with theInstitut fur Informatik, Freie Universitat Berlin, Germany, 14195.Email: {huaming.wu, seided, yi.sun, carlosmn, katinka.wolter}@fu-berlin.de.

• W. Knottenbelt is with the Department of Computing, Imperial CollegeLondon, UK.Email: [email protected].

system, and their main goal is to keep the whole cost as smallas possible.

The main costs for mobile offloading systems are the com-putational cost for local and remote execution, respectively,and the communication cost due to the extra communicationbetween the mobile device and the remote cloud. Calculationscan naturally be described as a graph in which vertices rep-resent computational costs and edges reflect communicationcosts [3]. By partitioning the vertices of a graph, the calculationcan be divided among processors of local mobile devicesand remote cloud servers. Traditional graph partitioning al-gorithms (e.g., [4], [5], [6] and [7]) cannot be applied directlyto the mobile offloading systems, because they only considerthe weights on the edges of the graph, neglecting the weight ofeach node. Our research is situated in the context of resource-constrained mobile devices, in which there are often multi-objective partitioning cost functions, such as minimizing thetotal response time or energy consumption on mobile devicesby offloading partial workloads to a cloud server.

In this paper, we explore the methods of how to deploysuch an offloadable application in a more optimal way, bydynamically and automatically determining which parts ofthe application should be computed on the cloud server andwhich parts should be left on the mobile device to achievea particular performance target (low latency, minimization ofenergy consumption, low response time, etc.) [8]. We studyhow to disintegrate and distribute modules of applicationbetween mobile devices and cloud server, and effectivelyutilize the cloud resources. The problem of whether or notto offload certain parts of an application to the cloud de-pends on the following factors: CPU speed of mobile device,network bandwidth, transmission data size, and the speedof the cloud server [9]. With considering these factors, weconstruct a weighted consumption graph (WCG) accordingto the estimated computational and communication cost, andfurther derive a new min-cost offloading partitioning (MCOP) al-

arX

iv:1

510.

0798

6v1

[cs

.DC

] 2

7 O

ct 2

015

2

gorithm designed especially for the mobile offloading systems.This MCOP algorithm aims at finding the optimal cut thatminimizes a given objective function (response time, energyconsumption or the weighted sum of time and energy) andcan be applied to WCGs of arbitrary topology.

The remainder of this paper is organized as follows. Wereview related work in Section 2. Section 3 explores the par-titioning challenges and process. Section 4 brings in the parti-tioning models such as topology, optimization and partitioningcost models. An optimal partitioning algorithm for arbitrarytopology is proposed and investigated in Section 5. Section 6describes three different profilers that are used for informationcollecting. Section 7 gives some evaluation and simulationresults. Finally, the paper is summarized in Section 8.

2 RELATED WORK

Offloading becomes an attractive solution for meeting responsetime requirements on mobile systems as applications becomeincreasingly complex [10]. Extending battery lifetime is alsoone of the most crucial design objectives of mobile devicesbecause they are usually equipped with limited battery ca-pacity. Many research efforts have been devoted to offloadingcomputation to remote servers in order to shorten executiontime or save energy consumption.

Karthik et al. et al. argued that offloading could potentiallysave energy and reduce execution time for mobile users, butnot all applications are energy-efficient and time-saving whenthey are migrated to the cloud. It depends on whether thecomputational cost saved due to offloading outperforms theextra communication cost. A large amount of communicationcombined with a small amount of computation should prefer-ably be performed locally on the mobile device, while a smallamount of communication with a large amount of computationshould preferably be executed remotely.

The partitioning algorithm introduced in [11] aims at re-ducing the response time of tasks on mobile devices. It findsthe offloading and integrating points on a sequence of calls bydepth-first search and a linear time searching scheme, and canachieve low user-perceived latency while largely reduce thepartitioning computation on cloud. The offloading inferenceengine proposed in [12] can adaptively make decisions atruntime, dynamically partition an application and offload partof the application execution to a powerful nearby surrogate.Some application partitioning solutions [13], [14], [15] heavilydepend upon programmers and middleware to partition theapplications, which limits their uses.

Partitioning technologies were adopted to identify of-floaded parts for energy saving [1], [16], [17]. The energy costof each function of the application was profiled. According tothe profiling result, they constructed a cost graph, in whicheach node represented a function to be performed, and eachedge indicated the data to be transmitted. Finally, the serverparts were executed on remote servers for reducing energyconsumption. CloneCloud [2] used a combination of staticanalysis and dynamic profiling to partition applications auto-matically at a fine granularity while optimizing execution timeand energy usage for a target computation and communicationenvironment. However, this approach only considers limitedinput/environmental conditions in the offline pre-processingand needs to be bootstrapped for every new application built.

This work was motivated by the above interesting worksto investigate the partitioning problem in mobile cloud com-puting environment, aiming at the different objects, includingminimum of the response time, minimum of the energy con-sumption, and minimum of weighted sum of time and energy.We explicitly considered the mobile nature of both user andapplication behaviors, and addressed how dynamic partition-ing can address these heterogeneity problems by taking thebandwidth as a variable. Thus, we greatly extending priorwork [2] by considering dynamic partitioning of applicationsbetween weak devices and clouds, in order to better supportapplications running on diverse devices in different environ-ments.

3 PARTITIONING PROBLEMS

3.1 ChallengesApplication partitioning is very important for designing anadaptive, cost-effective, and efficient offloading system. Somecritical issues concerning the partitioning problem include:

• Weighting: when choosing an application task to of-fload, we need to scale the weights of each appli-cation task regarding its resource utilization, such asmemory, processing time, and bandwidth utilizations[18]. The weights can vary for different mobile devicesand in different running environments. Communicationoverhead is introduced by the remote communicationbetween a mobile device and a cloud server.

• Real-Time Adaptability: since available network band-widths vary in wireless environments, static partition-ing algorithms proposed by previous works with afixed bandwidth assumption are unsuitable for mobileplatforms [19]. The partitioning algorithms should beadaptive to network and device changes. For example,an optimal partition for a high-bandwidth low-latencynetwork and low-capacity client might not be a goodpartition for a high-capacity client with a bad networkconnection. Since the network condition is only mea-surable at run time, the partitioning algorithm shouldbe a real-time online process [11].

• Partitioning Efficiency: making partitioning decisionsfor simple applications (e.g., an alarm clock) at real-timeis not difficult, but for some complex applications (e.g.,speech/face recognition) that contain a large number ofmethods [11], a highly efficient algorithm is required toperform real-time partitioning.

3.2 Application Partitioning ProcessTo solve the above challenges, the workflow of anenvironment-adaptive application partitioning process is pro-posed in Fig. 1.

It starts with profiling an application that can be split intomultiple tasks, through static analysis and dynamic profilingtechnology [20]. We then construct a WCG of the mobileapplication as shown in Fig. 3(b). Based on partitioning costmodels, an elastic partitioning algorithm is proposed to makea proper application partitioning. By calling such an algorithm,we can get preliminary partitioning results for response timeor energy optimization. During the execution process of theapplication, if the mobile environment changes, and thesechanges meet or exceed a certain threshold, the application

3

PartitioningResult

Partitioning

Partition Cost Module

Environment Changed

End

N

Profiling

Application StaticAnalysis

StatisticalAnalysis

Graph

Offloading

Y

Start

Fig. 1. Flowchart of an application partitioning process

graph will be re-partitioned according to the new parameters.Therefore, it can ultimately realize the condition-aware andenvironment-adaptive elastic partitioning. Here in the contextof a mobile environment, it includes mobile computing re-sources inside the device, a battery level, CPU, memory, etc.,but also includes an external mobile environment, such as thenetwork connection and the cloud’s speed. After partitioning,it then automatically offloads the distributed applications thatrequire remote execution to a cloud server and performs therest locally on the mobile device according to the partitioningresults.

Therefore, the problem of whether or not to offload certainparts of an application to the cloud depends on the followingfactors: CUP speed of the mobile device, network bandwidth,transmission data size, and the speed of the cloud server [9].When considering such factors, we construct a WCG accordingto the estimated computational and communication cost, andfurther derive a new partitioning algorithm designed espe-cially for the mobile offloading systems.

3.3 Application Task Classification

Different applications emerge in a mobile device according tosome process and each consists of several tasks. Since not allthe application tasks are suitable for remote execution, theyneed to be weighed and distinguished as:

• Unoffloadable Tasks: some should be unconditionallyexecuted locally on the mobile device, either becausetransferring relevant information would take tremen-dous time and energy or because these tasks mustaccess local components (camera, GPS, user interfaces,accelerometer or other sensors etc.) [1]. Tasks that mightcause security issues when executed on a different placeshould also not be offloaded (such as e-commerce).Local processing consumes the battery power of themobile device, fortunately, there are no communicationcosts or delays.

• Offloadable Tasks: some application components areflexible tasks that can be processed either locally onthe processor of the mobile device, or remotely in acloud infrastructure. Many tasks fall into this category,and the offloading decision depends on whether thecommunication costs outweigh the difference betweenlocal and remote costs [10].

We do not need to take offloading decisions for unof-floadable components. However, as for offloadable ones, sinceoffloading all tasks of an application to the remote cloud isnot necessary or effective under all circumstances, it is worthconsidering what should be executed locally on the mobiledevice and what should be offloaded onto the remote cloudfor execution based on available networks, response time orenergy consumption. The mobile device has to take an offload-ing decision based on the result of a dynamic optimizationproblem.

4 PARTITIONING MODELS

In this section, we will illustrate which assumptions are made,how WCGs for different types of applications are constructedand how the optimization problem is defined.

4.1 Different Topologies

Flexible partitioning granularity-based applications are notlimited to a specific form. Previous works consider applica-tion partitioning at different levels of granularity: classes [21],objects [20], methods [1], components [7], [22], and threads[2]. Without loss of generality, we refer to application tasks inthis paper. Application developers can choose the appropriatepartition granularity according to different applications.

Construction of WCGs is critical for the application par-titioning. A mobile application can be represented as a list offine-grained tasks, formulating different topologies as depictedin Fig. 2, where each node reflects an application task, executedeither on the mobile device or offloaded onto the cloud side forfurther execution.

1

(a) One

1 2 3 4 5

(b) Linear

1

34

5 2

(c) Loop

1

2 3

4 65

(d) Tree

1

2

3

4

5

6

Vc

(e) Mesh

Fig. 2. Task-flow graphs for different topologies

(a) Only one active node: representing an entire application(without partitioning). Such a topology is often adoptedby previous full offloading schemes such as [2], [23],[24], [25], which can also be viewed as an exampleof the software as a service. In this case, the wholeapplication is migrated to a remote server involvingcomplete transfer of code and program state to the

4

server [26]. The main drawback of this solution includesinflexibility and coarse granularity.

(b) Linear topology: representing a sequential list of fine-grained tasks [11]. Each task is sequentially executed,with output data generated by one task as the input ofthe next one [27].

(c) Loop-based topology: a loop-based application is one inwhich most of the functionality is given by iterating anexecution loop, such as all the online social applications,in which we model their processing with a graph thatconsists of a cycle [28].

(d) Tree-based topology: representing a tree-based hierarchyof tasks [26]. The node at the top of the tree is theapplication entry node (i.e., the main module).

(e) Mesh-based topology: representing a lattice-based topol-ogy of tasks, e.g., a Java example of face recognition asdepicted in [20].

When compared with the scheme that offloads the wholeapplication (i.e., Fig. 2(a)) into the cloud, an application parti-tioning scheme is able to achieve a fine granularity for compu-tation offloading when partitioning a topological consumptiongraph (CG) between local and remote execution. Differentpartitions can lead to different costs, and the total cost incurreddue to offloading depends on multiple factors, such as deviceplatforms, networks, clouds, and workloads. Therefore, theapplication may have different optimal partitions for differentmobile environments and workloads.

4.2 Construction of Weighted Consumption Graphs

There are two types of costs in the offloading systems: one iscomputational cost of running the application tasks locally orremotely (including memory cost, processing time cost, andso on) and the other is communication cost for the applica-tion tasks’ interaction (associated with movement of data andrequisite messages). Even the same task can have a differentcost on the mobile device and the cloud in term of execu-tion time and energy consumption. As cloud servers usuallyexecute much faster than mobile devices having a powerfulconfiguration, it can save energy and improve performancewhen offloading part of the computation to remote servers[29]. However, when vertices are assigned to different sides,the interaction between them leads to the extra communicationcost. Therefore, we try to find the optimal assignment ofvertices for graph partitioning and computation offloading bytrading off the computational costs with the communicationcosts.

Call graphs are widely used to describe data dependencieswithin a computation, where each vertex represents a task andeach edge represents the calling relationship from the callerto the callee. Figure 3(a) shows a CG example consisting of sixtasks [13]. The computational costs are represented by vertices,while the communication costs are expressed by edges. Wedenote the dependency of an application’s tasks and theircorresponding costs as a directed acyclic graph G = (V,E),where the set of vertices V = (v1, v2, · · · , vN ) denotes Napplication tasks and an edge e(vi, vj) ∈ E represents thefrequency of invocation and data access between nodes viand vj , where vertices vi and vj are neighbors. Each task viis characterized by five parameters:

• type: offloadable or unoffloadable task.

• mi: the memory consumption of vi on a mobile deviceplatform,

• ci: the size of the compiled code of vi,• inij : the data size of input from vi to vj ,• outji: the data size of output from vj to vi.

We further construct a WCG as depicted in Fig. 3(b). Eachvertex v ∈ V is annotated with two-cost weights via a 2-tuple w(v) =< wlocal(v), wcloud(v) >, where wlocal(v) andwcloud(v) represent the computational cost of executing the taskv locally on the mobile device and remotely on the cloud,respectively. The jth vertex weighted vector means the jth

tuple. Each vertex is assigned with one of the values in thetuple depending on the partitioning result of the applicationgraph it finally ends up in or the label it is assigned [30].The edge set E ⊂ V × V represents the communication costamongst tasks. The weight of an edge w(e(vi, vj)) is denotedas:

w(e(vi, vj)) =inij

Bupload+

outijBdownload

, (1)

which is the communication cost of transferring the input andreturn states when the tasks vi and vj are executed on differentsides, and it closely depend on the network bandwidths (up-load bandwidth Bupload and download bandwidth Bdownload)and the transferred data.

A candidate offloading decision is described by one cutin the WCG, which separates the vertices into two disjointsets, one representing tasks that are executed on the mobiledevice and the other one implying tasks that are offloaded tothe remote server [31]. Hence, taking the optimal offloadingdecision is equivalent to partitioning the WCG such that anobjective function is minimized [32].

The red dotted line in Fig. 3(b) is one possible partitioningcut, indicating the partitioning of computational workload inthe application between the mobile device and the cloud. Vl

and Vc are sets of vertices, where Vl is the local set in whichtasks are executed locally and Vc is the cloud set in which tasksare directly offloaded to the cloud. We have Vl ∩ Vc = ∅ andVl ∪ Vc = V . Further, Ecut is the edge set in which the graph iscut into two parts.

4.3 Cost Models

Mobile application partitioning aims at finding the optimalpartitioning solution that leads to the minimum executioncost, in order to make the best tradeoff between time/energysavings and transmission costs/delay.

The optimal partitioning decision depends on user require-ments/expectations, device information, network bandwidth,and the application itself. Device information includes theexecution speed of the device and the workloads on it whenthe application is launched. If the device computes very slowlyand the aim is to reduce execution time, it is better to offloadmore computation to the cloud [33]. Network bandwidth af-fects data transmission for remote execution. If the bandwidthis very high, the cost in terms of data transmission will be low.In this case, it is better to offload more computation to thecloud.

The partitioning decision is made based on the cost esti-mation (computational and communication costs) before the

5

1

2

3

4

Vc

5

6

in12out21

in25

in46

Vc

out64

Vcin34

Vcout34

out52

in24out42

in13out31

Mobile Side

Graph Cut

Cloud Side

m3, c3

m2, c2

m1, c1

m4, c4

m5, c5

m6, c6

m i = memoryici = code_sizei

in45 out54

��unoffloadable

, offloadable

, offloadable, offloadable

, offloadable

, offloadable

type={offloadable, unoffloadable}

(a) CG

1

2

3

4 6

Vc

Mobile Side Cloud Side

Ecut

< wlocal (v1),wcloud (v1) >

w(e(v1,v3))

< wlocal (v2 ),wcloud (v2 ) >

w(e(v1,v2 ))

Vl

Vc



< wlocal (v4 ),wcloud (v4 ) > < wlocal (v6 ),w

cloud (v6 ) >

w(e(v2 ,v4 ))

w(e(v2 ,v5))

w(e(v4 ,v6 ))

w(e(v3,v4 ))

5

w(e(v4 ,v5))

(b) WCG

Fig. 3. Construction of WG and WCG.

program execution. On the basis of Fig. 3(b), we can formulatethe partitioning problem as:

Ctotal =∑v∈V

Iv · wlocal(v) +∑v∈V

(1− Iv) · wcloud(v) +∑e(vi,vj)∈E

Ie · w(e(vi, vj)), (2)

where the total cost is the sum of computational costs (localand remote) and communication costs of cut affected edges.

The cloud server node and the mobile device node mustbelong to different partitions. One possible solution for thispartitioning problem will give us an arbitrary tuple of parti-tions from the vertices set < Vl, Vc > and the cut of edge setEcut in the following way:

Iv =

{1, if v ∈ Vl

0, if v ∈ Vcand Ie =

{1, if e ∈ Ecut0, if e /∈ Ecut

. (3)

We seek to find an optimal cut in the WCG such thatsome application tasks are executed on the mobile side and theremaining ones on the cloud side. The optimal cut maximizesor minimizes an objective function and meanwhile satisfies amobile device’s resource constraints. The objective function ex-presses the general goal of a partition, this may be, for instance,minimize the energy consumption, minimize the amount ofexchanged data, or complete the execution in less than apredefined time. We only actually perform the partitioningwhen it is beneficial. Not all applications can benefit frompartitioning because of application-specific properties. The costestimation of running each application task on the mobiledevice and cloud server is needed. Offloading makes senseonly if the speedup of the cloud server overweigh the extracommunication costs.

The communication time and energy costs for the mobiledevice will vary according to the amount of data to be trans-mitted and the wireless network conditions. According to (2),the dynamic execution configuration of an elastic applicationcan be decided based on some different saving objectives withrespect to response time and energy consumption. A task’soffloading goals may change due to a change in environmentalconditions.

4.3.1 Minimum Response TimeThe communication cost depends on the size of data transferand the network bandwidth, while the computational cost is

impacted by the computation time. If the minimum responsetime is selected as the objective function, we can calculate thetotal time spent due to offloading as:

Ttotal(I) =∑v∈V

Iv · T lv +

∑v∈V

(1− Iv) · T cv +

∑e∈E

Ie · T tre , (4)

where T lv = F ·T c

v : the computing time of task v on the mobiledevice when it is executed locally; F : the speedup factor, theratio of the cloud server’s execution speed compared to thatof the mobile device, since the computation capacity of cloudinfrastructure is stronger than that of the mobile device, wehave F > 1; T c

v : the computing time of task v on the cloudserver once it is offloaded, T tr

e = Dtre /B: the communication

time between the mobile device and the cloud; Dtre : the amount

of data that is transmitted and received; B: the current wirelessbandwidth.

In this scenario, the offloading decision engine then se-lects the best partitioning candidate that minimizes the totalresponse time. The aim of this cost model is to find the optimalapplication partitioning: Imin =

{Iv, Ie|Iv, Ie ∈ {0, 1}

}, which

satisfies Imin = arg minI Ttotal(I).The saved response time in the partitioning scheme com-

pared to the scheme without offloading is calculated as:

Tsave(I) =Tlocal − Ttotal(I)

Tlocal· 100%, (5)

where Tlocal =∑

v∈V T lv is the local time cost when all the

application tasks are executed locally on the mobile device.Besides, for a given application and a mobile device, the

optimal partitioning results also change according to the sit-uations under different wireless network bandwidths and thespeedup factors of the cloud server.

4.3.2 Minimum Energy ConsumptionSimilarly, if the minimum energy consumption is chosen as theobjective function, we can calculate the total energy consumedby the mobile device due to offloading as:

Etotal(I) =∑v∈V

Iv · Elv +

∑v∈V

(1− Iv) · Eiv +

∑e∈E

Ie · Etre , (6)

where Elv = Pm · T l

v : the energy consumed of task v on themobile device when it is executed locally, Ei

v = Pi · T cv : the

energy consumed of task v on the mobile device when it isoffloaded to the cloud, Ee = Ptr · T tr

e : the energy spent on the

6

communication between the mobile device and the cloud. Pm,Pi and Ptr are the powers of the mobile device for computing,while being idle and for sending or receiving data, respectively.

In this scenario, the offloading decision engine then selectsthe best partitioning plan that minimizes the partitioningcost of energy. The aim is to find the optimal applicationpartitioning: Imin =

{Iv, Ie|Iv, Ie ∈ {0, 1}

}, which satisfies:

Imin = arg minI Etotal(I).The saved energy when compared to the scheme without

offloading is:

Esave(I) =Elocal − Etotal(I)

Elocal· 100%, (7)

where Elocal =∑

v∈V Elv is the local energy cost when all the

application tasks are executed on the mobile device.

4.3.3 Minimum of the Weighted Sum of Time and EnergyIf we combine both the response time and energy consump-tion, we can design the cost model for partitioning as follows:

Wtotal(I) = ω · Ttotal(I)

Tlocal+ (1− ω) · Etotal(I)

Elocal, (8)

where 0 ≤ ω ≤ 1 is a weighting parameter used to indicaterelative importance between the response time and energy con-sumption. Large ω favors response time while small ω favorsenergy consumption. In some special cases performance can betraded for power consumption and vice versa [34], thereforewe can use the ω parameter to express such special cases pref-erences for different applications. Ttotal(I) and Etotal(I) are theresponse time and energy consumption with the partitioningsolution I , respectively. To eliminate the impact of differentscales of time and energy, they are divided by the local costs.If Ttotal(I)/Tlocal is less than 1, the partitioning will increase theapplication’s power consumption. Similarly, if Etotal(I)/Elocalis less than 1, it will reduce the application’s performance.

In this scenario, the offloading decision engine then selectsthe best partition plan that minimizes the partitioning cost ofweighted sum of time and energy. The aim is to find the op-timal application partitioning: Imin =

{Iv, Ie|Iv, Ie ∈ {0, 1}

},

while satisfying: Imin = arg minI Wtotal(I).The saved weighted sum of time and energy in the parti-

tioning scheme compared to the scheme without offloading iscalculated as:

Wsave(I) = ω·Tlocal − Ttotal(I)

Tlocal+(1−ω)·Elocal − Etotal(I)

Elocal·100%.

(9)

5 PARTITIONING ALGORITHM FOR OFFLOADING

In this section, we introduce the min-cost offloading parti-tioning (MCOP) algorithm for WCGs of arbitrary topology.The MCOP algorithm takes a WCG as input which representsan application’s operations/calculations as the nodes and thecommunication between them as the edges. Each node hastwo costs: the first is the cost of performing the operationlocally (e.g., on the mobile phone) and the second is the cost ofperforming it elsewhere (e.g., on the cloud). The weight of theedges is the communication cost to the offloaded computation.It is assumed that the communication cost between operationsin the same location are negligible. The result contains infor-mation about the costs and reports which operations should beperformed locally and which should be offloaded.

5.1 StepsThe MCOP algorithm can be divided into two steps as follows:

1) Unoffloadable Vertices Merging: An unoffloadable vertexis the one that has special features making it unable tobe migrated outside of the mobile device and thereforeis located only in the unoffloadable partition. Apartfrom this, we can choose any task to be executedlocally according to our preferences or other reasons.Then all vertices that are not going to be migrated tothe cloud are merged into one that is selected as thesource vertex. By ‘merging’, we mean that these nodesare coalesced into one, whose weight is the sum ofthe weights of all merged nodes. Let G represent theoriginal graph after all the unoffloadable vertices aremerged.

2) Coarse Partitioning: The target of this step is to coarsenG to the coarsest graph G|V |. To coarsen means tomerge two nodes and reduce the node count by one.Therefore, the algorithm has |V | − 1 phases. In eachphase i (for 1 ≤ i ≤ |V | − 1), the cut value, i.e., thepartitioning cost in a graph Gi = (Vi, Ei) is calculated.Gi+1 arises from Gi by merging “suitable nodes”,where G1 = G. The partitioning results of using theMCOP algorithm are the minimum cut among all thecuts in an individual phase i and the correspondinggroup lists for local and cloud execution.

Furthermore, in each phase i of the coarse partitioning, westill have five steps:

1) Start with A={a}, where a is usually an unffloadablenode in Gi.

2) Iteratively add the vertex to A that is the most tightlyconnected to A.

3) Let s, t be the last two vertices (in order) added to A.4) The graph cut of the phase i is between Vi\{t} and {t}.5) Gi+1 arises from Gi by merging vertices s and t.

5.2 MergingDefinition: If s, t ∈ V (s 6= t), then s and t can be merged asfollows:

1) Nodes s and t are chosen.2) Nodes s and t are substituted by a new node xs,t. All

edges that were previously incident to s or t are nowincident to xs,t (except the edge between nodes s andt when they are connected).

3) Multiple edges are resolved by adding edge weights.The weights of the node xs,t are resolved by addingthe weights of s and t.

The merging function is used to merge two vertices intoone new vertex, which is implemented as in Algorithm 1. Forexample, we can merge nodes 2 and 4 as shown in Fig. 4.

5.3 Algorithmic ProcessThe algorithmic process is illustrated as the MinCut functionin Algorithm 2, and in each phase i, it calls the MinCutPhasefunction as described in Algorithm 3. Since some tasks have tobe executed locally, we need to merge them into one node.

The core of this algorithm is to make it easy to select thenext vertex to be added to the set A, that is Most Tightly

7

1

2

3

4

Vc

<0, 0>

<3, 1>

<6, 2><9, 3>

<12, 4> <15, 5>

4

2

3

8 1

5

4

5

6

(a) Step 1

2, 4

5

Vc

VcVc

<0, 0>

<3, 1>

<6, 2>

<12, 4>

<15, 5>

4

2

1

4

1

3Vc

<9, 3>

86

5

(b) Step 2

1

3

2, 4

5

6

Vc

VcVc

<0, 0>

<6, 2>

<3, 1>

<21, 7>

<15, 5>

8

4

1

5

6

(c) Step 3

Fig. 4. Example of merging two nodes

Algorithm 1 The Merging function//This function takes s and t as vertices in the given graphand merges them into oneFunction: G′=Merge(G,w, s, t)

Input: G: the given graph, G = (V,E)w: the weights of edges and verticess, t: two vertices in previous graph that are to be merged

Output: G′: the new graph after merging two vertices

1: xs,t ⇐ s ∪ t2: for all nodes v ∈ V do3: if v 6= {s, t} then4: w(e(xs,t, v)) = w(e(s, v)) + w(e(t, v))5: //adding weights of edges6:

[wlocal(xs,t), w

cloud(xs,t)]

=[wlocal(s) +

wlocal(t), wcloud(s) + wcloud(t)]

7: //adding weights of nodes8: E ⇐ E ∪ e(xs,t, v) //adding edges9: end if

10: E′ ⇐ E\{e(s, v), e(t, v)} //deleting edges11: end for12: V ′ ⇐ V \{s, t} ∪ xs,t

13: return G′ = (V ′, E′)

Connected Vertex (MTCV), which is defined as the vertexwhose ∆(v) into A is maximum, where ∆(v) = w(e(A, v)) −[wlocal(v) − wcloud(v)]. Further, we have the total cost frompartitioning:

Ccut(A−t,t) = C local −[wlocal(t)−wcloud(t)

]+∑

v∈A\t

w(e(t, v)),

(10)where C local =

∑v∈V wlocal(v) is the total of local costs and the

cut valueCcut(A−t,t) is the partitioning cost, wlocal(t)−wcloud(t)is the gain of node t from offloading, and

∑v∈A\t w(e(t, v)) is

the total of extra communication costs due to offloading.

Theorem 1. cut(A − t, t) is always a minimum s − t cut in thecurrent graph, where s and t are the last two vertices added in thephase, the s− t cut separates nodes s and t on two different sides.

The run of each MinCutPhase function orders the verticesof the current graph linearly, starting with a and ending withs and t, according to the order of addition into A. We want toshow that Ccut(A−t,t) ≤ Ccut(H) for any arbitrary s− t cut H .

Lemma 1. We define H as an arbitrary s − t cut, Av as a set ofvertices added to A before v, and Hv as a cut of Av ∪ {v} inducedby H . For all active vertices v, we have Ccut(Av, v) ≤ Ccut(Hv).

Algorithm 2 The MinCut function//This function performs an optimal offloading partitionalgorithmFunction: [minCut,MinCutGroupsList] =MinCut(G,w, SourceV ertices)

Input: G: the given graph, G = (V,E)w: the weights of edges and verticesSourceVertices: a list of vertices that are forced to be kept in

one side of the cutOutput: minCut: the minimum sum of weights of edges and

vertices among the cutMinCutGroupsList: two lists of vertices, one local list and

one remote list

1: w(minCut)⇐∞2: for i = 1 : length(SourceV ertices) do3: //Merge all the source vertices (unoffloadable) into one4: (G,w) = Merge(G,w, SourceV ertices(1), SourceV ertices(i))5: end for6: while |V | > 1 do7: [cut(A− t, t), s, t] = MinCutPhase(G,w)8: if w(cut(A− t, t)) < w(minCut) then9: minCut⇐ cut(A− t, t)

10: end if11: Merge(G,w, s, t)12: //Merge the last two vertices (in order) into one13: end while14: return minCut and MinCutGroupsList

Proof. As shown in Fig. 5, we use induction on the number ofactive vertices, k.

1) When k = 1, the claim is true,2) Assume the inequality holds true up to u, that is

Ccut(Au, u) ≤ Ccut(Hu),3) Suppose v is the first active vertex after u, according

to the assumption Ccut(Au, u) ≤ Ccut(Hu), then wehave:

Ccut(Av, v) = Ccut(Au, v) + Ccut(Av −Au, v)

≤ Ccut(Au, u) + Ccut(Av −Au, v) (u is MTCV)≤ Ccut(Hu) + Ccut(Av −Au, v)

≤ Ccut(Hv).

Since t is always an active vertex with respect to H , by theLemma 1, we can conclude that Ccut(A−t,t) ≤ Ccut(H) whichsays exactly that the cost of cut(A− t, t) is at most as heavy asthe cost of cut(H). Therefore, Theorem 1 is now proved.

8

Algorithm 3 The MinCutPhase function//This function perform one phase of the partitioning algo-rithmFunction: [cut(A− t, t), s, t]=MinCutPhase(Gi, w)

Input: Gi: the graph in Phase i, i.e., Gi = (Vi, Ei)w: the weights of edges and verticesSourceVertices: a list of vertices that are forced to be kept in

one side of the cutOutput: s, t: the lasted two vertices that are added to A

cut(A− t, t): the cut between {A− t} and {t} in phase i

1: a⇐ arbitrary vertex of Gi

2: A⇐ {a}3: while A 6= Vi do4: max = −∞5: vmax = null6: for v ∈ Vi do7: if v /∈ A then8: //Performance gain through offloading the task v to

the cloud9: ∆(v)⇐ w(e(A, v))− [wlocal(v)− wcloud(v)]

10: //Find the vertex that is the most tightly connected toA

11: if max < ∆(v) then12: max = ∆(v)13: vmax = v14: end if15: end if16: end for17: A⇐ A ∪ {vmax}18: a⇐Merge(G,w, a, vmax)19: end while20: t⇐ the last vertex (in order) added to A21: s⇐ the last second vertex (in order) added to A22: return cut(A− t, t)

t sa

At(a) The s− t cut

t sa

Ht

(b) An arbitrary s− t cut

Fig. 5. The proof of Lemma 1

5.4 Computational Complexity

As the running time of the algorithm MinCut is essentiallyequal to the added running time of the |V | − 1 runs of Min-CutPhase, which is called on graphs with decreasing number ofvertices and edges, it suffices to show that a single MinCutPhaseneeds at most O(|V | log |V | + |E|) time yielding an overallrunning time. The computational complexity of the MCOPalgorithm can be noted as O(|V |2 log |V |+ |V ||E|).

As a comparison, linear programming (LP) solvers arewidely used in schemes like [1] and [2]. The LP solver is basedon branch and bound, which is an algorithm design paradigmfor discrete and combinatorial optimization problems, as wellas general real valued problems [35]. The number of its op-

tional solutions grows exponentially with the number of tasks,which means higher time complexity O

(2|V |

).

Therefore, the MCOP algorithm has much lower time com-plexity when compared to the existing algorithms, which isproportion to the square of the number of tasks and hence canachieve an optimal offloading strategy as quickly as possible.

5.5 Case Study

Figure 6 shows that node a is defined as the starting pointin which the corresponding task will always be computed bythe mobile device. We have s = d and t = f, and the inducedordering a, c, b, e, d, f of the vertices. Node f is cut off from thegraph. The first cut-of-the-phase corresponds to the partitions{a, c, b, e, d} and {f}. Since the overall local cost is C local =∑

v∈V wlocal(v) = 45, we can calculate the cut cost by using(10) as: Ccut(A−f,f) = 45 − (15 − 5) + 5 = 40. At the end, wemerge nodes s = d and t = f into one.

From Figs. 7-10, we repeat the same process of the Min-CutPhase function as the first phase in Fig. 6. There are|V | − 1 = 5 phases, and at the end, all nodes are mergedinto one. Then, we compare all the cut values, the minimumvalue refers to the phase which has the optimal partitioningcut. In this scenario, the minimum cut of the graph G is thefourth cut-of-the-phase. The optimal cut is between {a, c} and{b, d, e, f} as depicted in Fig. 11 with the minimum cost ofCcut({a, c}, {b, d, e, f}) = 45− (42−14) + (4 + 1) = 22. Here, tasksb, d, e, f are offloaded to the remote cloud server while tasks aand c are executed locally.

a

Vc

<0, 0>

<3, 1>

8

4

1

1+4-(42-14)=-23

c

bdef

<42, 14>

Fig. 11. The optimal cut in phase 4

6 PROFILING

How to build the WCG is actually the bottleneck of wholetechnique, which closely depends on profiling, i.e., the processof gathering the information required to make offloading de-cisions. Such information may consist of the computation andcommunication costs of the execution units (program profiler),the network status (network profiler), and the mobile devicespecific characteristics such as energy consumption (energyprofiler). Profilers are needed to collect information about thedevice and network characteristics, which is a critical part ofthe partitioning algorithm: the more accurate and lightweightthey are, the more correct decisions can be made, and the loweroverhead is introduced [36]. We will in the following introduceall types of profilers.

6.1 Program Profiler

A program profiler (static or dynamic) collects characteristicsof applications, e.g., the execution time, the memory usage and

9

a

cVc

<0, 0>

<3, 1>

<9, 3>

<15, 5>8

2

34

1

5

4

e

f

b

d

8-(3-1)=6

4-(9-3)=-2

a

Vc

<0, 0>

<3, 1>

<6, 2><9, 3>

<15, 5>8

2

34

1

5

4

e

fd

4-(9-3)=-2

a

Vc

<0, 0>

<3, 1>

<6, 2>

<9, 3>

<12, 4> <15, 5>8

2

34

1

5

4

f

b

2-(6-2)=-2

1+3-(12-4)=-41-(12-4)=-7

b

c c

a

Vc

<0, 0>

<3, 1>

<9, 3>

<15, 5>8

2

34

1

5

4

e

f

b

1+3+4-(12-4)=0c

e

d d

a

Vc

<0, 0>

<3, 1>

<6, 2>

<9, 3>

<12, 4> <15, 5>8

2

34

1

5

4

eb

5-(15-5)=-5c

d f a

Vc

<0, 0>

<3, 1>

<6, 2><9, 3>

<12, 4> <15, 5>8

2

34

1

5

4

eb

c

d f

s t

a

Vc

<0, 0>

<3, 1>

<27, 9>

<9, 3>

8

2

34

1

4

eb

c

df

s and t merged

<12, 4>

<6, 2> <6, 2>

<12, 4><12, 4>

<6, 2>

G1 : A ={a} G1 : A ={a, c} G1 : A ={a, c, b} G1 : A ={a, c, b, e}

G1 : A ={a, c, b, e, d} G1 : A ={a, c, b, e, d, f}

Fig. 6. The 1st phase of MinCutPhase function. The induced ordering of the vertices is a, c, b, e, s, t, where s = d and t = f. The 1st cut-of-the-phasecorresponds to the partitions {a, c, b, e, d} and {f} with the cut value: Ccut(A−f,f) = 45− (15− 5) + 5 = 40.

a

cVc

<0, 0>

<3, 1>

<6, 2><9, 3>

8

2

34

1

4

eb

df

8-(3-1)=6

4-(9-3)=-2

a

Vc

<0, 0>

<3, 1>

<6, 2><9, 3>

8

2

34

1

4

e

df

4-(9-3)=-2

a

Vc

<0, 0>

<3, 1>

<6, 2><9, 3>

8

2

34

1

4b

2-(6-2)=-2

1+3-(27-9)=-141-(27-9)=-17

b

c c

a

Vc

<0, 0>

<3, 1>

<6, 2><9, 3>

8

2

34

1

4

eb

1+3+4-(27-9)=-10c

e

df

df a

Vc

<0, 0>

<3, 1>

<6, 2><9, 3>

8

2

34

1

4

eb

c

df a

Vc

<0, 0>

<3, 1>

<9, 3>

8

54

1

b

c

def

s t

<27, 9> <27, 9>

<27, 9><33, 11>

<27, 9>

<27, 9>

s and t mergedG2 : A ={a, c, b, e}

G2 : A ={a, c, b}G2 : A ={a, c}G2 : A ={a}

G2 : A ={a, c, b, e, {df}}

Fig. 7. The 2nd phase of MinCutPhase function. The induced ordering of the vertices is a, c, b, s, t, where s = e and t = {df}. The 2nd cut-of-the-phasecorresponds to the partitions {a, c, b, e} and {d, f} with the cut value: Ccut(A−{d,f},{d,f}) = 45− (27− 9) + (1 + 3 + 4) = 35.

a

cVc

<0, 0>

<3, 1>

<9, 3>

8

54

1

b

def

8-(3-1)=6

4-(9-3)=-2

a

Vc

<0, 0>

<3, 1>

<9, 3>

8

54

1

4-(9-3)=-2

a

Vc

<0, 0>

<3, 1>

<9, 3>

8

54

1

b

1+5-(33-11)=-161-(33-11)=-21

b

c c

a

Vc

<0, 0>

<3, 1>

8

4

1

c

bdef

<42, 14>

s and t merged

def

<33, 11> <33, 11> <33, 11>

a

Vc

<0, 0>

<3, 1>

<9, 3>

8

54

1

b

c

<33, 11>

defdef

s tG3 : A ={a} G3 : A ={a, c} G3 : A ={a, c, b} G3 : A ={a, c, b, {def}}

Fig. 8. The 3rd phase of MinCutPhase function. The induced ordering of the vertices is a, c, s, t, where s = b and t = {def}. The 3rd cut-of-the-phasecorresponds to the partitions {a, b, c} and {d, e, f} with the cut value: Ccut({a,b,c},{d,e,f}) = 45− (33− 11) + (1 + 5) = 29.

a

cVc

<0, 0>

<3, 1>

8

4

1

bdef

8-(3-1)=6

4-(42-14)=-24

a

Vc

<0, 0>

<3, 1>

8

4

1

1+4-(42-14)=-23

c

a

Vc

<0, 0>

12bcdef

<45, 15>

s and t merged

a

Vc

<0, 0>

<3, 1>

8

4

1

c

bdefbdef

s t

<42, 14><42, 14> <42, 14>

G4 : A ={a} G4 : A ={a, c} G4 : A ={a, c, {bdef}}

Fig. 9. The 4th phase of MinCutPhase function. The induced ordering of the vertices is a, s, t, where s = c and t = {bdef}. The 4th cut-of-the-phasecorresponds to the partitions {a, c} and {b, d, e, f} with the cut value: Ccut({a, c}, {b, d, e, f}) = 45−

{(42− 14)− (1 + 4)

}= 22.

10

a

Vc<0, 0>

12

12-(45-15)=-18

abcdef

<45, 15>

s and t merged

a

<0, 0>

bcdef

s t

<45, 15>

bcdef

<45, 15>

G5 : A ={a} G5 : A ={a, {bcdef}}

Fig. 10. The 5th phase of MinCutPhase function. The induced ordering of the vertices is s, t, where s = a and t = {bcdef}. The 5th cut-of-the-phasecorresponds to the partitions {a}, and {b, c, d, e, f} with cut value Ccut({a}, {b, c, d, e, f}) = 45− (45− 15) + 12 = 27.

the size of data. We can combine static analysis and dynamicprofiling to construct the WCG of an application.

Static analysis obtains the control flow graph of an applica-tion by analyzing the bytecode with nodes representing objectsand edges representing relations between objects. We can getall the objects and the relations between them based on methodinvocations by traversing the graph. Constructing call graphsby hand and without the help of analysis tools would have costfar more time and resources. Many tools and frameworks havebeen developed to generate the call graph. Many tools andframeworks have been developed to generate the call graph ofa given application, e.g., Spark [37], Cgc [5], and Soot [38], andthis automation is a huge advantage.

Dynamic profiling is adopted to obtain weights of thenodes and edges. Since there is a certain ratio of executiontime to the total bytecode instruction count for Java programs,execution time of objects can be evaluated by the corre-sponding bytecode instruction count [39]. Data transmissiondata between tasks include parameters and return values ofmethod invocations. Combining Java bytecode rewriting withpretreatment information like speedup factor F and wirelessbandwidth B, we can obtain the execution time for each task(node weight) and the transmission time for each invocation(edge weight). These weights can be dynamically assignedaccording to the different processing capabilities of the cloudserver and the wireless bandwidth.

We take a face recognition application1 as an example. Byanalyzing this application with Soot, the call graph could beconstructed as a tree-based topology in Fig. 12. From the localestimated execution time, we can get the remote estimatedexecution time, dividing by the speedup factor F . When of-floading a task to the cloud, the communication cost incurredbetween the mobile device and the cloud is the data trans-fer divided by the bandwidth. Then, we have the weightedconsumption graph for this application. Finally, with remoteexecution and transmission costs, we now have all informationto get the WCG.

6.2 Network Profiler

A network profiler collects information about wireless connec-tion status and available bandwidth. It measures the networkcharacteristics at initialization, and it continuously monitorsenvironmental changes. Network throughput can be obtainedby measuring the time duration when sending a certainamount of data as in [2]. Due to the mobile nature, the statusof a wireless connection could change frequently (e.g. usermoves to other location). Fresh information about a wireless

1. The face recognition application is built upon an open source codehttp://darnok.org/programming/face-recognition/, which implementsthe Eigenface face recognition algorithm

TestFaceRecognitionmain

1555.3ms

class namemethod nameexecution time

EigenFaceCreatorreadFaceBudles

1464ms

EigenFaceCreatorcheckAgainst

137.8ms

EigenFaceCreatorsubmitSet722.2ms

FaceBundlesubmitFace

35.9ms

EigenFaceCreatorreadImage

80.7ms

JPGFile<init>75.2ms

FaceBundlecompute37.2ms

JPGFile readImage

77.7ms

1024.2KB600KB

0KB 675.2KB

0KB

0.2KB0.29KB

10206KB

EigenFaceCreatorcomputeBundle

516.5ms

EigenFaceCreatorsaveBundle

192ms

10204KB

10206KB

EigenFaceCreatorsubmit

516.6ms

Jama.Matrixtimes

68.6ms

Jama.Matrixeig

2.2ms

Jama.Matrixtranspose33.0ms

12003KB3KB12000KB

19806KB

Fig. 12. Call graph of a face recognition application

connection is critical for the optimizer to make correct offload-ing decisions.

The profiler tracks several parameters for the WiFi and3G interfaces, including the number of packets transmittedand received per second, and receiving and transmitting datarate [36]. These measurements enable better estimation of thecurrent network performance being achieved. We can useSpeedtest2 to measure the mobile network bandwidth.

6.3 Energy Profiler

There are two ways to estimate the energy consumption,namely, software and hardware monitors. For example, MAUI[1] used a power meter attached to the smartphone’s batteryto build an energy profile. Power Monitor (e.g. Monsoonmonitor) is a device that measures energy consumption whendata is transmitted from the mobile device to the cloud serverby supplying a certain level of power to the mobile device.

We can also use PowerTutor3 to measure the power con-sumption of the applications. Although PowerTutor doe notgive very accurate results as a hardware power monitor does,the result is still reasonable and does provide some valuesbecause it gives the detailed energy consumption informationfor each hardware component.

2. A free connection analysis tool, which shows real-time download andupload graphs, stores results both locally and on the Internet for sharing,http://www.speedtest.net/

3. PowerTutor is an application for Android phones that provides ac-curate, real-time power consumption estimates for power-intensive hard-ware components, http://powertutor.org/

http://darnok.org/programming/face-recognition/

http://www.speedtest.net/

http://powertutor.org/

11

7 EVALUATION

7.1 SetupTo evaluate the partitioning algorithm, we need to know threedifferent kinds of values:

• Fixed Values: they are set by the mobile applicationdeveloper, determined based on a large number of ex-periments. For example, the power consumption valuesof Pm, Pi, and Ptr are parameters specific to the mobilesystem. We use an HP iPAQ PDA with a 400-MHzIntel XScale processor that has the following values:Pm ≈ 0.9 W, Pi ≈ 0.3 W, and Ptr ≈ 1.3 W [40].

• Specific Values: such parameters represent some state ofmobile devices, e.g., the size of transferred data, thevalue of current wireless bandwidth B (for convenient,we assume Bupload = Bdownload) and the speedup factorF that depends on the speed of current cloud serverand the mobile device.

• Calculated Values: these values cannot be determinedby application developers. For a given application, thecomputational cost is affected by input parameters anddevice characteristics, which can be measured using aprogram profiler. The communication cost is related totransmitting codes/data via wireless interfaces such asWiFi or 3G, which can be tracked by a network profiler.

Performance evaluation results encompass comparisonswith other existing schemes, in contrast to the energy conserva-tion efficiency and execution time. We compare the partitioningresults with two other intuitive strategies without partitioningand, for ease of reference, we list all three kinds of offloadingtechniques:

• No Offloading (Local Execution): all computation tasksof an application are running locally on the mobiledevice and there is no communication cost. This maybe costly since as compared to the powerful computingcapability at the cloud side, the mobile device is limitedin processing speed and battery life.

• Full Offloading: all computation tasks of mobile applica-tions (except the unoffloadable tasks) are moved fromthe local mobile device to the remote cloud for execu-tion. This may significantly reduce the implementationcomplexity, which makes the mobile devices lighterand smaller. However, full offloading is not always theoptimal choice since different application tasks mayhave different characteristics that make them more orless suitable for offloading [19].

• Partial Offloading (With Partitioning): with the help of theMCOP algorithm, all tasks including unoffloadable andoffloadable ones are partitioned into two sets, one forlocal execution on the mobile device and the other forremote execution on a cloud server node. Before a taskis executed, it may require certain amount of data fromother tasks. Thus, data migration via wireless networksis needed between tasks that are executed at differentsides.

We define the saved cost in the partial offloading schemecompared to that in the no offloading scheme as OffloadingGain, which can be formulated as:

Offloading Gain = 1− Partial Offloading CostNo Offloading Cost

· 100%. (11)

The offloading gains in terms of time, energy and theweighted sum of time and energy are described in (5), (7) and(9), respectively.

7.2 Evaluation in Computational Complexity

We implement the MCOP algorithm in Java that can serve as acomparison to the theoretic results, and the code can be foundin [41].

As an example, we partition the constructed WCG in Fig. 12under the condition of the speedup factor F = 2 and thebandwidth B = 1 MB/s, where the main and checkAgainstmethods are assumed as unoffloadable nodes. The optimal par-titioning result is depicted in Fig. 13. The red nodes representthe application tasks that should be offloaded to the remotecloud and the blue nodes are the tasks that are supposed tobe executed locally on the mobile device. The partition resultswill change as the wireless bandwidth B or the speedup factorF varies.

<75.20, 37.60>

<77.70, 38.85>

0.0

<1555.30, 777.65>

<137.80, 68.90>

0.29

<1464.00, 732.00>

0.2

<35.90, 17.95>

<37.20, 18.60>

0.0

<80.70, 40.35>

675.2

600.0 1024.2

<516.60, 258.30>

<2.20, 1.10>

3.0

<33.00, 16.50>

12000.0

<68.60, 34.30>

12003.0

<516.50, 258.25>

19806.0

<722.20, 361.10>

10204.0

<192.00, 96.00>

10206.0

10206.0

Fig. 13. Optimal partitioning result of the face recognition application whenF = 2 and B = 1 MB/s

The running time of the java implementation under dif-ferent number of application tasks is depicted as Fig. 14.We compare it with the theoretic computational complexitydenoted as O(|V |2 log |V |+ |V ||E|) in Section 5.4. We find theyhave a good match with each other, which further proofs thatour partitioning algorithm has much lower time complexitythan the LP solver which has exponential time complexity.

Number of Tasks0 100 200 300 400 500 600 700

Run

ning

Tim

e/s

0

10

20

30

40

50

60

70

Simulation

Theory

Fig. 14. Running time of the MCOP algorithm under different number oftasks

12

7.3 Evaluation in Dynamic Conditions

We build a graphical user interface (GUI) in MATLAB asshown in Fig. 15. The GUI is responsible for interaction withthe user: inputing parameters accordingly and displaying theapplication partitioning results. The GUI is responsible for userinteraction such as receiving input parameters and displayingthe application partitioning results.

Fig. 15. The user interface for demonstration

The user first inputs or selects the relative parameters,such as Application Graph, Unoffloadable Nodes and OptimizationModel. We can either use the predefined application graphsof “linear”, “loop”, “tree” and “mesh” or just choose “user”to input any arbitrary CG. Then, by clicking the “Graph”button, a WCG will be constructed based on the above pa-rameters. Further, by clicking the “Start Partitioning” button,the partitioning process will begin, by calling the partitioningalgorithm of MCOP. We can get the partitioning results suchas Partial Offloading Cost, No offloading Cost, Full Offloading Costand Offloading Gain. In addition, the optimal partitioning graphwill appear like Fig. 16, which further proves the correctnessof the partitioning result in Fig. 11 with the minimum cost of22. We can get the different results under different parametersof speedup factor F and wireless bandwidth B.

a:<0,0>

b:<9,3>

c:<3,1>

d:<6,2>

e:<12,4>

f:<15,5>

4

8

3

2

1

4

5

Fig. 16. An optimal partitioning result of using the MCOP algorithm

As depicted in Fig. 17, the speedup factor is set as F = 3.Since the low bandwidth results in much higher costs for datatransmission, the full offloading scheme cannot benefit fromoffloading. Given a relatively large bandwidth, the response

time or energy consumption obtained by the full offloadingscheme slowly approaches to the partial offloading schemebecause the optimal partition includes more and more tasksrunning on the cloud side until all offloadable tasks are of-floaded to the cloud. With the higher bandwidth, they beginto coincide with each other and only decrease because allpossible nodes are offloaded and the transmissions becomefaster. Both response time and energy consumption have thesame trend as the wireless bandwidth increases. Therefore,bandwidth is a critical condition for offloading since the mobilesystem could benefit a lot from offloading in high bandwidthenvironments, while with low bandwidths, the no offloadingscheme is preferred.

As shown in Fig. 18, the bandwidth is fixed as B = 3 MB/s.It can be seen that offloading benefits from higher speedupfactors. When F is very small, the full offloading scheme canreduce energy consumption of the mobile device, however ittakes much more response time than the no offloading scheme.The partial offloading scheme that adopts the MCOP algorithmcan effectively reduce execution time and energy consumption,while adapting to environmental changes.

From Figs. 17-18 we can tell that the full offloading schemeperforms much better than the no offloading scheme undercertain adequate wireless network conditions, because the ex-ecution cost of running methods on the cloud server is signifi-cantly lower than on the mobile devices when the speedup fac-tor F is large. The partial offloading scheme outperforms theno offloading and full offloading schemes and significantly im-proves the application performance, since it effectively avoidsoffloading tasks in the case of large transition costs betweenconsecutive tasks compared to the full offloading scheme, andoffloads more appropriate tasks to the cloud server. In a word,neither running all application tasks locally on the mobileterminal nor always offloading their execution to a remoteserver, can offer an efficient solution, but rather our partialoffloading scheme can do.

We then compare the cost savings under three different costmodels. For the model of minimum weighted time and energy,the weights of response time and energy consumption are bothset to 0.5.

As shown in Fig. 19(a), it can be seen that when thebandwidth is low, the offloading gains for all three cost mod-els are very small and almost coincide. That’s because moretime/energy will be spent in transferring the same data dueto the low network bandwidth, resulting in execution timeincreases. As the bandwidth increases, the offloading gainsfirstly arise drastically and then the increases become slower.It can be concluded that the optimal partition includes moreand more tasks running on the cloud side until all the tasksare offloaded to the cloud when the bandwidth increases.Among the partitioning cost models, the minimum energyconsumption model has the largest offloading gain, followedby the minimum weighted sum of time and energy, while theresponse time benefits the least from the offloading. Similarly,Figure 19(b) demonstrates how the partitioning result varies asthe speedup factor F changes. When F is small, the offloadinggains for all three cost models are very low since a smallvalue means very little computational cost reduction fromremote execution. As F increases, the offloading gains firstlyarise drastically and then approach to the same value. That’sbecause the benefits from offloading cannot neglect the extracommunication cost. From Fig. 19, it can be seen that the

13

Wireless Bandwidth B (MB/s)0 0.5 1 1.5 2 2.5 3 3.5 4

Res

pons

e T

ime

(s)

0

20

40

60

80

100

120

140

No OffloadingFull OffloadingPartial Offloading

(a) Response TimeWireless Bandwidth B (MB/s)

0 0.5 1 1.5 2 2.5 3 3.5 4

Ene

rgy

Con

sum

ptio

n (J

)

0

20

40

60

80

100

120

140

160

180


(b) Energy Consumption

Fig. 17. Comparisons of different schemes under different wireless bandwidths when the speedup factor F = 3

Speedup Factor F1 2 3 4 5 6 7 8

Res

pons

e T

ime

(s)

5

10

15

20

25

30

35

40

45

50


(a) Response TimeSpeedup Factor F

1 2 3 4 5 6 7 8

Ene

rgy

Con

sum

ptio

n (J

)

5

10

15

20

25

30

35

40

45


(b) Energy Consumption

Fig. 18. Comparisons of different schemes under different speedup factors when the bandwidth B = 3 MB/s

Wireless Bandwidth B (MB/s)0 1 2 3 4 5 6

Offl

oadi

ng G

ain

(%)

0

10

20

30

40

50

60

70

80

90

Minimum Response TimeMinimum Energy ConsumptionMinimum Weighted Time and Energy

(a) Wireless bandwidth B (F = 3)Speedup Factor F

0 2 4 6 8 10 12 14 16 18 20

Offl

oadi

ng G

ain

(%)

0

10

20

30

40

50

60

70

80

90

Minimum Response TimeMinimum Energy ConsumptionMinimum Weighted Time and Energy

(b) Speedup factor F (B = 3 MB/s)

Fig. 19. Offloading gains under different environment conditions when ω = 0.5

proposed MCOP algorithm is able to effectively reduce theapplication’s energy consumption as well as execution time.Further, it can adapt to environment changes to some extentand avoids a sharp decline in application performance oncethe bandwidth falls dramatically.

8 CONCLUSION AND FUTURE WORK

In this paper, for applications under different scenarios, weconstruct them into different WCGs of arbitrary topology. Totackle the problem of dynamic partitioning in a mobile envi-ronment, we propose a new offloading partitioning algorithm(MCOP algorithm) that finds the optimal application parti-tioning under different cost models to make the best trade-off

14

between time/energy savings and transmission costs/delay.Contrary to the traditional graph partitioning problem, ouralgorithm is not restricted to balanced partitions but takes theinfrastructure heterogeneity into account.

The MCOP algorithm provides a stably quadratic runtimecomplexity for determining which parts of application tasksshould be offloaded to the cloud server and which parts shouldbe executed locally, in order to save mobile devices’ energy andto reduce application’s execution time. Experimental resultsshow that according to environmental changes (e.g., networkbandwidths and cloud server performance), the proposed al-gorithm can effectively achieve the optimal partitioning resultin terms of time and energy saving. Offloading benefits a lotfrom high bandwidths and large speedup factors, while lowbandwidths favor the no offloading scheme.

Future work consists of integrating the MCOP with otheralgorithms (e.g., the one creates a graph out of program parts,the one partitions an application into parts and the one is ableto prepare data of application parts that once offloaded, thecloud server is able to execute them) in an actual softwaredeployment framework to automatically distribute softwarecomponents on a cloud infrastructure.

REFERENCES

[1] E. Cuervo, A. Balasubramanian, D.-k. Cho, A. Wolman, S. Saroiu,R. Chandra, and P. Bahl, “Maui: making smartphones last longer withcode offload,” in Proceedings of the 8th international conference on Mobilesystems, applications, and services, pp. 49–62, ACM, 2010.

[2] B.-G. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti, “Clonecloud:elastic execution between mobile device and cloud,” in Proceedings ofthe sixth conference on Computer systems, pp. 301–314, ACM, 2011.

[3] B. Hendrickson and T. G. Kolda, “Graph partitioning models forparallel computing,” Parallel computing, vol. 26, no. 12, pp. 1519–1534,2000.

[4] M. Stoer and F. Wagner, “A simple min-cut algorithm,” Journal of theACM (JACM), vol. 44, no. 4, pp. 585–591, 1997.

[5] K. Ali and O. Lhotak, “Application-only call graph construction,”in ECOOP 2012–Object-Oriented Programming, pp. 688–712, Springer,2012.

[6] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy min-imization via graph cuts,” Pattern Analysis and Machine Intelligence,IEEE Transactions on, vol. 23, no. 11, pp. 1222–1239, 2001.

[7] L. Yang, J. Cao, Y. Yuan, T. Li, A. Han, and A. Chan, “A frameworkfor partitioning and execution of data stream applications in mobilecloud computing,” ACM SIGMETRICS Performance Evaluation Review,vol. 40, no. 4, pp. 23–32, 2013.

[8] H. Wu and K. Wolter, “Software aging in mobile devices: Partialcomputation offloading as a solution,” in Software Reliability Engineer-ing Workshops (ISSREW), 2015 IEEE International Symposium on, IEEE,2015.

[9] Y. Liu and M. J. Lee, “An effective dynamic programming offloadingalgorithm in mobile cloud computing system,” in Wireless Communi-cations and Networking Conference (WCNC), 2014 IEEE, pp. 1868–1873,IEEE, 2014.

[10] K. Kumar, J. Liu, Y.-H. Lu, and B. Bhargava, “A survey of computationoffloading for mobile systems,” Mobile Networks and Applications,vol. 18, no. 1, pp. 129–140, 2013.

[11] Y. Zhang, H. Liu, L. Jiao, and X. Fu, “To offload or not to offload:an efficient code partition algorithm for mobile cloud computing,” inCloud Networking (CLOUDNET), 2012 IEEE 1st International Conferenceon, pp. 80–86, IEEE, 2012.

[12] X. Gu, K. Nahrstedt, A. Messer, I. Greenberg, and D. Milojicic,“Adaptive offloading for pervasive computing,” Pervasive Computing,IEEE, vol. 3, no. 3, pp. 66–73, 2004.

[13] I. Giurgiu, O. Riva, D. Juric, I. Krivulev, and G. Alonso, “Calling thecloud: enabling mobile phones as interfaces to cloud applications,” inMiddleware 2009, pp. 83–102, Springer, 2009.

[14] R. Kemp, N. Palmer, T. Kielmann, and H. Bal, “Cuckoo: a compu-tation offloading framework for smartphones,” in Mobile Computing,Applications, and Services, pp. 59–79, Springer, 2012.

[15] D. Kovachev, “Framework for computation offloading in mobilecloud computing,” IJIMAI, vol. 1, no. 7, pp. 6–15, 2012.

[16] D. Huang, P. Wang, and D. Niyato, “A dynamic offloading algorithmfor mobile computing,” Wireless Communications, IEEE Transactions on,vol. 11, no. 6, pp. 1991–1995, 2012.

[17] Z. Li, C. Wang, and R. Xu, “Computation offloading to save energy onhandheld devices: a partition scheme,” in Proceedings of the 2001 inter-national conference on Compilers, architecture, and synthesis for embeddedsystems, pp. 238–246, ACM, 2001.

[18] S. Ou, K. Yang, and A. Liotta, “An adaptive multi-constraint parti-tioning algorithm for offloading in pervasive systems,” in PervasiveComputing and Communications, 2006. PerCom 2006. Fourth AnnualIEEE International Conference on, pp. 10–pp, IEEE, 2006.

[19] L. Lei, Z. Zhong, K. Zheng, J. Chen, and H. Meng, “Challengeson wireless heterogeneous networks for mobile cloud computing,”Wireless Communications, IEEE, vol. 20, no. 3, 2013.

[20] J. Niu, W. Song, and M. Atiquzzaman, “Bandwidth-adaptive par-titioning for distributed execution optimization of mobile applica-tions,” Journal of Network and Computer Applications, vol. 37, pp. 334–347, 2014.

[21] E. Abebe and C. Ryan, “Adaptive application offloading using dis-tributed abstract class graphs in mobile environments,” Journal ofSystems and Software, vol. 85, no. 12, pp. 2755–2769, 2012.

[22] T. Verbelen, T. Stevens, F. De Turck, and B. Dhoedt, “Graph par-titioning algorithms for optimizing software deployment in mobilecloud computing,” Future Generation Computer Systems, vol. 29, no. 2,pp. 451–459, 2013.

[23] H. Wu, Q. Wang, and K. Wolter, “Tradeoff between performanceimprovement and energy saving in mobile cloud offloading systems,”in Communications Workshops (ICC), 2013 IEEE International Conferenceon, pp. 728–732, IEEE, 2013.

[24] M. Satyanarayanan, P. Bahl, R. Caceres, and N. Davies, “The case forvm-based cloudlets in mobile computing,” Pervasive Computing, IEEE,vol. 8, no. 4, pp. 14–23, 2009.

[25] H. Wu, Q. Wang, and K. Wolter, “Optimal cloud-path selection inmobile cloud offloading systems based on qos criteria,” InternationalJournal of Grid and High Performance Computing (IJGHPC), vol. 5, no. 4,pp. 30–47, 2013.

[26] V. Pandey, S. Singh, and S. Tapaswi, “Energy and time efficientalgorithm for cloud offloading using dynamic profiling,” WirelessPersonal Communications, pp. 1–15, 2014.

[27] M. Jia, J. Cao, and L. Yang, “Heuristic offloading of concurrent tasksfor computation-intensive applications in mobile cloud computing,”in Computer Communications Workshops (INFOCOM WKSHPS), 2014IEEE Conference on, pp. 352–357, IEEE, 2014.

[28] A.-C. OLTEANU and N. TAPUS, “Tools for empirical and operationalanalysis of mobile offloading in loop-based applications,” InformaticaEconomica, vol. 17, no. 4, pp. 5–17, 2013.

[29] R. Niu, W. Song, and Y. Liu, “An energy-efficient multisite offloadingalgorithm for mobile devices,” International Journal of DistributedSensor Networks, 2013.

[30] K. Sinha and M. Kulkarni, “Techniques for fine-grained, multi-sitecomputation offloading,” in Proceedings of the 2011 11th IEEE/ACMInternational Symposium on Cluster, Cloud and Grid Computing, pp. 184–194, IEEE Computer Society, 2011.

[31] B. Y.-H. Kao and B. Krishnamachari, “Optimizing mobile computa-tional offloading with delay constraints,” in Proc. of Global Communi-cation Conference (Globecom 14), pp. 8–12, 2014.

[32] C. Wang and Z. Li, “Parametric analysis for adaptive computationoffloading,” in ACM SIGPLAN Notices, vol. 39, pp. 119–130, ACM,2004.

[33] L. Yang and J. Cao, “Computation partitioning in mobile cloudcomputing: A survey,” ZTE Communications, vol. 4, pp. 003–, 2013.

[34] Y.-W. Kwon and E. Tilevich, “Energy-efficient and fault-tolerant dis-tributed mobile execution,” in Distributed Computing Systems (ICDCS),2012 IEEE 32nd International Conference on, pp. 586–595, IEEE, 2012.

[35] Wikipedia, “Branch and bound.” http://en.wikipedia.org/wiki/Branch and bound.

[36] S. Kosta, A. Aucinas, P. Hui, R. Mortier, and X. Zhang, “Thinkair:Dynamic resource allocation and parallel execution in the cloudfor mobile code offloading,” in INFOCOM, 2012 Proceedings IEEE,pp. 945–953, IEEE, 2012.

[37] O. Lhotak and L. Hendren, “Scaling java points-to analysis usingspark,” in Compiler Construction, pp. 153–169, Springer, 2003.

[38] “Soot: A framework for analyzing and transforming java and androidapplications,” in http://sable.github.io/soot/.

[39] W. Binder and J. Hulaas, “Using bytecode instruction counting asportable cpu consumption metric,” Electronic Notes in TheoreticalComputer Science, vol. 153, no. 2, pp. 57–77, 2006.

http://en.wikipedia.org/wiki/Branch_and_bound

http://en.wikipedia.org/wiki/Branch_and_bound

15

[40] K. Kumar and Y.-H. Lu, “Cloud computing for mobile users: Can of-floading computation save energy?,” Computer, vol. 43, no. 4, pp. 51–56, 2010.

[41] “Optimal partitioning algorithm.” https://github.com/carlosmn/work-offload.

https://github.com/carlosmn/work-offload

https://github.com/carlosmn/work-offload

1 An Optimal Ofﬂoading Partitioning Algorithm in …1 An Optimal Ofﬂoading Partitioning...

Documents

Transcript of 1 An Optimal Ofﬂoading Partitioning Algorithm in …1 An Optimal Ofﬂoading Partitioning...