Architecture and robustness tradeoffs in speed-scaled queues with application to energy management

13
This article was downloaded by: [Laurentian University] On: 25 November 2014, At: 08:03 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Click for updates International Journal of Systems Science Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/tsys20 Architecture and robustness tradeoffs in speed-scaled queues with application to energy management Tuan V. Dinh a , Lachlan L. H. Andrew a & Yoni Nazarathy b a Swinburne University of Technology, Victoria, Australia b The University of Queensland, Queensland, Australia Published online: 24 Jan 2013. To cite this article: Tuan V. Dinh, Lachlan L. H. Andrew & Yoni Nazarathy (2014) Architecture and robustness tradeoffs in speed-scaled queues with application to energy management, International Journal of Systems Science, 45:8, 1728-1739, DOI: 10.1080/00207721.2012.749435 To link to this article: http://dx.doi.org/10.1080/00207721.2012.749435 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Transcript of Architecture and robustness tradeoffs in speed-scaled queues with application to energy management

Page 1: Architecture and robustness tradeoffs in speed-scaled queues with application to energy management

This article was downloaded by: [Laurentian University]On: 25 November 2014, At: 08:03Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Click for updates

International Journal of Systems SciencePublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/tsys20

Architecture and robustness tradeoffs in speed-scaledqueues with application to energy managementTuan V. Dinha, Lachlan L. H. Andrewa & Yoni Nazarathyb

a Swinburne University of Technology, Victoria, Australiab The University of Queensland, Queensland, AustraliaPublished online: 24 Jan 2013.

To cite this article: Tuan V. Dinh, Lachlan L. H. Andrew & Yoni Nazarathy (2014) Architecture and robustness tradeoffs inspeed-scaled queues with application to energy management, International Journal of Systems Science, 45:8, 1728-1739,DOI: 10.1080/00207721.2012.749435

To link to this article: http://dx.doi.org/10.1080/00207721.2012.749435

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Architecture and robustness tradeoffs in speed-scaled queues with application to energy management

International Journal of Systems Science, 2014Vol. 45, No. 8, 1728–1739, http://dx.doi.org/10.1080/00207721.2012.749435

Architecture and robustness tradeoffs in speed-scaled queues with applicationto energy management

Tuan V. Dinha,∗, Lachlan L. H. Andrewa and Yoni Nazarathyb

aSwinburne University of Technology, Victoria, Australia; bThe University of Queensland, Queensland, Australia

(Received 16 May 2012; final version received 18 October 2012)

We consider single-pass, lossless, queueing systems at steady-state subject to Poisson job arrivals at an unknown rate. Servicerates are allowed to depend on the number of jobs in the system, up to a fixed maximum, and power consumption is anincreasing function of speed. The goal is to control the state dependent service rates such that both energy consumption anddelay are kept low. We consider a linear combination of the mean job delay and energy consumption as the performancemeasure. We examine both the ‘architecture’ of the system, which we define as a specification of the number of speedsthat the system can choose from, and the ‘design’ of the system, which we define as the actual speeds available. Previouswork has illustrated that when the arrival rate is precisely known, there is little benefit in introducing complex (multi-speed)architectures, yet in view of parameter uncertainty, allowing a variable number of speeds improves robustness. We quantifythe tradeoffs of architecture specification with respect to robustness, analysing both global robustness and a newly definedmeasure which we call local robustness.

Keywords: parameter uncertainty; robust design; controlled single server queue; speed scaling

1. Introduction

Performance analysis, design and control by means ofstochastic queueing models (cf. Wolff 1989) has affecteda variety of fields, including not only telecommunicationsand computing systems but also service engineering, man-ufacturing, logistics, health-care, road traffic and biologicalmodelling. A typical queueing model abstracts unknownjob arrival and service requirements by means of stochas-tic processes and distributions. The resulting dynamics ofqueue-length, workload or other performance processes areanalysed yielding performance measures that ultimately al-low for better design and control of the system at hand.Design of the system often refers to an off-line specificationof parameters whereas control of the system typically refersto an on-line decision making based on state measurements(e.g. setting service speeds). In this paper we shall use athird term, architecture selection, referring to the action ofdeciding what are the design and control parameters thatare available to process.

Almost all of the queueing theoretic, performance anal-ysis, design, control and architecture selection literature isbased on the underlying assumption that the probabilitylaws of arrival and service processes are precisely known.A few exceptions to this rule are mentioned later in thissection. In practice, this assumption is often too strong,especially due to the fact that obtaining precise a prioriparameter estimates is not possible in many settings. Our

∗Corresponding author. Email: [email protected]

contribution in this paper is in quantifying the effect ofarchitecture selection on robustness. Here the property ofrobustness refers to the ability of the system to operate ina near-optimal manner even when estimates of parametervalues are not precise, or even grossly incorrect. As this isgenerally a vague concept, one of the contributions of thispaper is in proposing measures of robustness.

Our analysis focuses on a model that is applicable tocomputing systems operating in an energy aware speed-scaling environment. The model we consider is an M/G/1-PS queue with state dependent service rates. A Poissonstream of jobs arriving at rate λ is served by a processorsharing (PS) regime that operates as follows: When thereare n jobs in the system, each job is served at a rate sn/n,where s0 = 0. The objective of design and control is tominimise the operating cost, defined as a linear combina-tion of mean delay and mean energy consumption. Highservice rates generally imply low job delay yet typicallyincur higher computing energy costs due to the fact thatpower consumption of devices is often a strictly convex,increasing function of the processing speed.

Wierman, Andrew, and Tang (2009) and Andrew, Lin,and Wierman (2010), in their studies of similar modelsand cost objectives, showed that when λ is known, a sin-gle speed architecture (s1 = s2 = · · ·) yields comparableperformance to an optimally tailored sequence of speeds.In that sense, a simple architecture can be sufficient. The

C© 2013 Taylor & Francis

Dow

nloa

ded

by [

Lau

rent

ian

Uni

vers

ity]

at 0

8:03

25

Nov

embe

r 20

14

Page 3: Architecture and robustness tradeoffs in speed-scaled queues with application to energy management

International Journal of Systems Science 1729

pitfall mentioned in those studies is that in the more realis-tic setting in which λ is unknown, multi-speed architecturesare generally more robust. More precisely, fix some designarrival rate, λd , then a multi-speed architecture where thespeeds are optimised for λd greatly outperforms a single-speed architecture also optimised for λd in cases where theactual arrival rate λa differs from λd . Andrew et al. (2010)also discussed the service fairness along with optimalityand robustness, and argued that only two out of three objec-tives can be achieved at the same time, although progresstowards achieving all three has since been made by Elahi,Williamson, and Woelfel (2012).

The robust multi-speed architecture in Andrew et al.(2010) generally allows each system occupancy, n, to havean arbitrary speed sn which is not subjected to an overallmaximum bound. Such an architecture generally does notcome without additional costs of manufacturing, device-footprint, control complexity and other application specificissues. The question then remains: How many speeds arerequired in order to allow for robust speed-scaled systems?Or equivalently: How does architecture selection affect therobustness of the system to parameter uncertainty?

In specifying an architecture, one aspect is the numberof available speeds, and another is the ability of the con-trol to adapt to the actual load. We consider two regimes:fixed allocation (FA) and adaptive allocation (AA). In bothregimes, the set of available speeds is fixed at design time,yet the way states are mapped to speeds varies:

• Fixed allocation (FA): There is a fixed (design-time)mapping, setting sn to be one of the available speeds.In this case there is no run-time control calculation.

• Adaptive allocation (AA): It is assumed that thetrue arrival rate λa is accurately estimated at run-time hence allowing sn to be mapped to one of theavailable speeds in a way that optimises performancefor the given λa .

It is clear that adaptive allocation provides greater robust-ness than fixed allocation, yet in many computing scenar-ios, this is not without additional design complexity. Notethat our adaptive allocation scheme assumes that λa is es-timated perfectly and that the resulting system is in steadystate with that λa . One may also consider adaptive controlin the sense of estimating λa and optimising the control ina time-varying environment, yet this is not the focus of ourcurrent work.

In this paper, we examine optimal designs for architec-tures with a finite available number of speeds subject to afixed maximum. We compare robustness measures betweenthe AA and FA regimes. We introduce two robustness mea-sures: global robustness, applicable when nothing is knownabout λa , and local robustness applicable when λa ≈ λd .These measures of system architecture robustness are im-portant in their own right and may be applied to similar

models. We also show numerically an interesting counter-intuitive result: Having more speeds under the FA regimecan be less globally robust as λa becomes sufficiently large.

1.1. Related work

There has been extensive work on optimal control of birth–death processes and related models. Low (1974) consid-ered a similar queue but used price policies to controlthe unknown arrival rate to maximise the long run aver-age expected reward per unit time. George and Harrison(2001) considered the case where the arrival rate is knownwith state-dependent service rates and developed a numer-ical technique that imposes no upper bound on the servicerate. Ata and Shneorson (2006) considered controlling botharrival and service rates to maximise the system objec-tive, which is the average welfare in their model. Efrosininand Semenova (2009) looked at a slightly different sys-tem where server reliability is uncertain. Control for anM/M/s system was discussed by Serfozo (1981) assumingarrival and service rates can be chosen upon arrival based onthe current system occupancy. Jain, Sharma, and Shekhar(2005) investigated a queue-dependent multi-processor ser-vice system which also adopts dynamic service rates. Thetrade-off between delay and energy was also studied inGoseling, Boucherie, and van Ommeren (2009) in the con-text of a system with multiple queues. None of these papersdeals with robustness properties in depth.

Research on speed scaling often studies the worst-caseperformance, rather than average performance. In this con-text, Chan et al. (2007), Lam, Lee, To, and Wong (2008)and Bansal, Chan, Lam, and Lee (2008) have consideredthe linear combination of delay and energy with an upperbound on the speed. Unlike the present paper, they make theusual assumption that all speeds below some upper boundare permissible and do not consider the case of constrainedarchitectures to a small finite number of speeds as wedo.

1.2. Robustness, parameter uncertainty andadaptive control of queues

It appears that the field of performance analysis and controlof queues in face of parameter uncertainty is almost unstud-ied. For illustration, observe the annotated bibliography,Nazarathy and Pollet (2011), containing a comprehensivelist of papers in the literature dealing with parameter esti-mation in queues. There are under 250 such publications,and almost none of them explicitly deals with control inthe view of uncertainty. An exception is Jain, Lim, andShanthikumar (2010), dealing with robustness with respectto the probability laws of the underlying stochasticprocesses using advanced point process theory. A compre-hensive survey of robust control methods in the contextof operations research is in Lim, Shantikumar, and Shen

Dow

nloa

ded

by [

Lau

rent

ian

Uni

vers

ity]

at 0

8:03

25

Nov

embe

r 20

14

Page 4: Architecture and robustness tradeoffs in speed-scaled queues with application to energy management

1730 T.V. Dinh et al.

(2006), yet it appears that the robustness point of view hasnot yet been fully investigated in queues. Note though thatone may view the general line of research of insensitivity(cf. Taylor 2011) as supplying robust results. Yet these arewith respect to distributions and typically not with respectto unknown demand rates.

Our contribution is mainly conceptual and numerical,yet we believe it bears significant importance for computersystem engineers as well as for future research on designand control of systems in view of parameter uncertainty.The remainder of the paper is organised as follows: Section2 defines the model and objective function, and surveysrelated work. Section 3 presents the robustness measureresults for both global and local robustness. The results arethen summarised in Section 4 where further open questionsare put forward.

2. Model and design framework

2.1. Model and notations

We consider an M/G/1-PS queue with state dependent ser-vice rates. Jobs arrive according to a Poisson process withrate λ > 0. Job sizes are finite mean i.i.d. random vari-ables independent of the arrival process. Without loss ofgenerality we assume the mean job size is 1. Let Q(t)denote the number of jobs in the system at time t . ThePS scheme is as follows: At time t if Q(t) = n, each jobis served at a rate sn/n, where the sequence of speeds,0 = s0 < s1 ≤ s2 ≤ · · · is a result of the design and controlof the system.

The insensitivity of the M/G/1-PS, even under speedscaling, (cf. Kelly 1979), allows us to ignore the actualshape of the job-size distribution, as it does not affect thelaw of the process Q(t). In other words, the occupancy dis-tribution of this queue is the same as that of an M/M/1 queuewith the same arrival and state-dependent service rates.We therefore limit ourselves to performance objectives thatdepend only on the marginal occupancy distribution. Theprocess Q(t) is represented by an irreducible continuoustime birth-death process on the state space {0, 1, . . .}; seefor example, Norris (1997). We assume λ < sup{s1, s2, . . .}and hence Q(t) is positive-recurrent with a unique station-ary distribution, (π0, π1, . . . , ), πi = limt→∞ P (Q(t) = i),satisfying the partial balance equations, λπi = si+1πi+1 and∑∞

i=0 πi = 1.In this model, speeds are constrained to be within

the set [0, μmax]. The number of unique speeds is speci-fied by the architecture parameter, K ∈ {0, 1, 2, . . .} ∪ ∞.For finite K , the available set of speeds is, M = {0 =μ0, μ1, . . . , μK,μmax}, with μi ≤ μi+1 ≤ μmax. Hencethere are K + 2 available speeds. If K = ∞, any speedwithin [0, μmax] is allowed. We refer to the latter caseas continuum speed architecture which is equivalent tothe multi-speed architecture discussed in Wierman et al.

(2009), except that the speeds are now constrained by anupper bound. In this case, we denote sn by μn for simplicityof the notation below.

Crabill (1972) showed that the optimal speeds are non-decreasing, hence the optimal policy would be a thresholdpolicy characterised by thresholds θ1, . . . , θK . Thus, forfinite K , our policies remain optimal if we assume thespeeds are monotonically non-decreasing. The mapping ofsn toM can be then specified by a non-decreasing sequenceof integer thresholds such that 0 = θ0 < θ1 ≤ θ2 ≤ · · · ≤θK < θK+1 = ∞. For a given queue occupancy n > 0, letJ (n) = max{k : θk < n} with J (0) = −1. Now the speed-scaling mapping is given by: sn = μJ (n)+1. For example,if K = 5, θ3 = 17 and θ4 = 20 then s18 = s19 = s20 = μ4

while s17 = μ3 and s21 = μ5.The performance metric we consider is the average

running cost per unit time. The running cost of a singlejob consists of two parts: the sojourn time in the system(Twaiting) and the energy consumed by processing it (E).Let Z/λ denote the running cost for a single job. ThenZ/λ = βTwaiting + E. The average running cost per job isthen E[Z]/λ = βE[T ] + E[E]. The average running costper unit time - which we will henceforth refer to as simply‘cost’ - is then achieved by multiplying both sides by λ andapplying Little’s law (Little 1961) to give

z = E[Z] = βE[N ] + E[PN ], (1)

where Pn denotes the power consumption rate when theoccupancy is n. This objective has been studied previouslyin both the stochastic context (George and Harrison 2001;Wierman et al. 2009) and in worst-case contexts (Pruhs,Uthaisombut, and Woeginger 2008). The parameter β indi-cates the relative cost of delay. This can be omitted by theappropriate choice of units, but we retain it to emphasisethat the relative weights given to N and Pn are problem spe-cific. Often Pn is a strictly convex non-decreasing functionof the speed, and we assume that

Pn = sαn , α > 1. (2)

For a given architecture specification, the design variablescan be cast as the vectors

μ = (μ1, . . . , μK ), θ = (θ1, . . . , θK ),

which may be taken to be infinite vectors if K = ∞. Thenwith β and α fixed, the cost can be written as a function- denoted zK (μ, θ, λ) - of the architecture K , the designvariables and the load. Let ρi = λ/μi for i = 1, . . . ,

K + 1, let ρ•i = ∏i

j=1 ρθj −θj−1

j for i = 1, . . . , K ,let I = {i : i ∈ {1, 2, . . . , K + 1}, ρi = 1} and letI = {i : i ∈ {1, 2, . . . , K + 1}, ρi = 1}. Then zK (μ, θ, λ)

Dow

nloa

ded

by [

Lau

rent

ian

Uni

vers

ity]

at 0

8:03

25

Nov

embe

r 20

14

Page 5: Architecture and robustness tradeoffs in speed-scaled queues with application to energy management

International Journal of Systems Science 1731

is

π0

[∑i∈I

ρ•i−1

(θi−1−θiρθi−θi−1

i )(1−ρi) + (ρi−ρ1+θi−θi−1

i )

(1 − ρi)2

+ ρi − ρθi−θi−1+1i

1 − ρi

μαi

)+

∑i∈I

ρ•i−1

×(

θi(θi − 1) − θi−1(θi−1 − 1)

2+ (θi − θi−1)μα

i

)], (3)

with the normalising constant

π0 =⎡⎣∑

i∈Iρ•

i−1

1 − ρθi−θi−1

i

1 − ρi

+∑i∈I

ρ•i−1(θi − θi−1)

⎤⎦

−1

.

(4)This follows from a straightforward (yet tedious) compu-tations as detailed in Appendix 1. Observe that in practi-cal situations, the right hand summations (over I), in bothEquations (3) and (4), remain empty.

2.2. Design framework

In our framework, the design variables are optimised fora pre-determined arrival rate, λd < μmax (‘d’ stands fordesign), yet at runtime there is often another arrival rateλa < μmax (‘a’ stands for actual), where typically λd = λa .For K < ∞, in the FA case, let μ∗

FA(λd ) and θ∗FA(λd ) denote

the optimising design variables μ and θ of

minμ,θ

zK (μ, θ, λd ),

subject to the coordinates of μ and θ being ordered. Hencegiven a design assumption of λd , the optimal design wouldbe (μ∗

FA(λd ), θ∗FA(λd )).

In the AA case, use the fixed component μ∗FA(λd ) as

above and consider the optimisation

minθ

zK (μ∗FA(λd ), θ, λa).

Denote the optimiser as θ∗AA(λd, λa). Hence given a design

assumption of λd , the optimal design remains μ∗FA(λd ) as

above, and further based on actual measurements of λa , theoptimal control is θ∗

AA(λd, λa).For a given architecture, solving the fixed allocation

design problem or the adaptive allocation control probleminvolves optimisation of zK (·). For K < ∞, we have im-plemented the optimisation using a Gauss–Seidel methodwith a local-search refinement. In case of K = ∞, we usedynamic-programming. More details are in Appendix 2.

Note that in the continuum speed case (K = ∞) un-der FA case, the thresholds are θi = i, which allows

Equations (3) and (4) to be represented as

zK (μ, θ, λ) =∞∑

n=0

(βn + μα

n

)πn

=∞∑

n=0

(βn + μαn)λn∏n

i=1 μi

/ (1 +

∞∑n=1

λn∏ni=1 μi

).

(5)

2.3. Practical implication

The foregoing model was motivated by the study of a multi-tasking operating system on a processor with typical powersaving features. Multi-tasking operating systems allow mul-tiple jobs to be run in parallel. A simple but reasonablemodel is to treat the sharing policy as processor sharing.Hence, we model the dynamics as an M/G/1 processorsharing queue with a constant arrival rate λ and occupancy-dependent service rate sn in the state n. As explained inWierman et al. (2009), processor sharing is insensitive tothe job size distribution even when the service rate dependson the occupancy. This greatly simplifies the evaluation ofany performance metric depending only on this distribution,as considered here.

We are specifically interested in the case that speedvariation is achieved using dynamic voltage and fre-quency scaling (DVFS) in a complementary metal-oxide-semiconductor (CMOS) integrated circuit. The processingspeed of such systems is often assumed to be proportionalto the clock frequency f , although issues such as cacheperformance have an increasing influence. The power con-sumption is often modelled as proportional to the cube ofthe clock frequency. This is because the dynamic powerin CMOS is proportional to V 2f where V is the scal-able voltage, which was historically scaled proportional toclock frequency, as mentioned in Chandrakasan, Sheng, andBrodersen (1992). However, some CMOS devices do notscale V this way, as power is closer to quadratic in speedf (Wierman et al. 2009). DVFS is becoming less impor-tant because Dennard’s scaling law (Dennard, Gaensslen,Rideout, Bassous, and LeBlanc 1974) is no longer used,resulting in less flexibility in scaling voltages. However,future architectures may still result in power that is super-linear in speed. Multiple processing core architectures areincreasingly adopted by current processor designs, replac-ing DVFS (Kumar, Zyuban, and Tullsen 2005). In multi-core architectures, speed scaling of paralleling workloadscan be achieved by turning cores on and off. If cores areheterogeneous, this can also result in power Pn being a con-vex function of the speed as considered here, according toHofstee (2005).

Prior work such as George and Harrison (2001) orWierman et al. (2009) assumed no upper limit of sys-tem speed. However, as technology limits always place an

Dow

nloa

ded

by [

Lau

rent

ian

Uni

vers

ity]

at 0

8:03

25

Nov

embe

r 20

14

Page 6: Architecture and robustness tradeoffs in speed-scaled queues with application to energy management

1732 T.V. Dinh et al.

upper bound on the processing speed, we constrain allspeeds sn to be in the range [0, μmax] for some given con-stant μmax. Since our interest is to see the effect of thenumber of speeds, we constrain the total number of speedsto be K + 2, with speeds from M. Our decision variablesinclude the K speeds μ1 to μK .

In the CMOS situation we are modelling, different de-cision variables are decided on at different phases in thedesign process. We assume the μi are fixed properties ofa given piece of hardware. We assume they must be cho-sen when that chip is designed, before it is known whatload it will be subjected to. The thresholds θi are typicallyimplemented in software in the operating system, and canbe determined later based on a real-time estimate of theload, or of β. In contrast, K might determine the size ofa software-visible register that stores the current speed, ormight determine the number of external pins required tosignal this information; for compatibility reasons, this in-formation may need to be held constant over an entire familyof chips sharing the same instruction set architecture (ISA).For that reason, we concern ourselves more with robustnessresulting from the choice of K as opposed to robustness ofthe individual μi .

Note that this objective places no cost on switchingbetween speeds, as this switching cost is relatively smallin comparison with the energy cost or the delay cost. Thisassumption is made due to the fact that the typical switchingtime between speeds is in the order of 10−4 seconds asshown in an experiment by Hartman (2008), while the jobservice time is often in the order of seconds.

3. Quantifying robustness

In this section we present the main contribution of our pa-per: quantifying the effects of parameter uncertainty dueto discrepancies between λd and λa for various architec-ture specifications. We consider the error metrics �FA(·)and �AA(·) for fixed allocation and adaptive allocation,respectively:

�FA(λa, λd,K) = zK (μ∗FA(λd ), θ∗

FA(λd ), λa)

− z∞(μ∗FA(λa), θ∗

FA(λa), λa), (6)

�AA(λa, λd,K) = zK (μ∗FA(λd ), θ∗

AA(λd, λa), λa)

− z∞(μ∗FA(λa), θ∗

FA(λa), λa). (7)

Both metrics compare the cost under a given architecturespecification and design for a given λd to the minimal pos-sible cost, z∞(μ∗

FA(λa), θ∗FA(λa), λa), for this λd , attained

with K = ∞ and λd = λa . The metrics capture the dis-tance to the optimal cost indicating the effect of parameteruncertainty on performance: a low � value implies ‘morerobustness’.

We have calculated the error metrics for a variety ofcombinations of λa and λd and architecture specificationsK for ‘FA’ and ‘AA’ cases. In all cases we used μmax = 1,and α = 3. Our results are best presented by fixing λd at oneof several values: 0.3 (design for light load), 0.5 (design formedium load) or 0.7 (design for moderately heavy load).We considered two values of the relative cost of delay,β: 0.01 and 0.1. The former places more importance onenergy savings than the latter. We then varied λa over afine grid within (0, 1) and evaluated the error metrics foreach value. Note that each such evaluation requires carryingout two optimisations, one for zK (·) and one for z∞(·), asoutlined in Appendix 2.

Since we are looking at the error metric over the wholepossible range λa ∈ (0, μmax), assessment of � over thatrange may be viewed as measuring global robustness. Thisis in contrast to the local robustness defined later in thissection.

3.1. Global robustness in fixed allocation andadaptive allocation designs

3.1.1. The FA case

Figure 1 shows the metric �FA(λa, λd,K) for increasing ar-chitectures K from K = 0 (‘Only μmax’) and K = 1 (‘Twospeeds’) up to a continuum of speeds, for a system thatuses both the speeds and the thresholds optimised for λd .Note that all examples include μmax as one (the largest) ofthe available speeds; this is in contrast to the single-speed‘gated static’ policy studied in Wierman et al. (2009), inwhich the single speed was optimised for the design load,and the ‘continuum’ case could use arbitrarily high speedsas the occupancy increase.

Recall that Wierman et al. (2009) observed substantiallygreater robustness when using a system designed with noconstraints on the number of speeds than when using asystem with a single optimally chosen speed. This was ex-plained by the fact that, if the actual load is much higher thanthe design load, then the mean occupancy will be higher;since the speed is an increasing function of the occupancy,this increased occupancy causes the average speed used tobe higher, as is appropriate for a higher load.

It is natural to expect that a similar conclusion wouldapply to the model studied here, in which the system isforced to include the maximum speed μmax as one availablespeed, and that increasing the number of available speedswill monotonically increase the robustness. When the actualload is low, this is indeed the case. In fact, having even asingle additional speed (‘two speeds’) yields most of thebenefit obtained from having a continuum of speeds.

However, as the actual load approaches μmax, designswith more available speeds actually become monotonicallyless robust, in the sense that the penalty �FA(λa, λd,K) formis-estimating the load increases. This is most apparent

Dow

nloa

ded

by [

Lau

rent

ian

Uni

vers

ity]

at 0

8:03

25

Nov

embe

r 20

14

Page 7: Architecture and robustness tradeoffs in speed-scaled queues with application to energy management

International Journal of Systems Science 1733

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

β = 0.01, λd = 0.30

ΔF

A(.

)

actual load (λa)

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

β = 0.01, λd = 0.50

ΔF

A(.

)

actual load (λa)

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

β = 0.01, λd = 0.70

ΔF

A(.

)

actual load (λa)

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

β = 0.10, λd = 0.30

ΔF

A(.

)

actual load (λa)

Only smax

Two speeds

Four speeds

Eight speeds

Continuum speeds

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

β = 0.10, λd = 0.50

ΔF

A(.

)

actual load (λa)

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

β = 0.10, λd = 0.70

ΔF

A(.

)

actual load (λa)

Figure 1. Fixed allocation (FA): The error metric, or the distance to the optimal cost using policy optimised for load λd with the realityof load λa and fixed thresholds.

when the design load is low, and little weight (β) is givento delay.

This paradox is explained by noting that fixed thresh-olds were used. When the load is almost μmax, the averageprocessing speed must also be almost μmax. Thus the systemin which μmax is the only speed available is optimal. If morespeeds are available, then the system will need to maintaina higher occupancy N to cause speed μmax to be used.

If the second highest speed μK−1 is much less thanμK , then the occupancy will be increased by almost θK ,

increasing the cost by almost βθK , as illustrated in Figure 2.This figure shows the system occupancy and cost at a veryhigh λa (λa = 0.98μmax) for small λd so that the secondhighest speeds are relatively small. The mean occupanciesand costs are computed for two cases: (a) only use μmax

(K = 0) and (b) optimal settings for designed load λd fora four speed system (K = 3). When only running withμmax, the costs are constant regardless of λd . However,for multiple speed system, the occupancy is increased byroughly θK and the cost is increased by roughly βθK .

0 0.02 0.04 0.06 0.08 0.10

5

10

15

20

25

30

35

40

45

50

λa = 0.90μ

max, β = 0.01

E[N

]

design load (λd)

E[N|K=0

]

E[N|K=3

]

E[N|K=0

] + θ3

0 0.02 0.04 0.06 0.08 0.10

0.5

1

1.5

λa = 0.90μ

max, β = 0.01

z(.)

design load (λd)

zK=0

zK=3

zK=0

+ βθ3

Figure 2. System occupancies and general costs when actual load is very close to μmax (λa = 0.9μmax). This figure compares costs forK = 0 and K = 3 with low design loads.

Dow

nloa

ded

by [

Lau

rent

ian

Uni

vers

ity]

at 0

8:03

25

Nov

embe

r 20

14

Page 8: Architecture and robustness tradeoffs in speed-scaled queues with application to energy management

1734 T.V. Dinh et al.

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

β = 0.01, λd = 0.30

ΔA

A(.

)

actual load (λa)

Two speeds

Four speeds

Eight speeds

Continuum Speeds

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

β = 0.01, λd = 0.50

ΔA

A(.

)

actual load (λa)

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

β = 0.01, λd = 0.70

ΔA

A(.

)

actual load (λa)

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

β = 0.10, λd = 0.30

ΔA

A(.

)

actual load (λa)

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

β = 0.10, λd = 0.50

ΔA

A(.

)

actual load (λa)

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

β = 0.10, λd = 0.70

ΔA

A(.

)

actual load (λa)

Figure 3. Adaptive allocation (AA): Distance to the optimal cost using policy optimised for load λd with the reality of load λa andadjustable thresholds.

3.1.2. The AA case

Results for adaptive allocation are in Figure 3. Note thathere the robustness monotonically increases as the numberof available speeds increases, as intuition suggests. As ex-pected, the penalty for mis-estimating the load is small nearthe design load, λd . However, it also tends to 0 when the ac-tual load approaches μmax, even though this is far from λd .This is again because at such high loads, the system mustnearly always operate at speed μmax, which is implementedin all designs regardless of λd . The resulting bi-modalityraises the interesting question of what speeds are ‘max-imally robust’ in the sense of minimising the maximumover λa of �AA(λa, λd,K). This optimum depends on β,but Figure 3 suggests that it corresponds roughly to design-ing the system to work at a load of about μmax/3. This isin contrast to the common practice of designing CPUs fordesktop computers with a second speed that is very closeto the maximum speed.

3.1.3. Comparison of FA and AA

Adjusting the thresholds θi occurs on a much slower time-scale than adjusting the actual speeds. Implementing dy-namic thresholds incurs a non-negligible additional cost

to software development. To decide whether or not to im-plement dynamic thresholds, it is important to quantifyhow much benefit it provides over using dynamic (state-dependent) speeds with static thresholds. Figure 4 showsthis improvement for several parameter combinations. Thissuggests that dynamically adjusting the thresholds may notbe justified unless the design load is well below the maxi-mum load for which the system can be stable.

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8Improvement of having switching flexibility

ΔF

A(λ

a, λ

d,3

) −

ΔA

A(λ

a,λ

d,3

)

actual load (λa)

β = 0.01, λd = 0.30

β = 0.01, λd = 0.70

β = 0.10, λd = 0.30

β = 0.10, λd = 0.70

Figure 4. Benefits in cost reduction of having adjustable thresh-olds at runtime.

Dow

nloa

ded

by [

Lau

rent

ian

Uni

vers

ity]

at 0

8:03

25

Nov

embe

r 20

14

Page 9: Architecture and robustness tradeoffs in speed-scaled queues with application to energy management

International Journal of Systems Science 1735

2 4 8 inf0

0.2

0.4

0.6

0.8

1β = 0.01

R(.

)

number of speeds

λd = 0.30

λd = 0.50

λd = 0.70

2 4 8 inf0

0.2

0.4

0.6

0.8

1β = 0.05

R(.

)

number of speeds2 4 8 inf

0

0.2

0.4

0.6

0.8

1β = 0.10

R(.

)

number of speeds

2 4 8 inf0

0.2

0.4

0.6

0.8

1

λd = 0.30

R(.

)

number of speeds

β = 0.01

β = 0.05

β = 0.10

2 4 8 inf0

0.2

0.4

0.6

0.8

1

λd = 0.50

R(.

)

number of speeds2 4 8 inf

0

0.2

0.4

0.6

0.8

1

λd = 0.70

R(.

)

number of speeds

Figure 5. Local robustness of a fixed allocation (FA) as function of the number of speeds: R(.).

3.2. Quantitative measure of ‘local’ robustness

When the load is known approximately but not perfectly,the designer may still be interested in the sensitivity of coststo minor deviations of λa from λd . For this we introducethe notion of local robustness defined in terms of the curva-ture of the cost penalty �. More precisely, define the localrobustness of the system with K + 2 speeds optimised fora load of λd as

R(λd,K) =[

d2�FA(λa, λd,K)

dλ2a

∣∣∣∣λa=λd

]−1

. (8)

It is sensible to use the second derivative here in-stead of the first, since the first derivative is approx-imately zero at the design load; if �FA were definedas zK (μ∗

FA(λd ), θ∗FA(λd ), λa) − zK (μ∗

FA(λa), θ∗FA(λa), λa),

it would be exactly zero.For a given setting, we estimate RK,λd

by taking a leastsquare fit (Boyd and Vandenberghe 2004) of a parabolato the difference in costs over 40 equally spaced points inthe interval [λd − μmax/50, λd + μmax/50]. The value ofμmax/50 was chosen to be large enough to avoid numericalerrors, yet small enough allowing an accurate estimate ofthe second derivative. Note that such numerical evaluationof the local robustness is computationally intensive, since

each point used for the least squares fit requires solving anoptimisation.

As in the global robustness cases described above, weconsider light load, medium load or moderately heavy load.We also consider three values of β: 0.01 and 0.1 as anal-ysed for global robustness as well as an intermediate valueof 0.05. We generally expect that allowing more speeds inthe architecture specification will make the system more‘locally robust’. The question is how quickly local ro-bustness improves with the complexity of the architecture.Answers are in Figure 5.

To identify how many speeds are enough, we seek the‘knee’ of the curve of R(·) against the number of speedsK + 1. In the curve we label the axes with powers of 2,since this determines the number of bits required by theISA. Note that the final point on the axis, labelled ‘inf’,is the continuum case when there is no constraint on thespeeds. The results show that the knee is most pronouncedfor the smallest λd and appears to be at around six speeds(K = 5). Indeed, eight speed (K = 7) gives robustness al-most indistinguishable from that of the continuum case.This suggests that it is sufficient that the ISA allocate threebits to the register specifying the current speed. In gen-eral, it appears that the local robustness is increasing in λd

when there are few speeds, and decreasing in λd when thereare many speeds. Similarly, the local robustness tends toincrease as β increases.

Dow

nloa

ded

by [

Lau

rent

ian

Uni

vers

ity]

at 0

8:03

25

Nov

embe

r 20

14

Page 10: Architecture and robustness tradeoffs in speed-scaled queues with application to energy management

1736 T.V. Dinh et al.

4. Conclusion and outlook

This paper has identified and explained an anomaly in theperformance of queues with speed scaling optimised foran inaccurate estimate of the load. This effect causes theperformance of such a speed scaler operating at high loadto degrade as the number of available speeds increases.Specifically, this is due to inappropriate choice of speed atruntime, making it difficult to quantify the improvement inrobustness due to such an increase.

However, when the error in the load estimate is low, theperformance does monotonically improve as the number oflevels increases. This suggests that it may be possible todefine a meaningful measure of ‘local robustness’ whichcould be used to determine the number of speeds required,as we have proposed in this paper.

This work assumed very simplified models. For exam-ple, the speed scaling function Pn only considers activepower in CMOS circuits, whereas leakage power is be-coming an increasing fraction of total power consumption(Kaxiras and Martonosi 2008). The form of Pn is also suit-able for cases where the speed of processing is proportionalto the clock speed, whereas current processors are heavilyinfluenced by delays due to memory access. Nevertheless,we expect the qualitative insight to hold more generally: Ifthe speed selection algorithm is based on inaccurate param-eter estimates, then having a wider range of speeds availablemay be counter-productive.

AcknowledgementsThis work was funded in part by Australian Research Council(ARC) under Grant No. FT0991594.

Notes on contributorsTuan V. Dinh earned his BEng (Hons) inTelecommunications and Internet Technol-ogy at Swinburne University of Technol-ogy, Australia in 2010. He worked as a re-search intern at CSIRO ICT, Australia forthree months in 2010. He is currently aPhD candidate at Centre for Advanced In-ternet Architecture, Swinburne Universityof Technology. His research field is energy

efficiency for cluster computing.

Lachlan Andrew received the BSc, BEand PhD degrees in 1992, 1993 and1997, from the University of Melbourne,Australia. Since 2008, he has been an As-sociate Professor at Swinburne Universityof Technology, Australia, and since 2010 hehas been an ARC Future Fellow. From 2005to 2008, he was a Senior Research Engineerin the Department of Computer Science at

Caltech. Prior to that, he was a senior research fellow at the Univer-sity of Melbourne and a Lecturer at RMIT, Australia. His researchinterests include energy-efficient networking and performanceanalysis of resource allocation algorithms. He was co-recipientof the best paper award at IEEE INFOCOM 2011, InternationalGreen Computing Conference 2012 and IEEE MASS 2007.

Yoni Nazarathy is a Lecturer in the Schoolof Mathematics and Physics at the Uni-versity of Queensland, Brisbane Australia.His research is in the field of applied prob-ability, statistics, queueing networks andcontrol. He obtained his PhD from theUniversity of Haifa, Israel in 2009. Hethen spent two years as a Post-Doctoral Re-searcher in the Netherlands, at the European

Institute for Statistics, Probability, Stochastic Operations Researchand their Applications (EURANDOM) and at CWI (CentrumWiskunde & Informatica) Amsterdam. He then spent an addi-tional year as a Lecturer at Swinburne University of Technologyin Melbourne where he remains an adjunct member of the Centrefor Advanced Internet Architectures. Prior to his PhD studies, hewas working as a developer of wireless networks protocols andsoftware.

ReferencesAndrew, L.L.H., Lin, M., and Wierman, A. (2010), ‘Optimality,

Fairness and Robustness in Speed Scaling Designs’, in Pro-ceedings of the ACM SIGMETRICS, 14–18 June, New York,NY.

Ata, B., and Shneorson, S. (2006), ‘Dynamic Control of an M/M/1Service System With Adjustable Arrival and Service Rates’,Management Science, 52, 1778–1791.

Bansal, N., Chan, H., Lam, T., and Lee, L. (2008), ‘Schedulingfor Speed Bounded Processors’, Automata, Languages andProgramming, Part I, 7–11 July, Reykjavik, Iceland.

Bertsekas, D.P. (1999), Nonlinear Programming, Belmont, MA:Athena Scientific.

Bertsekas, D.P. (2005), Dynamic Programming and OptimalControl (3rd ed.), Belmont, MA: Athena Scientific.

Boyd, S., and Vandenberghe, L. (2004), Convex Optimization,New York, NY: Cambridge University Press.

Chan, H., Chan, W., Lam, T., Lee, L., Mak, K., and Wong, P.(2007), ‘Energy Efficient Online Deadline Scheduling’, inProceedings of the ACM Symposium on Discrete Algorithms,pp. 795–804.

Chandrakasan, A., Sheng, S., and Brodersen, R. (1992), ‘Low-Power CMOS Digital Design’, IEEE Journal of Solid-StateCircuits, 27, 473–484.

Crabill, T.B. (1972), ‘Optimal Control of a Service Facility WithVariable Exponential Service Times and Constant ArrivalRate’, Management Science, 18, 560–566.

Dennard, R., Gaensslen, F., Rideout, V., Bassous, E., and LeBlanc,A. (1974), ‘Design of Ion-Implanted MOSFET’s With VerySmall Physical Dimensions’, IEEE Journal of Solid-State Cir-cuits, 9, 256–268.

Efrosinin, D., and Semenova, O. (2009), ‘Optimal Control ofM/M/1 Queueing System With Constant Retrial Rate andNon-Reliable Removable Server’, in Ultra Modern Telecom-munications Workshops, 2009 (ICUMT’09), October, pp. 1–6.

Elahi, M., Williamson, C., and Woelfel, P. (2012), ‘DecoupledSpeed Scaling: Analysis and Evaluation’, in Proceedings ofthe International Conference on Quantitative Evaluation ofSystems (QEST), September, pp. 2–12.

George, J.M., and Harrison, J.M. (2001), ‘Dynamic Control of aQueue With Adjustable Service Rate’, Operations Research,49, 720–731.

Goseling, J., Boucherie, R., and van Ommeren, J.K. (2009),‘Energy Consumption in Coded Queues for Wireless Infor-mation Exchange’, in Workshop on Network Coding, Theory,and Applications, 2009 (NetCod’09), june, pp. 30–35.

Dow

nloa

ded

by [

Lau

rent

ian

Uni

vers

ity]

at 0

8:03

25

Nov

embe

r 20

14

Page 11: Architecture and robustness tradeoffs in speed-scaled queues with application to energy management

International Journal of Systems Science 1737

Hartman, M. (2008), ‘Processor Energy Savings Through Adap-tive Voltage Scaling’, Portable Design, 14, 38–41.

Hofstee, H.P. (2005), ‘Power Efficient Processor Architecture andthe Cell Processor’, in Proceedings of the High-PerformanceComputer Architecture (HPCA’05), Washington, DC, USA:IEEE Computer Society, pp. 258–262.

Jain, A., Lim, A., and Shanthikumar, J. (2010), ‘On the Optimalityof Threshold Control in Queues With Model Uncertainty’,Queueing Systems, 65, 157–174.

Jain, M., Sharma, G.C., and Shekhar, C. (2005), ‘Processor-Shared Service Systems With Queue-Dependent Processors’,Computers and Operations Research, 32, 629–645.

Kaxiras, S., and Martonosi, M. (2008), Computer ArchitectureTechniques for Power-Efficiency (1st ed.), San Rafael, CA:Morgan and Claypool Publishers.

Kelly, F.P. (1979), Reversibility and Stochastic Networks (Vol. 40),New York: Wiley.

Kumar, R., Zyuban, V., and Tullsen, D. (2005), ‘Interconnec-tions in Multi-Core Architectures: Understanding Mech-anisms, Overheads and Scaling’, in Proceedings of theComputer Architecture (ISCA’05), june, pp. 408–419.

Lam, T., Lee, L., To, I., and Wong, P. (2008), ‘Speed ScalingFunctions for Flow Time Scheduling Based on Active JobCount’ , in Algorithms - ESA 2008, pp. 647–659.

Lim, A., Shantikumar, J., and Shen, Z.M. (2006), ‘Model Un-certainty, Robust Optimization and Learning’, in Tutorials inOperations Research: Models, Methods, and Applications forInnovative Decision Making, eds. M.P. Johnson, B. Norman,and N. Secomandi, Maryland: INFORMS, pp. 66–94.

Little, J.D.C. (1961), ‘A Proof for the Queuing Formula: L = λW ’,Operations Research, 9, 383–387.

Low, D. (1974), ‘Optimal Dynamic Pricing Policies for an M/M/sQueue’ , Operations Research, 22, 545–561.

Nazarathy, Y., and Pollet, P.K. (2011), ‘Parameter and StateEstimation in Queues and Related Stochastic Models:A Bibliography’, http://www.maths.uq.edu.au/∼pkp/papers/Qest/Qest.html.

Norris, J. (1997), Markov Chains, Cambridge, UK: CambridgeUniversity Press.

Pruhs, K., Uthaisombut, P., and Woeginger, G. (2008), ‘Get-ting the Best Response for Your Erg’, ACM Transactions onAlgorithms, 4, 38:1–38:17.

Serfozo, R. (1981), ‘Optimal Control of Random Walks, Birth andDeath Processes, and Queues’, Advances in Applied Proba-bility, 13, 61–83.

Taylor, P.G. (2011), ‘Insensitivity in Stochastic Models’, in Queue-ing Networks, eds. R.J. Boucherie and N.M. van Dijk, NewYork, NY: Springer, pp. 121–140.

Wierman, A., Andrew, L.L.H., and Tang, A. (2009), ‘Power-Aware Speed Scaling in Processor Sharing Systems’, in IEEEINFOCOM 2009 Conference, April, pp. 2007–2015.

Wolff, R. (1989), Stochastic Modeling and the Theory of Queues,Englewood Cliffs, NJ: Prentice Hall.

Appendix 1. Objective costThe analysis in this appendix follows the notation and assumptionsof Section 2.

Proposition 1 For K < ∞, when it exists, the stationary distri-bution (π0, π1, . . .) of Q(t) is

πn = π0 ρ•J (n) ρ

n−θJ (n)

J (n)+1 , (A1)

for n ≥ 1 and,

π0 =⎡⎣∑

i∈Iρ•

i−1

1 − ρθi−θi−1i

1 − ρi

+∑i∈I

ρ•i−1(θi − θi−1)

⎤⎦

−1

. (A2)

Further the expected cost z = E[Z] is represented by Equation(3).

Proof: From the partial balance equations, the stationary proba-bility of being in state n is

πn = πθi−1ρn−θii n ∈ {θi−1 + 1, θi}.

Induction on n yields Equation (A1). Now π0 is obtained by1 = ∑∞

i=0 πi :

π0 =⎡⎣ θ1∑

n=0

ρn1 +

θ2∑n=θ1+1

ρθ11 ρ

n−θ12 +

θ3∑n=θ2+1

ρθ11 ρ

θ2−θ12 ρ

n−θ23

+ · · · +θK+1∑

n=θK+1

ρθ11 ρ

θ2−θ12 · · · ρn−θK

K+1

⎤⎦

−1

=[

θ1−1∑n=0

ρn1 + ρ•

1

θ2−1∑n=θ1

ρn−θ12 + ρ•

2

θ3−1∑n=θ2

ρn−θ23

+ · · · + ρ•K

θK+1−1∑n=θK

ρn−θKK+1

⎤⎦

−1

.

The summations above simplify due to

θi−1∑n=θi−1

ρn−θi−1i =

{1−ρ

θi−θi−1i

1−ρiρi = 1,

θi − θi−1, ρi = 1(A3)

yielding Equation (A2). The expected cost is z = βE[N ] +E[PN ] = ∑∞

i=0(βn + Pn)πn, in which

E[N ] = π0

K+1∑i=1

ρ•i−1

θi−1∑n=θi−1

nρn−θi−1i ,

and,

E[PN ] = π0

K+1∑i=1

ρ•i−1μ

αi

θi∑n=θi−1+1

ρn−θi−1i .

Note that

θi−1∑n=θi−1

nρn−θii

=

⎧⎪⎨⎪⎩

(θi−1−θiρθi−θi−1i )(1−ρi) + (ρi−ρ

1+θi−θi−1i )

(1 − ρi)2, ρi = 1,

12 [θi(θi − 1) − θi−1(θi−1 − 1)] , ρi = 1.

(A4)

Equation (3) can be obtained by substituting Equations (A3) and(A4) into the expression for E[N ] and E[PN ]. �

Dow

nloa

ded

by [

Lau

rent

ian

Uni

vers

ity]

at 0

8:03

25

Nov

embe

r 20

14

Page 12: Architecture and robustness tradeoffs in speed-scaled queues with application to energy management

1738 T.V. Dinh et al.

Appendix 2. Numerical optimisation algorithms

B.1. K = 0When K = 0, the system runs only with the maximum servicerate μmax for all si except for i = 0. In this case, the system issimply an M/M/1 queue with arrival rate λ and service rate μmax

and the cost is deterministic for a given λ:

z = βE[N ] + E[PN ] = β

∞∑n=0

nπn +∞∑

n=1

πnμmax

= βλ

μmax − λ+ (1 − π0)μα

max = βλ

μmax − λ+ λμα−1

max .

(B1)

B.2. 1 ≤ K < ∞Recall that we want to solve:

minμ,θ

zK (μ, θ, λd, β, α).

It is known that the optimal policy is of threshold type as provedin Crabill (1972) and mentioned in George and Harrison (2001).Thus, we enforce that μ is monotonically non-decreasing. We canthen use the Gauss–Seidel method [or block coordinate decent,described in Section 2.7 of Bertsekas (1999)] to numerically findthe optimal settings (speeds and thresholds).

In general, the search starts with an initial feasible guess (μ0,θ 0). The search was performed in 2K dimensions, each dimensionbeing an element of speed vector μ or threshold vector θ . Eachiteration consists of the search for new values for all 2K variables,and is divided into 2K phases. At each phase, it is a single di-mension optimisation, holding the most recent computed valuesof other variables. For each iteration, searches were performed forall speeds first and then thresholds. The iterations continue untilthe maximum change between (μk ,θk) and (μk+1,θk+1) is less thanε = 10−10 indicating that (μk ,θk) has converged to (μ∗,θ∗). Thesearches for speeds and thresholds were performed by differenttechniques.

In particular, phases i = 1, . . . , K of iteration k calculate

μki = argmin

μki−1≤μk

i ≤μk−1i+1

zK

(μk

1, . . . , μki−1, μ

ki , μ

k−1i+1 , . . . ,

μk−1K ; θk−1

1 , . . . , θk−1K

). (B2)

This phase uses the new values that were found for μkj (i < j )

in this iteration while it retains μk−1j (i > j ) from the previous

iteration. Using the assumption that the cost function is uni-modal(as numerical evidence suggests it is), golden section search wasused to calculate μk

i . The search range was bounded between themost recent adjacent speeds [μk

i−1,μk−1i+1 ], with the convention that

μK+1 = μmax and μ0 = 0.Phases i + K = K + 1, . . . , 2K calculate

θki = argmin

θki−1<θk

i <θk−1i+1

zK

(μk

1, . . . , μkK ; θk

1 , . . . , θki−1,

θki , θk−1

i+1 , . . . , θk−1K

). (B3)

Finding θki cannot use golden section search due to the shape of

cost as a function of θki . Numerical errors sometimes cause it to

enter an incorrect search range that does not include the globaloptimum. Since thresholds are naturally discrete, an exhaustive

search was instead performed over a truncated state space for suchcases. Again, the search range was bounded between [θk

i−1,θk−1i+1 ].

For the search of θkK , a sufficiently large θk

K+1 to be consideredinfinite was chosen.

Unfortunately, the discrete nature of θki sometimes causes the

convergence to a local optimum. To overcome this problem, weintroduced a heuristic to refine the results. When Gauss–Seideliterations returns (μ, θ ) and an estimate z of the optimal cost, anexhaustive search was then performed for all the neighbour pointsof θ . We used each point as an initial guess for new Gauss–Seideliterations. If any of these searches returned a cost that is less than z,the process would be repeated with this new point at the centre ofthe search region. In all of these searches, μ was kept unchanged.This refinement process was repeated until it found a point witha lower of z than all its neighbours. The final (μ, θ ) and z weretaken as (μ∗, θ∗) and z∗.

An experiment was performed to test this heuristic. About athousand combinations of λ, β and K were chosen. A procedurewas performed to each combination with a thousand initial feasiblepoints which were generated randomly. If the heuristic worked thenfor each combination, it would return the same optimal settingsfor all the initial guesses. This occurred in 92.8% of cases, whichsuggests that this search is seldom stuck in poor local optima.For some extreme cases (specifically for λ is very small), therewere differences among the optimal settings; however, in thesecases the optimal costs were approximately the same, and on av-erage 78.8% of initial feasible points converged to the lowest costpoint.

B.3. K = ∞The case K = ∞ refers to our continuum architecture represent-ing the case where no restriction in number of speeds is applied.Speeds can be chosen freely within [0, μmax] for each state ofsystem occupancy. A feasible policy μ = [μ1,μ2, . . .] maps thespeed μi ∈ [0, μmax] to system state i, where system state refersto the number of jobs in the system. The optimal policy μ∗ is afeasible policy that minimises the average cost per unit time z. It isa classic infinite horizon control for state dependent M/M/1 queueproblem which was numerically solved using dynamic program-ming with relative value iteration and uniformisation of the servicerate.

At iteration k, the relative value function for state i as thenumber of jobs in the system is, according to Bertsekas (2005),

hk(i) = minμ∈[0,μmax]

[z(i, μ)

λ + μmax+ μ

λ + μmaxhk−1(i − 1)

+ μmax − μ

λ + μmaxhk−1(i) + λ

λ + μmaxhk−1(i + 1)

]

− minμ∈[0,μmax]

[z(S,μ)

λ + μmax+ μ

λ + μmaxhk−1(S − 1)

+ μmax − μ

λ + μmaxhk−1(S) + λ

λ + μmaxhk−1(S + 1)

],

where z(i, μ) is the uniformised cost at state i with speed μ, givenby z(i, μ) = βi+μα

λ+μmax, and S = 0 is a reference state. As k → ∞,

hk converges to the h∗ satisfying

z∗ + h∗(i) = minμ∈[0,μmax]

[z(i, μ)

λ + μmax+ μ

λ + μmaxh∗(i − 1)

+ μmax − μ

λ + μmaxh∗(i) + λ

λ + μmaxh∗(i + 1)

],

Dow

nloa

ded

by [

Lau

rent

ian

Uni

vers

ity]

at 0

8:03

25

Nov

embe

r 20

14

Page 13: Architecture and robustness tradeoffs in speed-scaled queues with application to energy management

International Journal of Systems Science 1739

where the optimal cost is

z∗ = minμ∈[0,μmax]

[z(S, μ)

λ + μmax+ μ

λ + μmaxhk−1(S − 1)

+ μmax − μ

λ + μmaxhk−1(S) + λ

λ + μmaxhk−1(S + 1)

].

With S = 0, the optimal cost can be evaluated as

z∗ = μmax

λ + μmaxh∗(0) + λ

λ + μmaxh∗(1) (B4)

and the optimal speed in state i is the ‘argmax’ of the latest opti-misation over μ in calculating hk(i). Thus, every state has its ownspeed. We started the iteration with {h0(i)} = 0 and performed thesearch for a truncated state space at each iteration. The iterationstopped when |hk − hk−1| < ε = 10−10.

Dow

nloa

ded

by [

Lau

rent

ian

Uni

vers

ity]

at 0

8:03

25

Nov

embe

r 20

14