Automatic Virtual Machine Conﬁguration for Database Workloadsaakssoro/pubs/soror_sigmod08.pdf ·...

Automatic Virtual Machine Configuration for DatabaseWorkloads

Ahmed A. Soror†∗ Umar Farooq Minhas† Ashraf Aboulnaga†

Kenneth Salem† Peter Kokosielis‡ Sunil Kamath‡†University of Waterloo ‡IBM Toronto Lab

†{aakssoro, ufminhas, ashraf, kmsalem}@cs.uwaterloo.ca‡{pkolosie, sunil.kamath}@ca.ibm.com

ABSTRACTVirtual machine monitors are becoming popular tools forthe deployment of database management systems and otherenterprise software applications. In this paper, we considera common resource consolidation scenario, in which severaldatabase management system instances, each running in avirtual machine, are sharing a common pool of physical com-puting resources. We address the problem of optimizing theperformance of these database management systems by con-trolling the configurations of the virtual machines in whichthey run. These virtual machine configurations determinehow the shared physical resources will be allocated to thedifferent database instances. We introduce a virtualizationdesign advisor that uses information about the anticipatedworkloads of each of the database systems to recommendworkload-specific configurations offline. Furthermore, run-time information collected after the deployment of the rec-ommended configurations can be used to refine the recom-mendation. To estimate the effect of a particular resourceallocation on workload performance, we use the query opti-mizer in a new what-if mode. We have implemented our ap-proach using both PostgreSQL and DB2, and we have exper-imentally evaluated its effectiveness using DSS and OLTPworkloads.

Categories and Subject DescriptorsH.2.2 [Database Management]: Physical Design

General TermsAlgorithms, Design, Experimentation, Performance

KeywordsVirtualization, Virtual Machine Configuration, Resource Con-solidation∗Supported by an IBM CAS Fellowship.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SIGMOD’08, June 9–12, 2008, Vancouver, BC, Canada.Copyright 2008 ACM 978-1-60558-102-6/08/06 ...$5.00.

1. INTRODUCTIONVirtual machine monitors are becoming popular tools for

the deployment of database management systems and otherenterprise software systems. Virtualization adds a flexibleand programmable layer of software between “applications”,such as database management systems, and the resourcesused by these applications. This layer of software, calledthe virtual machine monitor (VMM), maps the virtual re-sources perceived by applications to real physical resources.By managing this mapping from virtual resources to phys-ical resources and changing it as needed, the VMM can beused to transparently allow multiple applications to shareresources and to change the allocation of resources to appli-cations as needed.

There are many reasons for virtualizing resources. Forexample, some virtual machine monitors enable live migra-tion of virtual machines (and the applications that run onthem) among physical hosts. This capability can be ex-ploited, for example, to simplify the administration of phys-ical machines or to accomplish dynamic load balancing. Oneimportant motivation for virtualization is to support enter-prise resource consolidation. Resource consolidation meanstaking a variety of applications that run on dedicated com-puting resources and moving them to a shared resource pool.This can improve the utilization of the physical resources,simplify resource administration, and reduce cost for the en-terprise. One way to implement resource consolidation is toplace each application in a virtual machine (VM) which en-capsulates the application’s original execution environment.These VMs can then be hosted by a shared pool of physicalcomputing resources. This is illustrated in Figure 1.

When creating a VM for one or more applications, it isimportant to correctly configure this VM. One of the mostimportant decisions when configuring a VM is deciding howmuch of the available physical resources will be allocated tothis VM. Our goal in this paper is to automatically make thisdecision for virtual machines that host database manage-ment systems and compete against each other for resources.

As a motivating example, consider the following scenario,We created two Xen [2] VMs, each running an instance ofPostgreSQL, and hosted them on the same physical server.1

On the first VM, we run a workload consisting of 1 copyof TPC-H query Q17 on a 10GB database. We call thisWorkload 1. On the second VM, we run a workload on an

1The full details of our experimental setup can be found inSection 6.

953

Figure 1: Resource consolidation using virtual ma-chines.

Figure 2: Motivating example.

identical 10GB TPC-H database consisting of 132 copies of amodified version of TPC-H Q18 (we modified the sub-queryin Q18 so that it touches less data). We call this Workload 2.As an initial configuration, we allocate 50% of the availableCPU capacity to each of the two VMs. When we apply ourconfiguration technique, it recommends allocating 20% ofthe available CPU capacity to the VM running Workload 1and 80% to the VM running Workload 2. Figure 2 shows theexecution time of the two workloads under the initial andrecommended configurations. Workload 1 suffers a slightdegradation in performance (4%) under the recommendedconfiguration as compared to the initial configuration. Onthe other hand, the recommended configuration boosts theperformance of Workload 2 by 34%. This is because Work-load 1 is very I/O intensive in our execution environment,so its performance is not sensitive to changes in CPU al-location. Workload 2, in contrast, is CPU intensive, so itbenefits from the extra CPU allocation. This simple exam-ple illustrates the potential performance benefits that canbe obtained by adjusting resource allocation levels based onworkload characteristics.

Our approach to virtual machine configuration is to use in-formation about the anticipated workloads of each databasemanagement system (DBMS) to determine an appropriateconfiguration for the virtual machine in which it runs. Anadvantage of this approach is that we can avoid allocatingresources to DBMS instances that will obtain little benefitfrom them. For example, we can distinguish CPU intensive

workloads from I/O intensive workloads and allocate moreCPU to the former. Our technique is implemented as a vir-tualization design advisor, analogous to the physical designadvisors currently available for most relational DBMS. How-ever, our virtualization design advisor differs from DBMSphysical design advisors in two significant ways. First, itrecommends a configuration for the virtual machine con-taining the DBMS, rather than the DBMS itself. Second,our advisor is used to recommend configurations for a set ofvirtual machines that are sharing physical resources, whilemost DBMS physical design tools guide the configuration ofa single DBMS instance. Once the configured virtual ma-chines are up and running, our advisor is also capable ofcollecting runtime information that allows it to refine itsrecommendations online.

The rest of this paper is organized as follows. Section 2presents an overview of related work. In Section 3, wepresent a definition of the virtualization design problem.Section 4 describes our virtualization design advisor andpresents a cost model calibration methodology that allowsthe design advisor to leverage the query optimizer cost mod-els of the DBMSes that are being consolidated. In Section 5,we present an extension to the advisor that allows it to refineits recommendations using runtime performance measure-ments of the consolidated, virtualized DBMS instances. InSection 6, we present an experimental evaluation of our ap-proach using PostgreSQL and DB2 as our target DBMSes.We conclude in Section 7.

2. RELATED WORKThere are currently several technologies for machine vir-

tualization [2, 14, 17, 23], and our proposed virtualizationdesign advisor can work with any of them. As these virtu-alization technologies are being more widely adopted, thereis increasing interest in the problem of automating the de-ployment and control of virtualized applications, includingdatabase systems [9, 15, 16, 19, 24, 25]. Work on this prob-lem varies in the control mechanisms that are exploited andin the performance modeling methodology and optimizationobjectives that are used. However, a common feature ofthis work is that the target applications are treated as blackboxes that are characterized by simple models, typically gov-erned by a small number of parameters. In contrast, the vir-tualization design advisor described in this paper is specificto database systems, and it attempts to exploit databasesystem cost models to achieve its objectives. There is alsowork on application deployment and control, including re-source allocation and dynamic provisioning, that does notexploit virtualization technology [3, 8, 21, 22]. However, thiswork also treats the target applications as black boxes.

The virtualization design problem that is considered herewas posed, but not solved, in our previous work [18]. Thispaper builds on that previous work by proposing a completesolution to the problem in the form of a virtualization designadvisor. We also incorporate quality of service constraintsinto the problem definition, and we present an empiricalevaluation of the proposed solution.

There has been a substantial amount of work on the prob-lem of tuning database system configurations for specificworkloads or execution environments [26] and on the prob-lem of making database systems more flexible and adap-tive in their use of computing resources [1, 6, 10, 12, 20].However, in this paper we are tuning the resources to the

954

database system, rather than the other way around. Re-source management and scheduling have also been addressedwithin the context of database systems [4, 5, 7, 11]. Thatwork focuses primarily on the problem of allocating a fixedpool of resources to individual queries or query plan op-erators, or on scheduling queries or operators to run onthe available resources. In contrast, our resource alloca-tion problem is external to the database system, and henceour approach relies only on the availability of query costestimates from the database systems.

3. PROBLEM DEFINITIONOur problem setting is illustrated in Figure 1. N virtual

machines, each running an independent DBMS, are com-peting for a pool of physical resources. For each DBMS, weare given a workload description consisting of a set of SQLstatements (possibly with a frequency of occurrence for eachstatement). We use Wi (1 ≤ i ≤ N) to represent the work-load of the ith DBMS. In our problem setting, since we aremaking resource allocation decisions across workloads andnot for one specific workload, it is important that the work-loads represent the statements processed by the differentDBMSes in the same amount of time. Thus, a longer work-load represents a higher rate of arrival for SQL statements.

We assume that there are M different types of physicalresources, such as memory, CPU capacity, or I/O band-width, that are to be allocated to the virtual machines. Ourproblem is to allocate a share, or fraction, of each physi-cal resource to each of the virtual machines. We will useRi = [ri1, . . . , riM ], 0 ≤ rij ≤ 1, to represent the resourceshares allocated to workload Wi’s virtual machine. Theshares are used to set configuration parameters in the vir-tual machines so that the resource allocations described bythe shares are enforced by the virtual machine monitor.

We assume that each workload has an associated cost,which depends on the resources allocated to the virtual ma-chine in which the workload runs. We use Cost(Wi, Ri) torepresent the cost of workload Wi under resource allocationRi. Our goal is to find a feasible set of resource alloca-tions rij such that the total cost over all of the workloadsis minimized. Specifically, we must choose rij (1 ≤ i ≤ N ,1 ≤ j ≤M) such that

N∑i=1

Cost(Wi, Ri)

is minimized, subject to rij ≥ 0 for all i, j and∑N

i=1 rij = 1for all j. This problem was originally defined (but notsolved) in [18], and was named the virtualization designproblem.

In this paper, we have also considered a constrained ver-sion of the virtualization design problem. The constrainedversion is identical to the original problem, except for anadditional requirement that the solution must satisfy qual-ity of service (QoS) requirements imposed on one or more ofthe workloads. The QoS requirements specify the maximumincrease in cost that is permitted for a workload under thechosen resource assignment. We define the cost degradationfor a workload Wi under a resource assignment Ri as

Degradation(Wi, Ri) =Cost(Wi, Ri)

Cost(Wi, [1, . . . , 1])

where [1, . . . , 1] represents the resource assignment in which

Figure 3: Virtualization design advisor.

all of the available resources are allocated to Wi. In theconstrained version of the virtualization design problem, adegradation limit Li is specified for each workload Wi, andthe solution is required to obey the constraint

Degradation(Wi, Ri) ≤ Li

for all workloads. The degradation limit Li can be specifiedto be infinite for workloads for which limiting degradationis not desired.

We also introduce a mechanism for specifying relative pri-orities among the different workloads. A benefit gain factorGi can be specified for each workload, indicating how impor-tant it is to improve the performance of this workload com-pared to other workloads. Each unit of cost improvement forthe workload is considered to be worth Gi cost units. Thedefault setting for the different workloads is Gi = 1, indicat-ing that all workloads should be treated equally. IncreasingGi for a particular workload may cause it to get more thanits fair share of resources since cost improvements to it areamplified. Incorporating this metric into our problem defini-tion requires us to change the cost equation being minimizedto the following:

N∑i=1

Gi × Cost(Wi, Ri)

In this paper, we focus on the case in which the two re-sources to be allocated among the virtual machines are CPUtime and memory, i.e., M = 2. Most virtual machine moni-tors currently provide mechanisms for controlling the alloca-tion of these two resources to VMs, but it is uncommon forvirtual machine monitors to provide mechanisms for control-ling other resources, such as storage bandwidth. Neverthe-less, our problem formulation and our virtualization designadvisor can handle as many resources as the virtual machinemonitor can control.

4. VIRTUALIZATION DESIGN ADVISORA high level overview of our virtualization design advisor

is given in Figure 3. The advisor makes initial, static re-source allocation recommendations based on the workloaddescriptions and performance goals. Two modules withinthe design advisor interact to make these recommendations:a configuration enumerator and a cost estimator. The con-figuration enumerator is responsible for directing the explo-ration of the space of possible configurations, i.e., allocations

955

Parameter Description

random page cost cost of non-sequential disk pageI/O

cpu tuple cost CPU cost of processing one tu-ple

cpu operator cost per-tuple CPU cost for eachWHERE clause

cpu index tuple cost CPU cost of processing one in-dex tuple

shared buffers shared database bufferpool sizework mem amount of memory to be used

by sorting and hashing opera-tors

effective cache size size of the file system’s pagecache

Figure 4: PostgreSQL optimizer parameters.

of resources to virtual machines. The configuration enumer-ator is described in more detail in Section 4.3. To evaluatethe cost of a workload under a particular resource allocation,the advisor uses the cost estimation module. Given a work-load Wi and a candidate resource assignment Ri, selected bythe configuration enumerator, the cost estimation module isresponsible for estimating Cost(Wi, Ri). Cost estimation isdescribed in more detail in Section 4.1.

In addition to recommending initial virtual machine con-figurations, the virtualization design advisor can also adjustits recommendations dynamically based on observed work-load costs to correct for any cost estimation errors at theoriginal recommendation phase. This online refinement isdescribed in Section 5.

4.1 Cost EstimationGiven a workload Wi and a candidate resource alloca-

tion Ri, the cost estimator is responsible for estimatingCost(Wi, Ri). Our strategy for cost estimation is to leveragethe cost models that are built into the database systems forquery optimization. These models incorporate a wealth ofinformation about query processing within the DBMS, andwe would like to avoid reinventing this for the purpose ofvirtualization design.

A DBMS cost model can be described as a functionCostDB(Wi, Pi, Di), where Wi is a SQL workload, Pi =[pi1, pi2, . . . , PiL] is a vector of optimizer configuration pa-rameters, and Di is the database instance. The parametersPi are used to describe both the available computing re-sources and relevant parts of the DBMS configuration tothe cost model. For example, Figure 4 lists the relevantconfiguration parameters used by PostgreSQL version 8.1.3.

There are two difficulties in directly applying the DBMScost model for cost estimation for virtualization design. Thefirst problem is the difficulty of comparing cost estimatesproduced by different DBMSes. This is required for vir-tualization design because the design advisor is required toassign resources to multiple database systems, each of whichmay use a different cost model. DBMS cost models are in-tended to produce estimates that can be used to comparethe costs of alternative query execution strategies for a singleDBMS and a fixed execution environment. In general, com-paring cost estimates from different DBMS may be difficultbecause they may have very different notions of cost. For

Figure 5: Cost estimation for virtualization design.

example, one DBMS’s definition of cost might be responsetime, while another’s may be total computing resource con-sumption. Even if two DBMSes have the same notion ofcost, the cost estimates are typically normalized, and dif-ferent DBMSes may normalize costs differently. The firstof these two issues is beyond the scope of this paper, andfortunately it is often not an issue since many DBMS opti-mizers define cost as total resource consumption. For ourpurposes we will assume that this is the case. The normal-ization problem is not difficult to solve, but it does requirethat we renormalize the result of CostDB(Wi, Ri, Di) so thatestimates from different DBMS will be comparable.

The second problem is that the DBMS cost estimates de-pend on the parameters Pi, while the virtualization designadvisor is given a candidate resource allocation Ri. Thus, toleverage the DBMS query optimizer, we must have a meansof mapping the given candidate resource allocation to a setof DBMS configuration parameter values that reflect thecandidate allocation. We use this mapping to define a new“what-if” mode for the DBMS query optimizer. Instead ofgenerating cost estimates under fixed settings of Pi, we mapa given Ri to the corresponding Pi, and we use the Pi to an-swer the question: “if the parameter settings were to be setin a particular way, what would be the cost of the optimalplan for the given workload?”.

To address these problems, we construct cost estimatorsfor virtualization design as shown in Figure 5. A calibra-tion step is used to determine a set of DBMS cost modelconfiguration parameters corresponding to the given candi-date resource allocation Ri. Once these parameter valueshave been set, the DBMS cost model is then used to gen-erate CostDB for the given workload. Finally, this cost isrenormalized to produce the cost estimate required by thevirtualization design advisor.

The calibration and renormalization steps shown in Fig-ure 5 must be custom-designed for each type of DBMS forwhich the virtualization design advisor will be recommend-ing designs. To test the feasibility of this approach, wehave designed calibration and renormalization steps for bothPostgreSQL and DB2. In the following, we describe howthese steps were designed, using PostgreSQL as an illustra-tive example. The methodology for DB2 is very similar.

As has already been noted, we assume that the DBMSdefines cost as total resource consumption and, as a result,the renormalization step is simple. For example, in Post-greSQL, all costs are normalized with respect to the timerequired for a single sequential I/O operation. We havechosen to express costs in units of seconds. Thus, renor-

956

malization for PostgreSQL simply requires that we multiplyCostDB by the number of seconds required for a sequentialI/O operation. To determine this renormalization factor, wecreated a simple calibration program that sequentially reads8 kilobyte (the PostgreSQL page size) blocks of data fromthe virtual machine’s file system and reports the averagetime per block.

The calibration of the optimizer configuration parametersPi is more involved. We can distinguish two types of pa-rameters. Prescriptive parameters define the configurationof the DBMS itself. Changing the value of these param-eters changes the configuration of the DBMS itself. ForPostgreSQL, shared buffers and work mem are prescriptiveparameters. Descriptive parameters, in contrast, exist onlyto characterize the execution environment. Changing theseparameters affects the DBMS only indirectly through theeffect that they have on cost estimates. In PostgreSQL,parameters like cpu tuple cost, random page cost, andeffective cache size are descriptive parameters.

Values for prescriptive parameters must be chosen to re-flect the mechanisms or policies that determine the DBMSconfiguration. For example, if the PostgreSQL work mem

parameter will be left at its default value regardless ofthe amount of memory that our design advisor allocatesto the virtual machine in which the DBMS will run, thenthe calibration procedure should simply assign that defaultvalue to the work mem parameter. If, on the other hand,the DBMS’s configuration will be tuned in response to theamount of memory that is allocated to the virtual machine,then the calibration procedure should model this tuning pol-icy. For example, in our experiments our policy was toset shared buffers to 1/16 of the memory available in thehost virtual machine, and to set work mem to 5MB regardlessof the amount of memory available. Thus, our calibrationprocedure mimics these policies, setting shared buffers ac-cording to the virtual machine memory allocation describedby Ri and setting work mem to 5MB regardless of Ri.

For each descriptive parameter pik, we wish to determinea calibration function Calik that will define a value for pik

as a function of the candidate resource allocation Ri. Todo this, we use the following basic methodology for eachparameter pik:

1. Define a calibration query Q and a calibration databaseD such that CostDB(Q,Pi, D) is independent of alldescriptive parameters in Pi except for pik.

2. Choose a resource allocation Ri, instantiate D, andrun Q under that resource allocation, and measure itsexecution time TQ.

3. Solve Renormalize(CostDB(Q,Pi, D)) = TQ for pik,and associate the resulting pik value with the resourceallocation Ri. Here the Renormalize() function rep-resents the application of the renormalization factorthat was determined for the DBMS.

4. Repeat the two preceding steps for a variety of differentresource allocations Ri, associating each with a valueof pik.

5. Perform regression analysis on the set of (Ri, pik) valuepairs to determine calibration function Calik from re-source allocations to pik values.

A specific instance of this general methodology must bedesigned for each type of DBMS that will be consideredby the virtualization design advisor. The primary designtasks are the design of the calibration queries Q and calibra-tion database D (Step 1), the choice of resource allocationsfor which calibration measurements will be taken (Step 2),and the choice of function to be fit to the calibration data(Step 5). The design of the calibration methodology de-mands deep expertise in the implementation of the targetDBMS for the selection of calibration queries and databasein Step 1. For example, it is important to ensure that thecost of the calibration queries is dependent only on the pa-rameter that is being calibrated. It is also important tochoose the calibration database in such a way that all opti-mizer assumptions are satisfied, so that the cost estimatesit produces are accurate. For example, if the optimizerassumes a uniform data distribution then the calibrationdatabase should be uniformly distributed. The expertise re-quired for designing the calibration methodology is not amajor constraint on our approach, since this methodologyneed only be designed once for each type of DBMS.

In practice, the basic methodology can be refined and gen-eralized in several ways. One improvement is to choose cal-ibration queries in Step 1 that have minimal non-modeledcosts. For example, one cost that is typically not modeledis the cost of returning the query result to the application.2

This cost can be minimized by choosing calibration queriesthat return few rows. Care is also required in defining thecalibration database. For example, it should be just largeenough to allow query execution times to be measured ac-curately. Larger databases will increase the run times of thecalibration queries and hence the cost of calibration. Ideally,a single calibration database would be designed to be sharedby all of the calibration queries so that it is not necessaryto instantiate multiple databases during calibration.

Another potential problem with the basic methodology isthat it may not be possible to choose a single query thatisolates a particular cost model parameter in Step 1. In thiscase, one can instead identify a set of k queries that de-pend on k parameters (only). In Step 3 of the algorithm,a system of k equations is solved to determine values forthe k parameters for a given resource allocation. As a sim-ple example, consider the design of a calibration methodfor the cpu tuple cost parameter. PostgreSQL models thecost of a simple sequential table scan as a linear functionof cpu tuple cost that involves no other cost model pa-rameters. Thus, we could use a simple single-table selec-tion query without predicates as our calibration query forcpu tuple cost in Step 1. However, such a query would po-tentially return many tuples, leading to a large unmodeledcost. To eliminate this problem, we could instead choosea select count(*) query without predicates, since such aquery will return only a single row. However, the use ofaggregation in the query introduces a second cost modelparameter (cpu operator cost) into the query cost model.Thus, a second calibration query involving the same two pa-rameters will be required. One possibility is to use anotherselect count(*) query with a group by clause. The mea-sured run times of these two queries will then define a systemof two equations that can be solved to determine appropri-

2DBMS cost models ignore this cost because it is the samefor all plans for a given query, and thus is irrelevant for thetask of determining which plan is cheapest.

957

Figure 6: Variation in cpu tuple cost.

Figure 7: Variation in cpu operator cost.

ate values for cpu operator cost and cpu tuple cost foreach tested resource allocation.

4.2 Optimizing the Calibration ProcessOne of our major concerns was “How can we reduce the

number of different virtual machines we need to realize andthe number of calibration queries we need to run in order tocalibrate the query optimizer?” If we have N CPU settingsand M memory settings for the calibration experiments, asimplistic approach would be to realize N ×M virtual ma-chines and calibrate the parameters for each one. However,we could significantly reduce the calibration effort by relyingon the observation that CPU, I/O, and memory optimizerparameters are independent of each other and hence can becalibrated independently. We have verified this observationexperimentally on PostgreSQL and DB2.

For example, we have observed that the PostgreSQL CPUoptimizer parameters vary linearly with 1/(allocated CPUfraction). This is expected since if the CPU share of a VMis doubled, its CPU costs would be halved. At the sametime, the CPU parameters do not vary with memory sincethey are not describing memory. Thus, instead of needingN ×M experiments to calibrate CPU parameters, we onlyneed N experiments for the N CPU settings. Figures 6–8show the linear variation of the three CPU parameters ofthe PostgreSQL optimizer with 1/(CPU share). The fig-ures show for each parameter the value of the parameterobtained from a VM that was given 50% of the available

Figure 8: Variation in cpu index tuple cost.

Figure 9: Objective function for two workloads notcompeting for CPU.

memory, the average value of the parameter obtained from7 different VMs with memory allocations of 20%–80%, anda linear regression on the values obtained from the VM with50% of the memory. We can see from the figures that CPUparameters do not vary too much with memory, and that thelinear regression is a very accurate approximation. Thus, inour calibration of PostgreSQL, we calibrate the CPU pa-rameters at 50% memory allocation, and we use a linearregression to model how parameter values vary with CPUallocation. We have found similar optimization opportuni-ties for memory parameters, which can be calibrated at oneCPU setting, and I/O parameters, which do not depend onCPU or memory and can be calibrated once.

We expect that for all database systems, the optimizer pa-rameters describing one resource will be independent of thelevel of allocation of other resources, and we will be able tooptimize the calibration process as we did for PostgreSQL.This requires expert knowledge of the DBMS, but it can beconsidered part of designing the calibration process for theDBMS, which is performed once by the DBMS expert andthen used as many times as needed by users of the virtual-ization design advisor.

4.3 Configuration EnumerationThe shape of the objective function we are minimizing is

fairly smooth and concave. For example, Figures 9 and 10show the shape of this function for two workload mixes fromthe TPC-H benchmark running on PostgreSQL. In Figure 9,

958

Figure 10: Objective function for two workloadscompeting for CPU.

// start with equal resource shares for all workloads

foreach i from 1 to N do

Ri = [1/N, . . . , 1/N ]Ci = Gi × Cost(Wi, Ri)

end

// greedily shift resources until no more benefit

repeat

MaxDiff = 0foreach j from 1 to M do

MaxGainj = 0MinLossj = ∞foreach i from 1 to N do

// who benefits most from an increase?

// δ is a tunable algorithm parameter

C′ = Gi × Cost(Wi, [ri1, . . . , rij + δ, . . . , riM ])if ( Ci − C′ > MaxGainj ) then

MaxGainj = Ci − C′

igain = iend

// who suffers least from a reduction?

C′ = Gi × Cost(Wi, [ri1, . . . , rij − δ, . . . , riM ])if ( C′ − Ci < MinLossj ) and

( C′ satisfies degradation limit Li ) then

MinLossj = C′ − Ci

ilose = iend

end

// maximum benefit from adjusting this resource?

if (igain 6= ilose) and

( MaxGainj − MinLossj > MaxDiff ) then

MaxDiff = MaxGainj − MinLossj

imaxgain = igain

imaxlose = ilose

jmax = jend

end

if ( MaxDiff > 0 ) then

rimaxgainjmax = rigainjmax + δrimaxlosejmax = rilosejmax − δ

else

done = true

until done

Figure 11: Greedy configuration enumeration.

one workload is CPU intensive and the other is not, and inFigure 10 both workloads are CPU intensive and are com-peting for CPU. In both cases the shape of the cost functionremains smooth and concave. We have also verified this forthe case where we have N > 2 workloads. Hence, we adopta greedy search as our search algorithm. Due to the natureof the objective function, greedy search is accurate and fast,and is not likely to terminate at a local minimum. We haveobserved that when the greedy search does terminate at alocal minimum, this minimum is not far off from the globalminimum.

Figure 11 illustrates our greedy configuration enumera-tion algorithm. Initially, the algorithm assigns a 1/N shareof each resource to each of the N workloads. It then pro-ceeds iteratively. In each iteration, it considers shifting ashare δ (say, 5%) of some resource from one workload toanother. The algorithm considers all such resource reallo-cations, and if it finds reallocations of resources that arebeneficial according to the cost estimator, then it makes themost beneficial reallocation and iterates again. If no benefi-cial reallocations are found, algorithm terminates, reportingthe current resource allocations as the recommended alloca-tions.

The algorithm is greedy in the sense that it always re-moves resources from the workload whose estimated costwill increase the least as a result of the reallocation, andalways adds resources to the workload whose estimated costwill decrease the most as a result. If a workload has a per-formance degradation limit, Li, the algorithm will only takeresources away from this workload if its performance afterits resource level is reduced still remains within its degra-dation limit. If a workload has a benefit gain factor, Gi,the algorithm will multiply its cost by Gi for all levels ofresource allocation. Since each iteration’s reallocation af-fects only two workloads and the reallocation only occursif those workloads see a combined net cost reduction, eachiteration of the algorithm will decrease the total cost of theN workloads.

Unlike the cost model calibration procedure described inSection 4.1, the greedy search algorithm used for configura-tion enumeration does not require any access to the virtual-ization infrastructure and does not involve the execution ofany database queries, since it is based on cost models. Thealgorithm does, however, call the DBMS query optimizer toestimate costs, and these calls can potentially be expensive.A simple way to reduce the number of optimizer calls is tocache the estimated costs computed in one iteration of thealgorithm and reuse them in subsequent iterations. Sincethe algorithm changes the resource allocation of only twoworkloads in each iteration, there will be lots of opportuni-ties for reusing cached costs.

5. ONLINE REFINEMENTOur virtualization design advisor relies for cost estima-

tion on the query optimizer calibrated as described in theprevious section. This enables the advisor to make resourceallocation recommendations based on an informed and fairlyaccurate cost model without requiring extensive experimen-tation. However, the query optimizer – like any cost model –may have inaccuracies that lead to suboptimal recommenda-tions. When the virtual machines are configured as recom-mended by our advisor, we can observe the actual comple-tion times of the different workloads in the different virtual

959

machines, and we can refine the cost models used for makingresource allocation recommendations based on these obser-vations. After this, we can re-run the design advisor usingthe new cost models and obtain an improved resource allo-cation for the different workloads. This online refinementcontinues until the allocations of resources to the differentworkloads stabilize. We emphasize that the goal of onlinerefinement is not to deal with dynamic changes in the na-ture of the workload, but rather to correct for any queryoptimizer errors that lead to suboptimal recommendationsfor the given workload. Next, we present two approachesto online refinement. The first is a basic approach that canbe used when recommending allocations for one resource,and the second generalizes this basic approach to multipleresources.

5.1 Basic Online RefinementOur basic online refinement approach refines the cost mod-

els used for recommending resource allocations for one re-source. A fundamental assumption in this approach is thatworkload completion times are linear in the inverse of theresource allocation level. This means that the cost of work-load Wi under resource allocation level ri can be given by:

Cost(Wi, [ri]) =αi

ri+ βi

where αi and βi are the parameters of the linear model forworkloadWi. To obtain the αi and βi representing the queryoptimizer cost model for Wi, we run a linear regression onmultiple points representing the estimated costs for differ-ent 1/ri values that we obtain during the configuration enu-meration phase. Subsequently, we refine the different costmodels by adjusting αi and βi based on the observed costs(workload completion times).

Let the estimated cost for workload Wi at the resourcelevel recommended by the design advisor be Esti. At run-time, we can observe the actual cost of running the work-load, Acti. The difference between Esti and Acti guides ourrefinement process. One important observation is that refin-ing the cost models of the different workloads will not leadto a different resource allocation recommendation unless therefinement process changes the slopes of the cost equations(i.e., the αi’s). If a cost model underestimates the real costwe have to increase the slope, and if it overestimates the costwe have to reduce the slope. This will cause the resourceallocation recommendation to move in the right direction.The magnitude of the slope change should be proportionalto the observed error (the distance between Esti and Acti).The further Acti is from Esti, the higher the adjustmentthat is needed to correct the inaccuracy in resource alloca-tion decisions. At the same time, the line for the adjustedcost model should pass through the observed actual point,Acti. These requirements lead us to the following heuristicfor refining the cost model: Scale the linear cost equation byActiEsti

. Thus, the cost equation after refinement is given by:

Cost′(Wi, [ri]) =ActiEsti

· αi

ri+ActiEsti

· βi

After observing the actual completion times of all work-loads and refining their cost equations, we re-run the vir-tualization design advisor using the new cost equations toobtain a new resource allocation recommendation. If thenew recommendation is the same as the old recommenda-tion, we stop the refinement process. Otherwise, we perform

another iteration of online refinement. In the second iter-ation and beyond, we have multiple actual observed costsfor each workload from the different iterations of online re-finement, so we obtain the linear cost equation by runninga linear regression based on these observed costs (withoutusing optimizer estimates). To prevent the refinement pro-cess from continuing indefinitely, we place an upper boundon the number of refinement iterations. In our experiments,refinement always converges in one or two iterations.

5.2 Generalized Online RefinementThe basic online refinement approach is sufficient if we are

recommending allocations for one resource for which the as-sumption of a linear cost equation holds. In general, we maybe recommending allocations for more than one resource.Moreover, it is not always the case that we can assume alinear cost equation for all resource allocation levels. Wedeal with each of these issues in turn in the following para-graphs.

To enable online refinement when we are recommend-ing allocations for multiple resources, we extend our as-sumption of a linear cost equation to multiple dimensions.When recommending allocations for M resources, we as-sume that the cost of workload Wi given resource allocationRi = [ri1, . . . , riM ] can be given by:

Cost(Wi, Ri) =

M∑j=1

αij

rij+ βi

As in the one-resource case, we obtain the αij ’s and βi

representing the optimizer cost model for a workload by run-ning a linear regression on estimated costs obtained duringconfiguration enumeration. In this case, the regression is amulti-dimensional linear regression.

To refine the cost equation based on observed actual cost,we use the same reasoning that we used for the basic refine-ment approach, and therefore we scale the cost equation byActiEsti

. Thus, the cost equation after refinement is given by:

Cost′(Wi, Ri) =

M∑j=1

ActiEsti

· αij

rij+ActiEsti

· βi

=

M∑j=1

α′ijrij

+ β′i

where α′ij = ActiEsti

αij and β′i = ActiEsti

βi.

After refining the cost equations based on observed costs,we re-run the design advisor using the new cost equationsto get a new resource allocation recommendation. Since thelinear cost equations may not hold for all resource alloca-tion levels, we only allow the design advisor to change theallocation level of any resource for any workload by at most∆max (e.g., 10%). This is based on the assumption that thelinear cost equations will hold but only within a restrictedneighborhood around the original resource allocation recom-mendation. This local linearity assumption is much moreconstrained than the global linearity assumption made inSection 5.1, so we can safely assume that it holds for allworkloads and all resources. However, the downside of us-ing this more constrained assumption is that we can onlymake small adjustments in resource allocation levels. Thisis sufficient for cases where the query optimizer cost modelhas only small errors, but it cannot deal with cases where

960

the errors in the optimizer cost model are large. Dealingwith situations where the optimizer cost model has largeerrors is a subject for future investigation.

If the newly obtained resource allocation recommendationis the same as the original recommendation, we stop therefinement process. If not, we continue to perform iterationsof online refinement. When recommending allocations forM resources, we need M + 1 actual cost observations tobe able to fit a linear model to the observed costs withoutusing optimizer estimates. Thus, for the first M iterationsof online refinement, we use the same procedure as the firstiteration. We compute an estimated cost for each workloadbased on the current cost model of that workload, Esti. Wethen observe the actual cost of the workload Acti and scalethe cost equation by Acti

Esti. For example, the cost equation

after the second iteration would be as follows:

Cost′′(Wi, Ri) =

M∑j=1

ActiEsti

·α′ijrij

+ActiEsti

· β′i

=

M∑j=1

α′′ijrij

+ β′′i

where α′′ij = ActiEsti

α′ij and β′′i = ActiEsti

β′i. This refinement ap-

proach retains some residual information from the optimizercost model until we have sufficient observations to stop re-lying on the optimizer.

If refinement continues beyond M iterations, we fit anM -dimensional linear regression model to the observed costpoints (of which there will be more than M), and we stopusing optimizer estimates. Throughout the refinement pro-cess, when we run the virtualization design advisor to ob-tain a new resource allocation recommendation, we alwaysrestrict the change in the recommended level of any resourceto be within ∆max of the original level recommended in thefirst iteration. This ensures that we are operating in a re-gion within which we can assume a linear cost model. Toguarantee that the refinement process terminates, we placean upper bound on the number of refinement iterations.

6. EXPERIMENTAL EVALUATION

6.1 Experimental SetupEnvironment: We conduct experiments using the DB2 V9and PostgreSQL 8.1.3 database systems. The DB2 experi-ments use a machine with two 3.4GHz dual core Intel Xeonx64 processors and 1 GB of memory, running RedHat En-terprise Linux 5. The PostgreSQL experiments use a ma-chine with two 2.2GHz dual core AMD Opteron Model 275x64 processors and 8GB memory, running SUSE Linux 10.1.We use Xen as our virtualization environment [2], installingboth database systems on Xen-enabled versions of their re-spective operating systems. The resource control capabil-ities required by our configuration advisor are available inall major virtualization environments, but we use Xen be-cause of its growing popularity – most Linux distributionsnow come with full Xen support as a standard feature.Workloads: We use queries from the TPC-H benchmarkand an OLTP workload for our experiments. The two databasesystems have different TPC-H databases. For DB2, we usean expert-tuned implementation of the benchmark with scalefactor 1 (1GB). With indexes, the size of this database on

disk is 7GB. For PostgreSQL, we use the OSDL implemen-tation of the benchmark [13], which is specifically tuned forPostgreSQL. For most of our experiments, we use a databasewith scale factor 1, which has a total size on disk with in-dexes of 4GB. In Section 6.5, we use a PostgreSQL databasewith scale factor 10, which has a total size on disk with in-dexes of 30GB. The OLTP workload is run only on DB2.This workload is modeled after a real customer workloadfor a credit card transaction database. The database con-sists of one table that has 112 character fields with a totalwidth of 2318 bytes. This table starts empty, and the work-load accessing it consists of M client threads concurrentlyinserting then retrieving then updating x rows each into thistable. For our experiments, we use M = 40 clients and wevary x to get OLTP workloads of varying sizes.Virtual Machines and Resource Allocation: The basicsetup for our experiments is that we run N different work-loads in N virtual machines that all share the same physi-cal machine. The Xen virtual machine monitor (known asthe hypervisor in Xen terminology), like all virtual machinemonitors, provides mechanisms for controlling allocation ofresources to the different virtual machines. The Xen hyper-visor allows us to control a virtual machine’s CPU allocationby varying the CPU scheduling time slice of this machine.The hypervisor also allows us to control the amount of phys-ical memory allocated to a virtual machine. Our virtualiza-tion design advisor uses these mechanisms provided by Xento control the allocation of resources to the different virtualmachines. We have observed that for the workloads used inour experiments, the amount of memory allocated to a vir-tual machine has only a minor effect on performance. We,therefore, focus our experimental evaluation on the effec-tiveness of our virtualization design advisor at deciding theCPU allocations of the different virtual machines. For mostof our experiments, we give each virtual machine a fixedmemory allocation of 512MB. We set the memory param-eters of DB2 to 190MB for the buffer pool and 40MB forthe sort heap (we do not use the DB2 self-tuning memorymanager that automatically adjusts memory allocations).For PostgreSQL, we set the shared buffers to 32MB andthe work memory to 5MB. When running experiments withPostgreSQL on the TPC-H database with scale factor 10,we give the virtual machine 6GB of memory, and we setthe PostgreSQL shared buffers to 4GB and work memory to5MB. To obtain the estimated workload completion timesbased on the query optimizer cost model, we only need tocall the optimizer with its CPU and memory parameters setappropriately according to our calibration procedure, with-out needing to run a virtual machine. To obtain the actualworkload completion times, we run the virtual machines in-dividually one after the other on the physical machine, set-ting the virtual machine and database system parameters tothe required values. We use a warm database for these runs.We have verified that the performance isolation capability ofXen ensures that running the virtual machines concurrentlyor one after the other yields the same workload completiontimes for our workloads.Performance Metric: Without a virtualization design ad-visor, the simplest resource allocation decision is to allocate1/N of the available resources to each of the N virtual ma-chines sharing a physical machine. We call this the defaultresource allocation. To measure performance, we determinethe total execution time of the N workloads under this de-

961

fault allocation, Tdefault, and we also determine the total ex-ecution time under the resource allocation recommended byour advisor for the different workloads, Tadvisor. Our metricfor measuring performance is relative performance improve-

ment over the default allocation, defined asTdefault−Tadvisor

Tdefault.

For most of our experiments, we compute this metric basedon the query optimizer cost estimates, but for some experi-ments we compute the performance improvement based onthe actual run time of the queries.

6.2 Cost of Calibration and Search AlgorithmsThe cost of the query optimizer calibration process highly

depends on the targeted DBMS. Calibrating the DB2 opti-mizer involved executing stand-alone programs to measurethe following three resource related parameters: CPU speed,I/O bandwidth, and I/O overhead. The DB2 optimizer candetermine all of its remaining resource related parametersfrom these three parameters. Calibrating CPU speed took60 seconds for low CPU configurations and 20 seconds forhigh CPU configurations. Calibrating I/O parameters took105 seconds. For both DB2 and PostgreSQL, calibrating theI/O parameters was done for only one CPU setting since wehave observed that these parameters are independent of thevirtual machine CPU configuration. In total, the DB2 cal-ibration process for all CPU allocation levels to the virtualmachine took less than 6 minutes. Calibrating the Post-greSQL optimizer involved executing SQL queries to cali-brate CPU related parameters, and stand-alone programs tomeasure I/O related parameters. Calibrating CPU param-eters took an average of 90 seconds for low CPU configura-tions and 40 seconds for high CPU configurations. Calibrat-ing I/O parameters took 60 seconds. The entire PostgreSQLcalibration process took less than 9 minutes.

The cost of the search algorithm used by the virtualiza-tion design advisor depends on whether we are doing theinitial recommendation or online refinement. For the ini-tial recommendation, the search algorithm needs to call thequery optimizer multiple times for cost estimation. The al-gorithm converged in 8 iterations of greedy search or less,and it took less than 2 minutes. For online refinement, thesearch algorithm uses its own cost model and does not needto call the optimizer. Convergence still took 8 iterations orless of greedy search, but this always completed in less than1 minute. With these results we can see that the overheadof our design advisor is acceptable: a one-time calibrationprocess that requires less than 10 minutes, and a search al-gorithm that typically takes less than 1 minute.

6.3 Sensitivity to Workload Resource NeedsIn this set of experiments, we verify that our advisor can

accurately respond to the different resource needs of differ-ent workloads. For this experiment, we examine the behav-ior of the 22 TPC-H queries for a database with scale factor1, and we determine that Q18 is one of the most CPU inten-sive queries in the benchmark (i.e., its performance improvessignificantly if it is given more CPU), while Q21 is one ofthe least CPU intensive queries in the benchmark (i.e., itsperformance does not improve too much if it is given moreCPU). Thus, we use workloads consisting of multiple copiesof Q18 and Q21, and we vary the resource needs of theworkloads by varying the number of copies of the two querytypes. One subtle point to note here is that Q21 has muchlonger estimated and actual run times than Q18, so a vir-

Figure 12: Varying CPU intensity (DB2).

Figure 13: Varying CPU intensity (PostgreSQL).

tual machine that is running one copy of Q18 will appearto be “less loaded” than a virtual machine that is runningone copy of Q21, and hence the Q21 VM will be given moreresources by our advisor. This would be the correct decisionin this case, but we want to make sure that any variation inthe resource allocation to the different workloads is due tovariations in their resource needs not simply due to havingdifferent lengths. Thus, we use 25 copies of Q18 as our CPUintensive workload “unit”, which we refer to as C, and 1copy of Q21 as our CPU non-intensive workload unit, whichwe refer to as I. To create workloads with different CPUneeds, we combine different numbers of C and I units. Notethat both C and I are decision support queries that bothhave fairly high CPU demands, even though the demandsof C are greater than I. Hence, using C and I for our work-loads leaves almost no slack for the advisor to improve per-formance. Our purpose in this section is to illustrate thatthe advisor can detect the different resource needs of thedifferent workloads and improve performance even in thiscompetitive environment.

In our first experiment, we use two workloads that run intwo different virtual machines. The first workload consistsof 5 C units and 5 I units (i.e., W1 = 5C + 5I). The secondworkload has k C units and (10− k) I units for k = 0 to 10(i.e., W2 = kC + (10 − k)I). As k increases, W2 becomesmore CPU intensive while W1 remains unchanged. The rel-ative sizes of the workloads remain unchanged due to theway we scale C and I to have the same size. Figures 12and 13 show for DB2 and PostgreSQL, respectively, for dif-

962

Figure 14: Varying workload size and resource in-tensity (DB2).

Figure 15: Varying workload size and resource in-tensity (PostgreSQL).

ferent values of k, the amount of CPU allocated to W2 byour virtualization design advisor (on the left y-axis) and theestimated performance improvement of this allocation overthe default allocation of 50% CPU to each workload (on theright y-axis). For small k, our design advisor gives most ofthe CPU to W1 because W1 is more CPU intensive. As kincreases, our advisor is able to detect that W2 is becomingmore CPU intensive and therefore it gives W2 more CPU.Overall performance is improved over the default allocationexcept in the cases where the two workloads are similar toeach other so that the default allocation is optimal. Themagnitude of the performance improvement is small becauseboth workloads are fairly CPU intensive so the performancedegradation of W1 when more of the CPU is given to W2 isonly slightly offset by the performance improvement in W2.The main point of this experiment is that the advisor is ableto detect the different resource needs of different workloadsand make the appropriate resource allocation decisions.

In our second experiment, we use two workloads that runin two different virtual machines. The first workload con-sists of 1 C unit (i.e., W3 = 1C). The second workload has kC units for k = 1 to 10 (i.e., W4 = kC). As k increases, W4

becomes longer compared to W3, and hence more resourceintensive. The correct behavior in this case is to allocatemore resources to W4. Figures 14 and 15 show for DB2and PostgreSQL, respectively, the CPU allocated by ourdesign advisor to W4 for different k and the performance

Figure 16: Varying workload size but not resourceintensity (DB2).

Figure 17: Varying workload size but not resourceintensity (PostgreSQL).

improvement due to this allocation. Initially, when k = 1,both workloads are the same so they both get 50% of theresources. However, as k increases and W4 becomes moreresource intensive, our search algorithm is able to detectthat and allocate more resources to this workload, resultingin an overall performance improvement. The performanceimprovements in this figure are greater than those in theprevious experiment since there is more opportunity dueto the larger difference in the resource demands of the twoworkloads.

Our next experiment demonstrates that simply relying onthe relative sizes of the workloads to make resource allo-cation decisions can result in poor decisions. For this ex-periment, we use one workload consisting of 1 C unit (i.e.,W5 = 1C) and one workload consisting of k I units for k = 1to 10 (i.e., W6 = kI). Here the goal is to illustrate that eventhough W6 may have a longer running time, the fact that itis not CPU intensive should lead our algorithm to concludethat giving it more CPU will not reduce the overall execu-tion time. Therefore, the correct decision would be to keepmore CPU with W5 even as W6 grows. Figures 16 and 17show for DB2 and PostgreSQL, respectively, that our searchalgorithm does indeed give a lot less CPU to W6 than is war-ranted by its length. W6 has to be several times as long asW5 to get the same CPU allocation.

It is clear from these experiments that our virtualizationdesign advisor behaves as expected, which validates our op-

963

Figure 18: Effect of Li.

timizer calibration process and our search algorithm. It isalso clear that the advisor is equally effective for both DB2and PostgreSQL, although the magnitudes of the improve-ments are higher for DB2.

6.4 Supporting QoS MetricsIn this section, we demonstrate the ability of our virtu-

alization design advisor to make recommendations that areconstrained in accordance with user defined QoS parame-ters (the degradation limit, Li, and the benefit gain factor,Gi). For this experiment we use five identical workloads,W7–W11, each consisting of 1 unit of the C workload usedin Section 6.3. The optimal allocation decision in this case isto split the resources equally between the workloads, but weset service degradation limits for two of the workloads to in-fluence this decision. We vary the service degradation limitof W7, L7, from 1 to 4, and we give W8 a fixed degradationlimit L8 = 2.5.

Figure 18 shows the service degradation of all workloadsfor different values of L7. We can see that our virtualizationdesign advisor is able to meet the constraints specified byL7 and L8 and limit the degradation that W7 and W8 suf-fer. This comes at the cost of a higher degradation for theother workloads, but that is expected since the Li parame-ters specify goals that are specific to particular workloads.

We also verified the ability of our advisor to increase theCPU allocation to workloads whose Gi is greater than 1.We omit the details due to lack of space.

6.5 Random WorkloadsThe experiments in the previous sections are fairly “con-

trolled” in the sense that we know what to expect from thedesign advisor. In this section, we demonstrate the effective-ness of our advisor on random workloads for which we donot have prior expectations about what the final configura-tion should be. Our goal is to show that for these cases, theadvisor will recommend resource allocations that are betterthan the default allocation.

Each experiment in this section uses 10 workloads. Werun each workload in a separate virtual machine, and wevary the number of concurrently running workloads from 2to 10. For each set of concurrent workloads, we run ourdesign advisor and determine the CPU allocation to eachvirtual machine and the performance improvement over thedefault allocation of 1/N CPU share for each workload.

We present results for two random workload experiments.

Figure 19: CPU allocation for N workloads on TPC-H database.

Figure 20: CPU allocation for N OLTP + TPC-Hworkloads.

Figure 21: Performance improvement for N work-loads on TPC-H database.

964

The first experiment uses queries on a TPC-H databasewith scale factor 10 stored in PostgreSQL. Using a databasewith scale factor 10 allows us to test our design advisorwith long-running resource-intensive queries. For this ex-periment, each of the 10 workloads consists of a randommix of between 10 and 20 workload units. A workload unitcan be either 1 copy of TPC-H query Q17 or 66 copies of amodified version of TPC-H Q18. We added a WHERE pred-icate to the sub-query that is part of the original Q18 so thatthe query touches less data, and therefore spends less timewaiting for I/O. The number of copies of the modified Q18in a workload unit is chosen so that the two workload unitshave the same completion time when running with 100% ofthe available CPU.

The second random workload experiments uses 10 work-loads running on DB2 in different virtual machines. Of theseworkloads, 5 are OLTP workloads that touch x = 200 tox = 6000 rows of the table in the OLTP database (W2, W4,W6, W8, and W10). The other 5 workloads consist of up to40 randomly chosen TPC-H queries.

Figures 19 and 20 show, for both of these experiments,the changes in CPU allocation to the different workloads aswe introduce new workloads to the mix. It can be seen thatour virtualization design advisor is identifying the natureof new workloads as they are introduced and is adjustingthe resource allocations accordingly. The slopes of the dif-ferent CPU allocation lines are not constant. It can alsobe seen that the advisor maintains the relative order of theworkloads’ CPU allocations even as new workloads are in-troduced. The fact that some workload is more resourceintensive than another does not change due to the introduc-tion of more workloads.

Figure 21 shows the actual performance improvement un-der different resource allocations for the experiment on thescale factor 10 TPC-H database. The figure shows the per-formance improvement under the resource allocation recom-mended by the virtualization design advisor, and under theoptimal resource allocation obtained by exhaustively enu-merating all feasible allocations and measuring performancein each one. The figure shows that our virtualization designadvisor, using a properly calibrated query optimizer and awell-tuned database, can achieve near-optimal resource al-locations.

6.6 Online RefinementIn some cases, the query optimizer cost model is inac-

curate so our resource allocation decisions are suboptimaland the actual performance improvement we obtain is sig-nificantly less than the estimated improvement. Most workon automatic physical database design ignores optimizer er-rors even if they result in suboptimal decisions. One of theunique features of our work is that we try to correct for op-timizer errors through our online refinement process. In thissection, we illustrate the effectiveness of this process usingthe OLTP + TPC-H workloads from the previous section.Since we are only allocating CPU to the different virtualmachines, and since CPU is a resource for which a linearcost model is typically accurate, we use the basic online re-finement approach that is described in Section 5.1.

We expect the query optimizer to be less accurate in mod-eling OLTP workloads than DSS workloads such as TPC-H.The optimizer cost model does not capture contention or up-date costs, which are significant factors in OLTP workloads.

Figure 22: CPU allocation for N OLTP + TPC-Hworkloads after online refinement.

Figure 23: Performance improvement for N OLTP+ TPC-H workloads before and after online refine-ment.

Thus, for our OLTP workload, the optimizer tends to un-derestimate the CPU requirements. The OLTP workload isindeed less CPU intensive than the workloads from TPC-Hsince I/O is a much higher fraction of its work, but the queryoptimizer sees it as much less CPU intensive than it really is.It, therefore, leads the advisor to allocate a large portion ofthe CPU to the workloads from TPC-H. Implementing theCPU allocations recommended by the advisor results in ac-tual performance improvements that are shown in Figure 23.These recommendations are clearly inaccurate. However, ifwe run our online refinement process on the different sets ofworkloads, the process converges in at most two iterationsand gives the CPU allocations shown in Figure 22. We haveverified that these CPU allocations are the same as the opti-mal allocations obtained by performing an exhaustive searchthat finds the allocation with the lowest actual completiontime. In these CPU allocations, the workloads from TPC-H are getting less CPU than before, even though they arelonger and more resource intensive. The CPU taken fromthese workloads is given to the OLTP workloads and pro-vides them with an adequate level of CPU. The resultingactual performance improvements are much better than theimprovements without online refinement, and are also shownin Figure 23. Thus, we are able to show that our advisorcan provide effective recommendations for different kinds ofworkloads, giving us easy performance gains of up to 25%.

965

7. CONCLUSIONSIn this paper, we considered the problem of automatically

configuring multiple virtual machines that are all runningdatabase systems and sharing a pool of physical resources.Our approach to solving this problem is implemented as avirtualization design advisor that takes information aboutthe different database workloads and uses this informationto determine how to split the available physical computingresources among the virtual machines. The advisor relieson the cost models of the database system query optimizersto enable it to predict workload performance under differ-ent resource allocations. We described how to calibrate andextend these cost models so that they are used for this pur-pose. We also presented an approach that uses actual per-formance measurements to refine the cost models used forrecommendation. This provides a means of correcting costmodel inaccuracies. We conducted an extensive empiricalevaluation of the virtualization design advisor, demonstrat-ing its accuracy and effectiveness.

8. REFERENCES[1] R. Agrawal, S. Chaudhuri, A. Das, and V. R.

Narasayya. Automating layout of relational databases.In Proc. Int. Conf. on Data Engineering (ICDE),2003.

[2] P. T. Barham, B. Dragovic, K. Fraser, S. Hand, T. L.Harris, A. Ho, R. Neugebauer, I. Pratt, andA. Warfield. Xen and the art of virtualization. InProc. ACM Symposium on Operating SystemsPrinciples (SOSP), 2003.

[3] M. Bennani and D. A. Menasce. Resource allocationfor autonomic data centers using analytic performancemodels. In Proc. IEEE Int. Conf. on AutonomicComputing (ICAC), 2005.

[4] M. J. Carey, R. Jauhari, and M. Livny. Priority inDBMS resource scheduling. In Proc. Int. Conf. onVery Large Data Bases (VLDB), 1989.

[5] D. L. Davison and G. Graefe. Dynamic resourcebrokering for multi-user query execution. In Proc.ACM SIGMOD Int. Conf. on Management of Data,1995.

[6] K. Dias, M. Ramacher, U. Shaft, V. Venkataramani,and G. Wood. Automatic performance diagnosis andtuning in Oracle. In Proc. Conf. on Innovative DataSystems Research (CIDR), 2005.

[7] M. N. Garofalakis and Y. E. Ioannidis.Multi-dimensional resource scheduling for parallelqueries. In Proc. ACM SIGMOD Int. Conf. onManagement of Data, 1996.

[8] A. Karve, T. Kimbrel, G. Pacifici, M. Spreitzer,M. Steinder, M. Sviridenko, and A. Tantawi. Dynamicplacement for clustered web applications. In Proc. Int.Conf. on WWW, 2006.

[9] G. Khanna, K. Beaty, G. Kar, and A. Kochut.Application performance management in virtualizedserver environments. In Proc. IEEE/IFIP NetworkOperations and Management Symp. (NOMS), 2006.

[10] P. Martin, H.-Y. Li, M. Zheng, K. Romanufa, andW. Powley. Dynamic reconfiguration algorithm:Dynamically tuning multiple buffer pools. In Proc.Int. Conf. Database and Expert Systems Applications(DEXA), 2000.

[11] M. Mehta and D. J. DeWitt. Dynamic memoryallocation for multiple-query workloads. In Proc. Int.Conf. on Very Large Data Bases (VLDB), 1993.

[12] D. Narayanan, E. Thereska, and A. Ailamaki.Continuous resource monitoring for self-predictingDBMS. In Proc. IEEE Int. Symp. on Modeling,Analysis, and Simulation of Computer andTelecommunication Systems (MASCOTS), 2005.

[13] OSDL Database Test Suite 3.http://sourceforge.net/projects/osdldbt.

[14] M. Rosenblum and T. Garfinkel. Virtual machinemonitors: Current technology and future trends. IEEEComputer, 38(5), 2005.

[15] P. Ruth, J. Rhee, D. Xu, R. Kennell, andS. Goasguen. Autonomic live adaptation of virtualcomputational environments in a multi-domaininfrastructure. In Proc. IEEE Int. Conf. onAutonomic Computing (ICAC), 2006.

[16] P. Shivam, A. Demberel, P. Gunda, D. E. Irwin, L. E.Grit, A. R. Yumerefendi, S. Babu, and J. S. Chase.Automated and on-demand provisioning of virtualmachines for database applications. In Proc. ACMSIGMOD Int. Conf. on Management of Data, 2007.Demonstration.

[17] J. E. Smith and R. Nair. The architecture of virtualmachines. IEEE Computer, 38(5), 2005.

[18] A. A. Soror, A. Aboulnaga, and K. Salem. Databasevirtualization: A new frontier for database tuning andphysical design. In Proc. Workshop on Self-ManagingDatabase Systems (SMDB), 2007.

[19] M. Steinder, I. Whalley, D. Carrera, and I. G. D. M.Chess. Server virtualization in autonomic managementof heterogeneous workloads. In Proc. IFIP/IEEE Int.Symp. on Integrated Network Mgmt., 2007.

[20] A. J. Storm, C. Garcia-Arellano, S. Lightstone,Y. Diao, and M. Surendra. Adaptive self-tuningmemory in DB2. In Proc. Int. Conf. on Very LargeData Bases (VLDB), 2006.

[21] C. Tang, M. Steinder, M. Spreitzer, and G. Pacifici. Ascalable application placement controller for enterprisedata centers. In Proc. Int. Conf. on WWW, 2007.

[22] G. Tesauro, R. Das, W. E. Walsh, and J. O. Kephart.Utility-function-driven resource allocation inautonomic systems. In IEEE Int. Conf. on AutonomicComputing, 2005.

[23] VMware. http://www.vmware.com/.

[24] X. Wang, Z. Du, Y. Chen, and S. Li.Virtualization-based autonomic resource managementfor multi-tier web applications in shared data center.Journal of Systems and Software, 2008.

[25] X. Wang, D. Lan, G. Wang, X. Fang, M. Ye, Y. Chen,and Q. Wang. Appliance-based autonomicprovisioning framework for virtualized outsourcingdata center. In Proc. IEEE Int. Conf. on AutonomicComputing (ICAC), 2007.

[26] G. Weikum, A. Monkeberg, C. Hasse, and P. Zabback.Self-tuning database technology and informationservices: from wishful thinking to viable engineering.In Proc. Int. Conf. on Very Large Data Bases(VLDB), 2002.

966

Automatic Virtual Machine Conﬁguration for Database Workloadsaakssoro/pubs/soror_sigmod08.pdf ·...

Documents

Transcript of Automatic Virtual Machine Conﬁguration for Database Workloadsaakssoro/pubs/soror_sigmod08.pdf ·...