Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan,...

41
Experiment-driven System Management Shivnath Babu Shivnath Babu Duke University Duke University Joint work with Songyun Duan, Joint work with Songyun Duan, Herodotos Herodotou, and Herodotos Herodotou, and Vamsidhar Thummala Vamsidhar Thummala

Transcript of Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan,...

Page 1: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Experiment-driven System Management

Shivnath BabuShivnath BabuDuke UniversityDuke University

Joint work with Songyun Duan, Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Herodotos Herodotou, and Vamsidhar

ThummalaThummala

Page 2: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Managing DBs in Small to Medium Business Enterprises (SMBs)

• Peter is a system admin in an SMB– Manages the database

(DB)– SMB cannot afford a DBA

• Suppose Peter has to tune a poorly-performing DB– Design advisor may not

help– Maybe the problem is with

DB configuration parameters

Database (DB)

Page 3: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Tuning DB Configuration Parameters

• Parameters that control – Memory distribution– I/O optimization– Parallelism– Optimizer’s cost model

• Number of parameters ~ 100– 15-25 critical params depending on OLAP Vs.

OLTP

• Few holistic parameter tuning tools available – Peter may have to resort to 1000+ page tuning

manuals or rules of thumb from experts– Can be a frustrating experience

Page 4: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Response Surfaces • TPC-H 4 GB DB size, 1 GB memory, Query 18

2-dim Projection of a 11-dim Surface

Page 5: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

DBA’s Approach to Parameter Tuning

• DBAs run experiments– Here, an experiment is a run of the DB workload

with a specific parameter configuration– Common strategy: vary one DB parameter at a

time

Page 6: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Experiment-driven Management

Are more experiments

needed?

Process output to extract

information

Plannext set of

experiments

Conductexperiments on

workbench

Yes

Mgmt. taskResult

Goal: Automate this process

Page 7: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Roadmap• Use cases of experiment-driven mgmt.

– Query tuning, benchmarking, Hadoop, testing, …

• iTuned: Tool for DB conf parameter tuning– End-to-end application of experiment-driven

mgmt.

• .eX: Language and run-time system that brings experiment-driven mgmt. to users & tuning tools

Page 8: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

What is an Experiment?

• Depends on the management task– Pay some extra cost, get new information in return– Even for a specific management task, there can be

spectrum of possible experiments

Page 9: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Uses of Experiment-driven Mgmt.• DB conf

parameter tuning

Page 10: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Uses of Experiment-driven Mgmt.• DB conf

parameter tuning

• MapReduce job tuning in Hadoop

Page 11: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Uses of Experiment-driven Mgmt.• DB conf

parameter tuning

• MapReduce job tuning in Hadoop

• Server benchmarking– Capacity

planning– Cost/perf

modeling

Page 12: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

<100, 187>

<100, 187>

<100, 436>

<1, 0.6><2473, 7496>

<380459, 229739> <65, 309>

<65, 309>

<1, 1>

<1, 1>

<1629, 1615>

<Estimated, Actual> Cardinality

<2473, 7496>

Uses of Experiment-driven Mgmt.• Tuning

“problem queries”

Page 13: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Uses of Experiment-driven Mgmt.• Tuning

“problem queries”

Page 14: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Uses of Experiment-driven Mgmt.

• Tuning “problem queries”

• Troubleshooting• Testing• Canary in the

server farm (James Hamilton, Amazon)

• …

• DB conf parameter tuning

• MapReduce job tuning in Hadoop

• Server benchmarking– Capacity planning– Cost/perf modeling

Page 15: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Roadmap• Use cases of experiment-driven mgmt.

– Query tuning, benchmarking, Hadoop, testing, …

• iTuned: Tool for DB conf parameter tuning– End-to-end application of experiment-driven

mgmt.

• .eX: Language and run-time system that brings experiment-driven mgmt. to users & tuning tools

Page 16: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Problem Abstraction• Unknown response surface: y = F(X)

– X = Parameters x1, x2, …, xm

• Each experiment gives a <Xi,yi> sample– Set DB to conf Xi– Run workload that needs tuning– Measure performance yi at Xi

• Goal: Find high performance setting with low total cost of running experiments

Page 17: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

4 6 8 10 12

02

46

8

x1

y

4 6 8 10 12

02

46

8

x1

y

Example

• Goal: Compute the potential utility of candidate experiments

Utility(X)Where to do thenext experiment?

Page 18: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

// Phase I: Bootstrapping– Conduct some initial experiments// Phase II: Sequential Sampling– Loop: Until stopping condition is reached

1. Identify candidate experiments to do next2. Based on current samples, estimate the

utility of each candidate experiment3. Conduct the next experiment at the

candidate with highest utility

iTuned’s Adaptive Sampling Algorithm for Experiment Planning

Page 19: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Utility of an Experiment• Let <X1,y1>--<Xn,yn> be the samples from n

experiments done so far• Let <X*,y*> be the best setting so far (i.e., y* =

mini yi)

– wlg assuming minimization• U(X), Utility of experiment at X is

// y = F(X)– y* - y if y* > y– 0 otherwise

• However, U(X) poses a chicken-and-egg problem– y will be known only after experiment is run at X

• Goal: Compute expected utility EU(X)

Page 20: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Expected Utility of an Experiment• Suppose we have the probability density function

of y (y is the perf at X)– Prob(y = v | <Xi,yi> for i=1,…,n)

• Then, EU(X) = sv=-1 U(X) Prob(y = v) dv

EU(X) = sv=-1 (y* - v) Prob(y = v) dv

• Goal: Compute Prob(y = v | <Xi,yi> for i=1,…,n)

v=+1

v=y*

Page 21: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

• GRS models the response surface as: y(X) = g(X) + Z(X) (+ (X) for measurement error)

– E.g., g(X) = x1 – 2x2 + 0.1x12 (Learned using common

techniques) – Z: Gaussian Process to capture regression residual

Model: Gaussian Process Representation (GRS) of a Response

Surface

Page 22: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Primer on Gaussian Process• Univariate Gaussian distribution

– G = N(,)– Described by mean , variance

• Multivariate Gaussian distribution

– [G1, G2, …, Gn]– Described by mean vector

and covariance matrix

• Gaussian Process– Generalizes multivariate

Gaussian to arbitrary number of dimensions

– Described by mean and covariance functions

Page 23: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

• If Z is a Gaussian process, then: [Z(X1),…,Z(Xn),Z(X)] is multivariate GaussianZ(X) | Z(X1),…,Z(Xn) is a univariate Gaussian

y(X) is a univariate Gaussian

• GRS captures the response surface as: y(X) = g(X) + Z(X) (+ (X) for measurement error)

Model: Gaussian Process Representation (GRS) of a Response

Surface

Page 24: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Parameters of the GRS Model

• [Z(X1),…,Z(Xn)] is multivariate Gaussian– Z(Xi) has zero mean

– Covariance(Z(Xi),Z(Xj)) / exp(k –k |xik – xjk|k)

• Residuals at nearby points have higher correlation

• k, ½k learned from <X1,y1>--<Xn,yn>

Page 25: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Use of the GRS Model• Recall our goals to compute

– EU(X) = sv=-1 (y* - v) Prob(y = v) dv

– Prob(y = v | <Xi,yi> for i=1,…,n)• Lemma: Using the GRS, we can compute the mean

(X) and variance 2(X) of the Gaussian y(X)• Theorem: EU(X) has a closed form that is a product

of:– Term that depends on (y* - (X))– Term that depends on (X)

• It follows that settings X with high EU are either:– Close to known good settings (for exploitation)– In highly uncertain regions (for exploration)

v=y*

Page 26: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Example

4 6 8 10 12

02

46

8

x1

y

4 6 8 10 12

02

46

8

x1

y

4 6 8 10 12

02

46

8

x1

y

• Settings X with high EU are either:– Close to known good settings (high y*-(X))– In highly uncertain regions (high (X))

EU(X)

y*

Unknown actual surface

(X)

4(X)

Page 27: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

TestData

Where to Conduct Experiments?

Data

DBMSProduction Platform

Data

DBMSStandby Platform

DBMS

Test Platform

Clients Clients Clients

Write Ahead Log (WAL) shipping

Middle Tier

Page 28: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

iTuned’s Solution

• Exploit underutilized resources with minimal impact on production workload

• DBA/User designates resources where experiments can be run– E.g., production/standby/test

• DBA/User specifies policies that dictate when experiments can be run– Separate regular use (home) from experiments

(garage)– Example: If CPU, mem, & disk utilization < 10% for

past 15 mins, then resource can be used for experiments

Page 29: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

One Implementation of Home/Garage

Standby Machine

Data

DBMS

Production Platform

Clients Clients Clients

Data

WAL shipping

Middle Tier

Interface

Engine

iTuned

Experiment Planner & Scheduler

Home

DBMSApply WAL

Home

DBMS

Apply WAL

Garage

DBMS

Workbench for experiments

Copy onWrite

Page 30: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Overheads are LowOperation in API Time (seconds) Description

Create Container 610 Create a new garage (one time

process)

Clone Container 17 Clone a garage from already existing one

Boot Container 19 Boot garage from halt state

Halt Container 2 Stop garage and release resources

Reboot Container 2 Reboot the garage

Snapshot-R DB (5GB, 20GB)

7, 11 Create read-only snapshot of the

database

Snapshot-RW DB (5GB, 20GB)

29, 62 Create read-write snapshot of database

Page 31: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Empirical Evaluation (1)• Cluster of machines with 2GHz processors

and 3GB memory• Two database systems: PostgreSQL &

MySQL• Various workloads

– OLAP: Mixes of heavy-weight TPC-H queries• Varying #queries, #query_types, and MPL• Scale factors 1 and 10

– OLTP: TPC-W and RUBiS

• Tuning of up to 30 configuration parameters

Page 32: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

• Techniques compared– Default parameter settings shipped (D)– Manual rule-based tuning (M)– Smart Hill Climbing (S): State-of-the-art

technique– Brute-Force search (B): Run many experiments

to find approximation to optimal setting– iTuned (I)

• Evaluation metrics– Quality: workload running time after tuning– Efficiency: time needed for tuning

Empirical Evaluation (2)

Page 33: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Comparison of Tuning Quality

Page 34: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

iTuned’s Scalability Features (1)• Identify important parameters quickly• Run experiments in parallel• Stop low-utility experiments early• Compress the workload• Work in progress:

– Apply database-specific knowledge– Incremental tuning– Interactive tuning

Page 35: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

iTuned’s Scalability Features (2)• Identify important parameters quickly

– Using sensitivity analysis with a few experiments

#Parameters = 9, #Experiments = 10

Page 36: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

iTuned’s Scalability Features (3)

Page 37: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Roadmap• Use cases of experiment-driven mgmt.

– Query tuning, benchmarking, Hadoop, testing, …

• iTuned: Tool for DB conf parameter tuning– End-to-end application of experiment-driven

mgmt.

• .eX: Language and run-time system that brings experiment-driven mgmt. to users & tuning tools

Page 38: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Back of the Envelope Calculation• DBAs cost $300/day; Consultants cost $100/hr• 1 Day of experiments gives a wealth of info.

– TPC-H, TPC-W, RUBiS workloads; 10-30 conf. params

• Cost of running these experiments for 1 day on Amazon Web Serv.– Server: $10/day– Storage: $0.4/day– I/O: $5/day– TOTAL: $15/day

Page 39: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

.eX: Power of Experiments to the People

• Users & tools express needs as scripts in eXL (eXperiment Language)

• .eX engine plans and conducts experiments on designated resources

• Intuitive visualization of resultsResources

eXL script

Language processor

Run-time engine

.eX

Page 40: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Current Focus of .eX • Parts of an eXL script

1. Query: (approx.) response surface mapping, search

2. Expt. setup & monitoring3. Constraints &

optimization: resources, cost, time

I1 I2 O1

… …

Are more experiments

needed?Process

output to extractinformation

Plannext set of

experimentsConduct

experiments onworkbench

Yes

Result

Automaticallygenerate the

experiment-drivenworkflow

Page 41: Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala.

Summary• Automated expt-driven mgmt: The time has

come– Need, infrastructure, & promise are all there

• We have built many tools around this paradigm– http://www.cs.duke.edu/~shivnath/dotex.html

• Poses interesting questions and challenges– Make it easy for users/admins to do expts– Make experiments first-class citizens in systems