Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan,...
-
Upload
mateo-briley -
Category
Documents
-
view
215 -
download
1
Transcript of Experiment-driven System Management Shivnath Babu Duke University Joint work with Songyun Duan,...
Experiment-driven System Management
Shivnath BabuShivnath BabuDuke UniversityDuke University
Joint work with Songyun Duan, Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Herodotos Herodotou, and Vamsidhar
ThummalaThummala
Managing DBs in Small to Medium Business Enterprises (SMBs)
• Peter is a system admin in an SMB– Manages the database
(DB)– SMB cannot afford a DBA
• Suppose Peter has to tune a poorly-performing DB– Design advisor may not
help– Maybe the problem is with
DB configuration parameters
Database (DB)
Tuning DB Configuration Parameters
• Parameters that control – Memory distribution– I/O optimization– Parallelism– Optimizer’s cost model
• Number of parameters ~ 100– 15-25 critical params depending on OLAP Vs.
OLTP
• Few holistic parameter tuning tools available – Peter may have to resort to 1000+ page tuning
manuals or rules of thumb from experts– Can be a frustrating experience
Response Surfaces • TPC-H 4 GB DB size, 1 GB memory, Query 18
2-dim Projection of a 11-dim Surface
DBA’s Approach to Parameter Tuning
• DBAs run experiments– Here, an experiment is a run of the DB workload
with a specific parameter configuration– Common strategy: vary one DB parameter at a
time
Experiment-driven Management
Are more experiments
needed?
Process output to extract
information
Plannext set of
experiments
Conductexperiments on
workbench
Yes
Mgmt. taskResult
Goal: Automate this process
Roadmap• Use cases of experiment-driven mgmt.
– Query tuning, benchmarking, Hadoop, testing, …
• iTuned: Tool for DB conf parameter tuning– End-to-end application of experiment-driven
mgmt.
• .eX: Language and run-time system that brings experiment-driven mgmt. to users & tuning tools
What is an Experiment?
• Depends on the management task– Pay some extra cost, get new information in return– Even for a specific management task, there can be
spectrum of possible experiments
Uses of Experiment-driven Mgmt.• DB conf
parameter tuning
Uses of Experiment-driven Mgmt.• DB conf
parameter tuning
• MapReduce job tuning in Hadoop
Uses of Experiment-driven Mgmt.• DB conf
parameter tuning
• MapReduce job tuning in Hadoop
• Server benchmarking– Capacity
planning– Cost/perf
modeling
<100, 187>
<100, 187>
<100, 436>
<1, 0.6><2473, 7496>
<380459, 229739> <65, 309>
<65, 309>
<1, 1>
<1, 1>
<1629, 1615>
<Estimated, Actual> Cardinality
<2473, 7496>
Uses of Experiment-driven Mgmt.• Tuning
“problem queries”
Uses of Experiment-driven Mgmt.• Tuning
“problem queries”
Uses of Experiment-driven Mgmt.
• Tuning “problem queries”
• Troubleshooting• Testing• Canary in the
server farm (James Hamilton, Amazon)
• …
• DB conf parameter tuning
• MapReduce job tuning in Hadoop
• Server benchmarking– Capacity planning– Cost/perf modeling
Roadmap• Use cases of experiment-driven mgmt.
– Query tuning, benchmarking, Hadoop, testing, …
• iTuned: Tool for DB conf parameter tuning– End-to-end application of experiment-driven
mgmt.
• .eX: Language and run-time system that brings experiment-driven mgmt. to users & tuning tools
Problem Abstraction• Unknown response surface: y = F(X)
– X = Parameters x1, x2, …, xm
• Each experiment gives a <Xi,yi> sample– Set DB to conf Xi– Run workload that needs tuning– Measure performance yi at Xi
• Goal: Find high performance setting with low total cost of running experiments
4 6 8 10 12
02
46
8
x1
y
4 6 8 10 12
02
46
8
x1
y
Example
• Goal: Compute the potential utility of candidate experiments
Utility(X)Where to do thenext experiment?
// Phase I: Bootstrapping– Conduct some initial experiments// Phase II: Sequential Sampling– Loop: Until stopping condition is reached
1. Identify candidate experiments to do next2. Based on current samples, estimate the
utility of each candidate experiment3. Conduct the next experiment at the
candidate with highest utility
iTuned’s Adaptive Sampling Algorithm for Experiment Planning
Utility of an Experiment• Let <X1,y1>--<Xn,yn> be the samples from n
experiments done so far• Let <X*,y*> be the best setting so far (i.e., y* =
mini yi)
– wlg assuming minimization• U(X), Utility of experiment at X is
// y = F(X)– y* - y if y* > y– 0 otherwise
• However, U(X) poses a chicken-and-egg problem– y will be known only after experiment is run at X
• Goal: Compute expected utility EU(X)
Expected Utility of an Experiment• Suppose we have the probability density function
of y (y is the perf at X)– Prob(y = v | <Xi,yi> for i=1,…,n)
• Then, EU(X) = sv=-1 U(X) Prob(y = v) dv
EU(X) = sv=-1 (y* - v) Prob(y = v) dv
• Goal: Compute Prob(y = v | <Xi,yi> for i=1,…,n)
v=+1
v=y*
• GRS models the response surface as: y(X) = g(X) + Z(X) (+ (X) for measurement error)
– E.g., g(X) = x1 – 2x2 + 0.1x12 (Learned using common
techniques) – Z: Gaussian Process to capture regression residual
Model: Gaussian Process Representation (GRS) of a Response
Surface
Primer on Gaussian Process• Univariate Gaussian distribution
– G = N(,)– Described by mean , variance
• Multivariate Gaussian distribution
– [G1, G2, …, Gn]– Described by mean vector
and covariance matrix
• Gaussian Process– Generalizes multivariate
Gaussian to arbitrary number of dimensions
– Described by mean and covariance functions
• If Z is a Gaussian process, then: [Z(X1),…,Z(Xn),Z(X)] is multivariate GaussianZ(X) | Z(X1),…,Z(Xn) is a univariate Gaussian
y(X) is a univariate Gaussian
• GRS captures the response surface as: y(X) = g(X) + Z(X) (+ (X) for measurement error)
Model: Gaussian Process Representation (GRS) of a Response
Surface
Parameters of the GRS Model
• [Z(X1),…,Z(Xn)] is multivariate Gaussian– Z(Xi) has zero mean
– Covariance(Z(Xi),Z(Xj)) / exp(k –k |xik – xjk|k)
• Residuals at nearby points have higher correlation
• k, ½k learned from <X1,y1>--<Xn,yn>
Use of the GRS Model• Recall our goals to compute
– EU(X) = sv=-1 (y* - v) Prob(y = v) dv
– Prob(y = v | <Xi,yi> for i=1,…,n)• Lemma: Using the GRS, we can compute the mean
(X) and variance 2(X) of the Gaussian y(X)• Theorem: EU(X) has a closed form that is a product
of:– Term that depends on (y* - (X))– Term that depends on (X)
• It follows that settings X with high EU are either:– Close to known good settings (for exploitation)– In highly uncertain regions (for exploration)
v=y*
Example
4 6 8 10 12
02
46
8
x1
y
4 6 8 10 12
02
46
8
x1
y
4 6 8 10 12
02
46
8
x1
y
• Settings X with high EU are either:– Close to known good settings (high y*-(X))– In highly uncertain regions (high (X))
EU(X)
y*
Unknown actual surface
(X)
4(X)
TestData
Where to Conduct Experiments?
Data
DBMSProduction Platform
Data
DBMSStandby Platform
DBMS
Test Platform
Clients Clients Clients
Write Ahead Log (WAL) shipping
Middle Tier
iTuned’s Solution
• Exploit underutilized resources with minimal impact on production workload
• DBA/User designates resources where experiments can be run– E.g., production/standby/test
• DBA/User specifies policies that dictate when experiments can be run– Separate regular use (home) from experiments
(garage)– Example: If CPU, mem, & disk utilization < 10% for
past 15 mins, then resource can be used for experiments
One Implementation of Home/Garage
Standby Machine
Data
DBMS
Production Platform
Clients Clients Clients
Data
WAL shipping
Middle Tier
Interface
Engine
iTuned
Experiment Planner & Scheduler
Home
DBMSApply WAL
Home
DBMS
Apply WAL
Garage
DBMS
Workbench for experiments
Copy onWrite
Overheads are LowOperation in API Time (seconds) Description
Create Container 610 Create a new garage (one time
process)
Clone Container 17 Clone a garage from already existing one
Boot Container 19 Boot garage from halt state
Halt Container 2 Stop garage and release resources
Reboot Container 2 Reboot the garage
Snapshot-R DB (5GB, 20GB)
7, 11 Create read-only snapshot of the
database
Snapshot-RW DB (5GB, 20GB)
29, 62 Create read-write snapshot of database
Empirical Evaluation (1)• Cluster of machines with 2GHz processors
and 3GB memory• Two database systems: PostgreSQL &
MySQL• Various workloads
– OLAP: Mixes of heavy-weight TPC-H queries• Varying #queries, #query_types, and MPL• Scale factors 1 and 10
– OLTP: TPC-W and RUBiS
• Tuning of up to 30 configuration parameters
• Techniques compared– Default parameter settings shipped (D)– Manual rule-based tuning (M)– Smart Hill Climbing (S): State-of-the-art
technique– Brute-Force search (B): Run many experiments
to find approximation to optimal setting– iTuned (I)
• Evaluation metrics– Quality: workload running time after tuning– Efficiency: time needed for tuning
Empirical Evaluation (2)
Comparison of Tuning Quality
iTuned’s Scalability Features (1)• Identify important parameters quickly• Run experiments in parallel• Stop low-utility experiments early• Compress the workload• Work in progress:
– Apply database-specific knowledge– Incremental tuning– Interactive tuning
iTuned’s Scalability Features (2)• Identify important parameters quickly
– Using sensitivity analysis with a few experiments
#Parameters = 9, #Experiments = 10
iTuned’s Scalability Features (3)
Roadmap• Use cases of experiment-driven mgmt.
– Query tuning, benchmarking, Hadoop, testing, …
• iTuned: Tool for DB conf parameter tuning– End-to-end application of experiment-driven
mgmt.
• .eX: Language and run-time system that brings experiment-driven mgmt. to users & tuning tools
Back of the Envelope Calculation• DBAs cost $300/day; Consultants cost $100/hr• 1 Day of experiments gives a wealth of info.
– TPC-H, TPC-W, RUBiS workloads; 10-30 conf. params
• Cost of running these experiments for 1 day on Amazon Web Serv.– Server: $10/day– Storage: $0.4/day– I/O: $5/day– TOTAL: $15/day
.eX: Power of Experiments to the People
• Users & tools express needs as scripts in eXL (eXperiment Language)
• .eX engine plans and conducts experiments on designated resources
• Intuitive visualization of resultsResources
eXL script
Language processor
Run-time engine
.eX
Current Focus of .eX • Parts of an eXL script
1. Query: (approx.) response surface mapping, search
2. Expt. setup & monitoring3. Constraints &
optimization: resources, cost, time
I1 I2 O1
… …
Are more experiments
needed?Process
output to extractinformation
Plannext set of
experimentsConduct
experiments onworkbench
Yes
Result
Automaticallygenerate the
experiment-drivenworkflow
Summary• Automated expt-driven mgmt: The time has
come– Need, infrastructure, & promise are all there
• We have built many tools around this paradigm– http://www.cs.duke.edu/~shivnath/dotex.html
• Poses interesting questions and challenges– Make it easy for users/admins to do expts– Make experiments first-class citizens in systems