A New Perspective on Experimental Analysis of N-tier ...Database Scalability, Multi-bottlenecks, and...

A New Perspective on Experimental Analysis of N-tier Systems: EvaluatingDatabase Scalability, Multi-bottlenecks, and Economical Operation

(Invited Paper)

Simon Malkowski∗, Markus Hedwig†, Deepal Jayasinghe∗, Junhee Park∗,Yasuhiko Kanemasa‡, and Calton Pu∗

∗CERCS, Georgia Institute of Technology, Atlanta, GA 30332-0765, USA{simon.malkowski, deepal, jhpark, calton}@cc.gatech.edu

†Chair of Information Systems Research, University of Freiburg, 79098 Freiburg, [email protected]

‡Fujitsu Laboratories Ltd., Kawasaki 211-8588, Japan - [email protected]

Abstract

Economical configuration planning, component perfor-mance evaluation, and analysis of bottleneck phenom-ena in N-tier applications are serious challenges dueto design requirements such as non-stationary work-loads, complex non-modular relationships, and globalconsistency management when replicating databaseservers, for instance. We have conducted an extensiveexperimental evaluation of N-tier applications, whichadopts a purely empirical approach the aforemen-tioned challenges, using the RUBBoS benchmark. Aspart of the analysis of our exceptionally rich dataset,we have experimentally investigated database serverscalability, bottleneck phenomena identification, anditerative data refinement for configuration planning.The experiments detailed in this paper are comprisedof a full scale-out mesh with up to nine databaseservers and three application servers. Additionally,the four-tier system was run in a variety of con-figurations, including two database management sys-tems (MySQL and PostgreSQL), two hardware nodetypes (normal and low-cost), two replication strate-gies (wait-all and wait-first—which approximates pri-mary/secondary), and two database replication tech-niques (C-JDBC and MySQL Cluster). Herein, wepresent an analysis survey of results mainly generatedwith a read/write mix pattern in the client emulator.

1. Introduction

Scaling N-tier applications in general and multi-tierede-commerce systems in particular are notoriouslychallenging. Countless requirements, such as non-

stationary workloads and non-modular dependencies,result in an inherently high degree of managementcomplexity. We have adopted a purely empirical ap-proach to this challenge through a large-scale experi-mental evaluation of scalability using an N-tier applica-tion benchmark (RUBBoS [1]). Our experiments coverscale-out scenarios with up to nine database serversand three application servers. The configurations werevaried using two relational database management sys-tems (MySQL and PostgreSQL), two database replica-tion techniques (C-JDBC and MySQL Cluster), wait-all (aka write-all) and wait-first (primary/secondary ap-proximation) replication strategies, browsing-only andread/write interaction mixes, workloads ranging from1,000 to 13,000 concurrent users, and two differenthardware node types.

The analysis of our extensive dataset produced sev-eral interesting findings. First, we documented de-tailed node-level bottleneck migration patterns amongthe database, application, and clustering middlewaretiers when workload and number of servers increasedgradually. These bottlenecks were correlated with theoverall system performance and used to explain theobserved characteristics. Second, the identification ofmulti-bottlenecks [2] showed the performance effectsof non-obvious phenomena that can arise in real sys-tems. Third, the initial economic evaluation of our databased on an iterative refinement process revealed theirusefulness in deriving economical configuration plans.

The main contribution of this paper is twofold. Froma technical perspective, we collected and evaluateda significant amount of data on scaling performancein N-tier applications. We present an evaluation sur-vey and illustrate various interesting findings that are

uniquely made possible through our empirical ap-proach. From a high-level perspective, we demonstratethe potential of automated experiment creation andmanagement systems for empirical analysis of largedistributed applications. Our results clearly show thatimportant domain insights can be obtained throughour approach. In contrast to traditional analytical ap-proaches, our methodology is not founded on rigidassumptions, which significantly reduces the risk ofoverlooking unexpected performance phenomena.

The remainder of this paper is structured as follows.In Section 2 we establish some background on infras-tructure and bottleneck definition. Section 3 outlinesexperimental setup and methods. Section 4 presentsthe analysis of our scale-out data. In Section 5 weintroduce the use-case of economical configurationplanning. Related work is summarized in Section 6,and Section 7 concludes the paper.

2. Background

This section provides previously published backgroundthat is of particular importance for the understanding ofour approach. Due to the space constraints of this papereach subsection is constrained to a brief overview.Interested readers should refer to the cited referencesfor more comprehensive descriptions.

2.1. Experimental Infrastructure

The empirical dataset that is used in this work is partof an ongoing effort for generation and analysis ofN-tier performance data. We have already run a veryhigh number of experiments over a wide range ofconfigurations and workloads in various environments.A typical experimentation cycle (i.e., code generation,system deployment, benchmarking, data analysis, andreconfiguration) requires thousands of lines of codethat need to be managed for each experiment. Theexperimental data output are system metric data points(i.e., network, disk, and CPU utilization) in addi-tion to higher-level metrics (e.g., response times andthroughput). The management and execution scriptscontain a high degree of similarity, but the differencesamong them are critical due to the dependencies amongthe varying parameters. Maintaining these scripts is anotoriously expensive and error-prone process whendone by hand.

In order to enable experimentation at this large scale,we used an experimental infrastructure created for theElba project to automate N-tier system configurationmanagement. The Elba approach [3] divides each au-tomated staging iteration into steps such as converting

Res

ou

rce

usa

ge

dep

end

ence!

dep

enden

t!

oscillatory!

bottlenecks!

not!

observed!

indep

enden

t!

concurrent!

bottlenecks!

simultaneous !

bottlenecks!

partially! fully!

Resource!

saturation frequency!

Figure 1. Simple multi-bottleneck classification [2].

policies into resource assignments [4], automated codegeneration [5], benchmark execution, and analysis ofresults.

2.2. Single-bottlenecks vs. Multi-bottlenecks

The abstract definition of a system bottleneck (orbottleneck for short) can be derived from its literalmeaning as the key limiting factor for achieving highersystem throughput. Due to this intuition, similar formu-lations have often served as foundation for bottleneckanalysis in computer systems. But despite their pop-ularity, these formulations are based on assumptionsthat do not necessarily hold in practice. Because ofqueuing theory, the term bottleneck is often used syn-onymously for single-bottleneck. In a single-bottleneckcases, the saturated resource typically exhibits a near-linearly load-dependent average resource utilizationthat saturates at one hundred percent past a certainworkload. However, the characteristics of bottlenecksmay change significantly if there is more than onebottleneck resource in the system. This is the casefor many real N-tier applications with heterogeneousworkloads. Therefore, we explicitly distinguish be-tween single-bottlenecks and the umbrella-term multi-bottlenecks [2], [6].

Because system resources may be causally depen-dent in their usage patterns, multi-bottlenecks intro-duce the classification dimension of resource usagedependence. Additionally, greater care has to be takenin classifying bottleneck resources according to theirresource saturation frequency. Resources may saturatefor the entire observation period (i.e., fully saturated)or for certain parts of the period (i.e., partially sat-urated). Figure 1 summarizes the classification thatforms the basis of the multi-bottleneck definition. Itdistinguishes between simultaneous, concurrent andoscillatory bottlenecks. In comparison to other bottle-

necks, resolving oscillatory bottlenecks may be verychallenging. Multiple resources form a combined bot-tleneck, and it can only be resolved in union. There-fore, the addition of resources does not necessarilyimprove performance in complex N-tier systems. Infact, determining regions of complex multi-bottlenecksthrough modeling may be an intractable problem.Consequently, multi-bottlenecks require measurement-based experimental approaches that do not oversim-plify system performance in their assumptions.

3. Experimental Setting

3.1. Benchmark Applications

Among N-tier application benchmarks, RUBBoS hasbeen used in numerous research efforts due to its realproduction system significance. Readers familiar withthis benchmark can skip to Table 1(a), which outlinesthe concrete choices of software components used inour experiments.

RUBBoS [1] is an N-tier e-commerce system mod-eled on bulletin board news sites similar to Slashdot.The benchmark can be implemented as three-tier (webserver, application server, and database server) or four-tier (addition of clustering middleware such as C-JDBC) system. The benchmark places high load onthe database tier. The workload consists of 24 differ-ent interactions (involving all tiers) such as registeruser, view story, and post comments. The benchmarkincludes two kinds of workload modes: browse-onlyand read/write interaction mixes. In this paper we useboth the browse-only and read/write workload for ourexperiments.

Typically, the performance of benchmark applicationsystems depends on a number of configurable settings(including software and hardware). To facilitate theinterpretation of experimental results, we chose con-figurations close to default values, if possible. Devia-tions from standard hardware or software settings arespelled out when used. Each experiment trial consistsof three periods: ramp-up, run, and ramp-down. Inour experiments, the trials consist of an 8-minuteramp-up, a 12-minute run period, and a 30-secondramp-down. Performance measurements (e.g., CPU ornetwork bandwidth utilization) are taken during therun period using Linux accounting log utilities (i.e.,Sysstat) with a granularity of one second.

3.2. Database Replication Techniques

In this subsection we briefly introduce the two differentdatabase replication techniques that we have used in

our experiments. Please refer to the cited sources fora comprehensive introduction.

C-JDBC [7], is an open source database clustermiddleware, which provides a Java application accessto a cluster of databases transparently through JDBC.The database can be distributed and replicated amongseveral nodes. C-JDBC balances the queries amongthese nodes. C-JDBC also handles node failures andprovides support for check-pointing and hot recovery.The C-JDBC server implements two update propaga-tion strategies for replicated databases: wait-all (writerequest is completed when all replicas respond) andwait-first (write request is considered completed uponthe first replica response). The wait-all strategy isequivalent to the commonly called write-all replicationstrategy, and the wait-first strategy is similar to theprimary-secondary replication strategy.

MySQL Cluster [8] is a real-time open source trans-actional database designed for fast, always-on access todata under high throughput conditions. MySQL Clusterutilizes a “shared nothing” architecture, which does notrequire any additional infrastructure and is designed toprovide five-nines data availability with no single pointof failure. In our experiment we used “in-memory”version of MySQL Cluster, but that can be configuredto use disk-based data as well. MySQL Cluster usesthe NDBCLUSTER storage engine to enable runningseveral nodes with MySQL servers in parallel.

3.3. Hardware Setup

The experiments used in this paper were run in the Em-ulab testbed [9] with two types of servers. Table 1(b)contains a summary of the hardware used in our ex-periments. Normal and low-cost nodes were connectedover 1,000 Mbps and 100 Mbps links, respectively.The experiments were carried out by allocating eachserver to a dedicated physical node. In the initial settingall components were “normal” hardware nodes. Asan alternative, database servers were also hosted onlow-cost machines. Such hardware typically entails acompromise between cost advantage and performanceloss, which may be hard to resolve without actualempirical data.

We use a four-digit notation #W/#A/#C/#D to denotethe number of web servers, application servers, clus-tering middleware servers, and database servers. Thedatabase management system type is either “M” or “P”for MySQL or PostgreSQL, respectively. If the servernode type is low-cost, the configuration is marked withan additional “L”. The notation is slightly differentfor MySQL Cluster. If MySQL Cluster is used, thethird number (i.e., “C”) denotes the number of MySQL

(a) Software setup.Function SoftwareWeb server Apache 2.0.54

Application server Apache Tomcat5.5.17

Cluster middleware C-JDBC 2.0.2

Database serverMySQL 5.0.51aPostgreSQL 8.3.1MySQL Cluster 6.2.15

Operating system Redhat FC4Kernel 2.6.12

System monitor Systat 7.0.2

(b) Hardware node setup.Type Components

Normal Processor Xeon 3GHz 64-bitMemory 2GBNetwork 6 x 1GbpsDisk 2 x 146GB 10,000rpm

Low-cost Processor PIII 600Mhz 32-bitMemory 256MBNetwork 5 x 100MbpsDisk 13GB 7,200rpm

(c) Sample C-JDBC topology(1/2/1/2L).

!"#$%

&"'("'%

)**$%

&"'("'&%

+,-&."'%

/011,"$

23'"%

45$%

&"'("'&%

(d) Detailed configuration settings.Apache Tomcat MySQL MySQL Cluster

KeepAlive Off maxThreads 330 (Total of all) storage engine MyISAM storage engine NDBCLUSTERStartServers 1 minSpareThreads 5 max connections 500 max connections 500MaxClients 300 maxSpareThreads 50 indexMemory 200MBMinSpareThreads 5 maxHeapSize 1,300MB PostgreSQL dataMemory 1,750MBMaxSpareThreads 50 max connections 150 numOfReplica 2/4ThreadsPerChild 150 C-JDBC shared buffers 24MB

minPoolSize 25 max fsm pages 153,600maxPoolSize 90 checkpoint segments 16initPoolSize 30LoadBalancer RAIDb-1-LPRFRAIDb-1Scheduler

Table 1. Details of the experimental setup on the Emulab cluster.

servers and the fourth number (i.e., “D”) denotes thenumber of data nodes. A sample topology of a C-JDBC experiment with one web server, two applicationservers, one clustering middleware server, two low-costdatabase servers, and non-specified database manage-ment system (i.e., 1/2/1/2L) is shown in Table 1(c).Unlike traditional system tuning work, we did notattempt to tune a large number settings to find “thebest a product can do”, rather to sustain for high work-loads we have made minimum changes as outlined inTable 1(d).

4. Scale-out Experiments

In this section we detail scale-out experiments in whichwe increase the number of database and applicationtier nodes gradually to find the bottlenecks at nodelevel (for each configuration). In Subsection 4.1 weshow experiments with MySQL running on normalservers. Subsection 4.2 describes experimental resultsfor PostgreSQL on normal servers. These experimentsshow both commonalities and differences between thetwo database management systems. In Subsection 4.3we discuss our experiment results running the same setof experiments with MySQL Cluster on normal nodes.Subsection 4.4 evaluates the change in system charac-teristics when using low-cost nodes for the deploymentof database servers.

4.1. MySQL on normal Servers

For this section we collected data from RUBBoS scale-out experiments with MySQL database servers, C-JDBC, and normal hardware. The experiments startfrom 1/1/1/1M and go to 1/3/1/9M. For each con-figuration, we increase the workload (i.e., number ofconcurrent users) from 1,000 up to 13,000 in steps of1,000. Due to the amount of collected data, we use twokinds of simplified graphs to highlight the bottlenecksand the shifting of bottlenecks in these experiments.Figure 2(a) shows only the highest achievable through-put (z-axis) for MySQL C-JDBC experiments withnormal nodes (read/write workload). Figure 3(a) showsthe corresponding bottleneck map. We can see threegroups in both figures. The first group consists ofconfigurations 1/2/1/1M and 1/3/1/1M, which haveidentical maximum throughput. The second group con-sists of all single application server configurations. Thethird group has the highest maximum throughput leveland consists of all other six configurations with at leasttwo application servers and two database servers.

Full data replication provides the best supportfor read-only workload, but it requires consistencymanagement with the introduction of updates in aread/write mixed workload. In this subsection we usethe C-JDBC clustering middleware to provide consis-tency management. Figures 2(a) and 2(b) show the

(a) C-JDBC MySQL “wait-all”. (b) C-JDBC MySQL “wait-first”. (c) C-JDBC PostgreSQL “wait-all”.

(d) C-JDBC PostgreSQL “wait-first”. (e) C-JDBC MySQL “wait-all”. (f) C-JDBC MySQL “wait-first”.

(g) MySQL Cluster with 2 data nodes. (h) MySQL Cluster with 4 data nodes.

Figure 2. Maximum system throughput for RUBBoS read/write workload with: (a)-(d) normal nodes; (e)-(f)low-cost nodes; and (g)-(h) normal nodes in the database tier.

maximum throughput achieved by the wait-all andwait-first strategies (see Subsection 3.2 for details),respectively. With respect to throughput, Figures 2(a)and 2(b) appear identical. This similarity is due tothe delegation of transactional support in MySQL tothe storage server, which happens to be MyISAM bydefault. MyISAM does not provide full transactionalsupport; specifically, it does not use Write AheadLogging (WAL) to write changes and commit recordto disk before returning a response. Since MySQLprocesses update transactions in memory if the serverhas sufficient hardware resources, the result is a verysmall difference between the wait-all and wait-firststrategies due the fast response. Both figures show thatthe three aforementioned groups. Consequently, theanalysis summarized in Figure 3(a) applies indepen-

dently of replication strategies. As indicated in the bot-tleneck map, all bottlenecks are CPU bottlenecks, andthe clustering middleware (CM) eventually becomesthe primary bottleneck. Consequently, increasing thenumber of database and application servers is effectiveonly in cases with relatively small numbers of servers.

4.2. PostgreSQL on normal Servers

In this subsection we ran similar experiments asin Subsection 4.1 on a different database manage-ment system, the PostgreSQL. The comparison of theread/write workloads in Figures 2(a), 2(b), 2(c), and2(d) reveals some differences between MySQL andPostgreSQL. While both MySQL configurations havesimilar throughput that peaks at around 1,500 opera-tions per second, the PostgreSQL configurations only

!"#$%&'()

! " #

!""#$%&

'(#$%&

!"#$$"%&'(&'

!")*"%&'(&'

$)#$%&

%*+,

!"#$

! "!"#$$"%&'(&'

!"+,-"./0&%

!"#$

! "

1")2 3")2

!"#$$"%&'(&'

!"+,-"./0&%

-.//.0

!"#$%&'()

! " #!"#$$"%&'(&'

!")*"%&'(&'

$123455.2/#

'(#6789:#

$%&

!"#$%&'()

! " #!"#$$"%&'(&'

!")*"%&'(&'

'(#$%&

;83700</15=#

'(#6789

%>+?%>+.

$123455.2/#

'(#6789

(a)

!"#$%&'()

! " #

!""#$%&

'(#$%&

!"#$$"%&'(&'

!")*"%&'(&'

$)#$%&

%*+,

!"#$

! "!"#$$"%&'(&'

!"+,-"./0&%

!"#$

! "

1")2 3")2

!"#$$"%&'(&'

!"+,-"./0&%

-.//.0

!"#$%&'()

! " #!"#$$"%&'(&'

!")*"%&'(&'

$123455.2/#

'(#6789:#

$%&

!"#$%&'()

! " #!"#$$"%&'(&'

!")*"%&'(&'

'(#$%&

;83700</15=#

'(#6789

%>+?%>+.

$123455.2/#

'(#6789

(b)!"#$%&'()

! " #

!""#$%&

'(#$%&

!"#$$"%&'(&'

!")*"%&'(&'

$)#$%&

%*+,

!"#$

! "!"#$$"%&'(&'

!"+,-"./0&%

!"#$

! "

1")2 3")2

!"#$$"%&'(&'

!"+,-"./0&%

-.//.0

!"#$%&'()

! " #!"#$$"%&'(&'

!")*"%&'(&'

$123455.2/#

'(#6789:#

$%&

!"#$%&'()

! " #!"#$$"%&'(&'

!")*"%&'(&'

'(#$%&

;83700</15=#

'(#6789

%>+?%>+.

$123455.2/#

'(#6789

(c)

!"#$%&'()

! " #

!""#$%&

'(#$%&

!"#$$"%&'(&'

!")*"%&'(&'

$)#$%&

%*+,

!"#$

! "!"#$$"%&'(&'

!"+,-"./0&%

!"#$

! "

1")2 3")2

!"#$$"%&'(&'

!"+,-"./0&%

-.//.0

!"#$%&'()

! " #!"#$$"%&'(&'

!")*"%&'(&'

$123455.2/#

'(#6789:#

$%&

!"#$%&'()

! " #!"#$$"%&'(&'

!")*"%&'(&'

'(#$%&

;83700</15=#

'(#6789

%>+?%>+.

$123455.2/#

'(#6789

(d)

!"#$%&'()

! " #

!""#$%&

'(#$%&

!"#$$"%&'(&'

!")*"%&'(&'

$)#$%&

%*+,

!"#$

! "!"#$$"%&'(&'

!"+,-"./0&%

!"#$

! "

1")2 3")2

!"#$$"%&'(&'

!"+,-"./0&%

-.//.0

!"#$%&'()

! " #!"#$$"%&'(&'

!")*"%&'(&'

$123455.2/#

'(#6789:#

$%&

!"#$%&'()

! " #!"#$$"%&'(&'

!")*"%&'(&'

'(#$%&

;83700</15=#

'(#6789

%>+?%>+.

$123455.2/#

'(#6789

(e)

Figure 3. MySQL Bottleneckmaps: (a) C-JDBC,normal nodes; (b) C-JDBC, low-cost nodes, wait-all; (c) C-JDBC, low-cost nodes, wait-first; (d)MySQL Cluster, normal nodes, 2 data nodes; (e)MySQL Cluster, normal nodes, 4 data nodes.

achieve two thirds of this throughput. Additionally, thewait-all strategy (Figure 2(c)) becomes a bottleneckfor PostgreSQL. However, it is not easy to identifythe underlying system bottleneck. Figure 4 shows po-tential resources that could be saturated in the 1/2/1/2Pconfiguration. All three CPU utilization densities shownormal-shaped densities that do not reach the criticalhigh percentile sector. From the throughput graphs, it isevident that the bottleneck cannot be resolved throughapplication or database server replication.

As we increase the number of servers, hardwareresources eventually becomes abundant. Through adetailed analysis, we found two explanations for thebottlenecks in the database and the clustering mid-dleware tiers. First, PostgreSQL uses WAL. Its logmanager waits for the completion of the commit recordwritten to disk. Because the log manager does notuse CPU cycles during the waiting time, the CPUbecomes idle. At the same time, the C-JDBC serverserializes all write requests by default to avoid oc-currence of database deadlocks. Consequently, Post-greSQL receives serialized write queries, which limitsthe amount of parallel processing in PostgreSQL andidle CPU during I/O waiting times. This explanationhas been tested by a separate experiment in which C-JDBC is removed, with a single PostgreSQL server,

which showed saturated database CPU consumption.

4.3. MySQL Cluster on Normal Servers

In order to explicitly investigate the effects of differentdatabase replication technologies, we have conducteda similar setup of experiments as discussed in Subsec-tion 4.1 with MySQL Cluster. First, we have run theexperiments without partitioning the database, whichimplies that we can have only two data nodes (DN) andone management node, while increasing the numberof MySQL servers (SN) from one to four and varyingapplication servers between one or two. The summaryof the results is contained in Figures 2(g) and 3(d).As illustrated in Figure 2(g), the maximum throughputthat can be achieved is seemingly low compared tothe same number of nodes with C-JDBC MySQL.The main reason is that that the MySQL data nodesbecome the bottleneck when increasing the workload(see Figure3(d)). Therefore, we have partitioned thedatabase into two parts using the MySQL Clusterdefault partitioning mechanism and repeated the setof experiments with four data nodes. The performancesummary is shown in Figures 2(h) and 3(e). The com-parison of Figures 2(g) and 2(h) reveals that increasingthe number of data nodes help us to increase themaximum throughput. However, as Figures 3(d) and3(e) illustrate splitting the database causes the bottle-necks to shift from the data nodes to the applicationservers initially. When the workload increases evenfurther, the bottlenecks shifted back to the data nodes.Consequently, our empirical analysis reveals that thesystem is not bottlenecked in the MySQL servers inthe case of read/write workload.

4.4. MySQL on Low-cost Servers

In the following we analyze data from RUBBoS scale-out experiments with C-JDBC and MySQL on low-cost hardware. Because of the limited memory on low-cost machines, the throughput graphs in Figures 2(e)and 2(f) are dominated by disk and CPU utilization inthe database tier. Initially, in all configurations witha single database server, the maximum throughputis limited by a shifting bottleneck between databasedisk and CPU. Figure 5 shows the correspondingdatabase server CPU and disk utilization densities inthe 1/2/1/1ML configuration. Note that no replicationstrategy is necessary in the single database server case.Both the CPU (Figure 5(a)) and the disk (Figure 5(b))densities have two characteristic modes for higherworkloads, which correspond to a constant shiftingbetween resource saturation and underutilization. If

025

5075

100

30006000

900012000

0.03

0.06

0.09

0.12

0.15

CPU util [%]Workload [#]

Prob

abili

ty d

ensi

ty

(a) First application server.

025

5075

100

30006000

900012000

0.03

0.06

0.09

0.12

0.15


Prob

abili

ty d

ensi

ty

(b) Cluster middleware.

025

5075

100

30006000

900012000

0.03

0.06

0.09

0.12

0.15


Prob

abili

ty d

ensi

ty

(c) First database server.

Figure 4. PostgreSQL with C-JDBC on normal servers densities with read/write workload, 1/2/1/2configuration, and wait-all replication strategy.

combined into a single maximum utilization valuein Figure 5(c), the density profile shows a stablesaturation behavior. Both Figure 2(e) and Figure 2(f)show that this bottleneck can be resolved by replicat-ing the database node, which increases the maximumthroughput.

The single disk bottleneck becomes an oscillatorybottleneck for more than one database server withthe wait-all replication strategy because write querieshave to await the response of all MySQL servers(Figure 3(b)). An oscillatory bottleneck means that atany arbitrary point in time (usually) only one disk issaturated, and that the bottleneck location changes con-stantly. Once a third database server has been added,the CPU bottleneck is resolved completely, and themaximum throughput remains invariant to any furtherhardware additions. Although the saturated resources(i.e. database disks) are located in the database tier,the C-JDBC configuration prohibits the increase of

maximum throughput through additional hardware inthe backend. Figure 6(b) shows that in the case ofnine database nodes, the bottleneck has been furtherdistributed among the database disks. Solely a slightmode at the high percentile values indicates an in-frequent saturation. In order to visualize the overallresource saturation, Figure 6(c) shows the density forthe maximum utilization of all nine database servers.This density has a large peak at the resource saturationsector (similar to Figure 5(c)), which is interpretableas a high probability of at least one saturated databasedisk at any arbitrary point during the experiment. Suchbottlenecks are particularly hard to detect with averageutilization values. Their cause is twofold—on the onehand, all write queries are dependent on all databasenodes, and on the other, the load-balancer successfullydistributes the load in case one database server isexecuting a particularly long query.

In contrast to the wait-all case, the wait-first repli-

025

5075

100

30006000

900012000

0.02

0.04

0.06

0.08


Prob

abili

ty d

ensi

ty

(a) Database server CPU.

025

5075

100

30006000

900012000

0.02

0.04

0.06

0.08

Disk BW util [%]Workload [#]

Prob

abili

ty d

ensi

ty

(b) Database server disk.

025

5075

100

30006000

900012000

0.02

0.04

0.06

0.08

Max util [%]Workload [#]

Prob

abili

ty d

ensi

ty

(c) Max of disk and CPU utilization.

Figure 5. Database server densities for read/write workload, C-JDBC, wait-all replication strategy, and1/2/1/1ML.

025

5075

100

30006000

900012000

0.03

0.06

0.09

0.12

0.15


Prob

abili

ty d

ensi

ty

(a) First database server disk with wait-firstreplication strategy.

025

5075

100

30006000

900012000

0.02

0.04

0.06

0.08


Prob

abili

ty d

ensi

ty

(b) First database server disk with wait-allreplication strategy.

025

5075

100

30006000

900012000

0.02

0.04

0.06

0.08

Max util [%]Workload [#]

Prob

abili

ty d

ensi

ty

(c) Maximum disk utilization among alldatabases with wait-all replication strategy

Figure 6. Database server densities for read/write workload for 1/2/1/9ML, C-JDBC with wait-first replicationstrategy and wait-all replication strategy [2].

cation strategy enables growing maximum throughputwhen scaling out the database server (Figure 2(f)). In-stead of oscillatory disk bottlenecks, the configurationswith more than one database server have concurrentbottlenecks in the database disks (Figure 3(c)). Afterthe database server has been replicated five timesthe database disks are the only remaining bottleneck.The difference between oscillatory and concurrentbottlenecks is clearly visible in the comparison ofFigures 6(a) and 6(b). Both show the 1/2/1/9ML con-figuration for read/write workload. While the wait-first replication strategy (Figure 6(a)) results in aclearly saturated resource, the wait-all replication strat-egy (Figure 6(b)) results in an infrequently utilizeddatabase disk. The actual system bottleneck only be-comes apparent in conjunction with all other databasedisks (Figure 6(c)). Due to overhead, the maximumthroughput growth diminishes once very high databasereplication states are reached. In comparison to thenormal hardware results, the low-cost configurationsreach an overall lower throughput level for read/writeworkloads. Please refer to our previous work [2] for adetailed bottleneck analysis of this particular interest-ing scenario.

5. Economical Configuration Planning

While the preceding section emphasized the potentialand complexity of empirical analysis of N-tier systems,this section presents a specific use-case. We show howthese data may be employed in economically motivatedconfiguration planning. In this scenario the empiricalobserved performance is transformed into economickey figures. This economic view on the problem isof particular interest given the new emerging trend

of cloud computing that enables flexible resourcemanagement and offers transparent cost models. Thecloud paradigm allows the adaption of infrastructureaccording to demand [10] and determining cost optimalinfrastructure sizes for enterprise systems [11]. How-ever, taking full advantage of cloud computing requiresan even deeper understanding of the performancebehavior of enterprise systems and its economicalimplications.

Any economic assessment requires at least a basiccost model and a simple revenue model. Our samplecost model assumes a constant charged for each serverhour. The revenue model is a simple implementation ofan SLA. For every successfully processed request, theprovider receives a constant revenue. For every unsuc-cessful request, the provider pays a penalty, wherebya successful request is defined as a request with aresponse time lower than a certain threshold. The totalprofit is defined as the total revenue minus the cost ofoperation.

Figure 7 shows the transformation of the empiricaldata to aggregated economic values. Instead of thenumber of machines in the infrastructure, the graphshows the cost incurred by operating a particularinfrastructure. The z-axis shows the expected revenuewhen operating a particular infrastructure size at anygiven workload (y-axis). The data are taken from thepreviously introduced RUBBoS with MySQL Clustersetup with browse-only workload. The graph clearlyshows the complex relationship between infrastructuresize, system performance, and expected revenue. Basedon a similar data aggregation, system operators are ableto derive the optimal infrastructure size constrained bytheir system and profit model.

Figure 8 shows a higher aggregation of the previous

Figure 7. Revenue and cost analysis of a RUBBoSsystem under read/write workload.

graph. The dashed and the dotted line represent therevenue and the cost for the profit optimal configura-tion, respectively. The solid line shows the resultingmaximal profit for any configuration. The profit max-imum lays at a workload of 3,000 users, whereas themaximum revenue is located at 4,000 users. However,the optimal points of operation strongly depend on thedefinition of the SLA model as well as on the costof the infrastructure. Therefore, depending on thesefactors, the economical size of the infrastructure mayvary strongly.

6. Related Work

Cloud computing, data center provisioning, server con-solidation, and virtualization have become ubiquitousterminology in the times of ever-growing complexityin large-scale computer systems. However, many hardproblems remain to be solved on the way to anoptimal balance between performance and availability.Specifically, scaling systems along these two axis isparticularly difficult with regard to databases wherereplication has been established as the common mech-anism of choice. In fact, replication consistently fallsshort of real world user needs, which leads to continu-ously emerging new solutions and real world databaseclusters to be small (i.e., less than 5 replicas) [12].In this paper we address this shortcoming explicitly.A central evaluation problem is that scalability studiestraditionally only address scaled load scenarios to de-termine best achievable performance (e.g., [13]). Con-cretely, experimentation methodologies with parallelscaling of load and resources mask system overhead inrealistic production system conditions, which are typ-ically over-provisioned. Consequently, a reliable bodyof knowledge on the aspects of management, capacityplanning, and availability is one of the limiting factors

Figure 8. Profit and cost analysis of a RUBBoSsystem under read/write workload.

for solution impact in the real world [12]. Moreover,it is well understood that transactional replication canexhibit unstable scale-up behavior [14] and it is notsufficient to characterize replication performance interms of peak throughput [12].

There exist many popular solutions to database repli-cation, and the most common commercial techniqueis master-slave replication. Some replication productexamples are IBM DB2 DataPropagator, replicationin Microsoft SQL Server, Sybase Replication Server,Oracle Streams, replication in MySQL. Academic pro-totypes that use master-slave replication are Ganymed[15] and Slony-I [16]. The latter is a popular replica-tion system supporting cascading and failover. How-ever, in this work we focus on multi-master architec-tures. Some commercial products that use this tech-nique are MySQL Cluster and DB2 Integrated Cluster.Academic multi-master prototypes are C-JDBC [7],which is used in this work, Tashkent [17], and Middle-R [18]. Further efforts focus on adaptive middlewareto achieve performance adaptation in case of workloadvariations [19].

The evaluation of database replication techniquesoften falls short of real representability. There are twocommon approaches to evaluation with benchmarks.Performance is either tested through microbenchmarks[18] or through web-based distributed benchmark ap-plications such as TPC-W, RUBiS, and RUBBoS [20].These benchmark applications are modeled on realapplications (e.g., Slashdot or Ebay.com), offeringreal production system significance. Therefore thisapproach has found a growing group of advocates (e.g.,[3], [7], [13], [15], [17]).

Most recent literature on the topic of database repli-cation deals with specific implementation aspects suchas transparent techniques for scaling dynamic contentweb sites [21], and the experimental evaluation in such

works remains narrow in focus and is solely used toconfirm expected properties.

7. Conclusion

In this paper we used an empirically motivated ap-proach to automated experiment management. Whichfeatures automatically generated scripts for runningexperiments and collecting measurement data, to studythe RUBBoS N-tier application benchmark in an ex-tensive configuration space. Our analysis producedvarious interesting findings such as detailed node-levelbottleneck migration maps and insights on genera-tion of economical configuration plans. Furthermore,our analysis showed cases with non-obvious multi-bottleneck phenomena that entail very characteristicperformance implications.

This paper corroborates the increasing recognition ofempirical analysis work in the domain of N-tier sys-tems. Findings such as multi-bottlenecks demonstratehow actual experimentation is able to complementanalytical system evaluation and provide higher levelsof confidence in analysis results. More generally, ourresults suggest that further work on both the infras-tructural as well as the analysis part of our approachmay lead to a new perspective on analysis of N-tiersystems with previously unseen domain insight.

Acknowledgment

This research has been partially funded by Na-tional Science Foundation grants ENG/EEC-0335622,CISE/CNS-0646430, CISE/CNS-0716484, AFOSRgrant FA9550-06-1-0201, NIH grant U54 RR 024380-01, IBM, Hewlett-Packard, Wipro Technologies, andGeorgia Tech Foundation through the John P. Imlay,Jr. Chair endowment. Any opinions, findings, and con-clusions or recommendations expressed in this materialare those of the author(s) and do not necessarily reflectthe views of the National Science Foundation or otherfunding agencies and companies mentioned above.

References

[1] “RUBBoS: Bulletin board benchmark,”http://jmob.objectweb.org/rubbos.html.

[2] S. Malkowski, M. Hedwig, and C. Pu, “Experimentalevaluation of n-tier systems: Observation and analysisof multi-bottlenecks,” in IISWC ’09.

[3] C. Pu, A. Sahai, J. Parekh, G. Jung, J. Bae, Y.-K. Cha, T. Garcia, D. Irani, J. Lee, and Q. Lin,“An observation-based approach to performancecharacterization of distributed n-tier applications,” inIISWC ’07, September 2007.

[4] A. Sahai, S. Singhal, V. Machiraju, and R. Joshi,“Automated generation of resource configurationsthrough policies,” in in Proceedings of the IEEE 5th International Workshop on Policies for DistributedSystems and Networks, 2004, pp. 7–9.

[5] G. Jung, C. Pu, and G. Swint, “Mulini: an automatedstaging framework for qos of distributed multi-tierapplications,” in WRASQ ’07.

[6] S. Malkowski, M. Hedwig, J. Parekh, C. Pu, andA. Sahai, “Bottleneck detection using statisticalintervention analysis,” in DSOM ’07.

[7] E. Cecchet, J. Marguerite, and W. Zwaenepoel,“C-jdbc: Flexible database clustering middleware,” inUSENIX ’04.

[8] “Mysql Cluster,” http://www.mysql.com/products/database/cluster/.

[9] “Emulab - Network Emulation Testbed,”http://www.emulab.net.

[10] M. Hedwig, S. Malkowski, and D. Neumann, “Tamingenergy costs of large enterprise systems throughadaptive provisioning,” in ICIS ’09.

[11] M. Hedwig, S. Malkowski, C. Bodenstein, andD. Neumann, “Datacenter investment support system(DAISY),” in HICSS ’10.

[12] E. Cecchet, G. Candea, and A. Ailamaki,“Middleware-based Database Replication: The GapsBetween Theory and Practice,” in SIGMOD ’08.

[13] E. Cecchet, J. Marguerite, and W. Zwaenepoel,“Partial replication: Achieving scalability in redundantarrays of inexpensive databases,” Lecture Notes inComputer Science, vol. 3144, pp. 58–70, July 2004.

[14] J. Gray, P. Helland, P. O’Neil, and D. Shasha, “Thedangers of replication and a solution,” in SIGMOD ’96.

[15] C. Plattner and G. Alonso, “Ganymed: scalablereplication for transactional web applications,” inMiddleware ’04.

[16] “Slony-I,” http://www.slony.info.[17] S. Elnikety, S. Dropsho, and F. Pedone, “Tashkent:

uniting durability with transaction ordering forhigh-performance scalable database replication,”SIGOPS Oper. Syst. Rev., vol. 40, no. 4, pp. 117–130,2006.

[18] M. Patino-Martinez, R. Jimenez-Peris, B. Kemme, andG. Alonso, “Middle-r: Consistent database replicationat the middleware level,” ACM Trans. Comput. Syst.,vol. 23, no. 4, pp. 375–423, 2005.

[19] J. M. Milan-Franco, R. Jimenez-Peris, M. Patino-Martinez, and B. Kemme, “Adaptive middleware fordata replication,” in Middleware ’04.

[20] C. Amza, A. Ch, A. L. Cox, S. Elnikety, R. Gil,K. Rajamani, and W. Zwaenepoel, “Specification andimplementation of dynamic web site benchmarks,” in5th IEEE Workshop on Workload Characterization,2002.

[21] C. Amza, A. L. Cox, and W. Zwaenepoel,“A comparative evaluation of transparent scalingtechniques for dynamic content servers,” in ICDE ’05.

A New Perspective on Experimental Analysis of N-tier ...Database Scalability, Multi-bottlenecks, and...

Documents

Transcript of A New Perspective on Experimental Analysis of N-tier ...Database Scalability, Multi-bottlenecks, and...