Guerrilla Capacity Assessment using USL A Case · PDF fileGuerrilla Capacity Assessment using...
Transcript of Guerrilla Capacity Assessment using USL A Case · PDF fileGuerrilla Capacity Assessment using...
Computer Measurement Group, India 1
Guerrilla Capacity Assessment using USL
A Case Study
Prajakta Bhatt, Infosys December 2013
www.cmgindia.org
Computer Measurement Group, India 2
Contents
• Typical Capacity Assessment Approach • Its Challenges – Need of USL • Understand Scalability Models • Answer WH Questions (What, Where, How, When etc.) on USL • Assumptions & Constraints of USL • Detailed case study of capacity assessment of on real production
DB server (Oracle based)
Computer Measurement Group, India 3
Introduction
• Key is Effective Capacity Planning • It is Science and Art that predicts resources (s/w, h/w,
connection infrastructure) required to handle additional loads optimally in future. Thus it helps to: • Reduce cost (by avoiding over-sizing) • Improve Productivity (by avoiding under-sizing)
Can my app scale well to take 2x existing load?
At what time I need to upgrade my
hardware?
Will this new CR interfere with existing app
performance?
Computer Measurement Group, India 4
Traditional Capacity Assessment Approach
• Growth projections from Business
• Data from Prod Logs
Gather NFRs
• Collect Throughput, Utilization, Service Demand data from Load Tests/ Prod logs using regression Analysis techniques
Test Scalability
• Build Analytical or Simulation Models using Service Demands
Build Capacity Model
• Extrapolate model
• Predict capacity requirement for critical resources
Predict Capacity
• Challenges in Traditional Capacity Modeling: –Predict Capacity assuming
theoretical linear scalability –Use complex statistical
algorithms (like Non-Linear Least Squares Regression)
–Unable to predict response times
• Any Alternatives? USL : Universal Scalability
Law
Computer Measurement Group, India 5
What is USL?
• Universal Scalability Law (USL), quantifies scalability for an application setup as a whole (both hardware and software together)
• Helps depicting scalability behavior in systems realistically • Other Features:
Universal in Nature
• H/w - disk arrays, SANs, CPUs, multicores
• S/W – Virtual Users, Unix Processes, POSIX threads
• Certain N/W IO Types
Simple to Implement
• No time consuming computation of Service Demand
Fairly accurate
Predictions
• System Throughput
• System Utilization
• Concurrency at maximum Throughput
• Transaction Response times
Computer Measurement Group, India 6
Understanding Scalability Models – Perfect Scalability
• Ideally, Scalability should be perfectly linear.
• If there are N processors and X(N) is the load handled by N processors.
If for N = 1, X(N) = 5, Then with N = 10, desired X(N) = 50
• Hence, the capacity model depicting perfect linear scalability is:
Computer Measurement Group, India 7
Explaining non-linearity in graphs
• Based on measurements on real multi-processor systems we know that scalability is non-linear.
If for N = 1, X(N) = 5 Then at N = 10, X(N) < 50
• In 1967, Gene Amdahl, recognized first this theoretical Linearity cannot be achieved because certain portions of the workload that can only be executed sequentially and accounted Contention
factor α for it.
E.g. When there are N processes, each of them competes for shared resources resulting into Contention at various layers e.g. Read/Write Lock Contention, Bus Contention etc.
Computer Measurement Group, India 8
Amdahl’s law implications
• Used in parallel computing to predict theoretical maximum speedup using multiple processors, given as:
Speedup, S(N) = 𝑵
𝟏+𝛂(𝐍−𝟏)
Speedup, S(N) can be defined as the ratio of time taken for serial execution to time taken for parallel execution.
α is the degree of contention or the part of task that cannot be parallelized
• Often used to find the maximum expected improvement to an overall system when only part of the system is improved.
E.g. Assume A & B are independent parts of a work task taking 75% & 25% of execution time.
Make B 5x faster, S(5) = 5
1+(0.75∗(4)) = 1.25 ; Make A 2x faster, S(2) =
2
1+(0.25∗(1)) = 1.6
Though, B’s speed-up is greater by ratio (5x), better optimization is achieved by tuning A!
Computer Measurement Group, India 9
Introducing Scale-up in Amdahl’s law
• Amdahl’s law derives speedup achieved by executing task in a multi-processor environment
• In real-world, by adding more processors we try to get actually more work done instead of reducing response time i.e. achieve more throughput keeping response times reasonably constant, this is termed as Scaleup.
• Hence we can apply the same Amdahl’s law to scale-up as well:
Scaleup, C(N) = 𝑁
1+𝛼(𝑁−1)
• However in real-world we see the load does not remain constant, after some point of time, but reduces or moreover gets unpredictable.
Computer Measurement Group, India 10
Scalability – Effect of Coherence
• In 1993, Dr. Neil Gunther defined Universal Scalability law that quantified scalability quite closely to realistic systems.
• In addition to contention α (e.g. queuing for shared resources) addressed by Amdahl’s law, USL accounted for coherence factor β (latency for shared data to become consistent) to account for this non-linearity trend.
• USL puts forward the Scaleup by accounting for both contention and coherence as:
Scaleup, C(N) = 𝑁
1+𝛼 𝑁−1 + 𝛽𝑁(𝑁−1)
Computer Measurement Group, India 11
Contention Vs. Coherency
Parameter Contention (α) Coherency (β)
Meaning Degree of contention because of shared data
Penalty incurred for maintaining consistency of shared data
Example in DBMS
When one user process has to wait in queue to get access to Table Row (get DB row lock).
Now even if the user process gets DB row lock, it cannot directly update the table, as it may have to look whether the data in its cache is stale, if yes it has to wait for its local data instance to be consistent with latest copy of cache from other CPU and then only update. This additional processing is Coherency delay.
Root cause Part of program being serial in nature (that cannot be parallelized).
Caused by inter-process communication, and increases in proportion to the square of concurrency.
Dependent Factor
Factor (N-1), suppose there are N processes, then in worst case the user process needs to wait (N-1) processes to finish before getting hold of the shared resource.
Factor N * (N-1), suppose there are N processes, then first process, needs to communicate to all N processes except itself, same for second, communicate to all other (N-1) processes. Thus for N processes, inter-process communication happens as N*(N-1)
Computer Measurement Group, India 12
Quiz – 1
Bucket items as per their categories : • Memory Thrashing • Wait to obtain DB latch to modify shared structure • Wait for other thread to update shared counter • Cache-miss latency
Contention Coherency
Memory Thrashing
Cache-miss latency
Wait to obtain DB latch to modify shared structure
Wait for other thread to update shared counter
Computer Measurement Group, India 13
Universal Scalability Law (USL) – A revised look
• Linear Scalability – Without contention and
coherence linear (perfect) scalability will be achieved i.e. C(N) = N
• Contention – The factor α represents the degree of contention because of shared data
• Coherence – The factor β represents the penalty incurred for maintaining consistency of shared data
USL gives the point of maximum throughput, beyond which performance actually degrades!
Computer Measurement Group, India 14
USL Application
Steps to Apply: 1. Collect data points at different Loads C(N) at
various Concurrency Levels (N)
2. By definition 𝑁
𝐶(𝑁) takes 2nd degree polynomial,
[1+𝛼 𝑁 − 1 + 𝛽𝑁(𝑁 − 1 )] hence transform data to plot
points (X, Y) as: X=(N-1), Y=𝑁
𝐶(𝑁) - 1
3. Then perform least-squares regression to fit the data to polynomial of degree 2, (y = ax2+bx+c)
4. Do curve fitting with R2 ~ 1, calculate values for α,β as:
α = b-a , β = a
y = 2E-05x2 + 0.0006xR² = 0.9906
0
0.2
0.4
0.6
0.8
1
1.2
0 50 100 150 200 250
(N/C
) -1
N-1
Non-Linear Scalability
Leveraging USL for Predictions: • Compute Scaleup value C(N) as per USL formula • Predict Load (Throughput/Utilization) for different N points as: Xp(N) = N*C(N)
• Predict Maximum Concurrency as: Npmax = √1−𝛼
𝛽
• Predict Response time using User mix, Throughput mix & little’s law
R1 = (𝑁1
𝑋1(𝑁))-Z1, R2 = (
𝑁2
𝑋2(𝑁))-Z2, R3 = (
𝑁3
𝑋3(𝑁))-Z3
Computer Measurement Group, India 15
*Indicative* USL application for App Server Predictions
SDLC Stage: Testing Phase 1. From Performance testing results, collect data
points for various #Virtual Users (N), Throughput X(N), CPU Utilization U(N)
2. Transform data to plot points (X, Y) as:
X=(N-1), Y=𝑁
𝐶(𝑁) - 1
3. Do Curve fitting* and compute co-efficient of regression, a, b, c from curve with R2 close to 100%
4. Compute, contention & coherency parameters as: α = b-a , β = a
Note*: In Excel, inversion transformation is necessary as it cannot fit a rational function by default. Also to get around with precision problems in excel, its recommended to serious capacity planners to use statistical tools in R, matlab etc.
Computer Measurement Group, India 16
USL App Server Predictions vs. Actual Test Results
Xpmax, Npmax
Upmax, Npmax
Computer Measurement Group, India 17
Assumptions and Limitations of USL
• The selected workload mix must accurately represent business activities and should be constant.
• All the test conditions (application, database setup - code/configurations), should be same, only load can vary.
• For analysis and forecasts it is assumed, the impact of any application/component other than that of client application is negligible on workload and Utilizations.
• As systems don’t generally scale as they are supposed to, USL can be used for capacity planning purposes as best-case bound. Its better not to count on getting any more performance than the model indicates!
• Any major change (functional or at configuration level) to the application, database or infrastructure level can cause the model to change.
• If C(1) is not directly available, choose it so that the other points don’t show better-than-linear behavior and its within reasonable limits.
• It is observed that the forecast error margins are somewhere between 15%-20%. However this can be improved by having sufficient and accurate data points depicting system throughput, concurrency correctly and fitting curve well (R2 > 97%)
Computer Measurement Group, India 18
Database Capacity Assessment Case Study
• A major telecom client wanted to do capacity assessment of its critical workforce application that aids in effective planning, scheduling, dispatching work to technicians.
• Application had combination Java, .NET platforms and comprising of : 58+ Application Servers for various interface Transactions, 15+ Virtual Servers for Background jobs processing, and 1 Oracle DB server - Sun SPARC F25k system hosting 48 CPUs Having 88 business transactions/sec across various app servers
• Capacity Assessment Challenges: Capacity projections after actual performance testing -> Option not available due to
time/cost constraints No information about workload mix for each transaction in database Also no information available about DB service demands for each transaction Using Regression Analysis on DB data (MVA) -> not an option as data not available of
how change in application throughput translates to change in DB throughput So, paper based Guerrilla Capacity Assessment approach was employed to come up with Database projections.
Computer Measurement Group, India 19
Steps to apply USL
Choose Parameter
s
• Concurrency N – Average Active Sessions (AAS), Average Session Load (ASL)
• Load – Throughput - X(N), Utilization - U(N)
Collect Data
• Oracle AWR/ Statspack report
• Custom Queries
Consolidate Data
• Generate unified view of: ASL (N), Logical Reads X(N), and CPU Utilization U(N)
Model Data
• Feed data into USL tool – excel based, custom R script
• Fit curve well so that error is less - α,β values are realistic
Analyze Model
• Find Max Concurrency - Nmax
• Predict Max Load - Xmax, Umax
Computer Measurement Group, India 20
Step 1a: Choose Data – Throughput Parameter
• Various parameters could represent load on the system: Physical reads User commits + User rollbacks Execute count Session logical reads CPU Utilization
• Session Logical Reads was selected as measure of Throughput as it is closely related to ‘Queries executed on Database’ by giving number of actual Reads on Buffer along with Physical Reads.
• CPU Utilization was selected as measure of Utilization to measure load on system.
Computer Measurement Group, India 21
Step 1b: Choosing Data - Concurrency (N)
• Could use Average Active Sessions (AAS) term, readily available from Oracle 10 G AWR report.
• However since application DB was Oracle 9i which doesn’t have AWR reports, Average session load (ASL) concept was used:
ASL = (CPU Time + Time spent by Wait events)/ Elapsed Time Where
CPU Time = Given by ‘CPU used by this session’ present in v$sysstat table. It represents the total amount of CPU used by all sessions, excluding background processes. Time spent by Wait events = calculated by time taken by DB events which were not idle i.e. for obvious idle events like: Client message, dispatcher timer, lock element cleanup etc.
Please note: ASL value indicates degree of concurrency database can support. This is not to be confused with actual number of concurrent users/connections supported by the Database System. In real life, queues etc. other mechanism exists which can support thousands of concurrent end users.
Computer Measurement Group, India 22
Quiz - 2
For applying USL in other databases, what other parameters can be possibly employed?
SQL Server -> User Connections , Logical Connections, #Sessions etc. MySQL -> Queries executed per sec (Threads_Running) etc.
• Load
• Concurrency
SQL Server -> Transactions per Sec, Batch Requests per sec etc. MySQL -> Queries Received per Sec (Questions) etc.
Computer Measurement Group, India 23
Step 2: Collect Data Details
• ASL & Session Logical Reads, both parameters can be measured in following two ways: Using Oracle Stats pack report, or Using custom queries on System Tables like v$sysstat and v$system_event
• Since on client production environment stats pack reports were readily available only at one hour interval and this duration is very high as it does not capture system dynamics closely, custom queries from system tables v$sysstat and v$system_event were executed every 10 minutes and the CPU Utilization for that duration was noted.
Computer Measurement Group, India 24
Step 3: Consolidate Data
• Results from the SQL Query were consolidated to get a unified view of ASL (N), Total Logical Reads (X), and Average CPU Utilization (U) for every 10 mins.
Database 1 Database 2 Physical Database Server
Date Time ASL
(N1)
Total Logical
reads (X1)
ASL
(N2)
Total Logical
reads (X2)
Total N
(N=N1+N2)
Total X
(X=X1+X2)
CPU
Utilization
(%)
8/1/2012 9:18 0.468 1544544 6.18 35639445 6.648 37183989 19.47
8/1/2012 9:28 0.438 1057181 6.952 44029984 7.39 45087165 20.47
8/1/2012 9:38 0.396 905675 5.896 35182053 6.292 36087728 21.15
8/1/2012 9:48 0.799 1968244 5.593 30522881 6.392 32491125 23.16
.. .. .. .. .. .. .. ..
8/1/2012 11:08 1.445 3521876 8.669 42440075 10.114 45961951 29.38
8/1/2012 11:18 4.022 2783427 8.668 53818447 12.69 56601874 27.62
.. .. .. .. .. .. .. ..
8/1/2012 13:28 0.795 1937491 9.406 53881083 10.201 55818574 30.52
8/1/2012 13:38 0.687 4024107 8.404 49021605 9.091 53045712 28.99
Computer Measurement Group, India 25
Step 4: Model Data
This data was then feed into USL tools- R script and the appropriate points were selected to fit the USL curve well and realistic values for α, β are seen.
Utilization Model (N vs. U) Throughput Model (N vs. X)
R2 = 97.78%, Nmax = 18.33 R2 = 99.86%,
Nmax = 17.64
Computer Measurement Group, India 26
Step 5: Analyze Model-Observations and Inferences
• Model is validated for both models as: Curve Fitting efficiency (R2) > 97% Both Throughput and Utilization Capacity Models indicate Nmax close to 18.
• Throughput Capacity Model shows –ve α (i.e. no contention) implying more performance due to parallelization of tasks. This is as expected for Database as DB takes into account efficiency due to Buffering of data. With increase in workload, data in buffer becomes more locally available, hence more throughput can be easily serviced without extra work, hence notion of improved performance
• In Utilization Capacity Model we see, at Nmax = 18, CPU Utilization ~ 30%. • Hence application is not able to scale beyond 30%
• This fact was also confirmed from the event that occurred on 31st July 11AM-12PM. Due to users directly logging onto some Transaction1 servers, number of DB connections suddenly increased leading to higher CPU Utilization and poor DB response times.
Computer Measurement Group, India 27
Conclusion
• DB hardware has sufficient capacity to handle more additional workload to the tune of 100% increase.
• Hence in current state, it can easily support the workload projections of next 1 year. • However, a separate Database tuning exercise is recommended out to improve scalability
of the application & better utilize the underlying hardware. • Thus, paper based scalability assessment through USL helped analyzing scalability issues
in application and aided in effective capacity planning on the real production systems.
Computer Measurement Group, India 28
References
• Average Session Load • Interpreting Wait Events to Boost System Performance • How to Quantify Scalability, Neil J Gunther • Forecasting MySQL Scalability with the Universal Scalability Law, Baron Schwartz and Ewen
Fortune, • Guerrilla Capacity Planning, Neil J Gunther, 2005