Slides Cost Based Performance Modelling
-
Upload
eugenemargulis -
Category
Documents
-
view
153 -
download
6
Transcript of Slides Cost Based Performance Modelling
Cost Based Performance Modeling:
Addressing Uncertainties
Eugene Margulis
Telus Health Solutions
October, 2009
2
Outline
• Background
• What issues/questions are addressed by performance “stuff” ?
• Performance/capacity cost model
• Examples and demo
• Using the cost based model
• Cost Based Model and the development cycle
• Benefits
3
Areas addressed by Performance “Stuff”
•Key performance “competencies”:
–Validation
–Tracking
–Non-degradation
–Characterization
–Forecasting
–Business Mapping
–Optimization
• All these activities have a common goal...
4
Performance activities Goal
•Ability to articulate/quantify resource requirements for a given system behaviouron a given h/w provided a number of constraints.
• Many ways of “getting” there – direct testing, load measurements, modelling, etc.
5
Reference System
System
A
A
B
B
C
C
GUI GUI
DISK
Outputs: GUI, devices, data, reports
Inputs/Devices:events/data(periodic, streaming)
• General purpose (Unix, VxWorks, Windows, etc)• Multiprocessor / virtual instances (cloud)• Non-HRT but may have Hard Real Time Component• Has 3rd party code – binary only, no access• Heterogeneous s/w – scripts, C, multiple JVMs
6
Real world “challenges” - uncertainties:
• Requirements / Behavior uncertainty:
– Performance Requirements are not well defined
– Load levels (S/M/L) or “KPI”s are speculated, not measured (cannot be measured)
• Code uncertainty:
– No Access to large portions of code / can’t rebuild/recompile
• H/W uncertainty:
– Underlying H/W architecture is not fixed (and can be very different)
• This is not a strictly Testing/Verification activity but rather an exploratory exercise where we need to discover/understand rather then verify
7
Additional Complications:
• Performance limits are multi-dimensional (CPU, Threading, IO, disk space)
• Designers in India, Testers in Vietnam, Architects in Canada, Customers in Spain (How to exchange information??)
• Need ability to articulate/communicate performance results efficiently
8
Examples of questions addressed by performance “stuff”
• Will timing requirements be met? All the time? Under what conditions? Can we guarantee it? What is relationship between latency and max rate?
• Will we have enough disk space (for how long)?
• What if we run the system on HW with 32 slow processors? (instead of 4 fast ones?) What would be max supported rate of events then?
• What if the amount of memory is reduced? What would be max supported rate of events then?
• What if some GUIs are in China? (increase RTT)
• Do we have enough spare capacity for an additional application X?
• Is our system performing better/worse compare to the last release (degradation)?
• What customer visible activity (not process name/id, not an IP port, not a 3rd
party DB) uses the most of resources? (e.g. CPU? Memory? Heap? BW? Disk?)
• What if we have two times as many of type A devices? What is the max size of network we can support? How does performance map to Business Model?
9
How can these be answered?
• Yes, we can test it in the lab (at least some) ….
… but can we have the answers by
tomorrow ??
10
What we need...
• Lab testing alone does not address this (efficiently)
• Addressed by a combination of methods/approaches
• But need a common “framework” to drive this
11
What we really need...
• A flexible mapping between customer behaviourand performance/capacity metrics of the system (recall performance goals)
• But there is a problem…There is HUGE number of different behaviours – even in the simplest of system…
12
Can we simplify the problem?
• Can we reduce the problem space and still have something useful/practical?
–Very few performance aspects are pass/fail (outside of HRT/military/etc.)
• Willing to trade-off accuracy for speed
–No need to be more accurate then inputs
13
Transaction – an “atomic performance unit”
• System processes TRANSACTIONS
–80/20 Rule - 20% of TRANSACTIONS responsible for 80% of “performance” during Steady state operations
–Focus on steady state (payload) - but other operation states can be defined
14
What is a TRANSACTION from performance perspective?
• What does the system do most of the time (payload)?
– Processes events of type X from device B (….transaction T1)
– Produces reports of type Y (… transaction T2)
– Updates GUI (… transaction T3)
– Processes login’s from GUI (… transaction T4)
• How often does it do it?
– Processes events of type X from device B – on avg, 3 per sec.
– Produces reports of type Y – once per hour
– Updates GUI – once every 30 sec
– Processes login’s from GUI – on demand, on avg 1 per 10 min.
• How much do we “pay” for it?– cpu?
– Memory?
– Disk?
…
15
Cost Based Model
16
Performance/Capacity – 3+ way view
• Costs – the price in terms of resources “paid” per transaction
– E.g. 2% of CPU for every fault/sec– E.g. 8% of CPU for every RAD Authentication per/sec
• Resource Utilization – the price in terms of resources for the given behaviour:
– E.g. (2% of CPU for every fault/sec * 10 faults/sec) + (8% of CPU for every Authentication per/sec * 1 authentication/sec) = 28%
• Costs can be used directly to estimate latency impact (lower bound)
– E.g.: 2 AA/sec -> 16% CPU impact– 3 sec 10 AA/sec burst with only 10% CPU available -> 24 sec latency (at least!)
Costs
Behaviour
Resource
Requirements
COSTMODEL
HWLatency +
Other Constraints
• Behaviour – transactions and frequencies
– E.g. faults, 10 faults/sec
– authentication,
1 authentication/sec
17
Steps to build the Cost Model
• Behaviour
– Decompose system into mutually-orthogonal performance transactions
– Identify expected frequencies (ranges of frequencies) per transaction
• Costs
– Measure the incremental costs per transaction on a given h/w – one TX at a time
– Identify boundary conditions (Cpu? Threading? Memory? Heap?)
• Constraints
– Identify latency requirements and other constraints
• Build spreadsheet model
– COSTS x BEHAVIOR -> REQUIREMENTS (assume linearity at first)
– Calibrate based on combined tests
18
Identifying Transactions
• Identify main end-to-end “workflows” though the system and their frequencies
• However since workflows contain common portions they are not “orthogonal” from performance perspective (resources/rates may not be additive)
• Identify common portions of the workflows
• The common portions are “transactions”
• A workflow is represented by a sequence of one or more transactions
19
Costs example
Vmstat1:
CPU
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
11/0
5-0
8:5
4:2
3
11/0
5-0
8:5
9:2
3
11/0
5-0
9:0
4:2
3
11/0
5-0
9:0
9:2
3
11/0
5-0
9:1
4:2
4
11/0
5-0
9:1
9:2
4
11/0
5-0
9:2
4:2
4
11/0
5-0
9:2
9:2
4
11/0
5-0
9:3
4:2
4
11/0
5-0
9:3
9:2
5
11/0
5-0
9:4
4:2
5
11/0
5-0
9:4
9:2
5
11/0
5-0
9:5
4:2
5
TOTAL
usr
sys
y = 0.048x + 0.006
R2 = 0.9876
0%
10%
20%
30%
40%
50%
60%
1 2 3 4 5 6 7 8 9 10 11 12
0
2
4
6
8
10
12
CPU%
LATENCY
Linear(CPU%)
Resources (CPU%/Latency) Measured for 2/4/6/8/10/12 requests/sec
LATENCY = exponential after 10 RPS => MAX RATE = 10 RPS
• Process is NOT CPU bound (there is lots of spare CPU% @ 10 RPS)
• (In this case it is limited by the size of a JVM’s heap)
Incremental CPU utilization = 4.8% of CPU per request
• Measured on Sun N440 (4 CPUs, 1.6 GHz each) – 6400 MHz total capacity
• COST = 4.8% * 6400 MHz = 307.2 MHz per request
20
Transaction Cost Matrix
ALMINS Cost
SA MHz MaxR
0 12.0 125.0
10000 15.5 96.6
30000 16.5 90.8
60000 18.0 83.3
100000 20.0 75.0
200000 24.9 60.1
• Transaction Costs
– Include resource cost (can be multiple resources)
– Can depend on additional parameters (e.g. “DB Insert" depends on the number of DB records)
– Can include MaxRate (if limited by a constraint other then the resource, e.g. CPU).
• Example of a transaction cost matrix (SA is a parameter the particular transaction deepens on - db size)
21
Constraints / Resources
• CPU– Overall CPU utilization is additive per transaction (most of the time)
– If not – then transactions are not orthogonal – break down or use worst case
• MEMORY / Java HEAPs– If there is no virtual memory (e.g. vxWorks) then additive; treat like CPU
– If there is virtual memory – then much trickier, no concept of X% utilization need to do direct testing.
– Heap sizes for each JVM – can be additive within each JVM
• DISK– Additive, must take purging policies and retention periods into account.
• IO– Additive, read/write rates are additive, but total capacity would depend on %waiting / svt and depend on
manufacturer, io pattern, etc. Safe limits can be tested separately
• BW – Additive
– “effective” BW depends on RTT
• Threading– Identify threading model for each TX – if TX is single-threaded then scale w.r.t.clock rate of the single HW Thread; if
multithreaded then scale w.r.t. entire system e.g:• Suppose a transaction X “costs” 1000 MHz and is is executed on a 32 CPU system with 500 MHz per CPU
• If it is single-threaded – it will take NO LESS then 2 seconds
• If it is multi-threaded – it will take NO LESS then 1000/(32*1000) ~ 0.03 seconds
• Latency– For “long” transactions - measure base latency – then scale using threading. Use RTT to compute impact if relevant
– Measure MAX rate on different architectures – to calibrate
22
Do we need to address everything???
• There are lots of constraints…
• May be additional constraints based on 3rd party processing– Addressing ALL of the in a single model may be impractical
• However – not all of them need to be addressed in every case for a useful model. For example:– vxWorks, 1 CPU, 512MB of memory, no virtual memory, pre-emptive
scheduling – focus on MEM
– Solaris, 8 CPUs, 32 h/w strands, 32G memory, - focus on CPU/Threading
• Only model what is relevant for the system
23
Model / Example
Rate
Workflow (/sec)
AU 5
AUPE 7
RAD 0
PAMFTP 0
PAMTEL 0
PAMFTPC 0
PAMTELC 0
MGMUSR 0
Behaviour
COSTMODEL
Workflow s-MHz
AU 111
AUPE 222
GET 333
RAD 777
PAMFTP 555
PAMTEL 444
Costs
Constraint Audit Total CPU% 64.2%
Security/AM Total Security rate greater then AM Max
Security/PAM OK
Sustainability At least one rate is not sustainable
Alarm Rate Composite alarm rate (INS+UPD) not sustainable
NOS Trigger OK
CWD Clients OK
Overall CPU Unlikely Sustainable
Constraint CPU Es Disk Nes Disk BW
Max Utilization 75% 80% 90% 80%
Max NE Supported 623 800 4482 21836
Constraint AEPS RRPS
Max Utilization 80 5
Max NE Supported 3154 4485
Projected Max Nes 623
Resource
Requirements
24
Model Hierarchy
• Transaction Model
– Cost and constraints per individual transaction w.r.t. a number of options/parameters
– E.g. 300Mhz to process an event
• System Model
– Composite Cost of executing a specific transaction load on a given h/w
– E.g. 35% cpu for 10 events/sec and 4 user queries/sec on N440
• Business Model
– Mapping of System model to Business metrics
– E.g. N440 can support up to 100 NE
25
Using model for scalability and bottleneck analysis
• Mapping between any behavior and capacity requirements
• Mapping the model to different processor architectures
• Can Quantify the impact of a Business request
• Can iterate over multiple “behaviors”
– Extends “What-if” analysis
– Enables operating envelope visualization
– Enables resource bottleneck identification
26
Using the Cost Based Model / Demo
27
Identifying resource allocation – by TRANSACTIONS / Applications
Disk_Security_Logs
_GB, 1
Disk_NE_Loads_GB,
7
Disk CACP_GB, 3
Disk NE B&R_GB, 7
Disk_NE_LOG_GB,
57
Disk_PM_GB, 11
spare, 37
Disk_Alarm_DB_GB,
10
Disk_Alarms_REDO_
GB, 5
CPU Distribution by feature
(500A x 15K cph)
14%
4%
13%
27%
0% 0%2%
0% 1%3%
5%
32%
0% 0%0%
5%
10%
15%
20%
25%
30%
35%
40%
Base C
P
Queuin
g
Bro
adcasts
Colle
ct
Dig
its
Giv
e I
VR
Giv
e R
AN
Mlin
kS
crP
op
Hdx
Intr
insic
s
CalB
yC
all
DB
Blu
e D
B
RT
Dis
pla
y
RT
Data
AP
I
Report
s
RAM - Top 10 Users
AppLd
IPComms
Base
IMF
Logs
OSI
STACK
PP
GES
HIDiags
OTHER
FREE
28
Compute operating envelope
Iterate over multiple behaviours – to compute operating envelope
10000
14000
18000
22000
26000
30000
34000
38000
42000
46000
50000
25000
26800
28600
30400
32200
34000
10
10.8
11.6
12.4
13.2
14
14.8
15.6
16.4
17.2
18
EVENT
RATE
NRECORDS1NUSERS
Operating Envelope
Operating Envelope
0
10
20
30
40
50
60
70
80
0 200 400 600 800 1000 1200
#NE1
#N
E2
MAX
MAX_cpu
MAX_ed
MAX_ned
MAX_bw
MAX_aeps
29
Nice charts – but how accurate are they?
Models are from God…. Data is from the Devil (http://www.perfdynamics.com/)
• Initially WAY more accurate then behavior data
• Within 10% of combined metrics – for an “established” model
• Less accurate as you extrapolate further form measurements
• Model includes guesses as well as measurements
• The value is to establish patterns rather then absolute numbers.
30
Projects where this was applied
• Call Centre Server (WinNT platform, C++)
• Optical Switch (VxWorks, C, Java)
• Network Management System (Solaris, Mixed, 3rd party, Java)
• Management Platform Application (Solaris, Mixed, 3rd party, Java)
• …
31
Addressing Uncertainties - recap
Uncertainty Cost Based Model “Traditional”
Behavior Forecast ANY behavior ERALY
Compute Operating Envelope
Worst Case Over-Engineering
TigerTeam- LATE
Code Treat as “black box”
No access needed
Costs w.r.t. behavior not code
??? KPI ??? BST ???
H/W Forecast h/w impact EARLY
Small number of “pilot” tests
Compute Operating Envelope
Worst Case Over-Engineering
TigerTeam- LATE
32
Cost Reduction
• Significantly reduces the number of tests needed to compute operating envelope.
– Suppose the system has 5 transactions defined, need to compute operating envelope with 10 “steps” for each transaction (e.g. 1 per sec, 2 per sec, ... 10 per sec).
– Using BST type “brute force” testing we will need to run 10 * 10 * 10 * 10 *10 tests (one for each rate combination), in total 100,000 tests
– Using the model approach we would need to run 10+10+10+10+10 tests, in total 50 tests (there will be additional work for calibration, model building, etc but the total costs will be much smaller then running 100K big system tests)
– Each individual test is much simpler then BST and can be automated
– H/w cost reduction – less reliance on BST h/w, using pilot tests can map from one h/w platform to another
33
How does the Cost Model fit in the dev cycle?
34
Performance/CapacityTypical Focus at the wrong places
Planning Product Verification
Development
KPI Validation(PT/SV)
KPI Definition
(PLM)
• Uncertainty of expected customer scenarios at planning stage (at the time of KPI commitment – specifically for platform)
• Issues discovered late – expensive to fix (=tiger teams) or over-engineering
• No early capacity/performance estimates to customers
• No sensitivity analysis – what is the smallest/greatest contributor to resources? Under what conditions?
• Validation involves BST type of tests; expensive; small number of scenarios (S/M/L)
• No results portability: validation results are difficult/impossible to map/generalize to specific customer requirements
?
Tiger Team
35
Performance/Capacity – Activities
Performance “Competency”
With Cost Based Model “Traditional”
Validation Validate Model Validate Requirements
BST (S/M/L) ???
Tracking Transaction Costs ??? KPI ???
Non-Degradation w.r.t. Transaction ??? KPI ??
Characterization / Forecasting / Sensitivity
w.r.t Transaction ??? Worst Case ???
Optimization Proactive, focus on specific Transaction/Behavior
Tiger Team
Perf Info Communication / Portability
Model Based
Transaction Based
??? KPI ???
36
Performance/Capacity – Key approaches
• All activities are focused on “transactions” metrics (these are “atomic” metrics and are much easier to deal with then the “composite” metrics such as KPI, BST, etc)
• All activities are flexible and proactive
• Start performance activities as early as possible and increase accuracy throughout the design cycle
37
Performance/Capacity – Model driven
• Identify key transactions throughout the dev cycle
• Quantify behaviour in terms of transactions
• Automate test/measurements per transaction (not all, but most important)
• Automate monitor/measurement/tracking of transaction costs – as part of sanity process (weekly? Daily? – automated)
• Tight cooperation between testers/designers
• Model is developed in small steps and contains latest measurements and guesses
• Product verification – focus on model verification/calibration– runs “official” test suite (automated) per transaction – Runs combined “BST” (multiple transactions) – to calibrate the model
38
Automated Transaction Cost Tracking
• Approximately 40 performance/Capacity CRs raised prior system verification stage
• Identification of bottlenecks (and feed-back to design)
• Continuous capacity monitoring – load-to-load view
• Other metrics collected regularly
-
5
10
15
20
25
cj
cn ct
db df
dp
ds
dz
ed ef
CPU (%)
TotCPU(vmstat)
JavaCPU
OracleCPU
TotCPU(prstat)
SysCPU(vmstat)
PagingCPU
OtherCPU
MPSTDEV
SY/msec
CS/msec
Delay
-
50
100
150
200
250
300
1400co
1400ct
1400dc
1400di
1400dm
1400dp
1400du
1400dy
1400gf
1400gl
1400go
1400gu
PropD(ms)
QueueD(ms)
PubD(ms)
ProcD(ms)
39
Cost Based Approach – Responsibilities and Roles
Costs
Behavior
Resource Req
Validation Focus – verify capacity as estimated
Design Monitoring Focus – track transaction costs
Forecasting focus –estimate requirements, sensitivity analysis, what if...
Business focus–quantify behavior
Design focus –decompose into transactions
40
Benefits of using the model-driven performance engineering
41
Benefits – technical and others
• Communication across groups – everyone speaks the same language (well defined transactions/costs).
• “De-politization” of performance eng – can’t argue/negotiate – the numbers and trade-offs are clear.
• Better requirements – quantifiable, PLM/Customer can see value in quantifying behaviour
• Documentation reduction – engineering guides are replaced by the model; the perf related documentation can focus on improvements, etc.
• Early problem detection - most performance problems are discovered before the official verification cycle
• Easy resource leak detection – easily traceable to code changes
• Reproducible/automated tests – same tests scripts used by design/PV
• Cost Reduction – less need for BST type of tests, less effort to run PV, reduced “over-engineering”
42
Things not discussed here…
43
• Tools
– Automation (!!!!)
– perf tracing/collection tools, transaction stat tools, transaction load, visualization, data archiving
– native, simple, ascii + excel
• Organization (info flow/responsibilities)
– good question, would depend on size and maturity of the project
– Best if driven by design rather then qa/verification
– Start slowly
• Performance Requirements definition
– trade-offs, customer traceable, never “locked”
• Performance documentation
– Is ENG Guide necessary?
• Using LOADS instead of transactions
– possible if measurable directly
• Linear Regression instead of single TX testing
– possibly for stable systems
Other issues to consider
44
Questions?
45
Appendix: useful links
http://technet.microsoft.com/en-us/commerceserver/bb608757.aspx
– Microsoft’s Transaction Cost Analysis
www.contentional.nl – mBrace – Transaction Aware Performance Modelling
www.spe-ed.com – Software Performance Engineering
www.perfdynamics.com – Performance Dynamics
www.cmg.org – CMG: Computer Measurement Group
46
Appendix: Good Test
47
Transaction cost testing
• How to measure workflow cost?
– For each workflow , run at least 4 test cases, each corresponding to the different rate of workflow execution.
• For example, for RAD1 run 4 test cases for 1, 3, 6 and 10 radius requests per second. The actual rate should result in CPU utilization between 20% and 75% for the duration of the test. If the resulting CPU is outside of these boundaries – modify the rate and rerun the test (the reason is that we want the results to represent sustainable scenarios, short term burst analysis is a separate issue).
– For each test collect and report CPU, memory and latency (as well as failure rate) before, during and after the test (about 5 min before, 5 min for test, 5 min after).
– Preserve all raw data (top/prstat, etc. outputs) for all tests – these may be required for further analysis.
time
Resource (CPU%)
CPU_tst
CPU_ppCPU_bcg
T_R_start T_E_start T_E_end
T_PP_endT_R_end
e.g. 10 RAD1 per second
• Automate the test-case so that it is possible to run it after each sanity to track changes
• Data to report/publish
– Marginal CPU/resource per workflow rate
– I can help with details
48
Metrics to be recorded/collected during a test
time
Resource (CPU%)
CPU_tst
CPU_pp
CPU_bcg
T_R_start T_E_start T_E_end T_PP_end T_R_end
Key metrics to collect during a test
T_R_start Time data recording started
T_E_start Time Event injection started. Assuming events are injected at a constant rate for the entire duration of the test
T_E_end Time Event injection ended
EPS Rate of event injection during the test (between T_E_start and T_E_end). Rate is constant during the test
T_PP_end Time Post-Processing ended
T_R_end Time Recording is ended
CPU_tst CPU% utilization during test
CPU_pp CPU% utilization during post-processing
CPU_bcg Background CPU% utilization
Enough samples must be collected to be able to produce a chart as below for all resources: CPU (total and by process) Memory (total and by process); Heap (for specific JVMs), IO, disk. The chart does not need to be included in the report but it must be available for analysis.
Application should also monitor/record its Latency and Failure rate – this is application specific, but it should be collected/recorded in such a way that it can be correlated with the resource chart. Avg latency and Avg Failure rate during the test is NOT sufficient– it does not show the trends.
Derived Metrics – to be included in performance report
mCPU_tst CPU_tst – CPU_bcg (marginal test cost)
mCPU_pp CPU_pp – CPU_bcg (marginal post-processing cost. If post-processing is not 0 then the EPS rate is not sustainable over long time)
mT_tst T_E_end – T_E_start (duration of the test/injection)
mT_pp T_PP_end – T_E_end (duration of post-processing –ideally this should be 0)
Ideally the resource utilization during the test is “flat” and returns to pre-test levels after the test is completed. To verify this compare the measurements before/after tests (points 1 and 5 on the chart) and at the beginning and at the end of the test (points 2 and 3 on the chart)
dCPU_bcg CPU_5 – CPU_1 (if >0 then resource is not fully released after test)
dCPU_tst CPU_3 – CPU_2 (if >0 then there may be a resource “leak” during the test)
1
2 3
4
5
In this chart CPU is used as an example, but the same methodology applies to all resources – memory, heap, disk io, CPU, etc.
TOOLS
Any tool can be used to collect the metrics – as long as it can collect multiple periodic samples. As a rule of thumb collect about 100 samples for the pre-test idle, 200 samples per test and another 100 after test. If you collect a sample once per 10 sec the overall test duration will be a bit more then 1 hrs. The following are examples:
prstat –n700 for individual process CPU and memory (-n700 to collect up to 700 processes regardless of their cpu% - to make sure you get a complete memory picture)
TOP / ps These can be used instead of prstat
vmstat For global memory/cpu/kernel CPU
Iostat If you suspect io issues
jstat -gc For individual JVM heap/GC metrics; look at OC and OU parameters.
CPU%, Memory, latency, heap
49
Perfect Case - SUSTAINABLE
No Post processing mT_pp =0
Resource utilization is flat during test
dCPU_tst = dMEM_tst = 0
All resources recover completely
dCPU_bcg = dMEM_bcg = 0
CPU% per 1 EPS mCPU_tst/EPS
Memory specific Process /system memory and heap may grow – if logically the events create objects that should stay in memory (e.g. you are doing discovery and are adding new data structures).
Memory/heap may also grow initially after T_E_start but should stabilize before T_E_end –this represents build-up of working sets. In this case memory may not be released fully upon completion of the test. In that case run the test again – if the memory keeps on increasing this may indicate a leak.
The overall CPU used in this case is the area under the utilization curve – the blue square
Make sure that latency/success rate during the test is acceptable. It is possible that the resource profile may look perfect but the events may be rejected due to the lack of resources.time
Resource (CPU%)
CPU_tst
CPU_ppCPU_bcg
T_R_start T_E_start T_E_end
T_PP_endT_R_end
e.g. 10 RAD1 per second
50
Post-Processing – NOT SUSTAINABLE / BURST test
Post processing detected
mT_pp > 0 means that the system was not able to process events at the rate they arrive, this can be due to CPU utilization, or due to threading or other resource contention. In this case you may see that
– the latency is continuously increasing between T_E_Start and T_E_end
– Memory (or old heap partition of a JVM) is continuously increasing between T_E_start and T_E_end and then starts decreasing during post-processing. This is because the events that cannot be processed in time must be stored somewhere. (see green line on the chart).
– The failure rate may increase towards the end of the test
Load unsustainable
This load is unsustainable over the long period of time - can be hours or days – but the system/process will either run out of memory or will be forced to drop outstanding events
May be acceptable for short burst/peaks
Although this rate is unsustainable for long time it may be acceptable for short bursts / peaks. The duration of post-processing and the rate of growth of the bounding factor (memory/heap/threads) will help determine the max duration of the burst.
CPU% per 1 EPS The overall CPU used in this case is the area under the utilization curve – the blue square and the pink square. It is possible to predict how much CPU would be used by 1 EPS if the bottleneck is removed (e.g. if threading is a bottleneck and we add more threads):
(mCPU_tst + mCPU_pp*mT_pp/mT_tst) / EPS
In case of post processing it is important to determine what is the boundary condition:
f CPU utilization during test is90% or more then it is likely that we are bounded by CPU
If memory/heap of component A grows and component A has to pass events to component B, then B may be a bottleneck
If component B uses 1 full CPU (25%) then it is likely single-threaded and threading is the issue
If component B does disk io, or other type of access that requires waiting then this can be the bottleneck
time
Resource (CPU%)
CPU_tst
CPU_pp
CPU_bcg
T_R_start T_E_start T_E_end T_PP_end T_R_end
Memory, latency,
heap
CPU%