Slides Cost Based Performance Modelling

50
Cost Based Performance Modeling: Addressing Uncertainties Eugene Margulis Telus Health Solutions [email protected] October, 2009

Transcript of Slides Cost Based Performance Modelling

Page 1: Slides Cost Based Performance Modelling

Cost Based Performance Modeling:

Addressing Uncertainties

Eugene Margulis

Telus Health Solutions

[email protected]

October, 2009

Page 2: Slides Cost Based Performance Modelling

2

Outline

• Background

• What issues/questions are addressed by performance “stuff” ?

• Performance/capacity cost model

• Examples and demo

• Using the cost based model

• Cost Based Model and the development cycle

• Benefits

Page 3: Slides Cost Based Performance Modelling

3

Areas addressed by Performance “Stuff”

•Key performance “competencies”:

–Validation

–Tracking

–Non-degradation

–Characterization

–Forecasting

–Business Mapping

–Optimization

• All these activities have a common goal...

Page 4: Slides Cost Based Performance Modelling

4

Performance activities Goal

•Ability to articulate/quantify resource requirements for a given system behaviouron a given h/w provided a number of constraints.

• Many ways of “getting” there – direct testing, load measurements, modelling, etc.

Page 5: Slides Cost Based Performance Modelling

5

Reference System

System

A

A

B

B

C

C

GUI GUI

DISK

Outputs: GUI, devices, data, reports

Inputs/Devices:events/data(periodic, streaming)

• General purpose (Unix, VxWorks, Windows, etc)• Multiprocessor / virtual instances (cloud)• Non-HRT but may have Hard Real Time Component• Has 3rd party code – binary only, no access• Heterogeneous s/w – scripts, C, multiple JVMs

Page 6: Slides Cost Based Performance Modelling

6

Real world “challenges” - uncertainties:

• Requirements / Behavior uncertainty:

– Performance Requirements are not well defined

– Load levels (S/M/L) or “KPI”s are speculated, not measured (cannot be measured)

• Code uncertainty:

– No Access to large portions of code / can’t rebuild/recompile

• H/W uncertainty:

– Underlying H/W architecture is not fixed (and can be very different)

• This is not a strictly Testing/Verification activity but rather an exploratory exercise where we need to discover/understand rather then verify

Page 7: Slides Cost Based Performance Modelling

7

Additional Complications:

• Performance limits are multi-dimensional (CPU, Threading, IO, disk space)

• Designers in India, Testers in Vietnam, Architects in Canada, Customers in Spain (How to exchange information??)

• Need ability to articulate/communicate performance results efficiently

Page 8: Slides Cost Based Performance Modelling

8

Examples of questions addressed by performance “stuff”

• Will timing requirements be met? All the time? Under what conditions? Can we guarantee it? What is relationship between latency and max rate?

• Will we have enough disk space (for how long)?

• What if we run the system on HW with 32 slow processors? (instead of 4 fast ones?) What would be max supported rate of events then?

• What if the amount of memory is reduced? What would be max supported rate of events then?

• What if some GUIs are in China? (increase RTT)

• Do we have enough spare capacity for an additional application X?

• Is our system performing better/worse compare to the last release (degradation)?

• What customer visible activity (not process name/id, not an IP port, not a 3rd

party DB) uses the most of resources? (e.g. CPU? Memory? Heap? BW? Disk?)

• What if we have two times as many of type A devices? What is the max size of network we can support? How does performance map to Business Model?

Page 9: Slides Cost Based Performance Modelling

9

How can these be answered?

• Yes, we can test it in the lab (at least some) ….

… but can we have the answers by

tomorrow ??

Page 10: Slides Cost Based Performance Modelling

10

What we need...

• Lab testing alone does not address this (efficiently)

• Addressed by a combination of methods/approaches

• But need a common “framework” to drive this

Page 11: Slides Cost Based Performance Modelling

11

What we really need...

• A flexible mapping between customer behaviourand performance/capacity metrics of the system (recall performance goals)

• But there is a problem…There is HUGE number of different behaviours – even in the simplest of system…

Page 12: Slides Cost Based Performance Modelling

12

Can we simplify the problem?

• Can we reduce the problem space and still have something useful/practical?

–Very few performance aspects are pass/fail (outside of HRT/military/etc.)

• Willing to trade-off accuracy for speed

–No need to be more accurate then inputs

Page 13: Slides Cost Based Performance Modelling

13

Transaction – an “atomic performance unit”

• System processes TRANSACTIONS

–80/20 Rule - 20% of TRANSACTIONS responsible for 80% of “performance” during Steady state operations

–Focus on steady state (payload) - but other operation states can be defined

Page 14: Slides Cost Based Performance Modelling

14

What is a TRANSACTION from performance perspective?

• What does the system do most of the time (payload)?

– Processes events of type X from device B (….transaction T1)

– Produces reports of type Y (… transaction T2)

– Updates GUI (… transaction T3)

– Processes login’s from GUI (… transaction T4)

• How often does it do it?

– Processes events of type X from device B – on avg, 3 per sec.

– Produces reports of type Y – once per hour

– Updates GUI – once every 30 sec

– Processes login’s from GUI – on demand, on avg 1 per 10 min.

• How much do we “pay” for it?– cpu?

– Memory?

– Disk?

Page 15: Slides Cost Based Performance Modelling

15

Cost Based Model

Page 16: Slides Cost Based Performance Modelling

16

Performance/Capacity – 3+ way view

• Costs – the price in terms of resources “paid” per transaction

– E.g. 2% of CPU for every fault/sec– E.g. 8% of CPU for every RAD Authentication per/sec

• Resource Utilization – the price in terms of resources for the given behaviour:

– E.g. (2% of CPU for every fault/sec * 10 faults/sec) + (8% of CPU for every Authentication per/sec * 1 authentication/sec) = 28%

• Costs can be used directly to estimate latency impact (lower bound)

– E.g.: 2 AA/sec -> 16% CPU impact– 3 sec 10 AA/sec burst with only 10% CPU available -> 24 sec latency (at least!)

Costs

Behaviour

Resource

Requirements

COSTMODEL

HWLatency +

Other Constraints

• Behaviour – transactions and frequencies

– E.g. faults, 10 faults/sec

– authentication,

1 authentication/sec

Page 17: Slides Cost Based Performance Modelling

17

Steps to build the Cost Model

• Behaviour

– Decompose system into mutually-orthogonal performance transactions

– Identify expected frequencies (ranges of frequencies) per transaction

• Costs

– Measure the incremental costs per transaction on a given h/w – one TX at a time

– Identify boundary conditions (Cpu? Threading? Memory? Heap?)

• Constraints

– Identify latency requirements and other constraints

• Build spreadsheet model

– COSTS x BEHAVIOR -> REQUIREMENTS (assume linearity at first)

– Calibrate based on combined tests

Page 18: Slides Cost Based Performance Modelling

18

Identifying Transactions

• Identify main end-to-end “workflows” though the system and their frequencies

• However since workflows contain common portions they are not “orthogonal” from performance perspective (resources/rates may not be additive)

• Identify common portions of the workflows

• The common portions are “transactions”

• A workflow is represented by a sequence of one or more transactions

Page 19: Slides Cost Based Performance Modelling

19

Costs example

Vmstat1:

CPU

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

11/0

5-0

8:5

4:2

3

11/0

5-0

8:5

9:2

3

11/0

5-0

9:0

4:2

3

11/0

5-0

9:0

9:2

3

11/0

5-0

9:1

4:2

4

11/0

5-0

9:1

9:2

4

11/0

5-0

9:2

4:2

4

11/0

5-0

9:2

9:2

4

11/0

5-0

9:3

4:2

4

11/0

5-0

9:3

9:2

5

11/0

5-0

9:4

4:2

5

11/0

5-0

9:4

9:2

5

11/0

5-0

9:5

4:2

5

TOTAL

usr

sys

y = 0.048x + 0.006

R2 = 0.9876

0%

10%

20%

30%

40%

50%

60%

1 2 3 4 5 6 7 8 9 10 11 12

0

2

4

6

8

10

12

CPU%

LATENCY

Linear(CPU%)

Resources (CPU%/Latency) Measured for 2/4/6/8/10/12 requests/sec

LATENCY = exponential after 10 RPS => MAX RATE = 10 RPS

• Process is NOT CPU bound (there is lots of spare CPU% @ 10 RPS)

• (In this case it is limited by the size of a JVM’s heap)

Incremental CPU utilization = 4.8% of CPU per request

• Measured on Sun N440 (4 CPUs, 1.6 GHz each) – 6400 MHz total capacity

• COST = 4.8% * 6400 MHz = 307.2 MHz per request

Page 20: Slides Cost Based Performance Modelling

20

Transaction Cost Matrix

ALMINS Cost

SA MHz MaxR

0 12.0 125.0

10000 15.5 96.6

30000 16.5 90.8

60000 18.0 83.3

100000 20.0 75.0

200000 24.9 60.1

• Transaction Costs

– Include resource cost (can be multiple resources)

– Can depend on additional parameters (e.g. “DB Insert" depends on the number of DB records)

– Can include MaxRate (if limited by a constraint other then the resource, e.g. CPU).

• Example of a transaction cost matrix (SA is a parameter the particular transaction deepens on - db size)

Page 21: Slides Cost Based Performance Modelling

21

Constraints / Resources

• CPU– Overall CPU utilization is additive per transaction (most of the time)

– If not – then transactions are not orthogonal – break down or use worst case

• MEMORY / Java HEAPs– If there is no virtual memory (e.g. vxWorks) then additive; treat like CPU

– If there is virtual memory – then much trickier, no concept of X% utilization need to do direct testing.

– Heap sizes for each JVM – can be additive within each JVM

• DISK– Additive, must take purging policies and retention periods into account.

• IO– Additive, read/write rates are additive, but total capacity would depend on %waiting / svt and depend on

manufacturer, io pattern, etc. Safe limits can be tested separately

• BW – Additive

– “effective” BW depends on RTT

• Threading– Identify threading model for each TX – if TX is single-threaded then scale w.r.t.clock rate of the single HW Thread; if

multithreaded then scale w.r.t. entire system e.g:• Suppose a transaction X “costs” 1000 MHz and is is executed on a 32 CPU system with 500 MHz per CPU

• If it is single-threaded – it will take NO LESS then 2 seconds

• If it is multi-threaded – it will take NO LESS then 1000/(32*1000) ~ 0.03 seconds

• Latency– For “long” transactions - measure base latency – then scale using threading. Use RTT to compute impact if relevant

– Measure MAX rate on different architectures – to calibrate

Page 22: Slides Cost Based Performance Modelling

22

Do we need to address everything???

• There are lots of constraints…

• May be additional constraints based on 3rd party processing– Addressing ALL of the in a single model may be impractical

• However – not all of them need to be addressed in every case for a useful model. For example:– vxWorks, 1 CPU, 512MB of memory, no virtual memory, pre-emptive

scheduling – focus on MEM

– Solaris, 8 CPUs, 32 h/w strands, 32G memory, - focus on CPU/Threading

• Only model what is relevant for the system

Page 23: Slides Cost Based Performance Modelling

23

Model / Example

Rate

Workflow (/sec)

AU 5

AUPE 7

RAD 0

PAMFTP 0

PAMTEL 0

PAMFTPC 0

PAMTELC 0

MGMUSR 0

Behaviour

COSTMODEL

Workflow s-MHz

AU 111

AUPE 222

GET 333

RAD 777

PAMFTP 555

PAMTEL 444

Costs

Constraint Audit Total CPU% 64.2%

Security/AM Total Security rate greater then AM Max

Security/PAM OK

Sustainability At least one rate is not sustainable

Alarm Rate Composite alarm rate (INS+UPD) not sustainable

NOS Trigger OK

CWD Clients OK

Overall CPU Unlikely Sustainable

Constraint CPU Es Disk Nes Disk BW

Max Utilization 75% 80% 90% 80%

Max NE Supported 623 800 4482 21836

Constraint AEPS RRPS

Max Utilization 80 5

Max NE Supported 3154 4485

Projected Max Nes 623

Resource

Requirements

Page 24: Slides Cost Based Performance Modelling

24

Model Hierarchy

• Transaction Model

– Cost and constraints per individual transaction w.r.t. a number of options/parameters

– E.g. 300Mhz to process an event

• System Model

– Composite Cost of executing a specific transaction load on a given h/w

– E.g. 35% cpu for 10 events/sec and 4 user queries/sec on N440

• Business Model

– Mapping of System model to Business metrics

– E.g. N440 can support up to 100 NE

Page 25: Slides Cost Based Performance Modelling

25

Using model for scalability and bottleneck analysis

• Mapping between any behavior and capacity requirements

• Mapping the model to different processor architectures

• Can Quantify the impact of a Business request

• Can iterate over multiple “behaviors”

– Extends “What-if” analysis

– Enables operating envelope visualization

– Enables resource bottleneck identification

Page 26: Slides Cost Based Performance Modelling

26

Using the Cost Based Model / Demo

Page 27: Slides Cost Based Performance Modelling

27

Identifying resource allocation – by TRANSACTIONS / Applications

Disk_Security_Logs

_GB, 1

Disk_NE_Loads_GB,

7

Disk CACP_GB, 3

Disk NE B&R_GB, 7

Disk_NE_LOG_GB,

57

Disk_PM_GB, 11

spare, 37

Disk_Alarm_DB_GB,

10

Disk_Alarms_REDO_

GB, 5

CPU Distribution by feature

(500A x 15K cph)

14%

4%

13%

27%

0% 0%2%

0% 1%3%

5%

32%

0% 0%0%

5%

10%

15%

20%

25%

30%

35%

40%

Base C

P

Queuin

g

Bro

adcasts

Colle

ct

Dig

its

Giv

e I

VR

Giv

e R

AN

Mlin

kS

crP

op

Hdx

Intr

insic

s

CalB

yC

all

DB

Blu

e D

B

RT

Dis

pla

y

RT

Data

AP

I

Report

s

RAM - Top 10 Users

AppLd

IPComms

Base

IMF

Logs

OSI

STACK

PP

GES

HIDiags

OTHER

FREE

Page 28: Slides Cost Based Performance Modelling

28

Compute operating envelope

Iterate over multiple behaviours – to compute operating envelope

10000

14000

18000

22000

26000

30000

34000

38000

42000

46000

50000

25000

26800

28600

30400

32200

34000

10

10.8

11.6

12.4

13.2

14

14.8

15.6

16.4

17.2

18

EVENT

RATE

NRECORDS1NUSERS

Operating Envelope

Operating Envelope

0

10

20

30

40

50

60

70

80

0 200 400 600 800 1000 1200

#NE1

#N

E2

MAX

MAX_cpu

MAX_ed

MAX_ned

MAX_bw

MAX_aeps

Page 29: Slides Cost Based Performance Modelling

29

Nice charts – but how accurate are they?

Models are from God…. Data is from the Devil (http://www.perfdynamics.com/)

• Initially WAY more accurate then behavior data

• Within 10% of combined metrics – for an “established” model

• Less accurate as you extrapolate further form measurements

• Model includes guesses as well as measurements

• The value is to establish patterns rather then absolute numbers.

Page 30: Slides Cost Based Performance Modelling

30

Projects where this was applied

• Call Centre Server (WinNT platform, C++)

• Optical Switch (VxWorks, C, Java)

• Network Management System (Solaris, Mixed, 3rd party, Java)

• Management Platform Application (Solaris, Mixed, 3rd party, Java)

• …

Page 31: Slides Cost Based Performance Modelling

31

Addressing Uncertainties - recap

Uncertainty Cost Based Model “Traditional”

Behavior Forecast ANY behavior ERALY

Compute Operating Envelope

Worst Case Over-Engineering

TigerTeam- LATE

Code Treat as “black box”

No access needed

Costs w.r.t. behavior not code

??? KPI ??? BST ???

H/W Forecast h/w impact EARLY

Small number of “pilot” tests

Compute Operating Envelope

Worst Case Over-Engineering

TigerTeam- LATE

Page 32: Slides Cost Based Performance Modelling

32

Cost Reduction

• Significantly reduces the number of tests needed to compute operating envelope.

– Suppose the system has 5 transactions defined, need to compute operating envelope with 10 “steps” for each transaction (e.g. 1 per sec, 2 per sec, ... 10 per sec).

– Using BST type “brute force” testing we will need to run 10 * 10 * 10 * 10 *10 tests (one for each rate combination), in total 100,000 tests

– Using the model approach we would need to run 10+10+10+10+10 tests, in total 50 tests (there will be additional work for calibration, model building, etc but the total costs will be much smaller then running 100K big system tests)

– Each individual test is much simpler then BST and can be automated

– H/w cost reduction – less reliance on BST h/w, using pilot tests can map from one h/w platform to another

Page 33: Slides Cost Based Performance Modelling

33

How does the Cost Model fit in the dev cycle?

Page 34: Slides Cost Based Performance Modelling

34

Performance/CapacityTypical Focus at the wrong places

Planning Product Verification

Development

KPI Validation(PT/SV)

KPI Definition

(PLM)

• Uncertainty of expected customer scenarios at planning stage (at the time of KPI commitment – specifically for platform)

• Issues discovered late – expensive to fix (=tiger teams) or over-engineering

• No early capacity/performance estimates to customers

• No sensitivity analysis – what is the smallest/greatest contributor to resources? Under what conditions?

• Validation involves BST type of tests; expensive; small number of scenarios (S/M/L)

• No results portability: validation results are difficult/impossible to map/generalize to specific customer requirements

?

Tiger Team

Page 35: Slides Cost Based Performance Modelling

35

Performance/Capacity – Activities

Performance “Competency”

With Cost Based Model “Traditional”

Validation Validate Model Validate Requirements

BST (S/M/L) ???

Tracking Transaction Costs ??? KPI ???

Non-Degradation w.r.t. Transaction ??? KPI ??

Characterization / Forecasting / Sensitivity

w.r.t Transaction ??? Worst Case ???

Optimization Proactive, focus on specific Transaction/Behavior

Tiger Team

Perf Info Communication / Portability

Model Based

Transaction Based

??? KPI ???

Page 36: Slides Cost Based Performance Modelling

36

Performance/Capacity – Key approaches

• All activities are focused on “transactions” metrics (these are “atomic” metrics and are much easier to deal with then the “composite” metrics such as KPI, BST, etc)

• All activities are flexible and proactive

• Start performance activities as early as possible and increase accuracy throughout the design cycle

Page 37: Slides Cost Based Performance Modelling

37

Performance/Capacity – Model driven

• Identify key transactions throughout the dev cycle

• Quantify behaviour in terms of transactions

• Automate test/measurements per transaction (not all, but most important)

• Automate monitor/measurement/tracking of transaction costs – as part of sanity process (weekly? Daily? – automated)

• Tight cooperation between testers/designers

• Model is developed in small steps and contains latest measurements and guesses

• Product verification – focus on model verification/calibration– runs “official” test suite (automated) per transaction – Runs combined “BST” (multiple transactions) – to calibrate the model

Page 38: Slides Cost Based Performance Modelling

38

Automated Transaction Cost Tracking

• Approximately 40 performance/Capacity CRs raised prior system verification stage

• Identification of bottlenecks (and feed-back to design)

• Continuous capacity monitoring – load-to-load view

• Other metrics collected regularly

-

5

10

15

20

25

cj

cn ct

db df

dp

ds

dz

ed ef

CPU (%)

TotCPU(vmstat)

JavaCPU

OracleCPU

TotCPU(prstat)

SysCPU(vmstat)

PagingCPU

OtherCPU

MPSTDEV

SY/msec

CS/msec

Delay

-

50

100

150

200

250

300

1400co

1400ct

1400dc

1400di

1400dm

1400dp

1400du

1400dy

1400gf

1400gl

1400go

1400gu

PropD(ms)

QueueD(ms)

PubD(ms)

ProcD(ms)

Page 39: Slides Cost Based Performance Modelling

39

Cost Based Approach – Responsibilities and Roles

Costs

Behavior

Resource Req

Validation Focus – verify capacity as estimated

Design Monitoring Focus – track transaction costs

Forecasting focus –estimate requirements, sensitivity analysis, what if...

Business focus–quantify behavior

Design focus –decompose into transactions

Page 40: Slides Cost Based Performance Modelling

40

Benefits of using the model-driven performance engineering

Page 41: Slides Cost Based Performance Modelling

41

Benefits – technical and others

• Communication across groups – everyone speaks the same language (well defined transactions/costs).

• “De-politization” of performance eng – can’t argue/negotiate – the numbers and trade-offs are clear.

• Better requirements – quantifiable, PLM/Customer can see value in quantifying behaviour

• Documentation reduction – engineering guides are replaced by the model; the perf related documentation can focus on improvements, etc.

• Early problem detection - most performance problems are discovered before the official verification cycle

• Easy resource leak detection – easily traceable to code changes

• Reproducible/automated tests – same tests scripts used by design/PV

• Cost Reduction – less need for BST type of tests, less effort to run PV, reduced “over-engineering”

Page 42: Slides Cost Based Performance Modelling

42

Things not discussed here…

Page 43: Slides Cost Based Performance Modelling

43

• Tools

– Automation (!!!!)

– perf tracing/collection tools, transaction stat tools, transaction load, visualization, data archiving

– native, simple, ascii + excel

• Organization (info flow/responsibilities)

– good question, would depend on size and maturity of the project

– Best if driven by design rather then qa/verification

– Start slowly

• Performance Requirements definition

– trade-offs, customer traceable, never “locked”

• Performance documentation

– Is ENG Guide necessary?

• Using LOADS instead of transactions

– possible if measurable directly

• Linear Regression instead of single TX testing

– possibly for stable systems

Other issues to consider

Page 44: Slides Cost Based Performance Modelling

44

Questions?

Page 45: Slides Cost Based Performance Modelling

45

Appendix: useful links

http://technet.microsoft.com/en-us/commerceserver/bb608757.aspx

– Microsoft’s Transaction Cost Analysis

www.contentional.nl – mBrace – Transaction Aware Performance Modelling

www.spe-ed.com – Software Performance Engineering

www.perfdynamics.com – Performance Dynamics

www.cmg.org – CMG: Computer Measurement Group

Page 46: Slides Cost Based Performance Modelling

46

Appendix: Good Test

Page 47: Slides Cost Based Performance Modelling

47

Transaction cost testing

• How to measure workflow cost?

– For each workflow , run at least 4 test cases, each corresponding to the different rate of workflow execution.

• For example, for RAD1 run 4 test cases for 1, 3, 6 and 10 radius requests per second. The actual rate should result in CPU utilization between 20% and 75% for the duration of the test. If the resulting CPU is outside of these boundaries – modify the rate and rerun the test (the reason is that we want the results to represent sustainable scenarios, short term burst analysis is a separate issue).

– For each test collect and report CPU, memory and latency (as well as failure rate) before, during and after the test (about 5 min before, 5 min for test, 5 min after).

– Preserve all raw data (top/prstat, etc. outputs) for all tests – these may be required for further analysis.

time

Resource (CPU%)

CPU_tst

CPU_ppCPU_bcg

T_R_start T_E_start T_E_end

T_PP_endT_R_end

e.g. 10 RAD1 per second

• Automate the test-case so that it is possible to run it after each sanity to track changes

• Data to report/publish

– Marginal CPU/resource per workflow rate

– I can help with details

Page 48: Slides Cost Based Performance Modelling

48

Metrics to be recorded/collected during a test

time

Resource (CPU%)

CPU_tst

CPU_pp

CPU_bcg

T_R_start T_E_start T_E_end T_PP_end T_R_end

Key metrics to collect during a test

T_R_start Time data recording started

T_E_start Time Event injection started. Assuming events are injected at a constant rate for the entire duration of the test

T_E_end Time Event injection ended

EPS Rate of event injection during the test (between T_E_start and T_E_end). Rate is constant during the test

T_PP_end Time Post-Processing ended

T_R_end Time Recording is ended

CPU_tst CPU% utilization during test

CPU_pp CPU% utilization during post-processing

CPU_bcg Background CPU% utilization

Enough samples must be collected to be able to produce a chart as below for all resources: CPU (total and by process) Memory (total and by process); Heap (for specific JVMs), IO, disk. The chart does not need to be included in the report but it must be available for analysis.

Application should also monitor/record its Latency and Failure rate – this is application specific, but it should be collected/recorded in such a way that it can be correlated with the resource chart. Avg latency and Avg Failure rate during the test is NOT sufficient– it does not show the trends.

Derived Metrics – to be included in performance report

mCPU_tst CPU_tst – CPU_bcg (marginal test cost)

mCPU_pp CPU_pp – CPU_bcg (marginal post-processing cost. If post-processing is not 0 then the EPS rate is not sustainable over long time)

mT_tst T_E_end – T_E_start (duration of the test/injection)

mT_pp T_PP_end – T_E_end (duration of post-processing –ideally this should be 0)

Ideally the resource utilization during the test is “flat” and returns to pre-test levels after the test is completed. To verify this compare the measurements before/after tests (points 1 and 5 on the chart) and at the beginning and at the end of the test (points 2 and 3 on the chart)

dCPU_bcg CPU_5 – CPU_1 (if >0 then resource is not fully released after test)

dCPU_tst CPU_3 – CPU_2 (if >0 then there may be a resource “leak” during the test)

1

2 3

4

5

In this chart CPU is used as an example, but the same methodology applies to all resources – memory, heap, disk io, CPU, etc.

TOOLS

Any tool can be used to collect the metrics – as long as it can collect multiple periodic samples. As a rule of thumb collect about 100 samples for the pre-test idle, 200 samples per test and another 100 after test. If you collect a sample once per 10 sec the overall test duration will be a bit more then 1 hrs. The following are examples:

prstat –n700 for individual process CPU and memory (-n700 to collect up to 700 processes regardless of their cpu% - to make sure you get a complete memory picture)

TOP / ps These can be used instead of prstat

vmstat For global memory/cpu/kernel CPU

Iostat If you suspect io issues

jstat -gc For individual JVM heap/GC metrics; look at OC and OU parameters.

CPU%, Memory, latency, heap

Page 49: Slides Cost Based Performance Modelling

49

Perfect Case - SUSTAINABLE

No Post processing mT_pp =0

Resource utilization is flat during test

dCPU_tst = dMEM_tst = 0

All resources recover completely

dCPU_bcg = dMEM_bcg = 0

CPU% per 1 EPS mCPU_tst/EPS

Memory specific Process /system memory and heap may grow – if logically the events create objects that should stay in memory (e.g. you are doing discovery and are adding new data structures).

Memory/heap may also grow initially after T_E_start but should stabilize before T_E_end –this represents build-up of working sets. In this case memory may not be released fully upon completion of the test. In that case run the test again – if the memory keeps on increasing this may indicate a leak.

The overall CPU used in this case is the area under the utilization curve – the blue square

Make sure that latency/success rate during the test is acceptable. It is possible that the resource profile may look perfect but the events may be rejected due to the lack of resources.time

Resource (CPU%)

CPU_tst

CPU_ppCPU_bcg

T_R_start T_E_start T_E_end

T_PP_endT_R_end

e.g. 10 RAD1 per second

Page 50: Slides Cost Based Performance Modelling

50

Post-Processing – NOT SUSTAINABLE / BURST test

Post processing detected

mT_pp > 0 means that the system was not able to process events at the rate they arrive, this can be due to CPU utilization, or due to threading or other resource contention. In this case you may see that

– the latency is continuously increasing between T_E_Start and T_E_end

– Memory (or old heap partition of a JVM) is continuously increasing between T_E_start and T_E_end and then starts decreasing during post-processing. This is because the events that cannot be processed in time must be stored somewhere. (see green line on the chart).

– The failure rate may increase towards the end of the test

Load unsustainable

This load is unsustainable over the long period of time - can be hours or days – but the system/process will either run out of memory or will be forced to drop outstanding events

May be acceptable for short burst/peaks

Although this rate is unsustainable for long time it may be acceptable for short bursts / peaks. The duration of post-processing and the rate of growth of the bounding factor (memory/heap/threads) will help determine the max duration of the burst.

CPU% per 1 EPS The overall CPU used in this case is the area under the utilization curve – the blue square and the pink square. It is possible to predict how much CPU would be used by 1 EPS if the bottleneck is removed (e.g. if threading is a bottleneck and we add more threads):

(mCPU_tst + mCPU_pp*mT_pp/mT_tst) / EPS

In case of post processing it is important to determine what is the boundary condition:

f CPU utilization during test is90% or more then it is likely that we are bounded by CPU

If memory/heap of component A grows and component A has to pass events to component B, then B may be a bottleneck

If component B uses 1 full CPU (25%) then it is likely single-threaded and threading is the issue

If component B does disk io, or other type of access that requires waiting then this can be the bottleneck

time

Resource (CPU%)

CPU_tst

CPU_pp

CPU_bcg

T_R_start T_E_start T_E_end T_PP_end T_R_end

Memory, latency,

heap

CPU%