Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

31
Using Control Charts for Detecting and Understanding Performance Regressions in Large Software Thanh Nguyen Supervisor: Ahmed E. Hassan Queen’s University, Kingston, Canada 1

Transcript of Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

Page 1: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

1

Using Control Charts for Detecting and Understanding Performance

Regressions in Large Software

Thanh NguyenSupervisor: Ahmed E. Hassan

Queen’s University, Kingston, Canada

Page 2: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

2

Motivation

Page 3: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

3

Performance is very important aspect in modern software engineering

LocalSingle user

Asynchronous interaction

Before Now

NetworkMassive multi-usersSynchronous interaction

Facebook50%

Canadian

190

Page 4: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

4

Performance testing is very important to ensure software performance

Local

Single user

Asynchronous interaction

Network

Massive multi-users

Synchronous interaction

Higher latencyRequire fast servers

Require central storage

Need states synchronization

Need guarantee response timeNeed redundancy

Page 5: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

5

I focus on one kind of performance testing: regression load test

Version 1 Version 1.1

Baseline Target

Page 6: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

6

How to detect performance regression?

Applying load

Version 1.1

Version 1 CPU %, Memory usage

CPU %, Memory usage

Detect regression

Page 7: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

Challenge in Performance Regression Testing

7

Layer 1 – Agent 1

Layer 1 – Agent 2

Layer 2 – Agent 1

Layer 2 – Agent 2

Layer 2 – Agent 3

Layer 2 – Agent 4

Layer 3 – Agent 1

Layer 4 – Agent 1

56 counters x 8 agents = 448 counters

56 counters x 2 agents = 112 counters

Layer 1 Layer 2

Lots of data

Not enough automation Limited research

Page 8: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

8

Approach

Page 9: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

9

Data mining -> R&R

Reduce Relate

Page 10: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

10

Performance regression detection is similar to monitoring and forecasting production servers load.

Page 11: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

11

Disk Subsystem Capacity Management [Trubin and Merritt 2003]

Page 12: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

12

Proposed approach to use control charts to find performance regression

BaselinePerformance counters

TargetPerformance counters

Determine the LCL, CL, UCL

0 5 10 15 20 25710

720

730

740

750

760

770

780

Performance counter

Page 13: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

13

Using control charts to verify load test results

BaselinePerformance counters

TargetPerformance counters

Determine the LCL, CL, UCL

0 5 10 15 20 25710

720

730

740

750

760

770

780

Performance counter

Violation ratio

Reduce

Page 14: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

14

BaselinePerformance counters

TargetPerformance counters

TargetPerformance counters

0 5 10 15 20 25710720730740750760770780

Performance counter

BaselinePerformance counters

0 5 10 15 20 25680

700

720

740

760

780

Performance counter

Low violation ratio

High violation ratio

We can use violation ratio to detect regression Relate

Page 15: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

15

Progress

Page 16: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

16

My three research questions

Is there a performance regression?Fairly confident that it works [ICPE 12]

Where does performance regression occur? Need more validation [APSEC 11]

What causes the regression? In progress

Page 17: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

17

Experiment set up

BaselinePerformance counters

TargetPerformance counters

TargetPerformance counters

Number of out-of-control counter is smallAverage violation ratio should be low

Page 18: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

18

Experiment set up

BaselinePerformance counters

TargetPerformance counters

TargetPerformance counters

Number of out-of-control counter is largeAverage violation ratio should be high

Page 19: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

19

Bad Query

Limit

Extra p

rint

Extra c

onnection

Missing k

ey index

Missing t

ext index

02468

1012141618

NormalProblem

Series3

Number of out-of-control counters

Page 20: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

20

Bad Query

Limit

Extra p

rint

Extra c

onnection

Missing k

ey index

Missing t

ext index

0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%90.00%

100.00%

NormalProblem

Series3

Average violation ratio

Page 21: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

21

Where does performance regression occur?

Backend

Other backend

Front End

Page 22: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

22

Backend 50% stress

B B B B B O0%

10%20%30%40%50%60%70%80%90%

100%

Violation ratio

Page 23: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

23

Frontend 50% stress

F F F F F F O0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Violation ratio

Page 24: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

24

Obstacles

Page 25: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

25

Obstacles #1: Inputs are unstable

1 2 3 4 5 60

5

10

15

20

25

30

35

40

45

Version 1.0Version 1.1

Time

CPU

%

Is there a performance regression?

Page 26: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

26

It is very difficult to maintain stable input across test runs

Applying load

Version 1.1

Version 1 CPU %, Memory usage

CPU %, Memory usage

Detect regression

Randomization CacheWarm up

Background tasks

Page 27: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

27

Solution #1: Scale the counter according to the input

• Step 1: Determine α and β

• Step 2:CPU% Request/s

May not work?

Page 28: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

28

Solution #1: Example of the effectiveness of scaling

Page 29: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

29

Obstacles #2: Multiple inputs

10 20 30 40 50 60 70 80 90 1000

5

10

15

20

25

30

35

40

45Density plot of two test runs

Version 1.0Version 1.1

CPU Usage

Dens

ity %

IF … THEN…ELSE…

Page 30: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

30

Solution #2: Isolating the counters

10 20 30 40 50 60 70 80 90 1000

5

10

15

20

25

30

35

40

45Density plot of two test runs

Version 1.0Version 1.1

CPU Usage

Dens

ity % Local minima

Page 31: Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

31