When Average is Not Average: Large Response Time Fluctuations in n-Tier Applications

38
When Average is Not Average: Large Response Time Fluctuations in n-Tier Applications Qingyang Wang , Yasuhiko Kanemasa, Calton Pu, Motoyuki Kawaba

description

When Average is Not Average: Large Response Time Fluctuations in n-Tier Applications. Qingyang Wang , Yasuhiko Kanemasa , Calton Pu , Motoyuki Kawaba. Outline. Background & Motivation Analysis of the Large Response Time Fluctuations Transient local events - PowerPoint PPT Presentation

Transcript of When Average is Not Average: Large Response Time Fluctuations in n-Tier Applications

Page 1: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

When Average is Not Average: Large Response Time

Fluctuations in n-Tier Applications

Qingyang Wang, Yasuhiko Kanemasa,Calton Pu, Motoyuki Kawaba

Page 2: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

2

Background & Motivation Analysis of the Large Response Time

Fluctuations Transient local events Compounding of local response time increase Mix-transaction scheduling

Solution Transaction level scheduling Limiting concurrency in the bottleneck tier

Conclusion

Sept. 18, 2012

Outline

Page 3: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

3

Response time is an important performance factor for Quality of Service (e.g., SLA for web-facing e-commerce applications).

Experiments at Amazon show that every 100ms increase in the page load decreases sales by 1%.

Average response time may not be representative

We will show concrete instances of this phenomenon

Sept. 18, 2012The 9th International Conference on Autonomic Computing (ICAC 2012)

Response Time is Important

Page 4: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

4

Response time and throughput of ten minutes benchmark on a 3-tier application with increasing workloads.

What does the timeline graph look like?

Sept. 18, 2012

Motivational Example

Zoom in

Page 5: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

5 Sept. 18, 2012

Motivational Example

Average at every 100ms time

interval

Average at every 10s time

interval

Page 6: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

6

Statistic analysis of response time distribution

Sept. 18, 2012

Motivational Example

Bi-model Response time Distribution

Average over a long time period hides wide range response time variations.

Page 7: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

7

Reveal the causes of large response time fluctuations in n-tier applications under high hardware utilizations.

Transient local events Compounding of local response time

increase Mix-transaction scheduling

Show heuristics to mitigate large response time fluctuations.

Sept. 18, 2012The 9th International Conference on Autonomic Computing (ICAC 2012)

Goal of This Research

Aim for more precise usage of response time as an index of application performance

Page 8: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

8

Background & Motivation Analysis of the Large Response Time

Fluctuations Transient local events Compounding of local response time increase Mix-transaction scheduling

Solution Transaction level scheduling Limiting concurrency in the bottleneck tier

Conclusion

Sept. 18, 2012

Outline

Page 9: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

9 Sept. 18, 2012

RUBBoS benchmark Bulletin board

system like Slashdot (www.slashdot.org)

Typical 3-tier or 4-tier architecture

Two types of workload

Browsing only (CPU intensive)

Read/Write mix 24 web interactions

Experimental Setup (1):N-tier Application

Page 10: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

10 Sept. 18, 2012

Experimental Setup (2):Hardware Configurations

Commodity servers with different levels of processing power

Page 11: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

11 Sept. 18, 2012

Experimental Setup (3):Software Configurations

Function Software

Web server Apache 2.0.54

Application server Apache Tomcat 5.5.17

DB clustering middleware

C-JDBC 2.0.2

Database server MySQL 5.0.51a

Java Sun jdk1.6.0_23

Operating system Redhat FC4

System Monitor Sysstat 10.0.0.02, Collectl 3.5.1

Transaction monitor Fujitsu SysViz

Page 12: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

12 Sept. 18, 2012

Experimental Setup (4):Sample Topology

Sample topology (1/2/1/2) Notation

Page 13: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

13

Background & Motivation Analysis of the Large Response Time

Fluctuations Transient local events Compounding of local response time increase Mix-transaction scheduling

Solution Transaction level scheduling Limiting concurrency in the bottleneck tier

Conclusion

Sept. 18, 2012

Outline

Page 14: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

14

Transient local events are pervasive in n-tier applications.

Sept. 18, 2012The 9th International Conference on Autonomic Computing (ICAC 2012)

Transient Local Events

Last levelcache misses

Java VMgarbagecollection

Page 15: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

15 Sept. 18, 2012

Negative Impact of Transient Local Events

High overhead caused by transient local events under high concurrency

1. Response time fluctuates slightly in a tier under high workload.

2. Concurrency increases as response time increases in the tier.

3. Overhead caused by transient local events increases as concurrency increases.

4. The fluctuation of response time is amplified due to the overhead.

Last level cache misses

JVM GC

Page 16: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

Sept. 18, 2012

Non-Linear CPU Overhead Caused by Last Level Cache Misses

“Ideal” CPU utilization

Non-linear increase of MySQL CPU utilization.

Non-linear increase of MySQL CPU Cache Miss

16

Page 17: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

17

Negative impact of JVM GC Consume CPU resources; Increase the waiting time of pending requests.

Sept. 18, 2012

Non-Linear Increase of JVM GC as Workload Increases

CJDBC JVM GC time in 3 minutes

Response time and JVM GC in WL 5500

Page 18: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

18

Background & Motivation Analysis of the Large Response Time

Fluctuations Transient local events Compounding of local response time increase Mix-transaction scheduling

Solution Transaction level scheduling Limiting concurrency in the bottleneck tier

Conclusion

Sept. 18, 2012

Outline

Page 19: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

19 Sept. 18, 2012The 9th International Conference on Autonomic Computing (ICAC 2012)

Compounding of Local Response Time Increase

2. Compounding of local response time increase

Page 20: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

20 Sept. 18, 2012

Bottom-Up Response Time Fluctuation Amplification

Response time in each tier (workload 5400)

Small fluctuations

Large fluctuations

# of concurrent requests in each tier (workload 5400)

Page 21: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

21

Background & Motivation Analysis of the Large Response Time

Fluctuations Transient local events Compounding of local response time increase Mix-transaction scheduling

Solution Transaction level scheduling Limiting concurrency in the bottleneck tier

Conclusion

Sept. 18, 2012

Outline

Page 22: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

22 Sept. 18, 2012

Mix-Transaction Scheduling

Light transactionHeavy transaction

Page 23: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

23 Sept. 18, 2012

Limitations of Inner-Tier Job Scheduling in n-Tier Applications

Heavy transactio

n

Light transactio

n

Delay of light transaction processing due to interference of heavy transactions.

Tomcat

MySQL

AJP call AJP replyRTT

Tomcat

MySQL

AJP call AJP replyRTT

Page 24: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

24

Tomcat

MySQL

AJP call AJP replyRTT

Sept. 18, 2012

Limitations of Inner-Tier Job Scheduling in n-Tier Applications

Heavy transactio

n

Light transactio

n

Delay of light transaction processing due to interference of heavy transactions.

Tomcat

MySQL

AJP call RTT AJP reply

Page 25: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

25 Sept. 18, 2012

Limitations of Inner-Tier Job Scheduling in n-Tier Applications

Heavy transactio

n

Light transactio

n

Delay of light transaction processing due to interference of heavy transactions.

Tomcat

MySQL

AJP call AJP replyRTT

Tomcat

MySQL

AJP call AJP replyRTT

Increase

Increase

Page 26: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

26 Sept. 18, 2012

Limitations of Inner-Tier Job Scheduling in n-Tier Applications

Mix-transaction

response time

Lightest transaction response

time

Delay of light transaction processing due to interference of heavy transactions.

Page 27: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

27

Background & Motivation Analysis of the Large Response Time

Fluctuations Transient local events Compounding of local response time increase Mix-transaction scheduling

Solution Transaction level scheduling Limiting concurrency in the bottleneck tier

Conclusion

Sept. 18, 2012

Outline

Page 28: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

28 Sept. 18, 2012

Heuristic I: Transaction Level Scheduling

Heuristic (i): We need to grant higher priority to light transactions; schedule transactions in an upper tier which can distinguish light from heavy.

Tomcat

MySQL

AJP call AJP replyRTT

Heavy transactio

n MySQL

Tomcat

AJP call RTT AJP reply

Light transactio

n

Page 29: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

29 Sept. 18, 2012

Heuristic I: Transaction Level Scheduling

Heuristic (i): We need to grant higher priority to light transactions; schedule transactions in an upper tier which can distinguish light from heavy.

Tomcat

MySQL

AJP call AJP replyRTT

Heavy transactio

n

Light transactio

n

Tomcat

MySQL

AJP call RTT AJP reply

Page 30: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

30 Sept. 18, 2012

Heuristic I: Transaction Level Scheduling

Cross-tier-priority based scheduling

Heuristic (i): We need to grant higher priority to light transactions; schedule transactions in an upper tier which can distinguish light from heavy.

Page 31: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

31 Sept. 18, 2012

Heuristic I: Transaction Level Scheduling

1/2/1 configuration, workload 5800

DBconn2 CTP Scheduling Case

Page 32: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

32

Background & Motivation Analysis of the Large Response Time

Fluctuations Transient local events Compounding of local response time increase Mix-transaction scheduling

Solution Transaction level scheduling Limiting concurrency in the bottleneck tier

Conclusion

Sept. 18, 2012

Outline

Page 33: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

33 Sept. 18, 2012

Heuristic II: Limiting Concurrency in Bottleneck

Tier

Java VMgarbage collections

Last levelcache misses

Heuristic (ii): We need to restrict the number of concurrent requests to avoid overhead caused by high concurrency in the bottleneck tier.

Page 34: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

34

Heuristic (ii): We need to restrict the number of concurrent requests to avoid overhead caused by high concurrency in the bottleneck tier.

Sept. 18, 2012

Heuristic II: Limiting Concurrency in Bottleneck Tier

1/2/1, MySQL L2 cache miss 1/2/1/2, CJDBC JVM GC

Page 35: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

35 Sept. 18, 2012

Heuristic II: Limiting Concurrency in Bottleneck Tier

1/2/1/2 configuration, workload 5500; CJDBC is the bottleneck.

DBconn24 case

DBconn2 case

Limiting concurrency in bottleneck tiers can mitigate the large fluctuations of end-to-end response time.

1/2/1/2, CJDBC

Page 36: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

36

Background & Motivation Analysis of the Large Response Time

Fluctuations Transient local events Compounding of local response time increase Mix-transaction scheduling

Solution Transaction level scheduling Limiting concurrency in the bottleneck tier

Conclusion

Sept. 18, 2012

Outline

Page 37: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

37

Under high resource utilization: Average response time may not be representative

to system performance. Beyond bursty workload, many system

environmental conditions cause large response time fluctuation.

To reduce wide range response time variations:

Transaction level scheduling is useful. Concurrency settings of an n-tier application

needs to be optimized. Ongoing work: More analysis of system

environmental conditions

Sept. 18, 2012

Conclusion

Page 38: When Average is Not Average:  Large Response Time Fluctuations  in n-Tier Applications

The 9th International Conference on Autonomic Computing (ICAC 2012)

38

Thank You. Any Questions?

Qingyang [email protected]

Sept. 18, 2012