Larson - Resource Mapping

38
Resource Mapping A Wait Time Based Methodology for Database Performance Analysis Prepared for Hotsos Symposium, 2005 Presented by Matt Larson Chief Technology Officer Confio Software

description

A Wait Time Based Methodology for Database Performance Analysis Presented by Matt Larson Chief Technology Officer Confio Software Prepared for Hotsos Symposium, 2005 Presentation Agenda 2  Former DBA consultant specializing in Oracle  Co-author of three Oracle books (Oracle software company  Co-author of two other database related performance tuning books 2 nd Edition, Oracle8 Server Unleashed) 3 Problems with Conventional Tuning Tools: Like the Drunk Under the Streetlight 4

Transcript of Larson - Resource Mapping

Page 1: Larson - Resource Mapping

Resource MappingA Wait Time Based Methodology for

Database Performance Analysis

Prepared for Hotsos Symposium, 2005Presented by Matt Larson

Chief Technology Officer Confio Software

Page 2: Larson - Resource Mapping

2

Presentation Agenda

Introduction Conventional Tuning vs. Wait-based

Tuning Foundation: Resource Mapping

Methodology 5 Key Steps of Applying RMM RMM Advantages Conclusion

Page 3: Larson - Resource Mapping

3

Who am I?

Former DBA consultant specializing in Oracle performance tuning

Co-author of three Oracle books (Oracle Development Unleashed, Oracle Unleashed 2nd Edition, Oracle8 Server Unleashed)

Co-author of two other database related books

CTO and founder of Oracle performance software company

Page 4: Larson - Resource Mapping

4

Problems with Conventional Tuning Tools: Like the Drunk Under the Streetlight

Page 5: Larson - Resource Mapping

5

Conventional Tuning

Art, not a science Ratio-based (cache hit ratios, etc.) Sometimes fruitless It’s “tuned” (I guess?) Different tuning/investigation process for

each DBA/DBA Team/Company

Page 6: Larson - Resource Mapping

6

Problems with Conventional Tuning Tools

Optimize systems, not business results Conventional tools:

• V$ Views: limited visibility & granularity• Statspack: averages across entire database• Explain Plan: deemphasizes how non-object

resources affect performance Incorrect Data hides real results

• System-wide averages• Event counters• Incomplete visibility

Page 7: Larson - Resource Mapping

7

What Problems are you Trying to Solve?

• I spend the whole week monitoring and optimizing Oracle configurations, but I have no demonstrable results to show for it - why?

• Will more hardware make my application run faster? By how much?

• Will the new application run efficiently on the production server?

• Why does one application keep impacting my SLA compliance?

• If I could make one (or 2, 3, or 4) changes to my database to have the biggest impact, what would they be?

Page 8: Larson - Resource Mapping

8

You know you are working on the wrong thing when… After spending an agonizing week tuning

Oracle buffers to minimize I/O operations, management typically rewards you with:

• A. An all expense paid vacation• B. A free lunch• C. A stale donut• D. Reward? Nobody even noticed!

Page 9: Larson - Resource Mapping

9

You know you have a visibility problem when… You measure database performance based

on:

• A. Increasing trends in user response time• B. Increasing system down time• C. Increasing help desk calls• D. Increasing decibel levels from irate users

Page 10: Larson - Resource Mapping

10

Your role is sub-essential to the business of your organization when… Your role in the rollout of a new

customer facing application results in:

• A. Keys to drive the CEO’s Porsche• B. Keys to use the executive restroom• C. A mop to use in the executive restroom• D. Your office has been moved to the

restroom

Page 11: Larson - Resource Mapping

11

You know you are accustomed to measuring the wrong thing when… You measure the commute time to work

based on:

• A. The time it takes to get there• B. Counting the times your wheels rotate• C. Monitoring your tachometer• D. The number of speeding tickets

Page 12: Larson - Resource Mapping

12

Wait-based Performance Tuning

Emerging best-practice for database tuning

Proponents include leading consultants, trainers and authors

Oracle is starting to build wait-based tuning tools into the database particularly in 10g

Tune by determining where processing time is spent

Page 13: Larson - Resource Mapping

13

Oracle 10g - Moving towards wait-based

Adding wait-based columns to existing views New wait-based views

Example: v$session_wait_history

• Provides the last 10 wait events for a session• Session ID, Username, Event, Wait_Time, etc.• Used to provide wait_time for only a few events

Page 14: Larson - Resource Mapping

14

DBA Success Stories using RMM

DBA solves a “Cold Case”. Problem unresolved for 1 year with traditional tools; Solution identified in 10 minutes during hands-on training

DBA ends “Crit Sit” 2 week situation ends quickly after identification of Library Cache pin wait and load locks. Metalink identifies Oracle bug, patch successfully applied

DBA saves $700K. 90% CPU capacity initiates expansion from 12 to 24 CPU server. DBA identifies parallel queries across 16 parallel threads as source of bottleneck. CPU eliminated as constraint, no new server required.

Page 15: Larson - Resource Mapping

15

RMM: Confio’s Underlying Methodology

Resource Mapping Methodology:

Three Key Principles of RMM1. SQL View: All statistics at SQL statement level2. Time View: Measure Time, not number of times a resource is utilized3. Full View: Separately measure every resource to isolate source of

problems

Resource Mapping

MethodologyDBFlashWait-Event

Analysis

General approach-

best practice

Rigorous, complete

requirements

Packaged product

implementation

Page 16: Larson - Resource Mapping

16

Blind Spot Blind SpotCPU 74%

Reads 1789327

Counters

CPU 38%

Reads 4955

Counters

Confio’s Resource Mapping Methodology• The principles of RMM can be illustrated by using the analogy that

data processing is like an assembly line. Data goes in one end, is subject to a series of changes, and comes out the other end as a finished product

• The assembly line (or SQL Statement) must be observed at the lowest level where a unit of work is being performed (SQL View Principle)

• Measurements are made with regard to time instead of counting how often an event occurs (Time View Principle)

• All resources system-wide must be monitored to get a full view of potential bottlenecks i.e. no blind spots (Full View Principle)

145 secondsTime

8726 seconds

TimeFollow a unit of work

through every operation

Page 17: Larson - Resource Mapping

17

Track SQL Time, Not System Counters

SQL 1

SQL 2

SQL 3

Resources I/O Network RedoLocks

• Watching Counters leads to wrong conclusions: Time is more relevant

• Total System Counters hide information: Need breakdown to individual SQLs

5 R

25 R

50 Reads

Total System Counter

80K Reads

30 Minutes

15M

5M

6 M

10 M

100 Minutes

35 A

50 A

50 A

125 Attempts

4 M

200 Minutes

5M

4 M

200 Minutes

5M

5K Packets 216K Writes

Page 18: Larson - Resource Mapping

18

RMM-compliant Performance Tools Oracle Tracing

• RMM compliant when wait events are traced• Shows SQL level statistics (SQLView), all events

(FullView) and events by time (TimeView)• Text-based, short-term technical reporting• Primarily used for reactive tuning

Confio DBFlash for Oracle• RMM compliant • 24/7 proactive monitoring• Graphical, long-term trend reporting• RMM-based Alerting

Page 19: Larson - Resource Mapping

19

Applying RMM for Business Results

Five Step Process focusing on what matters

1. Identify 2. Allocate 3. Quantify 4. Prioritize 5. Assign

Page 20: Larson - Resource Mapping

20

Step 1: Identify

Identify SQL Statements having largest impact • (SQL View and Time View

principles) Longest wait times = most

significant “pain points” for customers

Conversely, low cache hit ratios or high latch usage may not impose high wait times for users (so why fix them?)

SQL statements prioritized by Total Wait Time

Page 21: Larson - Resource Mapping

21

Step 2: Allocate

Allocate impact to real customers (internal or external)

Allocate wait time to Program, Session, Machine• SQL View principle makes

this connection Understanding database

customer and application

Programs Prioritized by Total Wait Time

Page 22: Larson - Resource Mapping

22

Step 3: Quantify How much is save in time/money if fixed? Enabled by Full View and Time View principles Soft dollar savings

• Data entry clerks• DBA time spent in problem resolution

Hard dollar savings• Reduce hardware upgrades• Meet SLA’s avoiding penality• Ensure business isn’t lost due to poor performing

or unavailable system

Quantifiable benefit ofTuning a specific statement

Page 23: Larson - Resource Mapping

23

Step 4: Prioritize

If last step properly executed, this step is fairly straight forward

Allow’s DBA to cut through the clutter of potential new projects, investigations, and trials.

Better justification for priorities. (e.g. We aren’t working on your problem since this other has a higher demonstrable business impact)

Page 24: Larson - Resource Mapping

24

Step 5: Assign

Assign the right people to the problem• Log_buffer waits• Network issues• Same query 10,000/hour

Enabled by Full View principle Avoid finger-pointing by accurately

assigning quickly

Page 25: Larson - Resource Mapping

25

Resource Mapping Methodology

RMM

Wait Based Tuning Network, Storage, Application, Web, etc.

Page 26: Larson - Resource Mapping

26

Web Server

Custom Biz Logic

Network

Database Server

Storage Box

Software Layers

Silo Monitoring

Each team uses their own tool to partially monitor their non-Oracle layers. No view across layers. Management has no clear view.

Web Team

Custom App Team

Network Team

Database/OS Teams

Storage/OS Teams

IT Management

Business Management

Sitescope

Often No Commercial Tools

HP Openview

Wait-based tuning

EMC Control Center

LIMITED VIEW

LIMITED VIEW

Page 27: Larson - Resource Mapping

27

The Solution - Integrated Vision

Web Server

Custom Biz Logic

Network

Database Server

Storage Box

Web Team

Custom App Team

Network Team

Database/OS Teams

Storage/OS Teams

All teams see a complete picture of all layers and dependencies. Enables more efficient “Umbrella” solution.

RMM across the stack

IT Management

Business Management

Page 28: Larson - Resource Mapping

28

RMM Achieved Business BenefitsRMM Does: Business Benefit:35% reduction in database capacity requirement

Reduce capital investment Avoid unnecessary additionsRecovers un-used capacity

Standardizes “expert” analysis ability across entire DBA team

Reduce training & consulting costs

Quantifies performance impact

Focus tuning efforts on biggest business impacts

Identifies problem Root Cause and resolution

Assign human resources and responsibility

Anticipates + resolves performance bottlenecks

Maintain SLA and end user performance

Page 29: Larson - Resource Mapping

29

Example 1: Problem Observed

Critical situation: Secure Service Center application performance unsatisfactory• Response time between 2400 and 9000

seconds• Very high network traffic (3x—4x normal),

indicating time-outs and user refreshes• “CritSit” declared: major effort to resolve

problem

Page 30: Larson - Resource Mapping

30

Observations using Resource Mapping Methods

Lib cache pin wait

Lib cache load lock

Notice scale: > 8000 secs

1: Identify accumulated Waits 2: Identify specific resources used

Page 31: Larson - Resource Mapping

31

Results

Library cache pin nearly unobservable

Library cache load lock no longer observable

Notice scale: < 1400 secs max

Page 32: Larson - Resource Mapping

32

Results

Response time improvement from 8000 seconds (worst case) to 900 seconds

Variance improvement:• Before: response time 2400 - 8000 sec• After: response time 800 - 900 sec

Page 33: Larson - Resource Mapping

33

Example 2: Performance Drain – Identify the Source

Slow response reported DBA and database focus

of delays Database problem?

No – SQL*Net Message identified as source of delay

2nd highest wait event

Page 34: Larson - Resource Mapping

34

RMM Drill Down identifies source of problem

Single application generates all SQL*Net Messages

App on same server as Oracle!

Answer: Misconfiguration – TCP/IP

used within server Change to IPC, eliminate

NIC traffic and 30% of wait time

Solution requires knowing: Which SQL, What Wait Time, Which Resource

Page 35: Larson - Resource Mapping

35

Example 3: Scattered Reads Situation: LINS06 database - Hourly profile identifies high

wait anomaly 3-10x higher than other periods – requires investigation

wait time42,000 seconds

10:00-11:00

Page 36: Larson - Resource Mapping

36

Drill Down to Key RMM Parameters

Db file scattered

reads

Db file scattered reads

Notice scale: > 6000 secs

Page 37: Larson - Resource Mapping

37

Conclusion

Look for what has an impact Resource Mapping is more that Wait

Time – Analysis must include: • SQL level granularity• Full Resource granularity

Isolating the SQL and Resource allows you to find and fix the Root Cause

DBAs can have an impact and be heroes!

Page 38: Larson - Resource Mapping

38

Thank you for coming

Matt Larson

Contact Information• [email protected]• 303-938-8282 ext. 110• Company website

www.confio.com