OOW13 JB KP ASH Deep Dive

65
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1

description

Joint session with JB from Oracle at OOW13/Oracle Open World 2013

Transcript of OOW13 JB KP ASH Deep Dive

Page 1: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1

Page 2: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template 2

ASH Deep Dive: Advanced Performance Analysis Tips John Beresniewicz, Oracle America Kellyn Pot’vin, Enkitec

Page 3: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 3

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be

incorporated into any contract. It is not a commitment to deliver any material, code, or functionality,

and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality

described for Oracle’s products remains at the sole discretion of Oracle.

Page 4: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 4

Program Agenda

 What is ASH?

 How does ASH work?

 How do we use ASH data?

 Enterprise Manager: ASH Analytics

 ASH in Action: Kellyn Pot’vin

Page 5: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 5

What is ASH?

Page 6: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 6

What is ASH?

 Time-based sampling of foreground session state –  Highly multi-dimensional view of database activity and therefore DB Time

 Observations of specific values of the (DB Time/time) function –  This function is called: Average Active Sessions

An instrumentation mechanism that actualizes an important concept

Page 7: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 7

Important Properties of ASH

 Samples represent “snapshots” of session activity at “same time” –  Not really true since using latchless mechanism

 Sampling is time independent of session activity –  Important since otherwise sessions may be over or under-sampled

Page 8: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 8

Active Session Sampling Time-based captures of state information for active sessions

Sample_t1

Session 1 Session 2 Session 3

Sample_t2 Sample_t3

Session Time State Wait Class SQL_ID Object

t1 1 ON CPU null 53qkkf6yzc2x0 null

t1 2 WAITING User I/O 0naxkcasaz162 EMP

t1 3 WAITING User I/O cs4qrt8kr3uhx EMP

t2 3 WAITING Application 4uh6zm2wg03mx DEPT

Page 9: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 9

ASH is Highly Multi-dimensional Most of these represent useful investigative paths in some context

desc v$active_session_history

Name Null Type ------------------------------ -------- ---------------- SAMPLE_ID NUMBER SAMPLE_TIME TIMESTAMP(3) IS_AWR_SAMPLE VARCHAR2(1) SESSION_ID NUMBER SESSION_SERIAL# NUMBER SESSION_TYPE VARCHAR2(10) FLAGS NUMBER USER_ID NUMBER . . . 93 rows selected

Page 10: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 10

SQL Dimensions SQL_ID VARCHAR2(13) IS_SQLID_CURRENT VARCHAR2(1) SQL_CHILD_NUMBER NUMBER SQL_OPCODE NUMBER SQL_OPNAME VARCHAR2(64) FORCE_MATCHING_SIGNATURE NUMBER TOP_LEVEL_SQL_ID VARCHAR2(13) TOP_LEVEL_SQL_OPCODE NUMBER SQL_PLAN_HASH_VALUE NUMBER SQL_PLAN_LINE_ID NUMBER SQL_PLAN_OPERATION VARCHAR2(30) SQL_PLAN_OPTIONS VARCHAR2(30) SQL_EXEC_ID NUMBER SQL_EXEC_START DATE PLSQL_ENTRY_OBJECT_ID NUMBER PLSQL_ENTRY_SUBPROGRAM_ID NUMBER PLSQL_OBJECT_ID NUMBER PLSQL_SUBPROGRAM_ID NUMBER QC_INSTANCE_ID NUMBER QC_SESSION_ID NUMBER QC_SESSION_SERIAL# NUMBER

Page 11: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 11

Wait Event Dimensions

EVENT VARCHAR2(64) EVENT_ID NUMBER EVENT# NUMBER SEQ# NUMBER P1TEXT VARCHAR2(64) P1 NUMBER P2TEXT VARCHAR2(64) P2 NUMBER P3TEXT VARCHAR2(64) P3 NUMBER WAIT_CLASS VARCHAR2(64) WAIT_CLASS_ID NUMBER WAIT_TIME NUMBER SESSION_STATE VARCHAR2(7) TIME_WAITED NUMBER

Page 12: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 12

Application Dimensions Instrumented applications can benefit greatly SERVICE_HASH NUMBER PROGRAM VARCHAR2(48) MODULE VARCHAR2(48) ACTION VARCHAR2(32) CLIENT_ID VARCHAR2(64) MACHINE VARCHAR2(64) PORT NUMBER ECID VARCHAR2(64) CONSUMER_GROUP_ID NUMBER TOP_LEVEL_CALL# NUMBER TOP_LEVEL_CALL_NAME VARCHAR2(64) CONSUMER_GROUP_ID NUMBER XID RAW(8) REMOTE_INSTANCE# NUMBER TIME_MODEL NUMBER

Page 13: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 13

How does ASH work?

Page 14: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 14

ASH Key Architecture Concepts

  In-memory ASH sampling: –  Dedicated background process: MMNL –  Circular SGA memory buffer: one writer; many readers –  Lean and robust mechanism: no locking or latching –  Default 1000ms (1 sec) sampling interval

 ASH sub-sampling to disk: –  Flush to AWR with snapshot or on emergency flush –  Default: 1-in-10 of the 1-sec samples are persisted –  Future: continuous sub-sampling

Session activity sampled efficiently into memory and onto disk

Page 15: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 15

•  MMNL writes to ASH circular buffer one way

•  Readers of V$ASH start at current write pointer

•  Readers proceed in opposite direction of MMNL through buffer

•  Stop when current sample_id > last read sample_id

•  SELECT from V$ASH returned recent-last order

Reading / Writing in Opposite Directions

MMNL

SALLY start

SALLY finish

Page 16: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 16

Sampling Pseudo-code (lean and mean, but there is a hole)

1) FOR ALL SESSION STATE OBJECTS

2) IS SESSION CONNECTED? NO => NEXT SESSION YES:

3) IS SESSION ACTIVE? NO => NEXT SESSION YES:

4) MEMCPY SESSION STATE OBJ 5) CHECK CONSISTENCY OF COPY WITH LIVE SESSION 6) IS COPY CONSISTENT? YES: WRITE ASH ROW FROM COPY NO: IF FIRST COPY, REPEAT STEPS 4-6 ELSE => NEXT SESSION (NO ASH ROW WRITTEN)

Page 17: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 17

Default Settings

 Sampling interval = 1000ms = 1 sec

 Disk filter ratio = 10 = 1 in 10 samples written to AWR

 ASH buffer size: –  Min( Max (5% shared pool, 2% SGA), 2MB per CPU) –  Absolute Max of 256MB

These are carefully chosen for maximum general utility

NOTE: the MMNL sampler session is not sampled

Page 18: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 18

Control Parameters

 _ash_size : size of ASH buffer in bytes –  K/M notation works (e.g. 200M)

 _ash_sampling_interval : in milliseconds –  Min = 100, Max = 10,000

 _ash_disk_filter_ratio : every Nth sample to AWR –  MOD(sample_id, N) = 0 where N=disk filter ratio

 _sample_all : samples idle and active sessions

(geeks want underscores)

Page 19: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 19

V$ASH_INFO New in 11.2 (but unfortunately un-documented)

desc v$ash_info Name Null Type ------------------------------ -------- -------------- TOTAL_SIZE NUMBER FIXED_SIZE NUMBER SAMPLING_INTERVAL NUMBER OLDEST_SAMPLE_ID NUMBER OLDEST_SAMPLE_TIME TIMESTAMP(9) LATEST_SAMPLE_ID NUMBER LATEST_SAMPLE_TIME TIMESTAMP(9) SAMPLE_COUNT NUMBER SAMPLED_BYTES NUMBER SAMPLER_ELAPSED_TIME NUMBER DISK_FILTER_RATIO NUMBER AWR_FLUSH_BYTES NUMBER AWR_FLUSH_ELAPSED_TIME NUMBER AWR_FLUSH_COUNT NUMBER AWR_FLUSH_EMERGENCY_COUNT NUMBER

Compute buffer time window size

Compute average time per sample

DROPPED_SAMPLE_COUNT NUMBER

Page 20: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 20

ASH is Robust when CPU-constrained

1.  ASH sampler is very efficient and does not lock –  Should complete a sample within a single CPU slice

2.  After sampling, the sampler computes next scheduled sample time and sleeps until then

3.  Upon scheduled wake-up, it waits for CPU (runq) and samples again –  CPU bound sample times are shifted by one runq but intervals stay close

to 1 second

(These are precisely times when reliable data is necessary)

Page 21: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 21

ASH Sampler and Run-queue Sampling interval is consistent under CPU-starvation

S_t0 S_t2 S_t1

Run queue Run queue

A_t1 A_t0

Run queue

A_t2

Sleep until next

time Sleep until next

Sample Sample Sample

Page 22: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 22

The ASH “Fix-up”

  ASH column values may be unknown at sampling time –  TIME_WAITED: session is still waiting –  PLAN_HASH: session is still optimizing SQL –  GC events: event details unknown at event initiation

  ASH “fixes up” data during subsequent sampling –  TIME_WAITED fixed up in first sample after event completes –  Long events: last sample gets correct TIME_WAITED (all others 0)

  Querying V$ASH may return un-fixed rows –  Should not be a problem generally

A unique and very important feature

Page 23: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 23

How do we use ASH data?

Page 24: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 24

How do we use ASH data?

 Estimate DB Time and Average Active Sessions –  For specific time intervals –  Decomposed and filtered many ASH dimensions

  Investigate tuning opportunities –  Excesses of DB Time in tune-able areas

 ASH Forensics –  Figure out “what happened to SID?”

Page 25: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 25

ASH Math: Estimating DB Time from ASH

 Each ASH row counts for :INTERVAL of active session time

 Default for :INTERVAL is 1 second (1000 ms)

  Therefore COUNT(*) = DB Time in seconds

  This is what I call “ASH Math”

 An estimate because it is computed over a sample of true reality

Page 26: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 26

ASH Math and DB Time The count of sampled rows is an estimate (unbiased) of DB time

Estimate DB Time COUNT (ASH SAMPLED ROWS)

Page 27: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 27

Computing Average Active Sessions

 AAS = DELTA(DB TIME) / DELTA(elapsed_time) –  Over some time interval(s) of sampled workload

 SUM(:sampling_interval) / [ MAX(sample_time) – MIN(sample_time) ] –  Normalized to common time units, e.g. seconds

 COUNT(*) / [ (MAX(sample_id) – MIN(sample_id) ] –  This works for default sampling interval and one time interval

The centerpiece measure for EM Activity charts and ASH Analytics

Page 28: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 28

Bad ASH Math and TIME_WAITED These mistakes are very common and very wrong

AVG(TIME_WAITED) This does not estimate average event latencies because sampling is biased toward longer events

SUM(TIME_WAITED)

This does not compute total wait time in the database since ASH does not contain all waits.

Page 29: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 29

ASH Timing for Nano-operations

 Some important operations are still too frequent and short-lived for timing –  E.g. no wait event for “bind” operations

 A session-level bit vector is updated in binary fashion before/after such operations

–  Much cheaper than timer calls

  The session bit vector is sampled into ASH

  “ASH Math” used to estimate time spent in un-timed transient operations

Magic trick: timing what cannot be timed

Page 30: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 30

“ON CPU” and ASH

 ASH session status ‘ON CPU’ is derived, not observed –  Session is in a database call –  Session is NOT in a wait event (idle or non-idle)

 Un-instrumented waits => ‘ON CPU’ –  These are bugs and should be rare, but have happened

 Sessions on run queue may be ‘WAITING’ or ‘ON CPU’ –  Depends on state prior to going onto run queue

ASH CPU and Time Model CPU don’t always agree

Page 31: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 31

Enterprise Manager: ASH Analytics

Page 32: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 32

  Display AAS by wait class over time   5-minute Time Selector for details   Top SQL and Top Sessions

–  Broken down by wait class –  Additional fact columns

  User-selectable Top dimension

Average Active Sessions

Origin: EM Top Activity

Page 33: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 33

  Top Lists not graphically comparable

–  “% Activity” depends on sample count

  Time Series by Wait Class only

–  What about SQL, User, etc?   Lots of wasted visual real-estate

Design Issues

Top Activity

Page 34: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 34

EM ASH Analytics

  Logical extension of EM Top Activity page  Average Active Sessions (AAS) over time

–  Decomposed by user-selectable ASH dimension (“parent” dimension)   “Top” Lists by two other user-selectable ASH dimensions

–  With breakdown by “parent” dimension  ASH Analytics Loadmap

–  AAS decomposed into Treemap of up to 3 ASH dimensions –  Investigate skew and/or balance of load over dimension combinations –  Investigate possible cause-effect relationships

Flexible multi-dimensional ASH-based performance analysis tool

Page 35: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 35

  Load (AAS) over time with time selector

–  Selected time broken down by ASH dimension

  2 “Top” lists by other dimensions –  Broken down by parent

dimension also   4 Charts with a shared dimension

–  Extremely powerful

EM ASH Analytics

Average Active Sessions

Page 36: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 36

Page 37: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 37

  Space-filling, scales well   Decompose load (AAS) by

multiple ASH dimensions   Hierarchical decomposition   Some hierarchies natural,

others investigative

EM ASH Analytics: Loadmap

ASH Treemap Visualization

Page 38: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 38

Graphic Section Divider

Page 39: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 39

Tanel Poder Consultant, Enkitec

THIRD PARTY COMPANY LOGO

Active Session History has radically changed the performance diagnosis of Oracle Databases, by design. With ASH you have detailed performance data always and immediately available… This translates to much faster problem solution times and also more accurate diagnosis…

Page 40: OOW13 JB KP ASH Deep Dive

About Us I am… •  Oracle ACE Director •  Sr. Technical Consultant, Enkitec 

Enkitec is…   Oracle Platinum Partner specializing in: 

  Oracle Exadata     Oracle Database, including RAC   Oracle Database Performance Tuning   Oracle APEX and so much more! 

Page 41: OOW13 JB KP ASH Deep Dive

The Consultant’s Challenge   “Hybrid” workload environment: 

  Transactional, ETL, Reporting   Upgraded to 11g in previous year   Consistent degradation since upgrade 

  ETL down from 400 “businesses” per hour to 2‐300   ETL code review and enhancement in works  

  “What can you do for us now outside of that effort?” 

Goal:  Load 700 businesses per hour!! 

Page 42: OOW13 JB KP ASH Deep Dive

Oracle Tools of the Trade   AWR Reports: 

  First offered by onsite DBA, always available   “Averaging” effect of large snapshot times hiding issues 

  ASH Reports:   Help identify problem times with finer granularity   Target reports to problem times, gives clearer picture 

  Enterprise Manager 12c    Used to enhance ASH findings and do further research   Top Activity, SQL Details, ASH Analytics 

Page 43: OOW13 JB KP ASH Deep Dive

Why AWR Wasn’t the Answer  The “problem” was not visible 

 We expect to use CPU and to do I/O  Did not want to alter AWR snapshot timing but needed finer‐grained time view 

 Problem not related to workload change or data volumes, ETL just degrading over time 

Page 44: OOW13 JB KP ASH Deep Dive

Why ASH Was…  Exposed competing PL/SQL procedures  More definitive breakdown of data  Zero‐in on problem time   Session level information   Interested in impacts not frequencies 

Page 45: OOW13 JB KP ASH Deep Dive

ASH Report Targets CPU Spike  Breakdown by the minute, by interval 

CPU spikes in four minute period 

Page 46: OOW13 JB KP ASH Deep Dive

ASH Top SQL Exposes OddiFes   STATS_ADMIN??   SQL Analyze?? 

}  What does this SQL Originate from? 

Page 47: OOW13 JB KP ASH Deep Dive

EM Exposes problem SQL Profiles   EM Search SQL found multiple plans for critical ETL statements with vastly different performance (?) 

  Click‐through bad plan to expose existence of SQL Profile   Oops, profiles are supposed to fix plans! 

Page 48: OOW13 JB KP ASH Deep Dive

What Caused This?  High profile environment, very sensitive to change 

  Stats collection using custom wrapper over deprecated  Oracle package dating back prior to 9i (DBMS_ADMIN) 

  Also using 11g stats collection (DBMS_STATS)  DBMS_ADMIN was deprecated for a reason! 

  Analysis of object stats providing poor data to CBO  Other automated maintenance window tasks were expensive and competing for resources at exactly the wrong time (i.e. ETL time) 

Page 49: OOW13 JB KP ASH Deep Dive

Steps to Correct  Migrated to DBMS_STATS for all stats collection 

   Disable jobs using custom wrapper over DBMS_ADMIN 

 Removed SQL Profiles impacting bad ETL plans  Additional steps taken: 

 Migrated select b‐tree indexes to bitmap indexes, also much needed disk space. 

  Continued to review ASH, AWR and Session SQL performance for improvement. 

Page 50: OOW13 JB KP ASH Deep Dive

Victory within reach…  Throughput improvement after Stats gathering changes  

Page 51: OOW13 JB KP ASH Deep Dive

Where They Are Today…  With further physical and logical tuning: 

750 !! GOAL ACHIEVED 

Page 52: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 52

Graphic Section Divider

Page 53: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 53

A SHORT SEQUENCE OF USING THE TOOL ON A REAL SYSTEM

JB’S ASH ANALYTICS ADVENTURE

Page 54: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 54

Page 55: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 55

Page 56: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 56

Page 57: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 57

Page 58: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 58

Page 59: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 59

Page 60: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 60

Page 61: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 61

Page 62: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 62

Page 63: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 63

Page 64: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 64

Page 65: OOW13 JB KP ASH Deep Dive

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 65