Analysing and troubleshooting Parallel Execution IT Tage 2015

44
ANALYSING AND TROUBLESHOOTING PARALLEL EXECUTION Randolf Geist www.oracle-performance.de http://oracle-randolf.blogspot.com randolf.geist@oracle- performance.de

Transcript of Analysing and troubleshooting Parallel Execution IT Tage 2015

Page 1: Analysing and troubleshooting Parallel Execution IT Tage 2015

ANALYSING AND TROUBLESHOOTING

PARALLEL EXECUTIONRandolf Geist

www.oracle-performance.dehttp://oracle-randolf.blogspot.com

[email protected]

Page 2: Analysing and troubleshooting Parallel Execution IT Tage 2015

WHO AM I Independent consultant

Performance Troubleshooting In-house workshops

Cost-Based Optimizer Performance By Design

Oracle ACE Director

Member of OakTable Network

Page 3: Analysing and troubleshooting Parallel Execution IT Tage 2015

AGENDA Parallel Execution introduction

Major challenges Parallel unfriendly examples Distribution skew examples

How to measure distribution of work

Fixing work distribution skew

Page 4: Analysing and troubleshooting Parallel Execution IT Tage 2015

INTRODUCTION Oracle Database Enterprise Edition

includes the powerful Parallel Execution feature that allows spreading the processing of a single SQL statement execution across multiple worker processes

The feature is fully integrated into the Cost Based Optimizer as well as the execution runtime engine and automatically distributes the work across the so called Parallel Workers

Page 5: Analysing and troubleshooting Parallel Execution IT Tage 2015

INTRODUCTION Simple generic parallelization example

Task: Compute sum of 8 numbers

1+8=9, 9+7=16, 16+9=25,...

1+8+7+9+6+2+6+3= ???

n=8 numbers, 7 computation steps requiredSerial execution: 7 time units

Page 6: Analysing and troubleshooting Parallel Execution IT Tage 2015

INTRODUCTIONSimple generic parallelization example

4 workers

3+ time units

1 + 8 = 9

9 + 7 = 16

6 + 2 = 8

6 + 3 = 9

9 + 16 = 25

8 + 9 = 17

25 + 17 = 42

Coordinator

Page 7: Analysing and troubleshooting Parallel Execution IT Tage 2015

INTRODUCTION

Parallel Execution doesn’t mean “work smarter”

You’re actually willing to accept to “work harder”

Could also be called: “Brute force” approach

Page 8: Analysing and troubleshooting Parallel Execution IT Tage 2015

INTRODUCTION

So with Parallel Execution there might be the problem that it doesn’t work “hard

enough”

Page 9: Analysing and troubleshooting Parallel Execution IT Tage 2015

INTRODUCTION Two major challenges

Can the given task be divided into sub-tasks that can efficiently and independently be processed by the workers? (“Parallel Unfriendly”)

Can all assigned workers be kept busy all the time?

Page 10: Analysing and troubleshooting Parallel Execution IT Tage 2015

INTRODUCTION More reasons why Oracle Parallel

Execution might not reduce runtime as expected:

Parallel DML/DDL gotchas

“Downgrade” at execution time (less workers assigned than expected)

Overhead of Parallel Execution implementation

Limitations of Parallel Execution implementation

Page 11: Analysing and troubleshooting Parallel Execution IT Tage 2015

PARALLEL DML/DDLParallel DML / DDL gotchas

DML / DDL part can run parallel or serial

Query part can run parallel or serial

Page 12: Analysing and troubleshooting Parallel Execution IT Tage 2015

PARALLEL DDL / DML PLANS

Parallel CTAS but serial query-------------------------------------------------------------------------| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |-------------------------------------------------------------------------| 0 | CREATE TABLE STATEMENT | | | | || 1 | PX COORDINATOR | | | | || 2 | PX SEND QC (RANDOM) | :TQ10001 | Q1,01 | P->S | QC (RAND) || 3 | LOAD AS SELECT | T4 | Q1,01 | PCWP | || 4 | PX RECEIVE | | Q1,01 | PCWP | || 5 | PX SEND ROUND-ROBIN| :TQ10000 | | S->P | RND-ROBIN ||* 6 | HASH JOIN | | | | || 7 | TABLE ACCESS FULL| T2 | | | || 8 | TABLE ACCESS FULL| T2 | | | |-------------------------------------------------------------------------

Page 13: Analysing and troubleshooting Parallel Execution IT Tage 2015

PARALLEL DDL / DML PLANS

Serial CTAS but parallel query--------------------------------------------------------------------------| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |--------------------------------------------------------------------------| 0 | CREATE TABLE STATEMENT | | | | || 1 | LOAD AS SELECT | T4 | | | || 2 | PX COORDINATOR | | | | || 3 | PX SEND QC (RANDOM) | :TQ10002 | Q1,02 | P->S | QC (RAND) ||* 4 | HASH JOIN BUFFERED | | Q1,02 | PCWP | || 5 | PX RECEIVE | | Q1,02 | PCWP | || 6 | PX SEND HASH | :TQ10000 | Q1,00 | P->P | HASH || 7 | PX BLOCK ITERATOR | | Q1,00 | PCWC | || 8 | TABLE ACCESS FULL| T2 | Q1,00 | PCWP | || 9 | PX RECEIVE | | Q1,02 | PCWP | || 10 | PX SEND HASH | :TQ10001 | Q1,01 | P->P | HASH || 11 | PX BLOCK ITERATOR | | Q1,01 | PCWC | || 12 | TABLE ACCESS FULL| T2 | Q1,01 | PCWP | |--------------------------------------------------------------------------

Page 14: Analysing and troubleshooting Parallel Execution IT Tage 2015

INTRODUCTION Other reasons why Oracle Parallel

Execution might not scale as expected:

Parallel DML/DDL gotchas

“Downgrade” at execution time (less workers assigned than expected)

Overhead of Parallel Execution implementation

Limitations of Parallel Execution implementation

Page 15: Analysing and troubleshooting Parallel Execution IT Tage 2015

IMPLEMENTATION LIMITATIONS

“Parallel Forced Serial” Example

------------------------------------------------------------------------------| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | | | || 1 | PX COORDINATOR FORCED SERIAL| | | | || 2 | PX SEND QC (RANDOM) | :TQ10003 | Q1,03 | P->S | QC (RAND) || 3 | HASH UNIQUE | | Q1,03 | PCWP | || 4 | PX RECEIVE | | Q1,03 | PCWP | || 5 | PX SEND HASH | :TQ10002 | Q1,02 | P->P | HASH ||* 6 | HASH JOIN BUFFERED | | Q1,02 | PCWP | || 7 | PX RECEIVE | | Q1,02 | PCWP | || 8 | PX SEND HASH | :TQ10000 | Q1,00 | P->P | HASH || 9 | PX BLOCK ITERATOR | | Q1,00 | PCWC | || 10 | TABLE ACCESS FULL | T2 | Q1,00 | PCWP | || 11 | PX RECEIVE | | Q1,02 | PCWP | || 12 | PX SEND HASH | :TQ10001 | Q1,01 | P->P | HASH || 13 | PX BLOCK ITERATOR | | Q1,01 | PCWC | || 14 | TABLE ACCESS FULL | T2 | Q1,01 | PCWP | |------------------------------------------------------------------------------

Page 16: Analysing and troubleshooting Parallel Execution IT Tage 2015

CHALLENGES Two major challenges

Can the given task be divided into sub-tasks that can efficiently and independently be processed by the workers? (“Parallel Unfriendly”)

Can all assigned workers be kept busy all the time?

Page 17: Analysing and troubleshooting Parallel Execution IT Tage 2015

PARALLEL UNFRIENDLYselect median(id) from t2;

-----------------------------------------------------------------------| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |-----------------------------------------------------------------------| 0 | SELECT STATEMENT | | | | || 1 | SORT GROUP BY | | | | || 2 | PX COORDINATOR | | | | || 3 | PX SEND QC (RANDOM)| :TQ10000 | Q1,00 | P->S | QC (RAND) || 4 | PX BLOCK ITERATOR | | Q1,00 | PCWC | || 5 | TABLE ACCESS FULL| T2 | Q1,00 | PCWP | |-----------------------------------------------------------------------

Page 18: Analysing and troubleshooting Parallel Execution IT Tage 2015

PARALLEL UNFRIENDLYcreate table t3 parallel as select * from t2 where rownum <= 10000000;-----------------------------------------------------------------------------| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |-----------------------------------------------------------------------------| 0 | CREATE TABLE STATEMENT | | | | || 1 | PX COORDINATOR | | | | || 2 | PX SEND QC (RANDOM) | :TQ20001 | Q2,01 | P->S | QC (RAND) || 3 | LOAD AS SELECT | T3 | Q2,01 | PCWP | || 4 | PX RECEIVE | | Q2,01 | PCWP | || 5 | PX SEND ROUND-ROBIN | :TQ20000 | | S->P | RND-ROBIN ||* 6 | COUNT STOPKEY | | | | || 7 | PX COORDINATOR | | | | || 8 | PX SEND QC (RANDOM) | :TQ10000 | Q1,00 | P->S | QC (RAND) ||* 9 | COUNT STOPKEY | | Q1,00 | PCWC | || 10 | PX BLOCK ITERATOR | | Q1,00 | PCWC | || 11 | TABLE ACCESS FULL| T2 | Q1,00 | PCWP | |-----------------------------------------------------------------------------

Page 19: Analysing and troubleshooting Parallel Execution IT Tage 2015

PARALLEL UNFRIENDLYcreate table t3 parallel as select * from (select a.*, lag(filler, 1) over (order by id) as prev_filler from t2 a);--------------------------------------------------------------------------------| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |--------------------------------------------------------------------------------| 0 | CREATE TABLE STATEMENT | | | | || 1 | PX COORDINATOR | | | | || 2 | PX SEND QC (RANDOM) | :TQ20001 | Q2,01 | P->S | QC (RAND) || 3 | LOAD AS SELECT | T3 | Q2,01 | PCWP | || 4 | PX RECEIVE | | Q2,01 | PCWP | || 5 | PX SEND ROUND-ROBIN | :TQ20000 | | S->P | RND-ROBIN || 6 | VIEW | | | | || 7 | WINDOW BUFFER | | | | || 8 | PX COORDINATOR | | | | || 9 | PX SEND QC (ORDER) | :TQ10001 | Q1,01 | P->S | QC (ORDER) || 10 | SORT ORDER BY | | Q1,01 | PCWP | || 11 | PX RECEIVE | | Q1,01 | PCWP | || 12 | PX SEND RANGE | :TQ10000 | Q1,00 | P->P | RANGE || 13 | PX BLOCK ITERATOR | | Q1,00 | PCWC | || 14 | TABLE ACCESS FULL| T2 | Q1,00 | PCWP | |--------------------------------------------------------------------------------

Page 20: Analysing and troubleshooting Parallel Execution IT Tage 2015

PARALLEL UNFRIENDLYAll these examples have one thing in

common:

If the Query Coordinator (non-parallel part) needs to perform a significant part of the

overall work, Parallel Execution won’t reduce the runtime as expected

Page 21: Analysing and troubleshooting Parallel Execution IT Tage 2015

CHALLENGES Two major challenges

Can the given task be divided into sub-tasks that can efficiently and independently be processed by the workers? (“Parallel Unfriendly”)

Can all assigned workers be kept busy all the time?

Page 22: Analysing and troubleshooting Parallel Execution IT Tage 2015

ALL WORKERS BUSY

Page 23: Analysing and troubleshooting Parallel Execution IT Tage 2015

DATA DISTRIBUTION SKEW

Page 24: Analysing and troubleshooting Parallel Execution IT Tage 2015

AGENDA Parallel Execution introduction

Major challenges Parallel unfriendly examples Distribution skew examples

How to measure distribution of work

Fixing work distribution skew

Page 25: Analysing and troubleshooting Parallel Execution IT Tage 2015

MEASURE WORK DISTRIBUTION Extended SQL tracing

Generates one trace file per Parallel Worker process in database server trace directory

For cross-instance Parallel Execution this means files spread across more than one server

Each Parallel Worker trace file lists, among others, the number of rows produced per plan line and elapsed time

Page 26: Analysing and troubleshooting Parallel Execution IT Tage 2015

MEASURE WORK DISTRIBUTION Extended SQL tracing

Allows detecting data distribution skew

Usually requires to reproduce the issue

Tedious to collect and skim through dozens of trace files (no tool known that automates that job)

Page 27: Analysing and troubleshooting Parallel Execution IT Tage 2015

MEASURE WORK DISTRIBUTION DBMS_XPLAN.DISPLAY_CURSOR +

Rowsource statistics

Based on same internal code instrumentation as extended SQL trace

Feeds back rowsource statistics (rows produced on execution plan line level, timing, I/O stats) into Library Cache

Very useful for analyzing serial executions

Page 28: Analysing and troubleshooting Parallel Execution IT Tage 2015

MEASURE WORK DISTRIBUTION DBMS_XPLAN.DISPLAY_CURSOR +

Rowsource statistics

Not enabled by default, need to reproduce

Doesn’t cope with well multiple Parallel Execution Servers / more complex parallel plans or Cross Instance RAC execution

No information per Parallel Execution Server

=> Therefore not able to detect distribution skew

Page 29: Analysing and troubleshooting Parallel Execution IT Tage 2015

MEASURE WORK DISTRIBUTION Analyzing Data Distribution Skew

Oracle since a long time offers a special view called V$PQ_TQSTAT

It is populated after a successful Parallel Execution in the session of the Query Coordinator

It lists the amount of data send via the “Table Queues” / “Virtual Tables”

In theory this allows exact analysis which operations caused an uneven data distribution

Page 30: Analysing and troubleshooting Parallel Execution IT Tage 2015

MEASURE WORK DISTRIBUTION V$PQ_TQSTAT skewed execution example

DFO_NUMBER TQ_ID SERVER_TYP NUM_ROWS BYTES PROCESS ---------- ---------- ---------- ---------- ---------- ---------- 1 0 Producer 500807 54396692 P008 1 0 Producer 499193 54259853 P009 1 0 Consumer 500038 54332349 P010 1 0 Consumer 499962 54324196 P011 1 1 Producer 499826 57267724 P008 1 1 Producer 500174 57357253 P009 1 1 Consumer 499933 57304789 P010 1 1 Consumer 500067 57320188 P011 1 2 Producer 462069 52963939 P010 1 2 Producer 537931 61681312 P011 1 2 Consumer 500212 57346574 P008 1 2 Consumer 499788 57298677 P009 1 3 Producer 500038 57337041 P010 1 3 Producer 499962 57328456 P011 1 3 Consumer 0 48 P008 1 3 Consumer 1000000 114665449 P009 1 4 Producer 0 24 P008 1 4 Producer 1000000 116668401 P009 1 4 Consumer 1000000 116668425 QC

Page 31: Analysing and troubleshooting Parallel Execution IT Tage 2015

MEASURE WORK DISTRIBUTION V$PQ_TQSTAT

Allows detecting data distribution skew

Can only be queried from inside session

Doesn’t cope with well more complex parallel plans

No information about relevance of skew

Page 32: Analysing and troubleshooting Parallel Execution IT Tage 2015

REAL-TIME SQL MONITORING

Easy to identify whether all workers are kept busy all the time or not

Easy to identify if there was a problem with work distribution

Shows actual parallel degree used (“Parallel Downgrade”)

Supports RAC

Page 33: Analysing and troubleshooting Parallel Execution IT Tage 2015

REAL-TIME SQL MONITORING

Reports are not persisted and will be flushed from memory quite quickly on busy systems

No easy identification and therefore no systematic troubleshooting which plan operations cause a work distribution problem

Lacks some precision regarding Parallel Execution details

Page 34: Analysing and troubleshooting Parallel Execution IT Tage 2015

PARALLEL EXECUTION SKEW

Real-Time SQL Monitoring allows detecting Parallel Execution skew in the following way:

In the “Activity” tab The average active sessions will be less than the

DOP of the operation, allows to detect both temporal and data distribution skew

In the “Parallel” tab The DB Time recorded per Parallel Slave will

show an uneven distribution of active work time

Page 35: Analysing and troubleshooting Parallel Execution IT Tage 2015

PARALLEL EXECUTION SKEW

Real-Time SQL Monitoring “Activity” tab – skewed execution example

Page 36: Analysing and troubleshooting Parallel Execution IT Tage 2015

PARALLEL EXECUTION SKEW

Real-Time SQL Monitoring “Parallel” tab – skewed execution example

Page 37: Analysing and troubleshooting Parallel Execution IT Tage 2015

PARALLEL EXECUTION SKEW

One very useful approach is using Active Session History (ASH)

ASH samples active sessions once a second

Activity of Parallel Workers over time can easily be analyzed

From 11g on the ASH data even contains a reference to the execution plan line, so a relation between Parallel Worker activity and execution plan line based on ASH is possible

Page 38: Analysing and troubleshooting Parallel Execution IT Tage 2015

PARALLEL EXECUTION SKEW

Custom queries on ASH data required for detailed analysis

XPLAN_ASH tool runs these queries for a given SQL_ID

Advantage of ASH is the availability of retained historic ASH data via AWR on disk

Information can be extracted even for SQL executions as long ago as the retention configured for AWR

Page 39: Analysing and troubleshooting Parallel Execution IT Tage 2015

AGENDA Parallel Execution introduction

Major challenges Parallel unfriendly examples Distribution skew examples

How to measure distribution of work

Fixing work distribution skew

Page 40: Analysing and troubleshooting Parallel Execution IT Tage 2015

PARALLEL EXECUTION SKEW

Fixing Work Distribution Skew

Influence Parallel Distribution: Data volume estimates, PQ_DISTRIBUTE / PQ_[NO]MAP / PQ_SKEW (12c+) hint

Limit impact by influencing join order

Rewrite queries trying to limit or avoid skew

Remap skewed data changing the value distribution

Page 41: Analysing and troubleshooting Parallel Execution IT Tage 2015

PARALLEL EXECUTION SKEW

Fixing Work Distribution Skew

Automatic join skew detection in 12c (PQ_SKEW)

Supports only parallel HASH JOINs

Supports at present only simple probe row sources (no result of other joins supported)

Histogram showing popular values required (or forced via PQ_SKEW hint)

Page 42: Analysing and troubleshooting Parallel Execution IT Tage 2015

PARALLEL EXECUTION SKEW

----------------------------------------------------------------------------------| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |----------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | | | || 1 | SORT AGGREGATE | | | | || 2 | PX COORDINATOR | | | | || 3 | PX SEND QC (RANDOM) | :TQ10002 | Q1,02 | P->S | QC (RAND) || 4 | SORT AGGREGATE | | Q1,02 | PCWP | ||* 5 | HASH JOIN | | Q1,02 | PCWP | || 6 | PX RECEIVE | | Q1,02 | PCWP | || 7 | PX SEND HYBRID HASH | :TQ10000 | Q1,00 | P->P | HYBRID HASH|| 8 | STATISTICS COLLECTOR | | Q1,00 | PCWC | || 9 | PX BLOCK ITERATOR | | Q1,00 | PCWC | ||* 10 | TABLE ACCESS FULL | T_1 | Q1,00 | PCWP | || 11 | PX RECEIVE | | Q1,02 | PCWP | || 12 | PX SEND HYBRID HASH (SKEW)| :TQ10001 | Q1,01 | P->P | HYBRID HASH|| 13 | PX BLOCK ITERATOR | | Q1,01 | PCWC | ||* 14 | TABLE ACCESS FULL | T_2 | Q1,01 | PCWP | |----------------------------------------------------------------------------------

Page 43: Analysing and troubleshooting Parallel Execution IT Tage 2015

PARALLEL EXECUTION SKEW

Fixing Work Distribution Skew

More information:

http://oracle-randolf.blogspot.com/2014/08/parallel-execution-skew-summary.html

Page 44: Analysing and troubleshooting Parallel Execution IT Tage 2015

QUESTIONS & ANSWERS

Q & A