Introduction to Parallel Execution

48
Tuning & Tracing Parallel Execution (An Introduction) Doug Burns ([email protected])

description

An overview presentation covering the use of Oracle's PX functionality including some tips and traps. Detailed white paper at http://oracledoug.com/px.html

Transcript of Introduction to Parallel Execution

Page 1: Introduction to Parallel Execution

Tuning & Tracing Parallel Execution(An Introduction)

Doug Burns([email protected])

Page 2: Introduction to Parallel Execution

Introduction

• Introduction• Parallel Architecture• Configuration• Dictionary Views• Tracing and Wait Events• Conclusion

Page 3: Introduction to Parallel Execution

• Parallel Query Option introduced in 7.1– Now called Parallel Execution

• Parallel Execution splits a single large task into multiple smaller tasks which are handled by separate processes running concurrently.– Full Table Scans– Partition Scans– Sorts– Index Creation– And others …

Introduction

Page 4: Introduction to Parallel Execution

• A little history

• So why did so few sites implement PQO?

Introduction

- Lack of understanding- Leads to horrible early experiences- Community's resistance to change- Not useful in all environments- Needs time and effort applied to the initial design!

• Isn’t Oracle’s Instance architecture parallel anyway?

Page 5: Introduction to Parallel Execution

• Non-Parallel Architecture?

Introduction

Page 6: Introduction to Parallel Execution

Parallel Architecture

• Introduction• Parallel Architecture• Configuration• Dictionary Views• Tracing and Wait Events• Conclusion

Page 7: Introduction to Parallel Execution

Parallel Architecture

Non-Parallel

E M P

S e r v e r P r o c e s s

U s e r P r o c e s s

s e l e c t * f r o m e m p ;

ParallelDeg 2

S l a v e 0

R e a d i n g 1 s t H a l f

S l a v e 1R e a d i n g 2 n d

H a l f

E M P

Q C

U s e r P r o c e s s

s e l e c t / * + p a r a l l e l ( e m p , 2 ) * / * f r o m e m p ;

Page 8: Introduction to Parallel Execution

• The Degree of Parallelism (DOP) refers to the number of discrete threads of work

• The default DOP for an Instance is calculated as – cpu_count * parallel_threads_per_cpu– Used if I don’t specify a DOP in a hint or table

definition

• The maximum number of PX slaves is :-– DOP * 2– Plus the Query Coordinator– But this is per Data Flow Operation– And the slaves will be re-used

Parallel Architecture

Page 9: Introduction to Parallel Execution

Parallel Architecture

• Inter-process communication is through message buffers (also known as table queues)

• These can be stored in the shared pool or the large pool

S l a v e 0

S o r t i n g A - P

S l a v e 1

S o r t i n g Q - Z

S l a v e 2

R e a d i n g 1 s t H a l f

S l a v e 3R e a d i n g 2 n d

H a l f

E M P

Q C

Q C ( R A N G E R )

U s e r P r o c e s s

s e l e c t *f r o m e m po r d e r b y n a m e ;

Page 10: Introduction to Parallel Execution

Parallel Architecture

This slide intentionally left blank

Page 11: Introduction to Parallel Execution

• Methods of invoking Parallel Execution– Table / Index Level

ALTER TABLE emp PARALLEL(DEGREE 2);

– Optimizer HintsSELECT /*+ PARALLEL(emp) */ *

FROM emp;

• Note Using Parallel Execution implies that you will be using the Cost-based Optimiser

• As usual, appropriate statistics are vital

– Statement LevelALTER INDEX emp_idx_1 REBUILD

PARALLEL 8;

Parallel Architecture

Page 12: Introduction to Parallel Execution

Configuration

• Introduction• Parallel Architecture• Configuration• Dictionary Views• Tracing and Wait Events• Conclusion

Page 13: Introduction to Parallel Execution

• parallel_automatic_tuning– First introduced in Oracle 8i– This is the first parameter you should set - to TRUE

• An alternative point of view – don’t use it!• Deprecated in 10G and default is FALSE but much of

the same functionality is implemented– Ensures that message queues are stored in the

Large Pool rather than the Shared Pool– It modifies the values of other parameters– As well as the 10g default values, the following

sections show the values when parallel_automatic_tuning is set to TRUE on previous versions

Configuration

Page 14: Introduction to Parallel Execution

• parallel_adaptive_multi_user– First introduced in Oracle 8– Default Value – FALSE (TRUE in 10g)– Automatic Tuning Default – TRUE– Designed when using PX for online usage– As workload increases, new statements will have

their degree of parallelism down-graded.

Configuration

–Effective Oracle by Design– Tom Kyte‘This provides the best of both worlds and what users expect from a system. They know that when it is busy, it will run slower.’

Page 15: Introduction to Parallel Execution

• parallel_max_servers– Default - cpu_count * parallel_threads_per_cpu * 2

(if using automatic PGA management) * 5• e.g. 1 CPU * 2 * 2 * 5 = 20 on my laptop

– The maximum number of parallel execution slaves available for all sessions in this instance.

– Watch out for the processes trap!

• parallel_min_servers– Default - 0– May choose to increase this if PX usage is constant

to reduce overhead of starting and stopping slave processes.

Configuration

More on this subject in tomorrow’s presentation

Page 16: Introduction to Parallel Execution

• parallel_execution_message_size– Default Value – 2148 bytes– Automatic Tuning Default – 4Kb– Maximum size of a message buffer– May be worth increasing to 8Kb, depending on wait

event analysis.– However, small increases in message size could lead

to large increases in large pool memory requirements

– Remember that DOP2 relationship and multiple sessions

Configuration

Page 17: Introduction to Parallel Execution

• Metalink Note 201799.1 contains full details and guidance for setting all parameters

• Ensure that standard parameters are also set appropriately– large_pool_size

• Modified by parallel_automatic_tuning• Calculation in Data Warehousing Guide• Can be monitored using v$sgastat

– processes

• Modified by parallel_automatic_tuning– sort_area_size

• For best results use automatic PGA management• Be aware of _smm_px_max_size

• Metalink Note 201799.1 contains full details and guidance for all relevant parameters

Configuration

Page 18: Introduction to Parallel Execution

Dictionary Views

• Introduction• Parallel Architecture• Configuration• Dictionary Views• Tracing and Wait Events• Conclusion

Page 19: Introduction to Parallel Execution

• Parallel-specific Dictionary Views

SELECT table_name FROM dict WHERE table_name LIKE 'V%PQ%' OR table_name like 'V%PX%‘;

TABLE_NAME------------------------------V$PQ_SESSTATV$PQ_SYSSTATV$PQ_SLAVEV$PQ_TQSTATV$PX_BUFFER_ADVICEV$PX_SESSIONV$PX_SESSTATV$PX_PROCESSV$PX_PROCESS_SYSSTAT

– Also GV$PQ_SESSTAT and GV$PQ_TQSTAT with INST_ID

Dictionary Views

Page 20: Introduction to Parallel Execution

• v$pq_sesstat– Provides statistics relating to the current session– Useful for verifying that a specific query is using

parallel execution as expected

SELECT * FROM v$pq_sesstat;

STATISTIC LAST_QUERY SESSION_TOTAL

------------------------------ ---------- -------------

Queries Parallelized 1 1

DML Parallelized 0 0

DDL Parallelized 0 0

DFO Trees 1 1

Server Threads 3 0

Allocation Height 3 0

Allocation Width 1 0

Local Msgs Sent 217 217

Distr Msgs Sent 0 0

Local Msgs Recv'd 217 217

Distr Msgs Recv'd 0 0

Dictionary Views

Page 21: Introduction to Parallel Execution

• v$pq_sysstat– The instance-level overview– Various values, including information to help set

parallel_min_servers and parallel_max_servers– v$px_process_sysstat contains similar information

SELECT * FROM v$pq_sysstat WHERE statistic like ‘Servers%’;

STATISTIC VALUE

------------------------------ ----------

Servers Busy 0

Servers Idle 0

Servers Highwater 3

Server Sessions 3

Servers Started 3

Servers Shutdown 3

Servers Cleaned Up 0

Dictionary Views

Page 22: Introduction to Parallel Execution

• v$pq_slave– Gives information on the activity of individual PX slaves– v$px_process contains similar information

SELECT slave_name, status, sessions, msgs_sent_total, msgs_rcvd_total

FROM v$pq_slave;

SLAV STAT SESSIONS MSGS_SENT_TOTAL MSGS_RCVD_TOTAL

---- ---- ---------- --------------- ---------------

P000 BUSY 3 465 508

P001 BUSY 3 356 290

P002 BUSY 3 153 78

P003 BUSY 3 108 63

P004 IDLE 2 249 97

P005 IDLE 2 246 97

P006 IDLE 2 239 95

P007 IDLE 2 249 96

Dictionary Views

Page 23: Introduction to Parallel Execution

• v$pq_tqstat– Shows communication relationship between slaves– Must be executed from a session that’s been using parallel

operations – refers to this session– Example 1 – Attendance Table (25,481 rows)break on dfo_number on tq_id

SELECT /*+ PARALLEL (attendance, 4) */ *FROM attendance;

SELECT dfo_number, tq_id, server_type, process, num_rows, bytesFROM v$pq_tqstatORDER BY dfo_number DESC, tq_id, server_type DESC, process;

DFO_NUMBER TQ_ID SERVER_TYP PROCESS NUM_ROWS BYTES---------- ---------- ---------- ---------- ---------- ----------

1 0 Producer P000 6605 114616 Producer P001 6102 105653 Producer P002 6251 110311 Producer P003 6523 113032 Consumer QC 25481 443612

Dictionary Views

Page 24: Introduction to Parallel Execution

• Example 2 - with a sort operation

SELECT /*+ PARALLEL (attendance, 4) */ *FROM attendanceORDER BY amount_paid;

DFO_NUMBER TQ_ID SERVER_TYP PROCESS NUM_ROWS BYTES---------- ---------- ---------- ---------- ---------- ---------- 1 0 Ranger QC 372 13322

Producer P004 5744 100069 Producer P005 6304 110167 Producer P006 6303 109696 Producer P007 7130 124060

Consumer P000 15351 261380 Consumer P001 10129 182281 Consumer P002 0 103 Consumer P003 1 120 1 Producer P000 15351 261317 Producer P001 10129 182238 Producer P002 0 20 Producer P003 1 37

Consumer QC 25481 443612

Dictionary Views

Page 25: Introduction to Parallel Execution

• So why the unbalanced slaves?– Check the list of distinct values in amount_paid

SELECT amount_paid, COUNT(*)

FROM attendance

GROUP BY amount_paid

ORDER BY amount_paid

/

 

AMOUNT_PAID COUNT(*)

----------- ----------

200 1

850 1

900 1

1000 7

1150 1

1200 15340

1995 10129

4000 1

Dictionary Views

Page 26: Introduction to Parallel Execution

• v$px_session and v$px_sesstat– Query to show slaves and physical readsbreak on qcsid on server_set

SELECT stat.qcsid, stat.server_set, stat.server#, nam.name, stat.valueFROM v$px_sesstat stat, v$statname namWHERE stat.statistic# = nam.statistic#AND nam.name = ‘physical reads’ORDER BY 1,2,3

QCSID SERVER_SET SERVER# NAME VALUE---------- ---------- ---------- -------------------- ---------- 145 1 1 physical reads 0 2 physical reads 0 3 physical reads 0 2 1 physical reads 63 2 physical reads 56 3 physical reads 61 physical reads 4792

Dictionary Views

Page 27: Introduction to Parallel Execution

• v$px_process– Shows parallel execution slave processes, status and

session information

SELECT * FROM v$px_process;

SERV STATUS PID SPID SID SERIAL#

---- --------- ---------- ------------ ---------- ----------

P001 IN USE 18 7680 144 17

P004 IN USE 20 7972 146 11

P005 IN USE 21 8040 148 25

P000 IN USE 16 7628 150 16

P006 IN USE 24 8100 151 66

P003 IN USE 19 7896 152 30

P007 AVAILABLE 25 5804

P002 AVAILABLE 12 6772

Dictionary Views

Page 28: Introduction to Parallel Execution

• Monitoring the SQL being executed by slavesset pages 0

column sql_text format a60

 

select p.server_name,

sql.sql_text

from v$px_process p, v$sql sql, v$session s

WHERE p.sid = s.sid AND p.serial# = s.serial#

AND s.sql_address = sql.address AND s.sql_hash_value = sql.hash_value

/

– 9i Results

P001 SELECT A1.C0 C0,A1.C1 C1,A1.C2 C2,A1.C3 C3,A1.C4 C4,A1.C5 C5,

A1.C6 C6,A1.C7 C7 FROM :Q3000 A1 ORDER BY A1.C0

– 10g Results

P001 SELECT /*+ PARALLEL (attendance, 2) */ * FROM attendance

ORDER BY amount_paid

Dictionary Views

Page 29: Introduction to Parallel Execution

• Additional information in standard Dictionary Views– e.g. v$sysstat

SELECT name, value FROM v$sysstat WHERE name LIKE 'PX%';

NAME VALUE

---------------------------------------------- ----------

PX local messages sent 4895

PX local messages recv'd 4892

PX remote messages sent 0

PX remote messages recv'd 0

Dictionary Views

Page 30: Introduction to Parallel Execution

• Monitoring the adaptive multi-user algorithm– We need to be able to check whether operations are

being downgraded and by how much– Downgraded to serial could be a particular problem!

SELECT name, value FROM v$sysstat WHERE name LIKE 'Parallel%'

NAME VALUE

---------------------------------------------------------------- ----------

Parallel operations not downgraded 546353

Parallel operations downgraded to serial 432

Parallel operations downgraded 75 to 99 pct 790

Parallel operations downgraded 50 to 75 pct 1454

Parallel operations downgraded 25 to 50 pct 7654

Parallel operations downgraded 1 to 25 pct 11873

• Monitoring the adaptive multi-user algorithm– We need to be able to check whether operations are

being downgraded and by how much– Downgraded to serial could be a particular problem!

SELECT name, value FROM v$sysstat WHERE name LIKE 'Parallel%'

NAME VALUE

------------------ ---------------------------------------------- ----------

Parallel operations not downgraded 546353

P*ssed-off users 432

Parallel operations downgraded 75 to 99 pct 790

Parallel operations downgraded 50 to 75 pct 1454

Parallel operations downgraded 25 to 50 pct 7654

Parallel operations downgraded 1 to 25 pct 11873

Dictionary Views

Page 31: Introduction to Parallel Execution

• Statspack– Example Report (Excerpt)– During overnight batch operation– Mainly Bitmap Index creation– Slightly difficult to read

Parallel operations downgraded 1 0

Parallel operations downgraded 25 0

Parallel operations downgraded 50 7

Parallel operations downgraded 75 38

Parallel operations downgraded to 1

Parallel operations not downgrade 22

– With one stream downgraded to serial, the rest of the schedule may depend on this one job.

Dictionary Views

Page 32: Introduction to Parallel Execution

Tracing and Wait Events

• Introduction• Parallel Architecture• Configuration• Dictionary Views• Tracing and Wait Events• Conclusion

Page 33: Introduction to Parallel Execution

• Tracing Parallel Execution operations is more complicated than standard tracing– One trace file per slave (as well as the query

coordinator)– Potentially 5 trace files even with a DOP of 2– May be in background_dump_dest or

user_dump_dest (usually background_dump_dest)

Tracing and Wait Events

• Optimizing Oracle Performance – Millsap and Holt‘The remaining task is to identify and analyze all of the

relevant trace files. This task is usually simple …’

                                                        

Page 34: Introduction to Parallel Execution

• Much simpler in 10g– Use trcsess to generate a consolidated trace file for

QC and all slaves

exec dbms_session.set_identifier(‘PX_TEST');

REM tracefile_identifier is optional, but might make things easier for youalter session set tracefile_identifier=‘PX_TEST';

exec dbms_monitor.client_id_trace_enable(‘PX_TEST');

REM DO WORK

exec dbms_monitor.client_id_trace_disable(‘PX_TEST’);

GENERATE THE CONSOLIDATED TRACE FILE AND THEN RUN IT THROUGH TKPROF

trcsess output=/ora/admin/TEST1020/udump/PX_TEST.trc clientid=PX_TEST /ora/admin/TEST1020/udump/*px_test*.trc /ora/admin/TEST1020/bdump/*.trc

tkprof /ora/admin/TEST1020/udump/DOUG.trc /ora/admin/TEST1020/udump/DOUG.out

Tracing and Wait Events

Page 35: Introduction to Parallel Execution

• This is what one of the slaves looks likeC:\oracle\product\10.2.0\admin\ORCL\udump>cd ../bdump

C:\oracle\product\10.2.0\admin\ORCL\bdump>more orcl_p000_2748.trc

<SNIPPED>

*** SERVICE NAME:(SYS$USERS) 2006-03-07 10:57:29.812

*** CLIENT ID:(PX_TEST) 2006-03-07 10:57:29.812

*** SESSION ID:(151.24) 2006-03-07 10:57:29.812

WAIT #0: nam='PX Deq: Msg Fragment' ela= 13547 sleeptime/senderid=268566527 passes=1 p3=0 obj#=-1 tim=3408202924

=====================

PARSING IN CURSOR #1 len=60 dep=1 uid=70 oct=3 lid=70 tim=3408244715 hv=1220056081 ad='6cc64000'

select /*+ parallel(test_tab3, 2) */ count(*)

from test_tab3

END OF STMT

Tracing and Wait Events

Page 36: Introduction to Parallel Execution

• Many more wait events and more time spent waiting– The various processes need to communicate with

each other– Metalink Note 191103.1 lists the wait events related

to Parallel Execution– But be careful of what ‘Idle’ means

Tracing and Wait Events

Page 37: Introduction to Parallel Execution

• Events indicating consumers or QC are waiting for data from producers– PX Deq: Execute Reply– PX Deq: Table Q Normal

• Although considered idle events, if these waits are excessive, it could indicate a problem in the performance of the slaves

• Investigate the slave trace files                                            

            

Tracing and Wait Events

Page 38: Introduction to Parallel Execution

• Events indicating producers are quicker than consumers (or QC)– PX qref latch

• Try increasing parallel_execution_message_size as this might reduce the communications overhead

• Although it could make things worse if the consumer is just taking time to process the incoming data.

                                                        

Tracing and Wait Events

Page 39: Introduction to Parallel Execution

• Messaging Events– PX Deq Credit: need buffer– PX Deq Credit: send blkd

• Although there may be many waits, the time spent should not be a problem.

• If it is, perhaps you have an extremely busy server that is struggling to cope– Reduce DOP?– Increase parallel_execution_message_size?– Don’t use PX?

Tracing and Wait Events

Page 40: Introduction to Parallel Execution

• Query Coordinator waiting for the slaves to parse their SQL statements– PX Deq: Parse Reply

• If there are any significant waits for this event, this may indicate you have shared pool resource issues.

• Or you’ve encountered a bug!

Tracing and Wait Events

Page 41: Introduction to Parallel Execution

• Partial Message Event– PX Deq: Msg Fragment

• May be eliminated or improved by increasing parallel_execution_message_size

• Not an issue on recent tests

Tracing and Wait Events

Page 42: Introduction to Parallel Execution

• Example– Excerpt from an overnight Statspack Report

                                                        Event             Waits  Timeouts  Time (s)   (ms)     /txn

direct Path read  2,249,666    0    115,813     51     25.5

PX Deq: Execute Reply  553,797 22,006     75,910    137      6.3

PX qref latch              77,461     39,676     42,257    546      0.9

library cache pin          27,877     10,404     31,422   1127      0.3

db file scattered read  1,048,135          0     25,144     24     11.9

– Direct Path Reads• Sort I/O• Read-ahead• PX Slave I/O• The average wait time – SAN!

Tracing and Wait Events

Page 43: Introduction to Parallel Execution

                                                     Event             Waits  Timeouts  Time (s)   (ms)     /txn

direct Path read  2,249,666    0    115,813     51     25.5

PX Deq: Execute Reply  553,797 22,006     75,910    137      6.3

PX qref latch              77,461     39,676     42,257    546      0.9

library cache pin          27,877     10,404     31,422   1127      0.3

db file scattered read  1,048,135          0     25,144     24     11.9

– PX Deq: Execute Reply• Idle event – QC waiting for a response from slaves• Some waiting is inevitable

– PX qref latch• Largely down to the extreme use of Parallel Execution• Practically unavoidable but perhaps we could increase

parallel_execution_message_size?

– Library cache pin?• Need to look at the trace files

Tracing and Wait Events

Page 44: Introduction to Parallel Execution

Conclusion

• Introduction• Parallel Architecture• Configuration• Dictionary Views• Tracing and Wait Events• Conclusion

Page 45: Introduction to Parallel Execution

• Plan / Test / Implement– Asking for trouble if you don’t!

• Hardware– It’s designed to suck the server dry– Trying to squeeze a quart into a pint pot will make

things slow down due to contention

• Tune the SQL first– All the old rules apply– The biggest improvements come from doing less

unnecessary work in the first place– Even if PX does make things go quickly enough, it’s

going to use a lot more resources doing so

Conclusion

Page 46: Introduction to Parallel Execution

• Don’t use it for small, fast tasks– They won’t go much quicker– They might go slower– They will use more resources

• Don’t use it for online– Not unless it’s a handful of users– With a predictable maximum number of concurrent

activities– Who understand the implications and won’t go crazy

when something takes four times as long as normal!– It gives a false initial perception of high performance and

isn’t scalable– Okay, Tom, set parallel_adaptive_multi_user to TRUE

Conclusion

Page 47: Introduction to Parallel Execution

• The slower your I/O sub-system, the more benefit you are likely to see from PX– But shouldn’t you fix the underlying problem?– More on this in the next presentation

• Consider whether PX is the correct parallel solution for overnight batch operations– A single stream of parallel jobs?– Parallel streams of single-threaded jobs?– Unfortunately you’ll probably have to do some work

to prove your ideas!

Conclusion

Page 48: Introduction to Parallel Execution

Tuning & Tracing Parallel Execution(An Introduction)

Doug Burns([email protected])

(oracledoug.blogspot.com)(doug.burns.tripod.com)