Oracle trace data collection errors: the story about oceans, islands, and rivers

75
Oracle Peormance Cary Millsap Method R Corporation and Accenture Enkitec Group @CaryMillsap [email protected] · [email protected] Dallas Oracle Users Group Meeting Texas State Government Facility whose name must not be spoken Richardson, Texas 5:00p–11:00p CST Thursday 22 January 2015 © 2015 Method R Corporation

Transcript of Oracle trace data collection errors: the story about oceans, islands, and rivers

Oracle Pe!ormance

Cary MillsapMethod R Corporation and Accenture Enkitec Group

@[email protected] · [email protected]

Dallas Oracle Users Group MeetingTexas State Government Facility whose name must not be spokenRichardson, Texas5:00p–11:00p CST Thursday 22 January 2015

© 2015 Method R Corporation

@CaryMillsap

Cary Millsap

2030

2025

2020

2015

2010

2005

2000

1995

1990

1985

100 45 4

??

TMMeTHOD RTM

hotsos

Optimal Flexible ArchitectureOracle APS

System Pe!ormance Group

Method R Profiler

Method R ToolsMethod R Trace

@CaryMillsap 3

h"p://amzn.to/173bpzg

@CaryMillsap 4

1 What I’ll talk about today

@CaryMillsap

When you execute a business task on a computer system, you create an experience.

…An experience between you and the machine.

5

@CaryMillsap

The duration of this experience is calledresponse time.

6

@CaryMillsap

A sequence diagram helps you understand how that response time was consumed.

7

@CaryMillsap

A profile is a useful aggregationof the sequence diagram.

8

@CaryMillsap

A pe!ormance analyst looks at your time consumptions to determine whether it is possible

to reduce the response time of the experience.

…And, if so, then by how much.

9

@CaryMillsap

The richest and easiest diagnostic information to obtain in this whole technology stack is available

from the Oracle Database tier.

…Oracle’s extended SQL trace data.

10

@CaryMillsap

But in almost 100% of first tries with using Oracle extended SQL trace data, people make a data

collection mistake that complicates their analysis.

11

@CaryMillsap

This is the story of that mistake.

12

@CaryMillsap 13

2 Sequence diagrams and profiles

@CaryMillsap 14

@CaryMillsap 15

@CaryMillsap 16

@CaryMillsap 17

@CaryMillsap 18

CALL-NAME DURATION % CALLS MEAN------------------------------------------------- ------------- ------ ---------- ---------db file sequential read 59,081.406102 76.6% 10,013,394 0.005900log buffer space 6,308.758563 8.2% 9,476 0.665762free buffer waits 4,688.730190 6.1% 200,198 0.023420EXEC 4,214.190000 5.5% 36,987 0.113937log file switch completion 1,552.471890 2.0% 1,853 0.837815db file parallel read 464.976815 0.6% 7,641 0.060853log file switch (checkpoint incomplete) 316.968886 0.4% 351 0.903045rdbms ipc reply 244.937910 0.3% 2,737 0.089491undo segment extension 140.267429 0.2% 1,411 0.099410log file switch (private strand flush incomplete) 112.680587 0.1% 134 0.84090017 others 23.367228 0.0% 58,126 0.000402------------------------------------------------- ------------- ------ ---------- ---------TOTAL (27) 77,148.755600 100.0% 10,332,308 0.007467

CALL-NAME DURATION % CALLS MEAN------------------------------------------------- ------------- ------ ---------- ---------db file sequential read 59,081.406102 76.6% 10,013,394 0.005900log buffer space 6,308.758563 8.2% 9,476 0.665762free buffer waits 4,688.730190 6.1% 200,198 0.023420EXEC 4,214.190000 5.5% 36,987 0.113937log file switch completion 1,552.471890 2.0% 1,853 0.837815db file parallel read 464.976815 0.6% 7,641 0.060853log file switch (checkpoint incomplete) 316.968886 0.4% 351 0.903045rdbms ipc reply 244.937910 0.3% 2,737 0.089491undo segment extension 140.267429 0.2% 1,411 0.099410log file switch (private strand flush incomplete) 112.680587 0.1% 134 0.84090017 others 23.367228 0.0% 58,126 0.000402------------------------------------------------- ------------- ------ ---------- ---------TOTAL (27) 77,148.755600 100.0% 10,332,308 0.007467

=

@CaryMillsap 19

@CaryMillsap

Key question:

What is it you’re trying to optimize?

20

@CaryMillsap

Slow depa!ment?

Then analyze the steam of experiences.

Verdict: clerk too slow between experiences.

21

=CALL-NAME DURATION % CALLS MEAN--------------------------- -------- ------ ----- ---------SQL*Net message from client 137 87.3% 7 19.571429everything else 20 12.7% 142 0.140845--------------------------- -------- ------ ----- ---------TOTAL (2) 157 100.0% 149 1.053691

@CaryMillsap

THE ONE YOU’LL BE DOING

MOST OF THE TIME.

Slow application?

Then analyze each experience separately.

Verdict: app is too cha"y.

22

CALL-NAME DURATION % CALLS MEAN--------------------------- -------- ------ ----- --------SQL*Net message from client 8 57.1% 4 2.000000some of this, some of that 6 42.9% 28 0.214286--------------------------- -------- ------ ----- --------TOTAL (2) 14 100.0% 32 0.382550

CALL-NAME DURATION % CALLS MEAN--------------------------- -------- ------ ----- --------SQL*Net message from client 11 52.3% 4 2.750000some of this, some of that 10 47.7% 113 0.088496--------------------------- -------- ------ ----- --------TOTAL (1) 21 100.0% 117 0.382550

=

@CaryMillsap 23

3 Oracle extended SQL trace data

@CaryMillsap 24

@CaryMillsap

Typical Oracle trace file for a connection pool

25

...WAIT ... nam='SQL*Net message from client' ela= 1202689 ...A sequence of trace lines explaining time consumption for Experience AWAIT ... nam='SQL*Net message from client' ela= 4260917 ...A sequence of trace lines explaining time consumption for Experience BWAIT ... nam='SQL*Net message from client' ela= 5213365 ...A sequence of trace lines explaining time consumption for Experience CWAIT ... nam='SQL*Net message from client' ela= 2044420 ......

@CaryMillsap

So, just ignore all theSQL*Net message from client.

Right?

26

@CaryMillsap

So, just ignore all theSQL*Net message from client.

Right?

27

BIG MISTAKE

@CaryMillsap

What I didn’t mention before…

28

@CaryMillsap

These experiences like A, B, and C can haveSQL*Net message from client calls in them, too.

…That might dominate response times!

29

@CaryMillsap

WAIT ... nam='SQL*Net message from client' ela= 1202689 ...stuff for Experience AWAIT ... nam='SQL*Net message from client' ela= 342

more stuff for Experience AWAIT ... nam='SQL*Net message from client' ela= 1492

yet more stuff for experience Aetc.WAIT ... nam='SQL*Net message from client' ela= 4260917 ...stuff for Experience BWAIT ... nam='SQL*Net message from client' ela= 2928

more stuff for Experience Betc.WAIT ... nam='SQL*Net message from client' ela= 5213365 ...stuff for Experience CWAIT ... nam='SQL*Net message from client' ela= 855

more stuff for Experience Cetc.WAIT ... nam='SQL*Net message from client' ela= 2044420 ...

30

WAIT ... nam='SQL*Net message from client' ela= 1202689 ...stuff for Experience AWAIT ... nam='SQL*Net message from client' ela= 342

more stuff for Experience AWAIT ... nam='SQL*Net message from client' ela= 1492

yet more stuff for experience Aetc.WAIT ... nam='SQL*Net message from client' ela= 4260917 ...stuff for Experience BWAIT ... nam='SQL*Net message from client' ela= 2928

more stuff for Experience Betc.WAIT ... nam='SQL*Net message from client' ela= 5213365 ...stuff for Experience CWAIT ... nam='SQL*Net message from client' ela= 855

more stuff for Experience Cetc.WAIT ... nam='SQL*Net message from client' ela= 2044420 ...

@CaryMillsap

It’s actually a common pa"ern.

Behold the network abusing, cha"y app…

31

CALL-NAME DURATION % CALLS MEAN MIN MAX--------------------------- ---------- ------ ------- -------- -------- --------SQL*Net message from client 200.939935 99.5% 142,520 0.001410 0.000937 0.202835SQL*Net message to client 0.526257 0.3% 142,520 0.000004 0.000000 0.000130FETCH 0.439933 0.2% 142,518 0.000003 0.000000 0.001000PARSE 0.000000 0.0% 2 0.000000 0.000000 0.000000EXEC 0.000000 0.0% 2 0.000000 0.000000 0.000000--------------------------- ---------- ------ ------- -------- -------- --------TOTAL (5) 201.906125 100.0% 427,562 0.000472 0.000000 0.202835

@CaryMillsap

CALL-NAME DURATION % CALLS MEAN MIN MAX--------------------------- ---------- ------ ------- -------- -------- --------SQL*Net message from client 0.911041 36.5% 72 0.012653 0.000890 0.026857SQL*Net more data to client 0.841897 33.7% 2,688 0.000313 0.000004 0.013287FETCH 0.744885 29.8% 70 0.010641 0.006999 0.012998PARSE 0.001000 0.0% 2 0.000500 0.000000 0.001000SQL*Net message to client 0.000147 0.0% 72 0.000002 0.000001 0.000006EXEC 0.000000 0.0% 2 0.000000 0.000000 0.000000--------------------------- ---------- ------ ------- -------- -------- --------TOTAL (6) 2.498970 100.0% 2,906 0.000860 0.000000 0.026857

It’s actually a common pa"ern.

…and the way it should behave.

32

@CaryMillsap 33

4 Oceans, islands, rivers

@CaryMillsap 34

Such trace files have

islands of activity

in an ocean of idleness.

@CaryMillsap 35

But…

@CaryMillsap 36

An island can have rivers.

@CaryMillsap

@CaryMillsap

So can a trace file.

WAIT ... nam='SQL*Net message from client' ela= 1202689 ...stuff for Experience AWAIT ... nam='SQL*Net message from client' ela= 342

more stuff for Experience AWAIT ... nam='SQL*Net message from client' ela= 1492

yet more stuff for experience Aetc.WAIT ... nam='SQL*Net message from client' ela= 4260917 ...stuff for Experience BWAIT ... nam='SQL*Net message from client' ela= 2928

more stuff for Experience Betc.WAIT ... nam='SQL*Net message from client' ela= 5213365 ...stuff for Experience CWAIT ... nam='SQL*Net message from client' ela= 855

more stuff for Experience Cetc.WAIT ... nam='SQL*Net message from client' ela= 2044420 ...

37

@CaryMillsap 38

5 How to cope with the problem

@CaryMillsap 39

…the problem?

@CaryMillsap

Trace file with oceans.

Find the 2.3-sec experience.

40

CALL-NAME DURATION % CALLS MEAN MIN MAX--------------------------- --------- ------ ------ -------- -------- ---------SQL*Net message from client 31.018640 99.3% 10,003 0.003101 0.000023 20.121507direct path read 0.110575 0.4% 10,000 0.000011 0.000004 0.020533FETCH 0.081993 0.3% 5,001 0.000016 0.000000 0.001000SQL*Net message to client 0.008804 0.0% 10,003 0.000001 0.000000 0.000061PARSE 0.003999 0.0% 2 0.001999 0.000000 0.003999EXEC 0.001000 0.0% 2 0.000500 0.000000 0.001000CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000--------------------------- --------- ------ ------ -------- -------- ---------TOTAL (7) 31.225011 100.0% 35,013 0.000892 0.000000 20.121507

What percentage of this 2.3-sec experience is rivers?

@CaryMillsap 41

CALL-NAME DURATION % CALLS MEAN MIN MAX--------------------------- --------- ------ ------ -------- -------- ---------SQL*Net message from client 31.018640 99.3% 10,003 0.003101 0.000023 20.121507direct path read 0.110575 0.4% 10,000 0.000011 0.000004 0.020533FETCH 0.081993 0.3% 5,001 0.000016 0.000000 0.001000SQL*Net message to client 0.008804 0.0% 10,003 0.000001 0.000000 0.000061PARSE 0.003999 0.0% 2 0.001999 0.000000 0.003999EXEC 0.001000 0.0% 2 0.000500 0.000000 0.001000CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000--------------------------- --------- ------ ------ -------- -------- ---------TOTAL (7) 31.225011 100.0% 35,013 0.000892 0.000000 20.121507

CALL-NAME DURATION % CALLS MEAN MIN MAX--------------------------- --------- ------ ------ -------- -------- ---------direct path read 0.110575 53.6% 10,000 0.000011 0.000004 0.020533FETCH 0.081993 39.7% 5,001 0.000016 0.000000 0.001000SQL*Net message to client 0.008804 4.3% 10,003 0.000001 0.000000 0.000061PARSE 0.003999 1.9% 2 0.001999 0.000000 0.003999EXEC 0.001000 0.5% 2 0.000500 0.000000 0.001000CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000--------------------------- --------- ------ ------ -------- -------- ---------TOTAL (6) 0.206371 100.0% 25,010 0.000008 0.000000 0.020533

Trace file with no water at all.

Doesn’t explain the 2.3-sec experience.

What percentage of this 2.3-sec experience is rivers?

Trace file with oceans.

Find the 2.3-sec experience.

@CaryMillsap 42

CALL-NAME DURATION % CALLS MEAN MIN MAX--------------------------- --------- ------ ------ -------- -------- ---------SQL*Net message from client 31.018640 99.3% 10,003 0.003101 0.000023 20.121507direct path read 0.110575 0.4% 10,000 0.000011 0.000004 0.020533FETCH 0.081993 0.3% 5,001 0.000016 0.000000 0.001000SQL*Net message to client 0.008804 0.0% 10,003 0.000001 0.000000 0.000061PARSE 0.003999 0.0% 2 0.001999 0.000000 0.003999EXEC 0.001000 0.0% 2 0.000500 0.000000 0.001000CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000--------------------------- --------- ------ ------ -------- -------- ---------TOTAL (7) 31.225011 100.0% 35,013 0.000892 0.000000 20.121507

CALL-NAME DURATION % CALLS MEAN MIN MAX--------------------------- --------- ------ ------ -------- -------- ---------direct path read 0.110575 53.6% 10,000 0.000011 0.000004 0.020533FETCH 0.081993 39.7% 5,001 0.000016 0.000000 0.001000SQL*Net message to client 0.008804 4.3% 10,003 0.000001 0.000000 0.000061PARSE 0.003999 1.9% 2 0.001999 0.000000 0.003999EXEC 0.001000 0.5% 2 0.000500 0.000000 0.001000CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000--------------------------- --------- ------ ------ -------- -------- ---------TOTAL (6) 0.206371 100.0% 25,010 0.000008 0.000000 0.020533

CALL-NAME DURATION % CALLS MEAN MIN MAX--------------------------- --------- ------ ------ -------- -------- ---------SQL*Net message from client 2.072877 90.9% 10,001 0.000207 0.000023 0.016861direct path read 0.110575 4.9% 10,000 0.000011 0.000004 0.020533FETCH 0.081993 3.6% 5,001 0.000016 0.000000 0.001000SQL*Net message to client 0.008804 0.4% 10,003 0.000001 0.000000 0.000061PARSE 0.003999 0.2% 2 0.001999 0.000000 0.003999EXEC 0.001000 0.0% 2 0.000500 0.000000 0.001000CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000--------------------------- --------- ------ ------ -------- -------- ---------TOTAL (7) 2.279248 100.0% 35,011 0.000065 0.000000 0.020533

Trace file with no water at all.

Doesn’t explain the 2.3-sec experience.

Trace file with rivers, but no oceans.

Explains the 2.3-sec experience exactly.

90.9% is rivers. Easy.

Trace file with oceans.

Find the 2.3-sec experience.

@CaryMillsap

To Oracle, it’s all just water.

It sees no difference between salt water and fresh water,between response-time SNMFC and non-response-time SNMFC.

It’s all just SQL*Net message from client.

43

WAIT ... nam='SQL*Net message from client' ela= 1202689 ...stuff for Experience AWAIT ... nam='SQL*Net message from client' ela= 342

more stuff for Experience AWAIT ... nam='SQL*Net message from client' ela= 1492

yet more stuff for experience Aetc.WAIT ... nam='SQL*Net message from client' ela= 4260917 ...stuff for Experience BWAIT ... nam='SQL*Net message from client' ela= 2928

more stuff for Experience Betc.WAIT ... nam='SQL*Net message from client' ela= 5213365 ...stuff for Experience CWAIT ... nam='SQL*Net message from client' ela= 855

more stuff for Experience Cetc.WAIT ... nam='SQL*Net message from client' ela= 2044420 ...

@CaryMillsap

However, there are

big SQL*Netmessage fromclient calls

…and li"le SQL*Netmessage from

client calls.

This is a clue.

44

@CaryMillsap

?SQL*Net message from client

if ela ≥ 1.00 sec then ocean (not response time)

otherwise river (response time)

45

@CaryMillsap

It actually works pre"y well.

46

@CaryMillsap

rivers

rivers

ocean

47

@CaryMillsap

SQL*Net message from client

if ela ≥ :b then ocean (not response time)

otherwise river (response time)

Sometimes you have to fine-tune the boundary value.

48

@CaryMillsap 49

$ mrskew --rc=txnz.05 v11203_ora_26827.trc

EXP-ID   DURATION       %   CALLS      MEAN       MIN       MAX-----------  ---------  ------  ------  --------  --------  -------- 0  24.236626   27.5%     327  0.074118  0.050007  0.283979 19547   2.212251    2.5%     807  0.002741  0.000000  0.049582 27247   2.112561    2.4%     791  0.002671  0.000000  0.048360 24221   1.927336    2.2%     267  0.007218  0.000000  0.048210 16129   1.450686    1.6%     683  0.002124  0.000000  0.049147 22289   0.997744    1.1%     643  0.001552  0.000000  0.045547 29620   0.982700    1.1%     562  0.001749  0.000000  0.049281 2843   0.967385    1.1%     655  0.001477  0.000000  0.048986 33239   0.920264    1.0%     139  0.006621  0.000000  0.047733 23031   0.917492    1.0%     647  0.001418  0.000000  0.049615 17091   0.899165    1.0%     579  0.001553  0.000000  0.045020 14701   0.864747    1.0%     123  0.007030  0.000000  0.049502 6509   0.805075    0.9%     437  0.001842  0.000000  0.043662 653   0.780152    0.9%     403  0.001936  0.000000  0.048553 36583   0.773713    0.9%     484  0.001599  0.000000  0.030175 26287   0.767064    0.9%     619  0.001239  0.000000  0.038591 2333   0.750920    0.9%     103  0.007290  0.000000  0.045808 9685   0.720571    0.8%     479  0.001504  0.000000  0.047614 25107   0.718329    0.8%     115  0.006246  0.000000  0.043572 28487   0.715467    0.8%     107  0.006687  0.000000  0.048749 309 others  43.717756   49.5%   8,389  0.005211  0.000000  0.049996-----------  ---------  ------  ------  --------  --------  --------TOTAL (329)  88.238004  100.0%  17,359  0.005083  0.000000  0.283979

@CaryMillsap 50

6 How to fix the problem

@CaryMillsap

For connection pooling apps, theoceans-islands-rivers thing works pre"y well.

51

But it’s not 100% reliable.For example, what if you have a river that’s bigger than one of your oceans?

@CaryMillsap

If you know your app, you know where the experience boundaries are.

52

@CaryMillsap

If you can instrument your app,it will automatically tell you

where the experience boundaries are.

53

@CaryMillsap

If you’re running code in an interactive development environment, it’s easy:

54

1. activate trace;

2. execute the code path for the experience;

3. deactivate trace;

@CaryMillsap

If you’re running code in an interactive development environment, it’s easy:

55

1. activate trace; 1.1. There must be NO LATENCY here.

2. execute the code path for the experience; 2.1. There must be NO LATENCY here.

3. deactivate trace;

@CaryMillsap 56

@CaryMillsap

This is the best thing you can do:

Instrument your application so that the trace data explains exactly one user

response time experience.

57

@CaryMillsap

You can fix a trace file that accountsfor more time than you want.

58

…E.g., if you’re stuck activating trace with dbms_monitor.session_trace_enable(:sid,:serial,true,true).

@CaryMillsap

But fixing a trace file requires either

a) effo$

b) tools

59

I’ll show you both.

@CaryMillsap

Two types of trace file scoping problems:

1. Unwanted calls at the bo!om

2. Unwanted calls at the top or middle

60

@CaryMillsap

You can cut the bo!om off a trace file.

…With, say, vi. No problem.

61

1

@CaryMillsap 62

...WAIT #0: nam='direct path read' ela= 7 file number=4 first dba=4665 block cnt=1

obj#=86815 tim=1313696204681916WAIT #0: nam='direct path read' ela= 5 file number=4 first dba=4665 block cnt=1

obj#=86815 tim=1313696204681942WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0

obj#=86815 tim=1313696204681955WAIT #0: nam='SQL*Net message from client' ela= 141 driver id=1650815232 #bytes=1 p3=0

obj#=86815 tim=1313696204682115FETCH #5:c=0,e=5,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1601196873,tim=1313696204682136STAT #5 id=1 cnt=5000 pid=0 pos=1 obj=86814 op='TABLE ACCESS FULL T (cr=5003 pr=0 pw=0

time=23585 us cost=11 size=10075000 card=5000)'WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0

obj#=86815 tim=1313696204682254

*** 2011-08-18 14:36:53.506WAIT #5: nam='SQL*Net message from client' ela= 8824256 driver id=1650815232 #bytes=1

p3=0 obj#=86815 tim=1313696213506522CLOSE #5:c=0,e=22,dep=0,type=0,tim=1313696213506643=====================PARSING IN CURSOR #2 len=55 dep=0 uid=84 oct=42 lid=84 tim=1313696213506753

hv=2217940283 ad='0' sqlid='06nvwn223659v'alter session set events '10046 trace name context off'END OF STMTPARSE #2:c=0,e=70,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=1313696213506752EXEC #2:c=1000,e=354,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=1313696213507146␄

@CaryMillsap 63

...WAIT #0: nam='direct path read' ela= 7 file number=4 first dba=4665 block cnt=1

obj#=86815 tim=1313696204681916WAIT #0: nam='direct path read' ela= 5 file number=4 first dba=4665 block cnt=1

obj#=86815 tim=1313696204681942WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0

obj#=86815 tim=1313696204681955WAIT #0: nam='SQL*Net message from client' ela= 141 driver id=1650815232 #bytes=1 p3=0

obj#=86815 tim=1313696204682115FETCH #5:c=0,e=5,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1601196873,tim=1313696204682136STAT #5 id=1 cnt=5000 pid=0 pos=1 obj=86814 op='TABLE ACCESS FULL T (cr=5003 pr=0 pw=0

time=23585 us cost=11 size=10075000 card=5000)'WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0

obj#=86815 tim=1313696204682254

# *** 2011-08-18 14:36:53.506# WAIT #5: nam='SQL*Net message from client' ela= 8824256 driver id=1650815232 #bytes=1

p3=0 obj#=86815 tim=1313696213506522# CLOSE #5:c=0,e=22,dep=0,type=0,tim=1313696213506643# =====================# PARSING IN CURSOR #2 len=55 dep=0 uid=84 oct=42 lid=84 tim=1313696213506753

hv=2217940283 ad='0' sqlid='06nvwn223659v'# alter session set events '10046 trace name context off'# END OF STMT# PARSE #2:c=0,e=70,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=1313696213506752# EXEC #2:c=1000,e=354,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=1313696213507146␄

@CaryMillsap

But cu"ing calls out of the bo!om of a trace file is almost

never going to be enough.

64

@CaryMillsap

Cu"ing calls out of the middle

or the top requires magic.

65

2

@CaryMillsap 66

*** 2011-08-18 14:36:21.576*** SESSION ID:(23.42) 2011-08-18 14:36:21.576*** CLIENT ID:() 2011-08-18 14:36:21.576*** SERVICE NAME:(SYS$USERS) 2011-08-18 14:36:21.576*** MODULE NAME:(SQL*Plus) 2011-08-18 14:36:21.576*** ACTION NAME:() 2011-08-18 14:36:21.576 WAIT #8: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1

p3=0 obj#=-1 tim=1313696181576631

*** 2011-08-18 14:36:41.698WAIT #8: nam='SQL*Net message from client' ela= 20121507 driver id=1650815232

#bytes=1 p3=0 obj#=-1 tim=1313696201698518CLOSE #8:c=0,e=41,dep=0,type=1,tim=1313696201698681=====================PARSING IN CURSOR #7 len=352 dep=1 uid=84 oct=3 lid=84 tim=1313696201699956

hv=2904344320 ad='3e4f6d48' sqlid='f70vdzaqjtjs0'SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB)

opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1") FROM (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ :"SYS_B_2" AS C1, :"SYS_B_3" AS C2 FROM "T" "T") SAMPLESUB

END OF STMTPARSE #7:c=0,e=402,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1313696201699946...

@CaryMillsap 67

*** 2011-08-18 14:36:21.576*** SESSION ID:(23.42) 2011-08-18 14:36:21.576*** CLIENT ID:() 2011-08-18 14:36:21.576*** SERVICE NAME:(SYS$USERS) 2011-08-18 14:36:21.576*** MODULE NAME:(SQL*Plus) 2011-08-18 14:36:21.576*** ACTION NAME:() 2011-08-18 14:36:21.576 WAIT #8: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1

p3=0 obj#=-1 tim=1313696181576631

*** 2011-08-18 14:36:41.698WAIT #8: nam='SQL*Net message from client' ela= 20121507 driver id=1650815232

#bytes=1 p3=0 obj#=-1 tim=1313696201698518CLOSE #8:c=0,e=41,dep=0,type=1,tim=1313696201698681=====================PARSING IN CURSOR #7 len=352 dep=1 uid=84 oct=3 lid=84 tim=1313696201699956

hv=2904344320 ad='3e4f6d48' sqlid='f70vdzaqjtjs0'SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB)

opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1") FROM (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ :"SYS_B_2" AS C1, :"SYS_B_3" AS C2 FROM "T" "T") SAMPLESUB

END OF STMTPARSE #7:c=0,e=402,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1313696201699946...

If you delete this line, then its 20.121507-second contribution to the 20.122009 seconds between calls will be unexplained.

(1,313,696,201.698681 – 41) – 1,313,696,181.576631 = 20.122009

@CaryMillsap 68

*** 2011-08-18 14:36:21.576*** SESSION ID:(23.42) 2011-08-18 14:36:21.576*** CLIENT ID:() 2011-08-18 14:36:21.576*** SERVICE NAME:(SYS$USERS) 2011-08-18 14:36:21.576*** MODULE NAME:(SQL*Plus) 2011-08-18 14:36:21.576*** ACTION NAME:() 2011-08-18 14:36:21.576 WAIT #8: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1

p3=0 obj#=-1 tim=1313696181576631

*** 2011-08-18 14:36:41.698WAIT #8: nam='SQL*Net message from client' ela= 20121507 driver id=1650815232

#bytes=1 p3=0 obj#=-1 tim=1313696201698518CLOSE #8:c=0,e=41,dep=0,type=1,tim=1313696201698681=====================PARSING IN CURSOR #7 len=352 dep=1 uid=84 oct=3 lid=84 tim=1313696201699956

hv=2904344320 ad='3e4f6d48' sqlid='f70vdzaqjtjs0'SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB)

opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1") FROM (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ :"SYS_B_2" AS C1, :"SYS_B_3" AS C2 FROM "T" "T") SAMPLESUB

END OF STMTPARSE #7:c=0,e=402,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1313696201699946...

You can’t just delete this line (or set its ela value to 0). You must also subtract 20.121507 seconds from every *** line and tim value from there to the end of the file.

@CaryMillsap

Cu"ing the top is just like cu"ing the middle, because of the *** lines.

69

@CaryMillsap 70

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing optionsORACLE_HOME = /app/oracle/product/11.2.0/db_1System name: LinuxNode name: local-orclRelease: 2.6.18-194.el5Version: #1 SMP Mon Mar 29 20:06:41 EDT 2010Machine: i686Instance name: yyzRedo thread mounted by this instance: 1Oracle process number: 25Unix process pid: 10358, image: oracle@local-orcl (TNS V1-V3)

*** 2011-08-18 14:36:21.576*** SESSION ID:(23.42) 2011-08-18 14:36:21.576*** CLIENT ID:() 2011-08-18 14:36:21.576*** SERVICE NAME:(SYS$USERS) 2011-08-18 14:36:21.576*** MODULE NAME:(SQL*Plus) 2011-08-18 14:36:21.576*** ACTION NAME:() 2011-08-18 14:36:21.576 WAIT #8: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1

p3=0 obj#=-1 tim=1313696181576631

*** 2011-08-18 14:36:41.698WAIT #8: nam='SQL*Net message from client' ela= 20121507 driver id=1650815232

#bytes=1 p3=0 obj#=-1 tim=1313696201698518CLOSE #8:c=0,e=41,dep=0,type=1,tim=1313696201698681

@CaryMillsap

$ mrcallrm --lines=26,35078 yyz_ora_10358.trc > yyz_ora_10358-fixed.trc

71

@CaryMillsap 72

7 References

@CaryMillsap 73

Referencesh"p://www.slideshare.net/carymillsap/how-to-find-and-fix-your

A free online presentation about how to instrument your application so it will automatically tell you where the experience boundaries are.

h"p://method-r.com/blogs/company-blog/214-finding-connection-pool-response-times-with-method-r-tools

“Connection pool response times with Method R Tools (Oceans, Islands, and Rivers),” a blog post explaining the oceans-islands-rivers metaphor.

h"ps://motdcr3.eventbrite.com

“Mastering Oracle Trace Data free online class reunion,” to be held 11:00a–12:30p CST Thursday, February 10, 2015.

h"p://amzn.to/173bpzg

“The Method R Guide to Mastering Oracle Trace Data,” a textbook for the 1- to 2-day course that covers Method R Corporation so$ware and methods.

h"p://method-r.com/so$ware/m%race

A Method R extension for Oracle SQL Developer. Method R Trace collects trace data and retrieves it for you, automatically.

h"p://method-r.com/so$ware/m%ools

A set of so$ware tools for mining and manipulating Oracle extended SQL trace data. I use mrskew to repo% on durations of individual experiences recorded in extended SQL trace files. I use mrcallrm to eliminate calls from my trace data. It automatically ripples the required tim and *** line changes throughout a trace file.

h"p://method-r.com/courses/mastering-oracle-trace-data

“Mastering Oracle Trace Data,” a 1- to 2-day course that covers Method R Corporation so$ware and methods.

@CaryMillsap 74

8 Your turn

@CaryMillsap 75

@CaryMillsap

www.enkitec.commethod-r.com

Thank you