Post on 25-May-2015
M
PAR
I1trAo
DcaS
ining th
RT 1 - Whe
I’m certain th108 AWR repthe bottleneckreports by hanAWR reports of the 1000+ l
Definitely thiconsultant I malready availaSQL*Plus ses
e AWR
New CPU
a cost. H
available
DBAs an
guesswo
proper p
for your
savings
AWR is
steroids
workloa
samples
to visua
AWR da
and Util
metrics
In this p
to have
Analysis
re it all sta
hat DBAs or dports in just ank? And what nd, and it is ageneration, e
lines of perfo
s will lead tomust also be awable to help ssion?
Reposito
Us and storage a
Hence, capacity p
e and to handle
nd IT managers
ork you'll end up
planning, and ma
r workload with
for the company
a built-in data s
". It has impro
d information w
s, we could build
alize data and us
ta samples is we
ization in terms
for Capacity Plan
paper you will lea
e a clear-cut me
s, and Performan
arted
developers don hour. How ais the bottlena daunting anespecially whrmance data t
o longer analyware on how with the trou
tory for Cother Re
KOracle ACE
karlara
arrays are getting
planning plays a
expected and un
is justifying the
p getting the mo
anagement of gr
allowance for a
y and a happier I
store that starte
oved significantl
when going thro
amazing reports
se statistical met
e are able to def
of CPU, IO, me
nning.
arn how to make
easurement on
nce Firefighting.
on’t have enoabout 108 AW
neck? Well bend repetitive ehen you start rto correlate it
ysis periods hto optimize m
ubleshooting
Capacityal World
Karl Arao E, OCP-DBAao@gmail.co
g faster, but the
very important
nexpected worklo
expense of add
st expensive har
rowth, you'll be
particular grow
IT shop.
d in 10gR1 and
y in 11gR2, en
ugh all the AW
s that will let us
thods for analys
fine the databas
emory, and netw
e use of the AWR
resources to aid
ough time to sWR reports inefore it will taexecution of areading each t to the proble
hence longermy troubleshobut what if
y Plannd Stuff
A, RHCE om
se resources are
role to ensure pr
oads. Another cri
ding resources on
rdware. With pro
able to get just
wth period. This
is very much lik
nabling you to
R snapshots. Fr
notice trends an
is. Even more su
se server's Capac
work, which are v
R, specifically the
d in Capacity P
spare to readn 5 minutes juake so much awrrpt.sql. Yof them and
em at hand.
r time for a pooting time. Yyou are only
ing, Vis
e finite and come
roper resources a
itical matter for t
n the system. W
oper measureme
the right hardwa
will result in hu
ke a "Statspack
have a far bet
rom the AWR d
d makes it possi
urprising about t
city, Requiremen
very important k
e DBA_HIST view
Planning, Predict
d 108 AWR reust to answer of my time ju
You will be ovyou only nee
problem to beYou can arguy left with ju
ualizatio
e at
are
the
With
ent,
are
uge
on
tter
ata
ble
the
nts,
key
ws,
tive
eports in a dathe question wust to generatverwhelmed bed to see parti
e solved and ue that there arust a comma
on, and
ay, even morwhat period ite these AWRby the manuaicular section
as a databasre visual tool
and line or an
e s
R al s
e s n
Tpv
PAR
AsO Tap3t
Fdsd Tf
T3
At
This scenario performance dvisualize the d
RT 2 - How
AWR is muchsources of theOracle version
The AWR repan AWR repoperformance p339) within ththe workload
For the querydata blocks resince instancedelta and tran
To transform formula. See t
IO MB/s = ( (d = ((5 = 73
To validate th339. The imag
Also a run of the throughpu
triggered medata in more data, or even p
w to mine th
h like “Statspae AWR reporn 11.2.
port provides ort for SNAPproblems we he specified inchange that’s
y output we aead from diske start. We arnsforming it to
the delta to the example f
delta * <block_size5663126 * 8192) /3.37 MB/s
he accuracy oge below show
Automatic Dut of 74 MB/s
e to mine on tmeaningful mpossible to do
he AWR
ack on steroidrt are the DB
a single summP_ID 335 to are more intenterval. In ths happening.
are investigatik. It is also imre particularlyo a more mea
a more meanfor SNAP_ID
e>) /1024/1024 ) /1024/1024) / 603
f the derived ws the delta w
Database Diagthat is really
the source tabmanner that wo some statist
ds” it is a wonBA_HIST view
mary report b339 that is aerested to see at way we ha
ing for the Smportant to noy interested o
aningful and r
ningful outpuD 338 below:
/ <snap_duration_
value we neewe used to der
gnostic Monitclose to our d
bles of the AWwill be easier tics out of it.
nderful data cws which hav
based upon anan interval tim
what occurreave a granular
YSSTAT staote this is a cuon the delta oeadable outpu
ut that we cou
_in_seconds>
ed to comparrive the MB/s
tor (ADDM) derived value
WR report to for me to no
collector for Oave grown fro
n interval of tme from 6:20ed during eacr view of wha
atistic “physicumulative phyof each SNAPut.
uld easily un
re it with the s is correct.
on SNAP_IDe
cut out the uotice trends an
Oracle and OSom 67 in Ora
time. On the i0 – 7:01AM. ch of the sampat’s going on
cal reads” whysical reads bP_ID that is
nderstand we
actual AWR
D 338 – 339 s
unnecessary annd even poss
S statistics. Tacle version 1
image below However wh
ple (335,336,n and have a b
hich is the toby all the dataend_value –
would apply
report on SN
shows that we
nd present thible for me to
The underlying10.1 to 108 in
we can creathen analyzing, 337, 338 andbetter view on
tal number oabase sessionstart_value =
y the IO MB/
NAP_ID 338 –
e are reaching
e o
g n
e g d n
of s =
s
–
g
A
T
And checking
The data show
SELECT * FRO( SELECT s0.sn TO_CHAR(s0 s10t0.stat_n s10t0.value s10t1.value (s10t1.value round(((((s1 ),2) as phyrFROM dba_h dba_hi dba_hi dba_hiWHERE s0.dbAND s1.dbid AND s10t0.dbAND s10t1.dbAND s0.instanAND s1.instanAND s10t0.inAND s10t1.inAND s1.snap_AND s10t0.snAND s10t1.snAND s10t0.stAND s10t1.st) WHERE snap_ORDER BY sn
g it with the E
wn above com
OM nap_id snap_id, 0.END_INTERVAL_name, start_value, end_value, e ‐ s10t0.value) de0t1.value ‐ s10t0.v + EXTRACT(H + EXTRACT(M + EXTRACT(Sreads_mbps ist_snapshot s0, st_snapshot s1, st_sysstat s10t0, st_sysstat s10t1 bid = 26079 = s0.dbid bid = s0.dbibid = s0.dbince_number = 1nce_number = sstance_number =stance_number =_id = s0.snanap_id = s0.snnap_id = s0.snat_name = 'phat_name = s10
_id in (335,336,33ap_id ASC;
Enterprise Man
mes from quer
_TIME,'YY/MM/DD
lta, value)* 8192)/102HOUR FROM s1.ENMINUTE FROM s1SECOND FROM s1
‐‐ physica
950532 ‐‐ DBID
d d 1 ‐‐ INSTANs0.instance_numb= s0.instance_num= s0.instance_numap_id + 1 nap_id nap_id + 1 hysical reads' 0t0.stat_name
7,338,339)
nager Perform
ry below:
D HH24:MI') TIME,
24/1024) / ((roundND_INTERVAL_TIM.END_INTERVAL_T.END_INTERVAL_T
l reads, diffed
NCE_NUMBER er
mber mber
mance page sh
d(EXTRACT(DAY FRME ‐ s0.END_INTERTIME ‐ s0.END_INTTIME ‐ s0.END_INT
hows that the
ROM s1.END_INTERVAL_TIME) * 60 TERVAL_TIME) TERVAL_TIME) / 6
e Disk IO is ar
ERVAL_TIME ‐ s0.E
60, 2))*60)
round our der
END_INTERVAL_TI
rived value
IME) * 1440
You may have noticed that I used the SQL trick below that has similar effect to the LAG function. This enables the query to get the start_value and end_value on a single row making it possible to get the delta value and apply the performance formula. The view DBA_HIST_SNAPSHOT also acts as an ultimate reference of snap information that allows joining to the other DBA_HIST views to provide meaningful data on other subsystems or workload performance data.
AND s10t0.snap_id = s0.snap_id AND s10t1.snap_id = s0.snap_id + 1
The query I’ve shown you is just one part of the story, that’s only giving the “IO Read MB/s” - an IO subsystem statistic. Ideally we must have a correlation on the following subsystems of the database server to fully characterize the overall workload and performance:
1) Oracle Oracle instance and database configuration
2) Operating System CPU, memory, IO, and network
3) Application SQLs and anything specific to the application
For the correlation we would be using the “3-circle analysis” technique [1] where each subsystem represents a circle and is diagnosed separately and then in combination. If the problem resides with the database server, the overlap of the 3 circles is the current performance problem. By doing this we will have a clear correlation of the workload and performance across subsystems and will have targeted efforts to improve the overall response time. In mining the AWR having a query in a time series layout and only the relevant statistics shown side by side can be very useful in various ways and even if it can’t be shown side by side each bottleneck period relates to a particular SNAP_ID so the correlation across various performance data is extremely possible!
Having this we would have the following advantages
Quickly notice trends for performance diagnosis We have the beautiful set of workload and performance data now in our control We have lots of data points for statistical and predictive analysis Faster analysis ever!
Aa T
Tc
Script Naawr_genw
awr_topev
awr_servic
As I go alongapplied succes
The chart belo
The table belcreated:
ame DBwl DB
DBDBDB
vents DBDBDB
ces DBDB
g with my ressfully on rea
ow shows the
low shows th
IM
BA_HIST vieBA_HIST_SNAPSBA_HIST_OSSTABA_HIST_SYS_TBA_HIST_SYSST
BA_HIST_SNAPSBA_HIST_SYSTEBA_HIST_SYS_T
BA_HIST_SNAPSBA_HIST_SERVI
esearch of mial world perfo
categorical r
he important
MPORTANT NO
ews SHOT AT TIME_MODEL TAT
SHOT EM_EVENT TIME_MODEL
SHOT ICE_STAT
ining the AWormance scena
relationship o
details of th
TE: Diagnostic
Data presAAS CPU capacCPU requirMemory reIO requireLogged onCPU Utiliza
Event Event RanWaits Time Avgwt (msDB Time %AAS Wait Class
Service NaDB Time DB CPU Physical ReLogical ReaAAS
WR I have crarios.
f the scripts:
he scripts and
c Pack License
sented
city rements equirements ments users
ation
k
s) %
ame
eads ads
reated and co
d some reaso
e is needed for
DescriptioThis is the overview ofthe relations Utilization = The AAS coperiods whejust idle This is a vewith AAS m Coming frommust be awdrilling dowof data over Graphing ththat outputsdifferent wayou could g Service enaor allowing This data isus a classifdatabase. Showing thicolumn will most the wo
ollected some
on behind ho
r the scripts
on starting point.f the load of thship of the form
= Requirements
olumn serves aere the databa
rsion of "Top 5 etric.
m the awr_genwware about the c
n on the time cr a period of tim
his data will be ms a nice graph aait classes givingo back and drill
ables the groupithe distribution
s commonly seefication of the
is data in a tim give us an ideaorkload of the d
e useful scrip
ow they are f
You first run he database semula
/ Capacity
as a (golden) mase could be h
Timed Events"
wl, for the AAS components of Acomponents) anme (across SNAP
much like the Eand slicing the Ag you a broad “l down on the p
ng of common of connections
en on the Enter application/mo
me series mannea if particular apdatabase.
pts that I hav
formatted and
this SQL to harver. It clearly
metric on findihaving a bottlen
but across SNA
to be more useAAS (much like d have this kindP_IDs).
nterprise ManagAAS component“historical” viewpast load activity
database connes (e.g. RAC).
prise Manager todule activity o
er and adding applications are
e
d
ave an shows
ng the neck or
AP_IDs
ful we d
ger ts to which y.
ections
to give on the
an AAS driving
awr_sysstat DBA_HIST_SNAPSHOT DBA_HIST_OSSTAT DBA_HIST_SYS_TIME_MODEL DBA_HIST_SYSSTAT
AAS LIO/s DB Block Changes/s User Calls/s Parses/s Hard Parses/s Sorts/s Logon/s SQL*NET to client MB SQL*NET to dblink MB
This is a version of "Load Profile" but across SNAP_IDs with AAS metric. Useful to quickly notice the Oracle workload change. You may put additional SYSSTAT statistic you want to monitor here.
awr_topsqlx DBA_HIST_SNAPSHOT DBA_HIST_SQLSTAT DBA_HIST_SQLTEXT
SQL_ID Plan Hash Value Module Elapsed Time (s) Elapsed Time / exec (s) CPU Time (s) IO Time (s) App Time (s) Concurrency Time (s) Cluster Wait (s) LIO PIO Direct Writes Rows Exec Parse Count PX Exec Time Rank AAS SQL_TEXT
The “SQL section” of the AWR report is usually segregated into sections ordered by the following:
Elapsed Time CPU Time Gets Reads Executions Parse Calls
Having separate data for a particular problematic SQL_ID spread over 1000+ lines of report makes it hard to find every detail about its performance. I feel there’s a better way to present the data. And here are the info/sections you'll get from the script and some short description: 1) snap_id, time, instance, snap duration The time period and snap_id could be used to show the SQLs for a given workload period..let's say you usual work hours is 9-6pm, you could just show the particular SQLs on that period.. there's a data range section on the bottom of the script you could make use of it if you want to filter. 2) sql_id, plan_hash_value, module You could make use of this info if you want to know where the SQL was executed (SQL*Plus, OWB, Toad, etc.).. plus you could compare the plan_hash_value but I suggest you make use of Kerry Osborne's awr_unstable_plans.sql script if you'd like to search for unstable plans. 3) total elapsed time, elapsed time per exec - cpu time - io time - app wait time - concurrency wait time - cluster wait time These are the time info.. at least without tracing the SQL you'd know what time component is consuming the elapsed time of that particular SQL.. so let's say your total elapsed time is 1000sec, and cpu time of 30sec, and io time of 300sec... you would know that it is consuming significant IO but you have to look for the other 670sec which could be attributed by "other" wait events (like PX Deq Credit: send blkd,etc,etc) 4) - LIOs - PIOs - direct writes - rows - executions - parse count - PX Some other statistics about the SQL.. if your incurring a lot of PIOs, how many times this SQL was executed on that period, the # of PX spawed.. just be careful about these numbers if you have "executions" of
let's say 8.. you have to divide these values to 8 as well as on the time section.. only the "elapsed time per exec" is the per execution value.. this is for formatting reasons because I can't fit them all on my screen.. 5) - AAS (Average Active Sessions) - Time Rank - SQL type, SQL text This is one of my favorites... this will measure how's the SQL is performing against the database server.. I'm using the AAS & CPU count as my yardstick for a possible performance problem (I suggest reading Kyle's stuff about this): if AAS < 1 -- Database is not blocked AAS ~= 0 -- Database basically idle -- Problems are in the APP not DB AAS < # of CPUs -- CPU available -- Database is probably not blocked -- Are any single sessions 100% active? AAS > # of CPUs -- Could have performance problems AAS >> # of CPUS -- There is a bottleneck so having the AAS as another metric on the TOP SQL is good stuff.. I've also added the "time rank" column to know what is the SQLs ranking on the top SQL.. normally the default settings of the script will show time rank 1 to 5.. this could be useful also if you are finding a particular SQL that is on rank #15 and you are seeing that there's an adhoc query that is time rank #1 and #2 affecting the database performance.. And.... this script could also show SQLs that span across SNAP_IDs... I would order the output by SNAP_ID and filter on that particular SQL then you would see that if the SQL is still running and span across let's say 2 SNAP_IDs then the exec count would be 0 (zero) and elapsed time per exec is 0 (zero).. only the time when the query is finished you'll see these values populated.. I've noticed this behavior and it's the same thing that is shown on the AWR reports.. you could go here for that scenario http://karlarao.tiddlyspot.com/#%5B%5BTopSQL%20on%20AWR%5D%5D
awr_topsql DBA_HIST_SNAPSHOT DBA_HIST_SQLSTAT DBA_HIST_SQLTEXT
SQL_ID Plan Hash Value Module Elapsed Time (s) Elapsed Time / exec (s) CPU Time (s) Cluster Wait (s) LIO PIO Rows Exec Parse Count PX Exec Time Rank AAS
Similar columns from awr_topsqlx but this time just showing the top 20 SQLs across SNAP_IDs.
awr_unstable_plans (by Kerry Osborne)
DBA_HIST_SNAPSHOT DBA_HIST_SQLSTAT
SQL_ID Executions Min,Max,Avg Etime Avg LIO STD_DEV
This script finds SQL statements with plan instability. I like the clever use of standard deviation to show SQLs with variable elapsed time.
awr_parm_mods (by Kerry Osborne)
DBA_HIST_SNAPSHOT DBA_HIST_PARAMETER V$INSTANCE
Parameter Name Old Value New Value
This script shows all parameters (including hidden) that have been modified.
awr_netwl
DBA_HIST_SYSMETRIC_SUMMARY Network Minvalue (MB)/s Network Maxvalue (MB)/s Network Avgvalue (MB)/s Network STD_DEV (MB)/s
The data comes from the metric family of tables that shows “Network Traffic Volume Per Sec” Keep in mind that metrics are different from sysstat values. On sysstat you just get the delta and the rate, in metric the sampling is different let's say the snap duration is 10mins what metric does is it samples on per 60sec interval (num_interval) and get the max, min, avg, std_dev of those samples.
awr_est_gc_traffic (by John Kanagaraj)
DBA_HIST_SNAPSHOT DBA_HIST_SYSSTAT DBA_HIST_DLM_MISC V$DATABASE V$PARAMETER
Estimated Interconnect Traffic (KB)
This script is ideal for RAC environment and shows the interconnect throughput of an instance. Very useful if you want to check if the interconnect is being saturated.
awr_iowl
DBA_HIST_SNAPSHOT DBA_HIST_OSSTAT DBA_HIST_SYS_TIME_MODEL DBA_HIST_SYSSTAT
AAS CPU IO WAIT Utilization OS Load Single Block R/W IOPS Multi Block R/W IOPS R/W MB/s Total R/W IOPS R/W Ratio HW Disk IOPS HW # of Disks
This script is ideal for monitoring the Oracle IO activity. Very useful for sizing and consolidating storage for Oracle databases. This can be used together with a storage monitoring tool to have a complete picture of IO performance. The last two columns have the corresponding formula that is used by storage engineers to determine the number of disk needed by the database. HW Disk IOPS = (IOPS * Read Ratio) + (IOPS * Write Ratio * RAID penalty) HW # of Disks = Total disk IOPS / IOPS per disk Of course the “HW # of Disks” is not the final number. There are other factors (bandwidth, throughput, service time, etc.) that need to be considered to determine the right storage for a particular IO workload but this can be your starting point. Also benchmarking will help a lot on the storage decisions.
awr_io_ts
DBA_HIST_SNAPSHOT DBA_HIST_FILESTATXS DBA_HIST_TEMPSTATXS
Tablespace R/W IOPS Tablespace R/W latency
This script shows the IO performance of the tablespaces. This is the same as what you see in AWR but across SNAP_IDs. The latency formula is as follows latency (ms) = (readtim / phy reads) * 10 Keep in mind that on this script the IOPS and latency values are aggregated from all the datafiles of the tablespace. So diagnosing latency issues using this script may not represent the actual numbers but may warn you from the textual trends of high latency (ms) numbers that way you’ll be interested on particular workload periods to probe it with small duration samples.
awr_io_file
DBA_HIST_SNAPSHOT DBA_HIST_FILESTATXS DBA_HIST_TEMPSTATXS
Datafile R/W IOPS Datafile R/W latency
This script shows the IO performance of the datafiles. This is the same as what you see in AWR but across SNAP_IDs. Keep in mind that the IOPS and latency values may be normalized if the snap interval is too long (60mins above) compared to per 5seconds or 10 minute snap interval. (see Appendix)
r2toolkit [2] DBA_HIST_SNAPSHOT DBA_HIST_DATABASE_INSTANCE DBA_HIST_SYSSTAT DBA_HIST_SYSTEM_EVENT DBA_HIST_SYS_TIME_MODEL DBA_HIST_OSSTAT DBA_HIST_WR_CONTROL
Y and X values that can be plotted for Linear Regression
This is a performance toolkit that uses AWR data and Linear Regression to identify what metric/statistic is driving the database server’s workload. The data points can be very useful for capacity planning giving you informed decisions and completely avoiding guesswork!
You can also do the same kind of mining with Statspack. Each DBA_HIST view has a counterpart Statspack view and you can achieve similar results DBA_HIST_SNAPSHOT = STATS$SNAPSHOT DBA_HIST_OSSTAT = STATS$OSSTAT
DBA_HIST_SYS_TIME_MODEL = STATS$SYS_TIME_MODEL DBA_HIST_SYSSTAT = STATS$SYSSTAT The scripts mentioned are freely downloadable and more details on the math and performance formulas (rates, time, IOPS, CPU, latency, utilization, AAS) will be discovered when you look into the SQL code. I would also suggest that if you are serious on mining the AWR you must take time to play further with the DBA_HIST tables and the underlying data and you’ll appreciate that you have a better understanding on how the data are derived on the plain AWR report.
PART 3 - Visualization
Average Active Sessions (AAS) has become my default (golden) metric on finding the periods where the database could be having a bottleneck or just idle. Essentially AAS is the database load; this value should not go above the CPU count (NUM_CPUS in DBA_HIST_OSSTAT) and if it does then that means the database is working very hard or waiting a lot for something. Together, the AAS & CPU count is used as a yardstick for a possible performance problem [3]
If AAS < 1 ‐‐ Database is not blocked AAS ~= 0 ‐‐ Database basically idle ‐‐ Problems are in the APP not DB AAS < # of CPUs ‐‐ CPU available ‐‐ Database is probably not blocked ‐‐ Are any single sessions 100% active? AAS > # of CPUs ‐‐ Could have performance problems AAS >> # of CPUS ‐‐ There is a bottleneck
Just like a doctor, AAS could be your “stethoscope” when investigating performance problems but it doesn’t stop there. For it to be more useful you must be aware about the components of AAS much like drilling down on the time components and have this kind of data over a period of time (across SNAP_IDs). Well Enterprise Manager does this nice graphs on the “Performance and Top Activity page” and slicing the AAS components into different “Wait Classes” and it’s got a “Historical” view which you could go back and drill down on the past load activity.
But what could be the problem?
S
I know solong AWRbecause thsome othe
So what could
1) USN
2) O
To be coSNAP_ID
The imagthere’s a componen
ome of you hR retention phere was an ier issue where
d be the alter
Use the Top TNAP_IDs
Or use the scri
onsistent withD 335-339. No
ge below is a big spike on
nts.
have encounteperiod (365 dainstance shutde Enterprise M
rnative?
imed Events
pt together w
h the initial eote that the A
stacked area n the database
ered this Entays to exaggedown betweeManager reall
SQL (awr_to
with Perfsheet
example we AAS during th
awr_
chart of the e load… but
erprise Manaerate it) but En the date yoly can’t just g
opevents.sql)
! … a great to
will focus onhis period had
_genwl.sql ou
awr_topeventwe want to
ager error at Enterprise Maou want to gogive you the v
and focus on
ool for ad-hoc
n the same id a sudden spi
utput
nts.sql using Pknow more
some point. Yanager won’t o and the datevisualization y
the AAS and
c performance
interval timeike that is on t
Perfsheet. It’sabout it by d
You are conflet you go ba
e you are nowyou need.
d wait class co
e visualizatio
6:20 to 7:0the range of 2
s clear from tdrilling down
figured with ack farther al
w. Or could b
olumns acros
on [4]
1 AM that i2.2 to 3.5
the image than on the AAS
a ll e
s
s
at S
S
Looking aknow whactivity, it
Some more ba
On the Einto differ From the ways to d
1) T2) S
AAS on tmodel. TDBA_HIS
at the “textuahich AAS comt’s evident tha
ackground
nterprise Marent wait clas
2nd slide of Kerive the valu
Time Statisticsampling
the PerformaThis is also ST_SYSTEM
al trends” of amponent is dat there’s a hi
anager “Perforses. But, did y
Kyle Hailey’sue:
s
ance Page usewhat the sc
M_EVENT a
Stacke
awr_topeventdriving the wigh User IO a
awr_to
rmance” andyou know tha
s presentation
es “Time Stacript awr_topand the “CPU
ed area chart o
ts.sql output jworkload of thactivity.
opevents.sql o
“Top Activitat their data so
n [3] on AAS
atistics” and ipevents.sql isU” from tim
of AAS
just by lookinhe database.
output
ty” Page youources are dif
(Average Ac
is actually frs doing… it
me model vie
ng at the AAFor the part
’ll see the AAfferent?
ctive Sessions
rom v$systemt unions theew DBA_HIS
S column weticular SNAP
AS compone
s) it says that
m_event + CPe output of ST_SYS_TIM
e would easilyP_IDs of high
ents are sliced
there are two
PU from tim“events” on
ME_MODEL
y h
d
e n L
S
N
and then fit look simAAS valu“CPU use AAS on thon a 15refresh toCPU from
So what’s the
On a highto Performsession (ththink) thaTime Stat If you waHistory of
Now time for
Finding thwe can cr
filter only themilar to the Eues will be coued by this sess
he Top Activ5sec refresh o Historical tm time model)
e effect?
h CPU activitymance Pagehe only way t
an v$sysstat “tistics (one of
ant more infof Session Loa
Perfsheet a l
he AAS compeate the same
e top 5 and doEnterprise Maunted. By thesion”.
vity Page usesrate… but
then it also st).
y period you. Simply becto see CPU u“CPU used byf two ways to
o about the dad [5] and AA
la Enterprise
ponent that’s e visualization
Stacke
o this across thanager Perfore way, on 10g
s “Sampling”as I have tarts to behav
’ll notice thatcause ASH s
usage real timy this sessioncalculate AA
etails aroundAS investigati
Manager!
driving the wn like the Ent
ed area chart
he SNAP_IDrmance Page
g below the lo
and by defauobserved wh
ve like the Pe
t there will besamples everye) while the t
n” there couldAS) which cou
d the Performon [14]
workload is aerprise Mana
t AAS compo
Ds but for grapI have to inc
oad chart is co
ult is taking ahen you swerformance P
e a higher AAy second andtime model Cd still be somuld be affecte
mance and To
a lot easier inager broken do
onents – wai
phing purposeclude all of toming from v
advantage of Awitch from age (pulls da
AS on the Tod it does tha
CPU althoughme lag time aned by average
p Activity pa
n graphics. Thown into “Wa
it class
es on the Perfthe “events” v$system_eve
ASH (samplethe Real T
ata from v$sy
p Activity Paat quickly onh it updates qund it will stils.
age this is wo
he image beloait Class”.
fsheet to makso that all th
ent + v$syssta
es) and does iTime 15 seystem_event +
age comparedn every activuicker (5secs ll be based on
orth reading
ow shows tha
e e
at
it c +
d e I n
-
at
Even morgraphs. Bis mostly
Ooops, douses couldview and compare chart view Compare it’s on the Then comhappening1.6 on SN
re, we have telow is brokeconsuming th
on’t get too exd hide importsee the data
the above anw could tell a
the wait clase range of 0.1
mpare the waig.. but on 3D
NAP_ID 335 a
the data now en down into he AAS.
Stack
xcited.. important informatia clearly sepand below cha more meanin
ss chart… ab1 (hidden bet
it event charyou can see
and 336. Yes,
in our contro“Wait Events
ked area chart
rtant remindeion and sometarated into thharts, you’ll kngful story.
bove notice thtween CPU an
rt… notice ththat only the, you will also
ol. So we cous”, aside from
t AAS compon
er… the 2-dimtimes could b
heir respectiveknow what I
he blue (Othend System IO
he big differendb file sequ
o not be foole
uld play aroum being more
ents – wait ev
mensional Stabe misleadinge componentI mean.. Wa
er wait class) O)… that’s a b
nce on the chential read aed when you
und with the colorful it let
vents
acked area chag [13] and it rts, rather thanait Class and
on the rangebig difference
hart? above yand direct palook at the ra
data and creat’s you see wh
art that Enterpreally helps ton being stack
d Wait Event
e of AAS of 1e!
you can’t realath read are oaw data… but
ate interestinghat wait even
prise Manageo have anotheked… As youts in 3D area
1 while below
lly tell what’on the AAS ot visualization
g nt
er er u a
w
s of n
A
is much e
AAS through
On my tedata. YouSNAP_IDdatabase. beyond mthere you
asier and the
out the AWR
st machine I u can see fromD 335-339) ha
You could amy maximum
could use AS
way to go bu
3
3D
R retention pe
have 365 daym the chart bappens to be also see the p
CPU which SH, run the A
ut you must be
D area chart A
D area chart AA
eriod!
ys retention pbelow (stackethe highest l
period of shutcould justify
AWR report, ru
e able to sens
AAS componen
AS component
period. This eed area chartoad period ftdowns (negathe drill dow
un ADDM, o
e and validate
nts – wait clas
ts – wait even
enables me tot), that what from all the Aative value) awn on the speor make use o
e if it’s drivin
ss
nts
o have a data we are focus
AAS samples and other timecific SNAP_f your high ca
ng you to bad
warehouse ofsing on (6:20for the lifetim
me period whe_IDs or time aliber scripts!
d conclusions.
f performanc0 to 7:01 AMme of my tesere AAS wenframe… from!
e M st nt m
PAR
U
The good
RT 4 - Capa
Utilization is
Capacity expected will fit inmeasuremand presen
Measuring H E E
On the InexplainedRequirem Essentiallformula
Utilization =
As shownwater” andecision tinto the sserver capmuch or i
thing here is
acity Plann
the ultimate m
planning plaand unexpec
nto the availabment [7]. Goont the in a mo
g the workloaHave enough cEnable us to quEnable us to qu
ntroduction tod in detail whaments, and U
ly what we ca
Requirements / C
n on the imagnd “another pto purchase therver. And o
pacity. And wt could be the
, you are not
ning
metric!
ays a very imted workloadble capacity od thing the d
ore meaningfu
ad will give ucapacity and nuantify the reuantify the be
o Oracle Servat information
Utilization
are most in C
Capacity
ge below the pitcher with bhe database sf course, the
when this doese other way ar
guessing!
mportant role ds. The primaof the databaata collectionul and useful
us the followinnot over buyesults of respoenefit of work
er Consolidatn you need to
Capacity Plann
“empty pitchbeer” are theerver that is tapplication r
sn’t occur nicround where t
to ensure prary principle se server. An
n process is amanner.
ng advantages
onse time optikload reductio
tion paper [6o get for you
ning is the da
her” represent Oracle workthey define threquirement mcely, there canthe capacity i
roper resourcis to ensure
nd with this walready being
s and benefits
imizations in on
] and Chapteto be able to
atabase server
ts the databaskload requirehe capacity. Tmay or may n be an excesis not enough
es are availathe applicati
we need to hadone by AW
s [7]:
the savings o
er 9 of Craig define the Da
r utilization a
se server capaements. TypicThen they stanot fit nicelys of capacity,
h for the requi
able and be aion workloadave a facility
WR. We just n
of system reso
Shallahamer’atabase Serve
and it is repre
acity while thcally the IT sart pouring thy on the avail, which meanirements at ha
able to handld requirementy for workloadneed to extrac
ources
’s book [8] her’s Capacity
esented by thi
he “glass withshop makes
he applicationlable databas
ns IT spent tooand.
e s d ct
e y,
s
h a s e o
This simppresented
Having thperiods w
ple and very uin a manner
he data presenwith high work
useful concepthat we can e
nted this waykload requirem
pt can be appasily abstract
y, we can easiments.
plied as well it the performa
ily apply filte
in AWR. Usiance statistics
er to the data
ing the awr_gs to the Utiliz
set and imm
genwl.sql scrzation formula
mediately find
ript the data ia.
d the workload
s
d
C
And we c
AAS range Per SNAP_ID Oracle CPU U OS CPU Utiliz Particular Wo AND TO_CHA AND TO_CHA AND TO_CHA AND TO_CHA AND s0.END_ AND s0.END_
CPU sizing re
Having th The data server is aoccurred. needed to The formu
core need = #
The data collocatedcan virtua
ould do other
aas > 1
or range of SNAPid in (336) where id >= 3
Utilization oracpupct > 5
zation oscpupct > 50
orkload periods
AR(s0.END_INTERVAR(s0.END_INTERVAR(s0.END_INTERVAR(s0.END_INTERV_INTERVAL_TIME _INTERVAL_TIME
ecommendati
his data outpu
points belowa dual core mThe manage handle the w
ula used to de
# of cores * utilizat
points were d to a data cenalize it to a ne
r filtering as w
P_IDs
36 and id <= 340
0
0
VAL_TIME,'D') >= VAL_TIME,'D') <= 7VAL_TIME,'HH24MVAL_TIME,'HH24M>= TO_DATE('2010<= TO_DATE('2010
ions
ut can be easil
w came from amachine and bement would workload of th
erive the “CPU
tion * 1.25
very useful nter, we could
ewer hardware
well…
1 ‐‐ Day of week7 MI') >= 0900 ‐‐ HMI') <= 1800 0‐jan‐17 00:00:000‐aug‐22 23:59:59
y used as inp
an actual probeen used forlike to know
he database.
U core need”
to characterd opt to just ue.
k: 1=Sunday 7=Sat
Hour
','yyyy‐mon‐dd hh9','yyyy‐mon‐dd hh
uts to CPU si
oduction server almost 8 yea
w what would
[9] is as follo
ize the curreupgrade to a n
turday
h24:mi:ss') ‐‐ Dath24:mi:ss‘)
izing of a data
er that needs ars and there
d be the ideal
ows:
ent utilizationnewer model
ta range
abase server.
to be migrate have been a l machine and
n of the databut not the la
ted to a new couple of had how many
abase server. atest and the g
machine. Thardware error
cores will b
Since it wagreatest or w
e s e
s e
S
But noticesummarizignore the
Validatinga year. Hprocess waffect the
Storage sizing
Having th
e the outlier (zing the data e outlier just l
g with the appHaving this inwill run again
overall conne
g recommend
his data outpu
(shown in redwill tell me tlike that becau
plication ownnformation, won the new seected users.
dations
ut can be easil
d above) reprethat I’m mostuse there mig
ner, she confiwe can safelyerver we just
y used as inp
esenting a SNt of the time
ght be a critica
irmed that it wremove the have to make
uts to storage
NAP period haon the 10 %<al application
was indeed anoutlier from
e sure that it’
e sizing of a d
aving high CP< CPU utiliza
n process on th
n adhoc procethe data poi
s being run o
database serve
PU utilizationation but we hat workload
ess that is beiints and even
on an off-peak
er.
n. Statisticallydon’t want to
d period.
ing done oncn if the adhok period to no
y o
e c
ot
The data mentionedcan be usmeasured
Also take determinewill help a For storag
points belowd above. Thissed together w
data easily tr
note that thee the right stoa lot on the st
ge sizing purp
w came froms shows the Iwith a storagransforms req
re are other forage for a patorage decisio
poses, I strong
m awr_genwIOPS requireme monitoring
quirements to
factors (bandwarticular IO wons.
gly recommen
wl.sql as welments needed tool to havecapacity.
width, throughworkload but
nd using the a
l, sizing stod to run the de a complete
hput, service this can be y
awr_iowl.sql
rage for thedatabase on tpicture of IO
time, etc.) thyour starting
same produthe new envir
O performanc
hat need to be point. Also b
uction systemronment. Thi
ce. Having th
considered tobenchmarking
m s e
o g
Rea
D
al World Ex
Diagnosing a
The graphprocessingdone any performan So it’s a plotted inwas able visualizat
On this impeaks aresuspect orparticular and OS sproblem. If it weren This is the
xample
and Resolving
h shown was g so it’s the mchanges on t
nce problem s
sudden slow one graph…to apply the
ion and I was
mage above ye the particular possible culdatabase ses
statistics (CPU
n’t for this vis
e image after
g GC Block L
a sudden slomost critical wthe database eso the tasks o
down, and I that would ae things thats able to achie
you can see tar periods wlprit for the pssions runningU, memory,
sualization th
replacing the
Lost
ow down on aweek of the menvironment…f finding whe
I was thinkinanswer a lot ot I have leareve what I hav
the where, whe are interest
performance pg critical modnetwork) we
he troubleshoo
e network inte
a client runnimonth. Interv… well that were/when/why
ng… if I can of questions. ned. So I mve envisioned
hen, and whyted in. And wproblem. Drildules that are
e were able c
oting would h
erconnect swi
ing 2 nodes oviewing the Dwould be the y it went wron
have time seComing from
made use of Pd.
y. Most of thwhat wait evlling down fue slow plus cconclude that
have taken lon
itch… this sh
of RAC and DBA, he wou
majority of thng is all left to
eries performm Tanel PodePerfsheet and
he load is on vents are conurther on thoscorrelating it t it was a ne
nger.
ows their nor
it’s a period ld insist that he customerso us.
mance of bothr’s seminar ind played aro
the first nodentributing on se peak periowith the data
etwork interco
rmal workload
of month endthey have no
s will say on
h of the noden Singapore, ound with th
e. And on ththe peak is
ods and on thabase advisoronnect switch
d.
d ot a
s I e
e a e s h
L
Linear Regres
Mining thtargeted re The graph8core HS2respectiveat >80% t On the drihigh load componenwhen lookreduction, If the servseems to b
ssion of AAS
he AWR backesponse time
h shown below21 Bladeserveely which shothe AAS also
ill down showSQL greatly
nt being utilizking at the SQ, response tim
ver’s workloabe low. Also
Nod
S and CPU on
ked by solid soptimization
w is a scatter er on a DS48
ows a strong cshoots up!
wn below on taffecting the
zed is on “CPUQL details on me optimizatio
ad is on the you will notic
de 1
n 2 node RAC
statistical anans and worklo
plot of a prod00 SAN. Not
correlation be
the peak periooverall perfoU” hence youawr_topsqlx.
on, and huge
AAS value oce the top SQ
C
alysis [10] [1ad reduction.
duction envirotice the strongtween AAS v
od with AAS ormance of thu will see larg. Tuning the hsavings on sy
of 2.2, the CPQL from AAS
1] [12] lets y
onment with g correlation cvs. CPU utiliz
value of 10 ie database. A
ge LIOs and mhigh load SQLystem resourc
PU utilizationof 10 is not t
you do foreca
2 nodes of 11coefficient (Rzation. Also w
it shows that tAlso note that most of the elL will result tces.
n, latency, Athere anymore
Nod
ast that can gu
1gR1 RAC ruR2) of .97 andwhen CPU sta
the workload the large chu
lapsed time spto great work
AAS componee.
de 2
uide you with
unning on d .89 arts to queue
is driven by unk of AAS pent on CPU
kload
ent on “CPU
h
”
Drill
The perfodatabase sinformed The toolk - CREAT - DROP - CREAT - POPUL - ANAL - POPUL - R2 REP
ing down o
1) General W
2) Tablespa
3) Top Ti
ormance toolkserver’s workdecisions and
kit contains 7 TE USER - cTABLES - d
TE THE r2 TLATE y data YZE r2 VALLATE x and rPORT - gene
on the peak
Workload repor
ace IO report
med Events
kit uses AWRkload based ond completely
sections, see breates the r2to
drop the tablesTABLES - cre
- y data is theLUES - get theresidual data rate the textu
workload...
rt
R data and Linn AAS. The davoiding gue
brief descriptoolkit user s for a fresh seate the main e "dependent e stat names w- x data is the
ual report and
. with AAS o
near Regressiodata points casswork!
tion below:
tart tables value", variabwith high r2 ve "independenr2 values wit
of 10
on to identifyan be very use
ble whose vavalues, to havnt value", useth or w/o outl
y what metric/eful for capac
lue is to be prve a more accud to predict thliers
/statistic is dricity planning g
redicted urate analysishe value of y
iving the giving you
s
Now
4) Top 20
6) Top 5 SQ
w on the low
0 SQLs
QLs of SNAP_ID
w workload
D 8631.. which b
period… wi
y the way got a
ith AAS of 2
n AAS of 10
2.2
Refe
1) Genera
2) Tables
3) Top Ti
4) Top 20
No entry – t
6) Top 5 SQ
erences [1] Craig [2] r2proj [3] Kyle H [4] Tanel [5] Histor [6] Craig [7] Andy [8] Craig [9] Husnu
http://husn [10] Forec [11] Statis
al Workload rep
space IO report
med Events
0 SQLs
the top SQL fro
QLs on SNAP_ID
Shallahamer ect - http://ka
Hailey SeminPoder – Perfs
ry of session lShallahamer Rivenes – OrShallahamer
u Sensoy - Danusensoy.filecasting Oraclstics Without
port
m AAS of 10 is
D 8582
- Oracle Perfarlarao.tiddlysnar – AAS presheet http://wload - http://si- Introduction
racle Workloa- Oracle Perf
atabase Consos.wordpress.ce Performanc
t Tears
not here anymo
formance Firespot.com/#r2pesentation
www.tanelpodites.google.con To Oracle Sad Measuremformance Fireolidation Bestcom/2010/05/ce
ore
efighting - Chproject
der.com/files/Pom/site/youviServer Consol
ment efighting - Cht Practices /database-con
hapter 1
PerfSheet.zipisualize/activelidation
hapter 9
nsolidation-be
p e-session-hist
est-practices.p
tory
Ape
[12] Neer [13] Neil
http://arxi [14] AAS Other refe
o hto Sto hto ht
endix - Ave The IO lat
latency (ms) =
The imagshorter shttp://ww
raj Bahatia – Ll Gunther & iv.org/pdf/080
S investigationerences: ttp://karlarao.torage IOPS, ttp://karlarao.ttp://karlarao.
erage Laten
tency formula
= (readtim / phy re
ges below shonap intervalw.freelists.or
Linear RegresTanel Poder
09.2532 n http://goo.gl
.wordpress.cocapacity, per
.tiddlyspot.co
.tiddlyspot.co
ncy Issue
a used in AW
eads) * 10
ow that latencls. Also reag/post/oracle-
ssion Paper r - Multidim
l/5WaAg
om rformance, coom/#Statisticsom/#OraclePe
WR is as follow
cy values maad on this l-l/Disk-Devic
mensional Vis
ost - http://goos erformance
ws:
ay be normalilink for thece-Busy-Wha
sualization of
o.gl/FCN0w
ized if the sne effects of at-exactly-is-t
f Oracle Per
nap interval iCPU sched
this,7
rformance us
is too long asduling issues
sing Barry007
s compared tos on latency
7
o y