OS Truth, little white lies, and the Oracle Wait...
Transcript of OS Truth, little white lies, and the Oracle Wait...
OS Truth, little white lies, and the Oracle Wait Interface
John Hurley Senior DBA Federal Reserve Bank of Cleveland The Federal Reserve System may or may not use databases and/or may or may not use any commercial database products and/or specifically we do not endorse any or all hardware/software vendors. Shocking but true I am still waiting to be invited to my first OMC meeting on fiscal policy.
You may be able to find the Grumpy Old DBA here: Blog: grumpyolddba.blogspot.com Twitter: @GrumpyOldDBA President of Northeast Ohio Oracle Users Group www.neooug.org www.neooug.org/gloc Great Lakes Oracle Conference May 18-20 2015
Oracle Performance Tuning:
Cary Millsap and Method R provided a light at the end
of the tunnel.
“Optimizing Oracle response time is, for the most part, a
solved problem.”
Method R based on instrumentation provided via the
Oracle Wait Interface.
Works extremely well if followed diligently 99+ percent
of the time.
COMMON GROUND
Adhoc definition of Oracle Wait Interface:
• An Oracle provided tool set that helps debug
important code paths and record time waited to
identify bottlenecks throughout the life of an
Oracle database session. The Oracle kernel code
is deeply instrumented to record waits.
• So what does a wait look like?
1100+ wait events 11.2
Events can be categorized:
• Input/Output
• Network
• Executing SQL/PLSQL (
on CPU or waiting to get
back on CPU )
• Concurrency ( waiting
for some other session
to release resource )
• Other categories
Wait = t1 – t0
COMMON GROUND
Oracle code samples database system activity
continuously but licensing is required to
access some parts of monitored information.
Comprehensive diagnostics using low level
tracing can be kicked off for a session by using
an Oracle 10046 trace.
The Oracle Enterprise Manager ( or Grid
Control ) provides a graphical user interface
showing system activity based off sampling
instrumented and recorded data.
Oracle provides graphical tools to see what is going on
based on instrumentation:
• Database console 11g
• Grid control
• Cloud control
SYSTEM OVERVIEW
• 4 core Xeon processor 32 gb mem 10 gb SGA
• Run 1000+ connected sessions from 10 am until 5 pm
• OEL 5.5
• Running EE 11.1 64 bit patched up to 11.1.0.7.6
• No RAC … Single instance database ( same ORACLE_HOME for database and ASM instance )
• No Grid control ( OEM with Diagnostics/Tuning packs )
• EMC Clariion direct attached storage
• Using ASM disk groups ( external redundancy )
NORMAL MONDAY MORNING 11/14/2011
Green = using cpu aka doing work
Blue = doing IO
Normal afternoon … pretty busy day
• In Cleveland we live and die
with the orange and brown
• Our NFL granted replacement
team has not won a lot of
games since the “old winning
team” was sold to Baltimore (
1996 )
Hard core Browns fan did
a youtube sendup of what
the stadium has turned
into lately …
ROUGH AFTERNOON AT THE STADIUM ???
OEM INSTANCE LOCKS
AWR REPORT:
Guessing game time:
Win something in brown paper sack!
MASSIVE CPU STARVATION?
APPLICATION LOCKING ( CODE CHANGES ? )
IO WAITS ( DELAYS STORAGE PROBLEMS? )
• Starting just after 1:49 pm intermittent cpu spikes on OEM Top Activity display
• Maximum CPU line at 4 across graph
• At 1:57 pm spiking gets worse
• Spikes of 15 to 20 active sessions on CPU
• Even higher CPU spikes past 2:15 pm
OEM TOP ACTIVITY SCREEN RECAP
OS WATCHER
• This tool or similar should be running on all production systems!
• Details a little different based on actual operating system in use but gets/samples information using ps, top, mpstat, iostat, netstat, traceroute, and vmstat.
• User Guide and setup instructions available via Metalink Doc ID 301137.1
• My systems are configured to collect information once a minute and retain collected information for 10 days. No impact on system to collect.
zzz ***Tue Mar 1 14:23:20 EST 2011 top - 14:23:24 up 16 days, 13:33, 3 users, load average: 1.28, 1.28, 1.58 Tasks: 1393 total, 2 running, 1391 sleeping, 0 stopped, 0 zombie
Cpu(s): 11.2%us, 1.5%sy, 0.0%ni, 85.6%id, 0.8%wa, 0.1%hi, 0.7%si, 0.0%st Mem: 32959880k total, 32708568k used, 251312k free, 266488k buffers Swap: 33456120k total, 143108k used, 33313012k free, 6943192k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26111 oracle 15 0 10.2g 27m 22m S 24.7 0.1 20:16.99 oracleprod (LOCAL=N 20885 oracle 16 0 10.2g 29m 23m S 8.5 0.1 0:13.36 oracleprod (LOCAL=N 9586 root 15 0 13696 2096 796 R 2.6 0.0 0:00.15 top -b -c -n 2 32674 oracle 18 0 2033m 426m 22m S 2.0 1.3 27:56.81 /u01/app/oracle/pro 7626 oracle 15 0 10.2g 32m 23m S 1.6 0.1 0:39.45 oracleprod (LOCAL=N 9681 oracle 16 0 13672 2096 808 S 1.6 0.0 13:36.42 top 32108 oracle 18 0 10.2g 29m 13m S 1.3 0.1 32:54.91 ora_dia0_prod 3732 oracle 15 0 10.2g 30m 23m S 1.0 0.1 0:19.90 oracleprod (LOCAL=N 14654 oracle 15 0 10.2g 21m 17m S 1.0 0.1 0:23.52 oracleprod (LOCAL=N 3515 oracle 15 0 10.2g 34m 25m S 0.7 0.1 0:57.47 oracleprod (LOCAL=N
zzz ***Tue Mar 1 14:23:20 EST 2011
avg-cpu: %user %nice %system %iowait %steal %idle
15.58 0.00 3.50 1.17 0.00 79.75
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 29.00 0.00 8.67 0.00 301.33 34.77 0.04 4.15 0.58 0.50
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda2 0.00 29.00 0.00 8.67 0.00 301.33 34.77 0.04 4.15 0.58 0.50
...
dm-0 0.00 0.00 0.00 37.67 0.00 301.33 8.00 0.18 4.79 0.13 0.50
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowerab 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowerk 0.00 0.00 0.33 0.67 10.67 13.33 24.00 0.00 3.00 3.00 0.30
emcpowerl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpoweraa 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowerr 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowerm 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowern 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowerac 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowero 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowerp 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowerad 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowerj 0.00 0.00 3.33 1.00 53.33 13.33 15.38 0.02 3.77 3.62 1.57
emcpowera 0.00 0.00 2.00 0.67 32.00 10.67 16.00 0.01 5.38 5.12 1.37
emcpowerq 0.00 0.00 3.00 1.00 48.00 16.00 16.00 0.01 3.75 3.17 1.27
emcpowerb 0.00 0.00 1.33 0.33 21.33 5.33 16.00 0.01 5.00 5.00 0.83
emcpowere 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowerf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowerg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowerh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpoweri 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowerc 0.00 0.00 0.00 4.67 0.00 25.33 5.43 0.00 0.93 0.93 0.43
emcpowerd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
zzz ***Tue Mar 1 14:22:20 EST 2011 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 3 0 143108 263504 266392 6942772 0 0 3142 305 3 4 14 3 78 5 0 1 1 143108 260124 266392 6942620 0 0 264 215 11707 10619 17 7 73 2 0 0 0 143108 260372 266392 6942820 0 0 640 454 11232 10701 16 4 74 6 0 zzz ***Tue Mar 1 14:23:20 EST 2011 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 5 0 143108 249832 266484 6943140 0 0 3142 305 3 4 14 3 78 5 0 1 0 143108 250784 266488 6942928 0 0 48 520 7835 7407 30 8 60 1 0 0 0 143108 251032 266488 6943072 0 0 88 32 6960 6954 8 1 90 1 0 zzz ***Tue Mar 1 14:24:20 EST 2011 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 3 0 143108 264044 266572 6944272 0 0 3142 305 3 4 14 3 78 5 0 0 0 143108 260440 266576 6944424 0 0 56 684 8555 8132 15 9 73 3 0 0 0 143108 260440 266576 6944316 0 0 120 4 7031 6401 7 1 90 1 0
Data accumulates in output *.dat file every minute
OS TRUTH
• Utilities and diagnostics at the operating
system level give an objective view of reality
• If ps or top ( or windows task manager ) do not
show a process using cpu then it is ( probably )
not using cpu
• At times … especially when dealing with Oracle
bugs and/or uninstrumented sections of Oracle
code … information from the Oracle wait
interface can be incomplete, misleading, false,
or just plain confusing.
• Best to have some slices of OS Truth being
generated and stashed away ahead of those
situations when OWI based problem solving
efforts just do not cut it.
Prod system dba toolkit
sqlplus / as sysdba <<EOF
oradebug setmypid
oradebug unlimit
oradebug dump hanganalyze 3
exit;
EOF
sqlplus / as sysdba <<EOF
oradebug setmypid
oradebug unlimit
oradebug dump systemstate 266
exit;
EOF
sleep 30 ... then repeat both
Probably smart to have
something similar canned and
ready to go if and when things
go badly … oracle support
may want different levels of
detail or other stuff but good
starting point. May need to
use ( sqlplus –prelim / as
sysdba ) if cannot connect …
Oracle doc id references:
452358.1
121779.1
ORACLE WAIT INTERFACE ( OWI)
• Oracle continues to develop/enhance/maintain the wait interface instrumentation. Each new release is supported by new events and additional instrumentation as well as fixes to existing events.
• Cary Millsap and “Optimizing Oracle Performance” has given us a proven methodology for attacking and diagnosing problems using OWI.
• When licensed for OEM tools such as the diagnostics and tuning pack GUI tools often give you visibility into problems and how to fix them.
• Traces such as the 10046 and the Oracle Wait Interface capabilities are often comprehensive usually give you a good view of what is going on in your system.
• Sometimes you run into problems where the Oracle Wait Interface gives you an incorrect picture of reality. Your system may get stuck in some obscure part of the Oracle code … perhaps the last calls to the OWI and the last information that OEM has does not correspond to where you are now.
PSTACK SHOWS PROCESS STACK
• At a given point in time pstack shows exactly where a process is … what module and offset is currently executing ( or waiting ) and the whole chain of programs involved.
• It is often possible to dig into the names of the code/modules shown in pstack output and figure out what a process/program is doing.
• Point in time … if you do several pstacks against the same process and it stays in the same routine it is probably stuck/waiting.
• May not be a good idea to run pstack on Oracle background processes … use with caution here.
• May need to run as root to get output ( os dependent ? )
The Oracle Wait Interface is …
• Useless ( sometimes ) • See James Morle and Tanel Poder joint blog
(http://jamesmorle.wordpress.com/2009/11/09/the-oracle-wait-interface-is-useless-sometimes-pt/ ) ( see part 1 / part 2 / part 3 ).
• At times low level OS based tools such as pstack and pmap and/or system call tracing tools ( linux: strace, systemTap AIX: truss hpux: tusc, Solaris: truss, DTrace Windows ProcMon, ProcExp, StraceNT ) as well as the slices of OS truth ( from OS Watcher or elsewhere ) are needed to help solve problems.
The top function ntevpque() is the real
function/operation what this oracle process was
doing (regardless of what the V$ views or standard
monitoring tools say). Functions starting with NT
mean Network Transport, which mean that the
process was currently (stuck in) doing network
related (or interprocess communication) tasks.
Also, the clsc* functions like clsc_select_ext ()
indicated that the functions in the top of stack were
related to CLuster ServiCes (CSLC). This was an
indication that the stuck Oracle processes get stuck
when they try to communicate to the cluster
services or ASM processes.
SO WHAT ACTUALLY HAPPENED?
• The OEM and Oracle Wait Interface were pointing to
application locking issues and/or cpu shortages.
• Our business was pushing this system into new record levels
of sales and connections and transactions … past levels that
had ever been encountered before.
• Opened an SR with Oracle the first time we ran into problems
but we were not getting any good answers from Oracle
support. We used an outside resource Tanel Poder working
remotely to help us diagnose the problem.
• In reality we had an ASM configuration issue related to an
Oracle bug. At times dedicated server processes running
application SQL need to communicate with the ASM instance (
read a block from ASM storage into the buffer cache for
example ).
• The oracle code wanted/needed to talk thru a piece of
software cssd daemon ( ocssd.bin ) to ASM instance. This
processes gets kicked off at boot time and had a file resource
limit set to low ( Oracle bug ) … it was sitting at 1024.
ASM and Database Instance hang when exceeding around
1800 sessions (Doc ID 858279.1)
#!/bin/ksh
PS_NUM=`ps -ef | grep ocssd.bin | grep -v grep | awk '{ print $2 }' `
echo " The process running ocssd.bin is $PS_NUM"
cat /proc/$PS_NUM/limits pseudo file system / interface to kernel data structures
[root@HOSTNAME ~]# ./temp_mon.sh
The process running ocssd.bin is 8900
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 10485760 unlimited bytes
Max core file size unlimited unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 270336 270336 processes
Max open files 65536 65536 files Was 1024 before fix
Max locked memory unlimited unlimited bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 270336 270336 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
DETAILS ( CONTINUED ) • With the max open files set to 1024 the ocssd.bin process has a low
limit ( file descriptors ). IPC is used between database instance
processes and ASM instance processes.
• Programs once connected via dedicated server that have done a
physical read still are holding onto ASM resource ( file descriptor ).
• An application program might hold a lock and then do something ( in
this case insert ) requiring a block from ASM storage.
• It gets stuck because the the ocssd.bin process cannot get another file
descriptor. It holds the lock and is waiting to talk to ASM.
• The oracle code was not instrumented to record at 11.1 these
interactions with ASM in the Oracle Wait Interface. Changes in 11.2
now have a number of new wait events related to database to ASM
inter communication.
• Other processes will soon get caught waiting to talk to ASM or wait for
locks held by earlier blocked process.
• OEM shows a very misleading picture … search in metalink on clssinit
point to bugs …
diff /etc/rc.d/init.d/init.cssd /etc/rc.d/init.d/init.cssd.bak
1747,1748d1746
< $ULIMIT_CORE
< ulimit -n 65536
TO DEBUG: WE HAD TO CATCH A
PROCESS STUCK AND USE PSTACK ON IT
• Pstack gave us the accurate picture
of where the process was in its
execution path.
• Since it was stuck and going
nowhere multiple pstack commands
gave us the same result.
• Searches in metalink on the names
led us to the published bug. Many
bugs and documents in oracle
metalink now include relevant
module names to make matching
weird problems somewhat easier.
SUMMARY
• While OEM and the Oracle Wait Interface provide a ton of
information at times they may give an inaccurate picture of
reality.
• Operating system TRUTH gives you a way of comparing and
contrasting information and facts when problems are
encountered.
• If the OS Truth looks different from what OEM and OWI are
showing you … then OEM and/or OWI are probably wrong.
Uninstrumented code in the Oracle area or bugs in oracle
code may be leading you down the wrong path.
• Code like OSWatcher is low impact but allows you to collect
and retain OS Truth.
• Utilities like pstack and pmap ( in linux and some other
operating systems ) along with low level tracing utilities can
provide diagnostic information beyond what OEM and OWI
provide.
#!/bin/bash . /home/oracle/ora_11_2_env # Logon string to oracle (can be just "/" if local authentication is configured) ORA_LOGON=username/password@connstring # For how many seconds a session must have been stuck waiting in order for the hang detection to kick in THRESHOLD=60 TMPFILE=mon_hang_stack.tmp LOGFILE=mon_hang_stack.log rm -f $TMPFILE sqlplus -s $ORA_LOGON @mon_stack $THRESHOLD > $TMPFILE # WARNING! Note that I deliberately search for "ultimate_blocker_USER" only (and not BACKGROUND) # as it's not a good idea to run linux'es pstack command on background processes regularly # (this is because linux pstack attaches to target process using gdb debugger and suspends it briefly, # potentially causing other issues too if you get unlucky. So you definitely don't want to cause trouble # for LGWR process for example, thus I've disabled stack sampling for background processes here) ULTIMATE_BLOCKERS=`grep ULTIMATE_BLOCKER_USER $TMPFILE | awk '{ print $2 }'` echo >> $LOGFILE cat $TMPFILE >> $LOGFILE echo >> $LOGFILE echo DATE=`date +"%Y-%d-%m %H:%M:%S"` ULTIMATE_BLOCKERS=$ULTIMATE_BLOCKERS >> $LOGFILE for i in $ULTIMATE_BLOCKERS ; do echo >> $LOGFILE echo DATE=`date +"%Y-%d-%m %H:%M:%S"` running pstack on PID=$i >> $LOGFILE echo >> $LOGFILE pstack $i >> $LOGFILE sleep 1 echo " after 1 sec sleep pstack repeated" >> $LOGFILE pstack $i >> $LOGFILE done
SET LINES 2000 PAGES 5000 TRIMSPOOL ON TRIMOUT ON FEEDBACK OFF VERIFY OFF SET SERVEROUT ON SIZE 1000000 DEFINE threshold=&1 -- First sleep and monitor V$SESSION to find long waits in the database -- This PL/SQL block will just keep running until a long enough wait is seen DECLARE l_threshold NUMBER := &threshold; l_max_wait NUMBER; BEGIN WHILE TRUE LOOP SELECT MAX(seconds_in_wait) INTO l_max_wait FROM v$session WHERE state = 'WAITING' AND wait_class != 'Idle'; IF l_max_wait > l_threshold THEN EXIT; END IF; DBMS_LOCK.SLEEP(30); END LOOP; END; / PROMPT Long wait detected, listing long waiters from V$SESSION.... SET HEADING OFF SELECT 'CURRENT_TIME= '||TO_CHAR(SYSDATE, 'YYYY-MM-DD HH24:MI:SS') FROM dual; SET HEADING ON SELECT bp.spid blocker_spid , p.spid waiter_spid , s.sid , s.program , s.sql_id , s.event , s.p1 , s.p2 , s.p3 , s.seconds_in_wait , s.blocking_session_status , s.blocking_session blocker_sid , bs.program blocker_program , bs.sql_id blocker_sql_id , bs.state blocker_state , bs.event blocker_event , bs.p1 blocker_p1 , bs.p2 blocker_p2 , bs.p3 blocker_p3 , bs.seconds_in_wait blocker_sec_in_wait
FROM v$session s , v$process p , v$session bs , v$process bp WHERE s.paddr = p.addr AND s.blocking_session = bs.sid AND bs.paddr = bp.addr AND s.state = 'WAITING' AND s.wait_class != 'Idle' AND s.seconds_in_wait > 60 / SELECT 'ULTIMATE_BLOCKER_'||TRIM(s.type)||'= '||TRIM(osid) blocking_spid , w.in_wait_secs , w.pid , w.sid , w.in_wait , w.wait_event , w.p1 , w.p2 , w.p3 FROM v$wait_chains w , v$session s WHERE w.sid = s.sid AND w.sess_serial# = s.serial# AND w.blocker_sid IS NULL AND w.num_waiters > 0 / EXIT