1 Redo Internals Julian Dyke Independent Consultant Web Version © 2005 Julian Dyke juliandyke.com.
1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke...
-
Upload
prosper-brooks -
Category
Documents
-
view
218 -
download
0
Transcript of 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke...
1
juliandyke.com
© 2008 Julian Dyke
RAC Troubleshooting
Web Version - May 2008
Julian Dyke
Independent Consultant
2 © 2008 Julian Dyke
juliandyke.com
Agenda
Installation and Configuration Oracle Clusterware ASM and RDBMS
3 © 2008 Julian Dyke
juliandyke.com
Installationand
Configuration
4 © 2008 Julian Dyke
juliandyke.com
Cluster Verification UtilityOverview Introduced in Oracle 10.2
Checks cluster configuration stages - verifies all steps for specified stage have been
completed components - verifies specified component has been
correctly installed
Supplied with Oracle Clusterware Can be downloaded from OTN (Linux and Windows)
Also works with 10.1 (specify -10gR1 option)
For earlier versions see Metalink Note 135714.1Script to Collect RAC Diagnostic Information (racdiag.sql)
5 © 2008 Julian Dyke
juliandyke.com
Cluster Verification UtilityCVUQDISK Package On the Red Hat 4 and Enterprise Linux platforms, the
following additional RPM is required for CLUVFY
This package is supplied in the clusterware/cluvfy/rpm directory on the clusterware CD-ROM
It can also be download from OTN
cvuqdisk-1.0.1-1.rpm
On each node as the root user install the RPM using:
rpm -ivh cvuqdisk-1.0.1-1.rpm
6 © 2008 Julian Dyke
juliandyke.com
Cluster Verification UtilityStages CLUVFY stages include:
-post hwos post check for hardware and operating system
-pre cfs pre-check for CFS setup
-post cfs post-check for CFS setup
-pre crsinst pre-check for Oracle Clusterware installation
-post crsinst post-check for Oracle Clusterware installation
-pre dbinst pre-check for database installation
-pre dbcfg pre-check for database configuration
7 © 2008 Julian Dyke
juliandyke.com
Cluster Verification UtilityComponents CLUVFY components include:
nodereach Checks reachability between nodes
nodecon Checks node connectivity
cfs Checks CFS integrity
ssa Checks shared storage accessibility
space Checks space availability
sys Checks minimum system requirements
clu Checks cluster integrity
clumgr Checks cluster manager integrity
ocr Checks OCR integrity
crs Checks Oracle Clusterware (CRS) integrity
nodeapp Checks node applications exist
admprv Checks administrative privileges
peer Compares properties with peers
8 © 2008 Julian Dyke
juliandyke.com
Cluster Verification UtilityExample For example, to check configuration before installing Oracle
Clusterware on node1 and node2 use:
sh runcluvfy.sh stage -pre crsinst -n london1,london2
Checks: node reachability user equivalence administrative privileges node connectivity shared stored accessibility
If any checks fail append -verbose to display more information
9 © 2008 Julian Dyke
juliandyke.com
Cluster Verification Utility Trace & Diagnostics To enable trace in CLUVFY use:
export SRVM_TRACE = true
On Linux/Unix comment out the following line in runcluvfy.sh
# $RM -rf $CV_HOME
Trace files are written to the $CV_HOME/cv/log directory By default this directory is removed immediately after
CLUVFY is execution
Pathname of CV_HOME directory is based on operating system process e.g:
echo CV_HOME=$CV_HOME
It can be useful to echo value of CV_HOME in runcluvfy.sh:
/tmp/18124
10
© 2008 Julian Dyke
juliandyke.com
Oracle Universal Installer (OUI)Trace & Diagnostics On Unix/Linux to launch the OUI with tracing enabled use:
./runInstaller -J-DTRACING.ENABLED=true -J-DTRACING.LEVEL=2
Log files will be written to $ORACLE_BASE/oraInventory/logs
To trace root.sh execute it using:
sh -x root.sh
Note that it may be necessary to cleanup the CRS installation before executing root.sh again
11 © 2008 Julian Dyke
juliandyke.com
DBCATrace & Diagnostics To enable trace for the DBCA in Oracle 9.0.1 and above
Edit $ORACLE_HOME/bin/dbca and change
# Run DBCA$JRE_DIR/bin/jre -DORACLE_HOME=$OH -DJDBC_PROTOCOL=thin-mx64m -classpath $CLASSPATH oracle.sysman.assistants.dbca.Dbca$ARGUMENTS
# Run DBCA$JRE_DIR/bin/jre -DORACLE_HOME=$OH -DJDBC_PROTOCOL=thin-mx64m -DTRACING.ENABLED=true -DTRACING.LEVEL=2 -classpath $CLASSPATH oracle.sysman.assistants.dbca.Dbca$ARGUMENTS
to
Redirect standard output to a file e.g.
$ dbca > dbca.out &
12
© 2008 Julian Dyke
juliandyke.com
OracleClusterware
13
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareOverview Provides
Node membership services (CSS) Resource management services (CRS) Event management services (EVM)
In Oracle 10.1 and above resources include Node applications ASM Instances Database Instances Services
Node applications include: Virtual IP (VIP) Listeners Oracle Notification Service (ONS) Global Services Daemon (GSD)
14
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareVirtual IP (VIP) Node application introduced in Oracle 10.1
Allows Virtual IP address to be defined for each node
All applications connect using Virtual IP addresses
If node fails Virtual IP address is automatically relocated to another node
Only applies to newly connecting sessions
15
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareVIP (Virtual IP) Node Application
Listener1
Instance1
Listener2
Instance2
Listener1
Instance1
Listener2
Instance2
VIP1
AfterBefore
Node 1
VIP1 VIP2 VIP1 VIP2
Node 1Node 2 Node 2
16
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareVIP (Virtual IP) Node Application On Linux during normal operation, each node will have one
VIP address. For example:
[root@server3]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:11:D8:58:05:99inet addr:192.168.2.103 Bcast:192.168.2.255 Mask:255.255.255.0inet6 addr: fe80::211:d8ff:fe58:599/64 Scope:LinkUP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1RX packets:6814 errors:0 dropped:0 overruns:0 frame:0TX packets:10326 errors:0 dropped:0 overruns:0 carrier:0collisions:0 txqueuelen:1000RX bytes:684579 (668.5 KiB) TX bytes:1449071 (1.3 MiB)Interrupt:217 Base address:0x8800
eth0:1 Link encap:Ethernet HWaddr 00:11:D8:58:05:99inet addr:192.168.2.203 Bcast:192.168.2.255 Mask:255.255.255.0UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1Interrupt:217 Base address:0x8800
The resource for VIP address for 192.168.2.203 is initially running on server3
17
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareVIP (Virtual IP) Node Application If Oracle Clusterware on server3 is shutdown, the VIP
resource is transferred to another node (in this case server11)
[root@server11]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:1D:7D:A3:0A:55inet addr:192.168.2.111 Bcast:192.168.2.255 Mask:255.255.255.0inet6 addr: fe80::21d:7dff:fea3:a55/64 Scope:LinkUP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1RX packets:2792 errors:0 dropped:0 overruns:0 frame:0TX packets:4097 errors:0 dropped:0 overruns:0 carrier:0collisions:0 txqueuelen:1000RX bytes:329891 (322.1 KiB) TX bytes:593615 (579.7 KiB)Interrupt:177 Base address:0x2000
eth0:1 Link encap:Ethernet HWaddr 00:1D:7D:A3:0A:55inet addr:192.168.2.211 Bcast:192.168.2.255 Mask:255.255.255.0UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1Interrupt:177 Base address:0x2000
eth0:2 Link encap:Ethernet HWaddr 00:1D:7D:A3:0A:55inet addr:192.168.2.203 Bcast:192.168.2.255 Mask:255.255.255.0UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1Interrupt:177 Base address:0x2000
18
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareVIP Failover VIP addresses can occasionally be failed over incorrectly. For example:
HA Resource Target State----------- ------ -----ora.server11.vip application ONLINE on server11ora.server12.vip application ONLINE on server12ora.server3.vip application ONLINE on server11ora.server4.vip application ONLINE on server4
HA Resource Target State----------- ------ -----ora.server11.vip application ONLINE on server11ora.server12.vip application ONLINE on server12ora.server3.vip application ONLINE on server3ora.server4.vip application ONLINE on server4
[root@server3]# ./crs_relocate ora.server3.vip -c server3Attempting to stop `ora.server3.vip` on member `server11`Stop of `ora.server3.vip` on member `server11` succeeded.Attempting to start `ora.server3.vip` on member `server3`Start of `ora.server3.vip` on member `server3` succeeded.
19
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareLogging In Oracle 10.2, Oracle Clusterware log files are created in the
$CRS_HOME/log directory can be located on shared storage
$CRS_HOME/log directory contains subdirectory for each node e.g. $CRS_HOME/log/server6
$CRS_HOME/log/<node> directory contains: Oracle Clusterware alert log e.g. alertserver6.log client - logfiles for OCR applications including CLSCFG,
CSS, OCRCHECK, OCRCONFIG, OCRDUMP and OIFCFG crsd - logfiles for CRS daemon including crsd.log cssd - logfiles for CSS daemon including ocssd.log evmd - logfiles for EVM daemon including evmd.log racg - logfiles for node applications including VIP and ONS
20
© 2008 Julian Dyke
juliandyke.com
Oracle Clusterware Log Files Log File locations in $ORA_CRS_HOME
$ORA_CRS_HOME
log
crsd cssd evmd racgclient alert<nodename>.log
racgimon racgmainracgeut
<nodename>
21
© 2008 Julian Dyke
juliandyke.com
Oracle Clusterware Log Files Log File locations in $ORACLE_HOME (RDBMS and ASM)
$ORACLE_HOME
log
racgclient
racgimon racgmainracgeut
<nodename>
racgmdb
22
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareTroubleshooting If OCR or voting disk are not available, error files may be
created in /tmp e.g. /tmp/crsctl.4038 For example, if OCR cannot be found:
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
OCR is inaccessible - no CRS daemons will start
No errors written to log files
If Voting Disk has incorrect ownership
clsscfg_vhinit: unable(1) to open disk (/dev/raw/raw2)Internal Error Information: Category: 1234 Operation: scls_block_open Location: statfs Other: statfs failed /dev/raw/raw2 Dep: 2Failure 1 checking the Cluster Synchronization Services voting disk '/dev/raw/raw2'.Not able to read adequate number of voting disks
23
© 2008 Julian Dyke
juliandyke.com
Oracle Clusterwareracgwrap Script called on each node by SRVCTL to control resources
Copy of script in each Oracle home $ORA_CRS_HOME/bin/racgwrap $ORA_ASM_HOME/bin/racgwrap $ORACLE_HOME/bin/racgwrap
Sets environment variables Invokes racgmain executable
Generated from racgwrap.sbs Differs in each home
Sets $ORACLE_HOME and $ORACLE_BASE environment variables for racgmain
Also sets $LD_LIBRARY_PATH Enable trace by setting _USR_ORA_DEBUG to 1
24
© 2008 Julian Dyke
juliandyke.com
Oracle Clusterwareracgwrap In Unix systems the Oracle SGA is located in one or more
operating system shared memory segments Each segment is identified by a shared memory key
Shared memory key is generated by the application Each shared memory key maps to a shared memory ID
Shared memory ID is generated by operating system Shared memory segments can be displayed using ipcs -m
[root@server3] # ipcs -m------ Shared Memory Segments --------key shmid owner perms bytes nattch status0x8a48ff44 131072 oracle 640 94371840 20 0x17d04568 163841 oracle 660 2099249152 246
Oracle generates the shared memory key from the values of $ORACLE_HOME $ORACLE_SID
25
© 2008 Julian Dyke
juliandyke.com
Oracle Clusterwareracgwrap If instance is currently running e.g.
[oracle@server3]$ ps -ef | grep pmon_PROD1oracle 8653 1 0 16:13 ? 00:00:00 ora_pmon_PROD1
But SQL*Plus cannot connect to the instance
[oracle@server3]$ export ORACLE_SID=PROD1[oracle@server3]$ sqlplus / as sysdba...
Connected to idle instance
Compare $ORACLE_HOME environment variable to ORACLE_HOME variable in $ORACLE_HOME/bin/racgwrap
[oracle@server3]$ echo $ORACLE_HOME/u01/app/oracle/product/10.2.0/db_1
[oracle@server3]$ grep "^ORACLE_HOME" $ORACLE_HOME/bin/racgwrapORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1/
26
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareProcess Monitor (OPROCD) Process Monitor Daemon
Provides Cluster I/O Fencing
Implemented on Unix systems Not required with third-party clusterware
Implemented in Linux in 10.2.0.4 and above In 10.2.0.3 and below hangcheck timer module is used
Provides hangcheck timer functionality to maintain cluster integrity
Behaviour similar to hangcheck timer Runs as root Locked in memory Failure causes reboot of system See /etc/init.d/init.cssd for operating system reboot
commands
27
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareProcess Monitor (OPROCD) OPROCD takes two parameters
-t - Timeout value Length of time between executions (milliseconds) Normally defaults to 1000
-m - Margin Acceptable margin before rebooting (milliseconds) Normally defaults to 500
Parameters are specified in /etc/init.d/init.cssd OPROCD_DEFAULT_TIMEOUT=1000 OPROCD_DEFAULT_MARGIN=500
Contact Oracle Support before changing these values
28
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareProcess Monitor (OPROCD) /etc/init.d/init.cssd can increase OPROCD_DEFAULT_MARGIN
based on two CSS variables reboottime (mandatory) diagwait (optional)
Values can for these be obtained using
[root@server3]# crsctl get css reboottime[root@server3]# crsctl get css diagwait
If diagwait > reboottime then OPROCD_DEFAULT_MARGIN := (diagwait - reboottime) * 1000
Both values are reported in seconds The algorithm is
Therefore increasing diagwait will reduce frequency of reboots e.g
[root@server3]# crsctl set css diagwait 13
29
© 2008 Julian Dyke
juliandyke.com
Oracle Clusterware Heartbeats CSS maintains two heartbeats
Network heartbeat across interconnect Disk heartbeat to voting device
Disk heartbeat has an internal I/O timeout (in seconds) Varies between releases In Oracle 10.2.0.2 and above disk heartbeat timeout can be
specified by CSS disktimeout parameter Maximum time allowed for a voting file I/O to complete If exceeded file is marked offline Defaults to 200 seconds
crsctl get css disktimeoutcrsctl set css disktimeout <value>
30
© 2008 Julian Dyke
juliandyke.com
Oracle Clusterware Heartbeats Network heartbeat timeout can be specified by CSS misscount
parameter Default values (Oracle Clusterware 10.1 and 10.2) are:
Linux 60 seconds
Unix 30 seconds
Windows 30 seconds
Default value for vendor clusterware is 600 seconds
crsctl get css misscountcrsctl set css misscount <value>
31
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareHeartbeats Relationship between internal I/O timeout (IOT), MISSCOUNT
and DISKTIMEOUT varies between releases
Version Description
10.1.0.3 IOT = MISSCOUNT - 15 seconds
10.1.0.4 IOT = MISSCOUNT - 15 seconds
10.1.0.5 IOT = MISSCOUNT - 3 seconds
10.1.0.6 IOT = DISKTIMEOUT during normal operations
IOT = MISSCOUNT during initial cluster formation or reconfiguration
10.2.0.1 IOT = MISSCOUNT - 3 seconds
10.2.0.2 IOT = DISKTIMEOUT during normal operations
IOT = MISSCOUNT during initial cluster formation or reconfiguration
32
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareHeartbeats If disktimeout supported CSS will not evict a node from the
cluster when I/O to voting disk takes more than MISSCOUNT seconds unless during during initial cluster formation slightly before reconfiguration
Nodes will not be evicted as long as voting disk operations are completed within DISKTIMEOUT seconds
Network Heartbeat Disk Heartbeat Reboot
Completes within MISSCOUNT seconds
Completes within DISKTIMEOUT seconds
No
Completes within MISSCOUNT seconds
Takes more than DISKTIMEOUT seconds
Yes
Takes more than MISSCOUNT seconds
Completes within MISSCOUNT seconds
Yes
33
© 2008 Julian Dyke
juliandyke.com
Oracle Clusterware CRSCTL CRSCTL can also be used to enable and disable Oracle
Clusterware To enable Clusterware use:
# crsctl enable crs
# crsctl disable crs
To disable Clusterware use:
These commands update the following file: /etc/oracle/scls_scr/<node>/root/crsstart
34
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareCRSCTL In Oracle 10.2, CRSCTL can be used to check the current state
of Oracle Clusterware daemons To check the current state of all Oracle Clusterware daemons
# crsctl check crsCSS appears healthyCRS appears healthyEVM appears healthy
To check the current state of individual Oracle Clusterware daemons
# crsctl check cssdCSS appears healthy
# crsctl check crsdCRS appears healthy
# crsctl check evmdEVM appears healthy
35
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareCRSCTL CRSCTL can be used to manage the CSS voting disk To check the current location of the voting disk use:
# crsctl query css votedisk0. 0 /dev/raw/raw31. 0 /dev/raw/raw42. 0 /dev/raw/raw5
To add a new voting disk use:
# crsctl add css votedisk <path_name>
To delete an existing voting disk use:
# crsctl delete css votedisk <path_name>
36
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareDebugging In Oracle 10.2 and above
Oracle Clusterware debugging can be enabled and disabled for
CRS CSS EVM Resources Subcomponents
Debugging can be controlled statically using environment variables dynamically using CRSCTL
Debug settings can be persisted in OCR for use in subsequent restarts
37
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareDebugging To list modules available for debugging use:
# crsctl lsmodules crs# crsctl lsmodules css# crsctl lsmodules evm
In Oracle 11.1 modules include:
CLSVER CRS
CLUCLS CRS,EVM
COMMCRS CRS,CSS,EVM
COMMNS CRS,CSS,EVM
CRSAPP CRS
CRSCOMM CRS
CRSD CRS
CRSEVT CRS
CRSMAIN CRS
CRSOCR CRS,EVM
CRSPLACE CRS
CRSRES CRS
CRSRTI CRS
CRSTIMER CRS
CRSUI CRS
CSSCLNT CRS,EVM
CSSD CSS
EVMAGENT EVM
EVMAPP EVM
EVMCOMM EVM
EVMD EVM
EVMDMAIN EVM
EVMEVT EVM
38
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareDebugging To debug individual modules use:
# crsctl debug log crs <module>:<level>[,<module>:<level>]
For example:
# crsctl debug log crs "CRSCOMM:2,COMMCRS:2,COMMNS:2"Set CRSD Debug Module: CRSCOMM Level: 2Set CRSD Debug Module: COMMCRS Level: 2Set CRSD Debug Module: COMMNS Level: 2
Values only apply for current node Stored within OCR in SYSTEM.crs.debug.<node>.<module> For example:
# ocrdump -stdout -keyname SYSTEM.crs.debug.vm1.CRSCOMM
Log will be written to: $ORA_CRS_HOME/log/<node>/crsd/crsd.log
39
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareDebugging To debug an individual resource use:
# crsctl debug log res <resource>:<level>
For example:
# crsctl debug log res ora.vm1.vip:5Set Resource Debug Module: ora.vm1.vip Level: 5
To disable debugging again set level 0 e.g.:
# crsctl debug log res ora.vm1.vip:0Set Resource Debug Module: ora.vm1.vip Level: 0
OCR debug value is stored in USR_ORA_DEBUG To check current debug value set in OCR for ora.vm1.vip use:
# ocrdump -stdout -keyname \CRS.CUR.ora\!vm1\!vip.USR_ORA_DEBUG
Log will be written to $ORA_CRS_HOME/log/<node>/racg/<resource>.log
40
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareDebugging Debugging for CRSD and EVMD can also be configured using
environment variables To enable tracing for all modules use ORA_CRSDEBUG_ALL For example:
# export ORA_CRSDEBUG_ALL=5
To enable tracing for individual modules use ORA_CRSDEBUG_<module>
For example:
# export ORA_CRSDEBUG_CRSOCR=5
Note that these environment variables have not been implemented in OCSSD or OPROCD
41
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareDebugging In Oracle 10.1 and above debugging can also be configured in
$ORA_CRS_HOME/srvm/admin/ocrlog.ini By default this file contains:
# "mesg_logging_level" is the only supported parameter currently.# level 0 means minimum logging. Only error conditions are loggedmesg_logging_level = 0
# The last appearance of a parameter will override the previous value.# For example, log level will become 3 when the following value is uncommented.# Change to log level 3 for detailed logging from Oracle Cluster Registry# mesg_logging_level = 3
# Component log and trace level specification template#comploglvl="comp1:3;comp2:4"#comptrclvl="comp1:2;comp2:1"
Component level logging can be configured in this file e.g.:
comploglvl="OCRAPI:5;OCRCLI:5;OCRSRV:5;OCRMAS:5;OCRCAC:5"
42
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareDebugging Component level logging can also be configured in the OCR For example:
crsctl debug log crs OCRAPI:5;OCRCLI:5;OCRSRV:5;OCRMAS:5;OCRCAC:5
Components include: OCRAPI - OCR Abstraction Component OCRCAC - OCR Cache Component OCRCLI - OCR Client Component OCRMAS - OCR Master Thread Component OCRMSG - OCR Message Component OCRSRV - OCR Server Component OCRUTL - OCR Util Component
43
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareDebugging CRSCTL can also generate state dumps
crsctl debug statedump crscrsctl debug statedump csscrsctl debug statedump evm
CSS dump is written to $ORA_CRS_HOME/log/<node>/cssd/ocssd.log
Dump contents can be made more readable e.g.:
cut -c58- < ocssd.log > ocssd.dmp
44
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareOLSNODES The olsnodes utility lists all nodes currently running on the
cluster With no arguments olsnodes lists the nodes e.g.
$ olsnodeslondon1london2
In Oracle 10.2 and above, with -p argument olsnodes lists node names and private interconnect
$ olsnodes -plondon1 london1-privlondon2 london2-priv
In Oracle 10.2 and above, with -i argument olsnodes lists node names and VIP address
$ olsnodes -ilondon1 london1-viplondon2 london2-vip
45
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareOCRCONFIG In Oracle 10.1 and above the OCRCONFIG utility performs
various administrative operations on the OCR including: displaying backup history configuring backup location restoring OCR from backup exporting OCR importing OCR upgrading OCR downgrading OCR
In Oracle 10.2 and above OCRCONFIG can also manage OCR mirrors overwrite OCR files repair OCR files
46
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareOCRCONFIG Options include
Option Description Version
-help Display help message 10.1+
-showbackup Display automatic OCR physical backup history 10.1+
-backuploc Change OCR physical backup location 10.1+
-restore Restore OCR from automatic physical backup 10.1+
-export Export contents of OCR to operating system file 10.1+
-import Import contents of OCR from operating system file 10.1+
-upgrade Upgrade OCR from a previous version 10.1+
-downgrade Downgrade OCR to a previous version 10.1+
-replace Add/replace/remove OCR file or mirror 10.2+
-overwrite Overwrite OCR configuration on disk 10.2+
-repair Repair local OCR configuration 10.2+
47
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareOCRCONFIG In Oracle 10.1 and above
OCR is automatically backed up every four hours Previous three backup copies are retained Backup copy retained from end of previous day Backup copy retained from end of previous week
Check node, times and location of previous backups using the showbackup option of OCRCONFIG e.g.
# ocrconfig -showbackuplondon1 2005/08/04 11:15:29 /u01/app/oracle/product/10.2.0/crs/cdata/crslondon1 2005/08/03 22:24:32 /u01/app/oracle/product/10.2.0/crs/cdata/crslondon1 2005/08/03 18:24:32 /u01/app/oracle/product/10.2.0/crs/cdata/crslondon1 2005/08/02 18:24:32 /u01/app/oracle/product/10.2.0/crs/cdata/crslondon1 2005/07/31 18:24:32 /u01/app/oracle/product/10.2.0/crs/cdata/crs
ENSURE THAT YOU COPY THE PHYSICAL BACKUPS TO TAPE AND/OR REDUNDANT STORAGE
48
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareOCRCONFIG In Oracle 11.1 and above OCR can be backed up manually
using:
# ocrconfig -manualbackup
Backups will be written to the location specified by:
# ocrconfig -backuploc <directory_name>
Manual backups can be listed using:
# ocrconfig -showbackup manual
Automatic backups can be listed using:
# ocrconfig -showbackup auto
49
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareOCRCONFIG To restore the OCR from a physical backup copy
Check you have a suitable backup using: # ocrconfig -showbackup
Stop Oracle Clusterware on each node using:
# crsctl stop crs
Restore the backup file using
# ocrconfig -restore <filename>
For example:
# ocrconfig -restore $ORA_CRS_HOME/cdata/crs/backup00.ocr
Start Oracle Clusterware on each node using:
# crsctl start crs
50
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareOCRCHECK In Oracle 10.1 and above, you can verify the configuration of
the OCR using the OCRCHECK utility
# ocrcheckStatus of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 262144 Used space (kbytes) : 7752 Available space (kbytes) : 254392 ID : 1093363319 Device/File Name : /dev/raw/raw1 Device/File integrity check succeeded /dev/raw/raw2 Device/File integrity check succeeded Cluster registry integrity check succeeded
In Oracle 10.1 this utility does not print the ID and Device/File Name information
51
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareOCRDUMP In Oracle 10.1 and above, you can dump the contents of the
OCR using the OCRDUMP utility For example:
# ocrdump
This command writes its output to a file called OCRDUMPFILE in the current working directory
You can specify an output file name using:
# ocrdump <dump_file_name>
For example:
# ocrdump ocr_cluster1
52
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareOCRDUMP In Oracle 10.2 and above, you can write OCRDUMP output to
stdout For example:
# ocrdump -stdout
In Oracle 10.2 and above, you can optionally restrict output by specifying a key
For example:
# ocrdump -stdout SYSTEM# ocrdump -stdout SYSTEM.css# ocrdump -stdout SYSTEM.css.misscount
In Oracle 10.2 and above, you can optionally format output in XML. For example:
# ocrdump -stdout SYSTEM.css.misscount -xml
53
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareCRS_STAT The CRS_STAT utility reports the current status of resources
managed by Oracle Clusterware
Resources include: databases instances services ASM instances node applications
gsd ons vip
listeners
54
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareCRS_STAT With no arguments CRS_STAT lists all resources currently
configured e.g.:
$ crs_statNAME=ora.RAC.RAC1.instTYPE=applicationTARGET=ONLINESTATE=ONLINE on london1
NAME=ora.RAC.RAC2.instTYPE=applicationTARGET=ONLINESTATE=ONLINE on london2
NAME=ora.RAC.SERVICE1.RAC1.srvTYPE=applicationTARGET=OFFLINESTATE=OFFLINE
etc...
If a node has failed, the STATE field will show which node the applications have failed over to
55
© 2008 Julian Dyke
juliandyke.com
Oracle Clusterware CRS_STAT With the -t option, crs_stat lists resources together with their
state and the current node
Name Type Target State Host------------------------------------------------------------ora....T1.inst application ONLINE ONLINE server3ora....T2.inst application ONLINE ONLINE server4ora....T3.inst application ONLINE ONLINE server11ora....T4.inst application ONLINE ONLINE server12ora.TEST.db application ONLINE ONLINE server3ora....SM3.asm application ONLINE ONLINE server11ora....11.lsnr application ONLINE ONLINE server11ora....r11.gsd application ONLINE ONLINE server11ora....r11.ons application ONLINE ONLINE server11ora....r11.vip application ONLINE ONLINE server11ora....SM4.asm application ONLINE ONLINE server12ora....12.lsnr application ONLINE ONLINE server12ora....r12.gsd application ONLINE ONLINE server12ora....r12.ons application ONLINE ONLINE server12ora....r12.vip application ONLINE ONLINE server12
56
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareCRS_STAT With the -ls option, crs_stat lists resources together with their
owner, group and permissions.
Name Owner Primary PrivGrp Permission-----------------------------------------------------------------ora....T1.inst oracle oinstall rwxrwxr--ora....T2.inst oracle oinstall rwxrwxr--ora....T3.inst oracle oinstall rwxrwxr--ora....T4.inst oracle oinstall rwxrwxr--ora.TEST.db oracle oinstall rwxrwxr--ora....SM3.asm oracle oinstall rwxrwxr--ora....11.lsnr oracle oinstall rwxrwxr--ora....r11.gsd oracle oinstall rwxr-xr--ora....r11.ons oracle oinstall rwxr-xr--ora....r11.vip root oinstall rwxr-xr--ora....SM4.asm oracle oinstall rwxrwxr--ora....12.lsnr oracle oinstall rwxrwxr--ora....r12.gsd oracle oinstall rwxr-xr--ora....r12.ons oracle oinstall rwxr-xr--ora....r12.vip root oinstall rwxr-xr--
57
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareCRS_STAT CRS_STAT abbreviates resource names Oracle provides an AWK script that includes complete
resource names Metalink Note: 259301_1 CRS and 10g RAC
#!/bin/bash
RSC_KEY=$1QSTAT=-uAWK=/usr/bin/awk
$AWK \ 'BEGIN {printf "%-45s %-10s %-18s\n","HA Resource", "Target", "State"; printf "%-45s %-10s %-18s\n","-----------", "------", "-----";}'$ORA_CRS_HOME/bin/crs_stat $QSTAT | $AWK \ 'BEGIN { FS="="; state = 0; } $1~/NAME/ && $2~/'$RSC_KEY'/ {appname = $2; state=1}; state == 0 {next;} $1~/TARGET/ && state == 1 {apptarget = $2; state=2;} $1~/STATE/ && state == 2 {appstate = $2; state=3;} state == 3 {printf "%-45s %-10s %-18s\n", appname,apptarget,appstate;state = 0;}'
58
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareCRS_STAT
#!/usr/bin/perl$s = ".";if ($#ARGV >= 0) { $s = $ARGV[0]; chomp $s;}printf ("%-45s %-12s %-18s\n","HA Resource","Target","State");printf ("%-45s %-12s %-18s\n","-----------","------","-----");open (CRS_STAT,"crs_stat -u|");while ($line = <CRS_STAT>){
if ($line =~ m/=/){
chomp $line;($n,$v) = split (/=/,$line);if ($n eq "NAME") { $name = $v; }elsif ($n eq "TYPE") { $type = $v; }elsif ($n eq "STATE"){
$state = $v;if ($name =~ m/$s/){
printf ("%-45s %-12s %-18s\n",$name,$type,$state);}
}}
}
59
© 2008 Julian Dyke
juliandyke.com
Oracle Clusterware Permissions The CRS_GETPERM and CRS_SETPERM utilities can be used
to check and modify Oracle Clusterware permissions For example to change the owner of an instance to oracle and
group to oinstall
Check the current permissions
Set the new permissions
[root@server11]# crs_getperm ora.TEST.TEST3.inst Name: ora.TEST.TEST3.inst owner:root:rwx,pgrp:root:r-x,other::r--,
[root@server11]# crs_setperm ora.TEST.TEST3.inst -o oracle [root@server11]# crs_setperm ora.TEST.TEST3.inst -g oinstall
Check the new permissions
[root@server11]# crs_getperm ora.TEST.TEST3.inst Name: ora.TEST.TEST3.inst owner:oracle:rwx,pgrp:oinstall:r-x,other::r--,
60
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareOCR Corruptions Oracle Cluster Registry
Vulnerable to corruption Versions experiencing OCR corruptions have included:
10.1.0.3 10.2.0.2 10.2.0.3 11.1.0.6
Also experienced by many Oracle employees about 20% of UKOUG RAC & HA SIG delegates
Typical symptom is "placement error" May be related to configuration of services
Corruption may occur at an earlier date May occur when service is configured on non-master node
61
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareOCR Corruptions Recovery of corrupt OCR:
If mirror is configured: Restore from mirror using ocrconfig -overwrite See Administration and Deployment Guide for details
If backup is available: Restore from backup using ocrconfig -restore
If no backup is available: Rebuild OCR using procedure described in
Metalink Note 399482.1 - How to recreate OCR/Voting disk accidentally deleted
62
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareOCR Corruptions Rebuild procedure (adapted from Note 399482.1):
On each node shutdown Oracle Clusterware
[root@server3]# crsctl stop crs
[root@server3]# $ORA_CRS_HOME/instance/rootdelete.sh
[root@server3]# $ORA_CRS_HOME/instance/rootdeinstall.sh
[root@server3]# dd if=/dev/zero of=/dev/ocr bs=1M
Note that for a corrupt OCR it may be necessary to zero the OCR. For example:
On first node execute rootdeinstall.sh
Check that all Clusterware processes have stopped On each node execute rootdelete.sh
63
© 2008 Julian Dyke
juliandyke.com
Oracle ClusterwareOCR Corruptions Rebuild procedure (adapted from Note 399482.1) continued:
On first node execute root.sh
Use srvctl to add ASM instances Database Instance Services
Use netca to add listener Execute cluvfy to verify CRS configuration
[root@server3]# $ORA_CRS_HOME/root.sh
[root@server4]# $ORA_CRS_HOME/root.sh
[oracle@server4]$ cluvfy stage -post crsinst -n node1,node2
On remaining nodes execute root.sh
64
© 2008 Julian Dyke
juliandyke.com
ASM and
RDBMS
65
© 2008 Julian Dyke
juliandyke.com
Trace and DiagnosticsModules and Actions In Oracle 8.0 and above it is possible to specify a module and
action for any session
Modules and actions allow inefficient SQL statements to be identified and isolated more efficiently
Modules and actions are reported in STATSPACK / AWR / ASH reports V$SESSION V$SQL V$ACTIVE_SESSION_HISTORY
Current module and action for a session is reported in V$SESSION.MODULE V$SESSION.ACTION
66
© 2008 Julian Dyke
juliandyke.com
Trace and DiagnosticsDBMS_MONITOR To specify a module and action use
To specify a new action within the current module use:
DBMS_APPLICATION_INFO.SET_MODULE (
MODULE_NAME => 'MODULE1', ACTION_NAME=> 'ACTION1'
);
DBMS_APPLICATION_INFO.SET_ACTION (
ACTION_NAME=> 'ACTION2');
Modules and actions can also be specified using OCI calls
67
© 2008 Julian Dyke
juliandyke.com
Trace and DiagnosticsDBMS_MONITOR Introduced in Oracle 10.1
Contains the following subroutines SESSION_TRACE_ENABLE SESSION_TRACE_DISABLE DATABASE_TRACE_ENABLE DATABASE_TRACE_DISABLE CLIENT_ID_TRACE_ENABLE CLIENT_ID_TRACE_DISABLE CLIENT_ID_STAT_ENABLE CLIENT_ID_STAT_DISABLE SERV_MOD_ACT_TRACE_ENABLE SERV_MOD_ACT_TRACE_DISABLE SERV_MOD_ACT_STAT_ENABLE SERV_MOD_ACT_STAT_DISABLE
68
© 2008 Julian Dyke
juliandyke.com
Trace and DiagnosticsDBMS_MONITOR Trace is enabled using the following subroutines:
SESSION_TRACE_ENABLE DATABASE_TRACE_ENABLE CLIENT_ID_TRACE_ENABLE SERV_MOD_ACT_TRACE_ENABLE
By default event 10046 level 8 trace will be enabled Includes wait events
In Oracle 11.1 these subroutines have an additional PLAN_STATS parameter which specifies when row source statistics are dumped. Possible values are NEVER FIRST_EXECUTION (default) ALL_EXECUTIONS
69
© 2008 Julian Dyke
juliandyke.com
Trace and DiagnosticsDBMS_MONITOR Introduced in Oracle 10.1 To enable trace in the current session use:
To disable trace in session 42 use:
EXECUTE DBMS_MONITOR.SESSION_TRACE_ENABLE;
To enable trace in session 42 use:
EXECUTE DBMS_MONITOR.SESSION_TRACE_ENABLE(SESSION_ID => 42);
To disable trace in the current session use:
EXECUTE DBMS_MONITOR.SESSION_TRACE_DISABLE;
EXECUTE DBMS_MONITOR.SESSION_TRACE_DISABLE(SESSION_ID => 42);
70
© 2008 Julian Dyke
juliandyke.com
Trace and DiagnosticsDBMS_MONITOR Introduced in Oracle 10.2 To enable trace for the entire database use:
To disable trace for instance RAC1 use:
EXECUTE DBMS_MONITOR.DATABASE_TRACE_ENABLE;
To enable trace for instance RAC1 use:
EXECUTE DBMS_MONITOR.DATABASE_TRACE_ENABLE(INSTANCE_NAME => 'RAC1');
To disable trace for the entire database use:
EXECUTE DBMS_MONITOR.DATABASE_TRACE_DISABLE;
EXECUTE DBMS_MONITOR.DATABASE_TRACE_DISABLE(INSTANCE_NAME => 'RAC1');
71
© 2008 Julian Dyke
juliandyke.com
Trace and DiagnosticsDBMS_MONITOR Trace can be enabled for using client identifiers
Useful when many sessions connect using the same Oracle user
Useful with connection caches
To set a client identifier use DBMS_SESSION.SET_IDENTIFIER For example:
The client identifier for a specific session is reported in V$SESSION.CLIENT_IDENTIFIER
BEGINDBMS_SESSION.SET_IDENTIFIER ('CLIENT42');
END;
72
© 2008 Julian Dyke
juliandyke.com
Trace and DiagnosticsDBMS_MONITOR To enable trace for CLIENT42 use:
BEGINDBMS_MONITOR.CLIENT_ID_TRACE_ENABLE(CLIENT_ID => 'CLIENT42');
END;
To statistics collection for CLIENT42 use:
BEGINDBMS_MONITOR.CLIENT_ID_STAT_ENABLE(CLIENT_ID => 'CLIENT42');
END;
Client statistics are reported in V$CLIENT_STATS
73
© 2008 Julian Dyke
juliandyke.com
Trace and DiagnosticsDBMS_MONITOR Trace can be enabled for a specific
service service and module service, module and action
To enable trace for SERVICE1 use:
BEGINDBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE(SERVICE_NAME => 'SERVICE1');
END;
To disable trace for SERVICE1 use:
BEGINDBMS_MONITOR.SERV_MOD_ACT_TRACE_DISABLE(SERVICE_NAME => 'SERVICE1');
END;
74
© 2008 Julian Dyke
juliandyke.com
Trace and DiagnosticsDBMS_MONITOR To enable trace for MODULE1 use:
BEGINDBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE(
SERVICE_NAME => 'SERVICE1',MODULE_NAME => 'MODULE1'
);END;
To enable trace for ACTION1 use:
BEGINDBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE(
SERVICE_NAME => 'SERVICE1',MODULE_NAME => 'MODULE1',ACTION_NAME => 'ACTION1'
);END;
75
© 2008 Julian Dyke
juliandyke.com
Trace and DiagnosticsDBMS_MONITOR To enable statistics collection for MODULE1 use:
BEGINDBMS_MONITOR.SERV_MOD_ACT_STAT_ENABLE(
SERVICE_NAME => 'SERVICE1',MODULE_NAME => 'MODULE1'
);END;
To enable statistics collection for ACTION1 use:
BEGINDBMS_MONITOR.SERV_MOD_ACT_STAT_ENABLE(
SERVICE_NAME => 'SERVICE1',MODULE_NAME => 'MODULE1',ACTION_NAME => 'ACTION1'
);END;
Statistics are externalized in V$SERV_MOD_ACT_STATS
76
© 2008 Julian Dyke
juliandyke.com
Trace & DiagnosticsDBMS_MONITOR In Oracle 10.1 and above, current trace configuration is
reported in DBA_ENABLED_TRACES
TRACE_TYPE column can be CLIENT_ID SERVICE SERVICE_MODULE SERVICE_MODULE_ACTION DATABASE
Currently enabled trace aggregations are reported in DBA_ENABLED_AGGREGATIONS
77
© 2008 Julian Dyke
juliandyke.com
Trace and DiagnosticsAutomatic Diagnostic Repository In Oracle 11.1 and above the diagnostics area has been
redesigned
Diagnostics area is located in $ORACLE_BASE/diag and includes the following top-level directories asm clients crs diagtool lsnrctl netcman ofm rdbms tnslsnr
78
© 2008 Julian Dyke
juliandyke.com
Trace and DiagnosticsAutomatic Diagnostic Repository Trace directory includes
server (foreground) process trace files background process trace files alert log (text)
All trace files and alert log are written to $ORACLE_BASE/diag/rdbms/<database>/<instance>/trace
For example for database TEST $ORACLE_BASE/diag/rdbms/test/TEST1/trace
BACKGROUND_DUMP_DEST and USER_DUMP_DEST both specify same trace directory by default Deprecated in Oracle 11.1
79
© 2008 Julian Dyke
juliandyke.com
Trace and Diagnostics Automatic Diagnostic Repository V$DIAG_INFO dynamic performance view Introduced in Oracle 11.1 Returns values for the following diagnostics
Name Example Value
ADR Base /u01/app/oracle
ADR Home /u01/app/oracle/diag/rdbms/test/TEST
Active Incident Count 2
Active Problem Count 1
Default Trace File /u01/app/oracle/diag/rdbms/test/TEST/trace/TEST_ora_14003.trc
Diag Alert /u01/app/oracle/diag/rdbms/test/TEST/alert
Diag Cdump /u01/app/oracle/diag/rdbms/test/TEST/cdump
Diag Enabled TRUE
Diag Incident /u01/app/oracle/diag/rdbms/test/TEST/incident
Diag Trace /u01/app/oracle/diag/rdbms/test/TEST/trace
Health Monitor /u01/app/oracle/diag/rdbms/test/TEST/hm
80
© 2008 Julian Dyke
juliandyke.com
Trace & DiagnosticsSRVCTL In Oracle 10.1 and above, to enable trace in SRVCTL use
export SRVM_TRACE = true
By default trace is written to standard output
In Oracle 10.1 and above, the same environment variable can be used to trace: NETCA VIPCA SRVCONFIG GSDCTL CLUVFY CLUUTIL
81
© 2008 Julian Dyke
juliandyke.com
References Metalink Notes
265769.1 - Troubleshooting CRS Reboots 240001.1 - Troubleshooting CRS root.sh problems 341214.1 - How to cleanup after a failed (or successful) Oracle Clusterware
installation 294430.1 - MISSCOUNT Definition and Default Values 357808.1 - CRS Diagnostics 272331.1 - CRS 10g Diagnostic Guide 330358.1 - CRS 10g R2 Diagnostic Collection Guide 331168.1 - Clusterware consolidated logging in 10gR2 357808.1 - Diagnosibility for CRS/EVM/RACG 289690.1 - Data Gathering for Troubleshooting RAC and CRS Issues 284752.1 - Increasing CSS Misscount, Reboottime and Disktimeout 462616.1 - Reconfiguring the CSS disktimeout of 10gR2 Clusterware for
proper LUN failover 317628.1 - How to replace a corrupt OCR mirror file 279793.1 - How to restore a lost voting disk in 10g
82
© 2008 Julian Dyke
juliandyke.com
Thank you for your interest