1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke...

82
1 juliandyke.co © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant

Transcript of 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke...

Page 1: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

1

juliandyke.com

© 2008 Julian Dyke

RAC Troubleshooting

Web Version - May 2008

Julian Dyke

Independent Consultant

Page 2: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

2 © 2008 Julian Dyke

juliandyke.com

Agenda

Installation and Configuration Oracle Clusterware ASM and RDBMS

Page 3: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

3 © 2008 Julian Dyke

juliandyke.com

Installationand

Configuration

Page 4: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

4 © 2008 Julian Dyke

juliandyke.com

Cluster Verification UtilityOverview Introduced in Oracle 10.2

Checks cluster configuration stages - verifies all steps for specified stage have been

completed components - verifies specified component has been

correctly installed

Supplied with Oracle Clusterware Can be downloaded from OTN (Linux and Windows)

Also works with 10.1 (specify -10gR1 option)

For earlier versions see Metalink Note 135714.1Script to Collect RAC Diagnostic Information (racdiag.sql)

Page 5: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

5 © 2008 Julian Dyke

juliandyke.com

Cluster Verification UtilityCVUQDISK Package On the Red Hat 4 and Enterprise Linux platforms, the

following additional RPM is required for CLUVFY

This package is supplied in the clusterware/cluvfy/rpm directory on the clusterware CD-ROM

It can also be download from OTN

cvuqdisk-1.0.1-1.rpm

On each node as the root user install the RPM using:

rpm -ivh cvuqdisk-1.0.1-1.rpm

Page 6: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

6 © 2008 Julian Dyke

juliandyke.com

Cluster Verification UtilityStages CLUVFY stages include:

-post hwos post check for hardware and operating system

-pre cfs pre-check for CFS setup

-post cfs post-check for CFS setup

-pre crsinst pre-check for Oracle Clusterware installation

-post crsinst post-check for Oracle Clusterware installation

-pre dbinst pre-check for database installation

-pre dbcfg pre-check for database configuration

Page 7: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

7 © 2008 Julian Dyke

juliandyke.com

Cluster Verification UtilityComponents CLUVFY components include:

nodereach Checks reachability between nodes

nodecon Checks node connectivity

cfs Checks CFS integrity

ssa Checks shared storage accessibility

space Checks space availability

sys Checks minimum system requirements

clu Checks cluster integrity

clumgr Checks cluster manager integrity

ocr Checks OCR integrity

crs Checks Oracle Clusterware (CRS) integrity

nodeapp Checks node applications exist

admprv Checks administrative privileges

peer Compares properties with peers

Page 8: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

8 © 2008 Julian Dyke

juliandyke.com

Cluster Verification UtilityExample For example, to check configuration before installing Oracle

Clusterware on node1 and node2 use:

sh runcluvfy.sh stage -pre crsinst -n london1,london2

Checks: node reachability user equivalence administrative privileges node connectivity shared stored accessibility

If any checks fail append -verbose to display more information

Page 9: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

9 © 2008 Julian Dyke

juliandyke.com

Cluster Verification Utility Trace & Diagnostics To enable trace in CLUVFY use:

export SRVM_TRACE = true

On Linux/Unix comment out the following line in runcluvfy.sh

# $RM -rf $CV_HOME

Trace files are written to the $CV_HOME/cv/log directory By default this directory is removed immediately after

CLUVFY is execution

Pathname of CV_HOME directory is based on operating system process e.g:

echo CV_HOME=$CV_HOME

It can be useful to echo value of CV_HOME in runcluvfy.sh:

/tmp/18124

Page 10: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

10

© 2008 Julian Dyke

juliandyke.com

Oracle Universal Installer (OUI)Trace & Diagnostics On Unix/Linux to launch the OUI with tracing enabled use:

./runInstaller -J-DTRACING.ENABLED=true -J-DTRACING.LEVEL=2

Log files will be written to $ORACLE_BASE/oraInventory/logs

To trace root.sh execute it using:

sh -x root.sh

Note that it may be necessary to cleanup the CRS installation before executing root.sh again

Page 11: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

11 © 2008 Julian Dyke

juliandyke.com

DBCATrace & Diagnostics To enable trace for the DBCA in Oracle 9.0.1 and above

Edit $ORACLE_HOME/bin/dbca and change

# Run DBCA$JRE_DIR/bin/jre -DORACLE_HOME=$OH -DJDBC_PROTOCOL=thin-mx64m -classpath $CLASSPATH oracle.sysman.assistants.dbca.Dbca$ARGUMENTS

# Run DBCA$JRE_DIR/bin/jre -DORACLE_HOME=$OH -DJDBC_PROTOCOL=thin-mx64m -DTRACING.ENABLED=true -DTRACING.LEVEL=2 -classpath $CLASSPATH oracle.sysman.assistants.dbca.Dbca$ARGUMENTS

to

Redirect standard output to a file e.g.

$ dbca > dbca.out &

Page 12: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

12

© 2008 Julian Dyke

juliandyke.com

OracleClusterware

Page 13: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

13

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareOverview Provides

Node membership services (CSS) Resource management services (CRS) Event management services (EVM)

In Oracle 10.1 and above resources include Node applications ASM Instances Database Instances Services

Node applications include: Virtual IP (VIP) Listeners Oracle Notification Service (ONS) Global Services Daemon (GSD)

Page 14: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

14

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareVirtual IP (VIP) Node application introduced in Oracle 10.1

Allows Virtual IP address to be defined for each node

All applications connect using Virtual IP addresses

If node fails Virtual IP address is automatically relocated to another node

Only applies to newly connecting sessions

Page 15: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

15

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareVIP (Virtual IP) Node Application

Listener1

Instance1

Listener2

Instance2

Listener1

Instance1

Listener2

Instance2

VIP1

AfterBefore

Node 1

VIP1 VIP2 VIP1 VIP2

Node 1Node 2 Node 2

Page 16: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

16

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareVIP (Virtual IP) Node Application On Linux during normal operation, each node will have one

VIP address. For example:

[root@server3]# ifconfig

eth0 Link encap:Ethernet HWaddr 00:11:D8:58:05:99inet addr:192.168.2.103 Bcast:192.168.2.255 Mask:255.255.255.0inet6 addr: fe80::211:d8ff:fe58:599/64 Scope:LinkUP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1RX packets:6814 errors:0 dropped:0 overruns:0 frame:0TX packets:10326 errors:0 dropped:0 overruns:0 carrier:0collisions:0 txqueuelen:1000RX bytes:684579 (668.5 KiB) TX bytes:1449071 (1.3 MiB)Interrupt:217 Base address:0x8800

eth0:1 Link encap:Ethernet HWaddr 00:11:D8:58:05:99inet addr:192.168.2.203 Bcast:192.168.2.255 Mask:255.255.255.0UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1Interrupt:217 Base address:0x8800

The resource for VIP address for 192.168.2.203 is initially running on server3

Page 17: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

17

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareVIP (Virtual IP) Node Application If Oracle Clusterware on server3 is shutdown, the VIP

resource is transferred to another node (in this case server11)

[root@server11]# ifconfig

eth0 Link encap:Ethernet HWaddr 00:1D:7D:A3:0A:55inet addr:192.168.2.111 Bcast:192.168.2.255 Mask:255.255.255.0inet6 addr: fe80::21d:7dff:fea3:a55/64 Scope:LinkUP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1RX packets:2792 errors:0 dropped:0 overruns:0 frame:0TX packets:4097 errors:0 dropped:0 overruns:0 carrier:0collisions:0 txqueuelen:1000RX bytes:329891 (322.1 KiB) TX bytes:593615 (579.7 KiB)Interrupt:177 Base address:0x2000

eth0:1 Link encap:Ethernet HWaddr 00:1D:7D:A3:0A:55inet addr:192.168.2.211 Bcast:192.168.2.255 Mask:255.255.255.0UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1Interrupt:177 Base address:0x2000

eth0:2 Link encap:Ethernet HWaddr 00:1D:7D:A3:0A:55inet addr:192.168.2.203 Bcast:192.168.2.255 Mask:255.255.255.0UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1Interrupt:177 Base address:0x2000

Page 18: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

18

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareVIP Failover VIP addresses can occasionally be failed over incorrectly. For example:

HA Resource Target State----------- ------ -----ora.server11.vip application ONLINE on server11ora.server12.vip application ONLINE on server12ora.server3.vip application ONLINE on server11ora.server4.vip application ONLINE on server4

HA Resource Target State----------- ------ -----ora.server11.vip application ONLINE on server11ora.server12.vip application ONLINE on server12ora.server3.vip application ONLINE on server3ora.server4.vip application ONLINE on server4

[root@server3]# ./crs_relocate ora.server3.vip -c server3Attempting to stop `ora.server3.vip` on member `server11`Stop of `ora.server3.vip` on member `server11` succeeded.Attempting to start `ora.server3.vip` on member `server3`Start of `ora.server3.vip` on member `server3` succeeded.

Page 19: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

19

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareLogging In Oracle 10.2, Oracle Clusterware log files are created in the

$CRS_HOME/log directory can be located on shared storage

$CRS_HOME/log directory contains subdirectory for each node e.g. $CRS_HOME/log/server6

$CRS_HOME/log/<node> directory contains: Oracle Clusterware alert log e.g. alertserver6.log client - logfiles for OCR applications including CLSCFG,

CSS, OCRCHECK, OCRCONFIG, OCRDUMP and OIFCFG crsd - logfiles for CRS daemon including crsd.log cssd - logfiles for CSS daemon including ocssd.log evmd - logfiles for EVM daemon including evmd.log racg - logfiles for node applications including VIP and ONS

Page 20: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

20

© 2008 Julian Dyke

juliandyke.com

Oracle Clusterware Log Files Log File locations in $ORA_CRS_HOME

$ORA_CRS_HOME

log

crsd cssd evmd racgclient alert<nodename>.log

racgimon racgmainracgeut

<nodename>

Page 21: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

21

© 2008 Julian Dyke

juliandyke.com

Oracle Clusterware Log Files Log File locations in $ORACLE_HOME (RDBMS and ASM)

$ORACLE_HOME

log

racgclient

racgimon racgmainracgeut

<nodename>

racgmdb

Page 22: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

22

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareTroubleshooting If OCR or voting disk are not available, error files may be

created in /tmp e.g. /tmp/crsctl.4038 For example, if OCR cannot be found:

OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]

OCR is inaccessible - no CRS daemons will start

No errors written to log files

If Voting Disk has incorrect ownership

clsscfg_vhinit: unable(1) to open disk (/dev/raw/raw2)Internal Error Information: Category: 1234 Operation: scls_block_open Location: statfs Other: statfs failed /dev/raw/raw2 Dep: 2Failure 1 checking the Cluster Synchronization Services voting disk '/dev/raw/raw2'.Not able to read adequate number of voting disks

Page 23: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

23

© 2008 Julian Dyke

juliandyke.com

Oracle Clusterwareracgwrap Script called on each node by SRVCTL to control resources

Copy of script in each Oracle home $ORA_CRS_HOME/bin/racgwrap $ORA_ASM_HOME/bin/racgwrap $ORACLE_HOME/bin/racgwrap

Sets environment variables Invokes racgmain executable

Generated from racgwrap.sbs Differs in each home

Sets $ORACLE_HOME and $ORACLE_BASE environment variables for racgmain

Also sets $LD_LIBRARY_PATH Enable trace by setting _USR_ORA_DEBUG to 1

Page 24: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

24

© 2008 Julian Dyke

juliandyke.com

Oracle Clusterwareracgwrap In Unix systems the Oracle SGA is located in one or more

operating system shared memory segments Each segment is identified by a shared memory key

Shared memory key is generated by the application Each shared memory key maps to a shared memory ID

Shared memory ID is generated by operating system Shared memory segments can be displayed using ipcs -m

[root@server3] # ipcs -m------ Shared Memory Segments --------key shmid owner perms bytes nattch status0x8a48ff44 131072 oracle 640 94371840 20 0x17d04568 163841 oracle 660 2099249152 246

Oracle generates the shared memory key from the values of $ORACLE_HOME $ORACLE_SID

Page 25: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

25

© 2008 Julian Dyke

juliandyke.com

Oracle Clusterwareracgwrap If instance is currently running e.g.

[oracle@server3]$ ps -ef | grep pmon_PROD1oracle 8653 1 0 16:13 ? 00:00:00 ora_pmon_PROD1

But SQL*Plus cannot connect to the instance

[oracle@server3]$ export ORACLE_SID=PROD1[oracle@server3]$ sqlplus / as sysdba...

Connected to idle instance

Compare $ORACLE_HOME environment variable to ORACLE_HOME variable in $ORACLE_HOME/bin/racgwrap

[oracle@server3]$ echo $ORACLE_HOME/u01/app/oracle/product/10.2.0/db_1

[oracle@server3]$ grep "^ORACLE_HOME" $ORACLE_HOME/bin/racgwrapORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1/

Page 26: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

26

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareProcess Monitor (OPROCD) Process Monitor Daemon

Provides Cluster I/O Fencing

Implemented on Unix systems Not required with third-party clusterware

Implemented in Linux in 10.2.0.4 and above In 10.2.0.3 and below hangcheck timer module is used

Provides hangcheck timer functionality to maintain cluster integrity

Behaviour similar to hangcheck timer Runs as root Locked in memory Failure causes reboot of system See /etc/init.d/init.cssd for operating system reboot

commands

Page 27: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

27

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareProcess Monitor (OPROCD) OPROCD takes two parameters

-t - Timeout value Length of time between executions (milliseconds) Normally defaults to 1000

-m - Margin Acceptable margin before rebooting (milliseconds) Normally defaults to 500

Parameters are specified in /etc/init.d/init.cssd OPROCD_DEFAULT_TIMEOUT=1000 OPROCD_DEFAULT_MARGIN=500

Contact Oracle Support before changing these values

Page 28: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

28

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareProcess Monitor (OPROCD) /etc/init.d/init.cssd can increase OPROCD_DEFAULT_MARGIN

based on two CSS variables reboottime (mandatory) diagwait (optional)

Values can for these be obtained using

[root@server3]# crsctl get css reboottime[root@server3]# crsctl get css diagwait

If diagwait > reboottime then OPROCD_DEFAULT_MARGIN := (diagwait - reboottime) * 1000

Both values are reported in seconds The algorithm is

Therefore increasing diagwait will reduce frequency of reboots e.g

[root@server3]# crsctl set css diagwait 13

Page 29: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

29

© 2008 Julian Dyke

juliandyke.com

Oracle Clusterware Heartbeats CSS maintains two heartbeats

Network heartbeat across interconnect Disk heartbeat to voting device

Disk heartbeat has an internal I/O timeout (in seconds) Varies between releases In Oracle 10.2.0.2 and above disk heartbeat timeout can be

specified by CSS disktimeout parameter Maximum time allowed for a voting file I/O to complete If exceeded file is marked offline Defaults to 200 seconds

crsctl get css disktimeoutcrsctl set css disktimeout <value>

Page 30: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

30

© 2008 Julian Dyke

juliandyke.com

Oracle Clusterware Heartbeats Network heartbeat timeout can be specified by CSS misscount

parameter Default values (Oracle Clusterware 10.1 and 10.2) are:

Linux 60 seconds

Unix 30 seconds

Windows 30 seconds

Default value for vendor clusterware is 600 seconds

crsctl get css misscountcrsctl set css misscount <value>

Page 31: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

31

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareHeartbeats Relationship between internal I/O timeout (IOT), MISSCOUNT

and DISKTIMEOUT varies between releases

Version Description

10.1.0.3 IOT = MISSCOUNT - 15 seconds

10.1.0.4 IOT = MISSCOUNT - 15 seconds

10.1.0.5 IOT = MISSCOUNT - 3 seconds

10.1.0.6 IOT = DISKTIMEOUT during normal operations

IOT = MISSCOUNT during initial cluster formation or reconfiguration

10.2.0.1 IOT = MISSCOUNT - 3 seconds

10.2.0.2 IOT = DISKTIMEOUT during normal operations

IOT = MISSCOUNT during initial cluster formation or reconfiguration

Page 32: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

32

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareHeartbeats If disktimeout supported CSS will not evict a node from the

cluster when I/O to voting disk takes more than MISSCOUNT seconds unless during during initial cluster formation slightly before reconfiguration

Nodes will not be evicted as long as voting disk operations are completed within DISKTIMEOUT seconds

Network Heartbeat Disk Heartbeat Reboot

Completes within MISSCOUNT seconds

Completes within DISKTIMEOUT seconds

No

Completes within MISSCOUNT seconds

Takes more than DISKTIMEOUT seconds

Yes

Takes more than MISSCOUNT seconds

Completes within MISSCOUNT seconds

Yes

Page 33: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

33

© 2008 Julian Dyke

juliandyke.com

Oracle Clusterware CRSCTL CRSCTL can also be used to enable and disable Oracle

Clusterware To enable Clusterware use:

# crsctl enable crs

# crsctl disable crs

To disable Clusterware use:

These commands update the following file: /etc/oracle/scls_scr/<node>/root/crsstart

Page 34: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

34

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareCRSCTL In Oracle 10.2, CRSCTL can be used to check the current state

of Oracle Clusterware daemons To check the current state of all Oracle Clusterware daemons

# crsctl check crsCSS appears healthyCRS appears healthyEVM appears healthy

To check the current state of individual Oracle Clusterware daemons

# crsctl check cssdCSS appears healthy

# crsctl check crsdCRS appears healthy

# crsctl check evmdEVM appears healthy

Page 35: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

35

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareCRSCTL CRSCTL can be used to manage the CSS voting disk To check the current location of the voting disk use:

# crsctl query css votedisk0. 0 /dev/raw/raw31. 0 /dev/raw/raw42. 0 /dev/raw/raw5

To add a new voting disk use:

# crsctl add css votedisk <path_name>

To delete an existing voting disk use:

# crsctl delete css votedisk <path_name>

Page 36: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

36

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareDebugging In Oracle 10.2 and above

Oracle Clusterware debugging can be enabled and disabled for

CRS CSS EVM Resources Subcomponents

Debugging can be controlled statically using environment variables dynamically using CRSCTL

Debug settings can be persisted in OCR for use in subsequent restarts

Page 37: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

37

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareDebugging To list modules available for debugging use:

# crsctl lsmodules crs# crsctl lsmodules css# crsctl lsmodules evm

In Oracle 11.1 modules include:

CLSVER CRS

CLUCLS CRS,EVM

COMMCRS CRS,CSS,EVM

COMMNS CRS,CSS,EVM

CRSAPP CRS

CRSCOMM CRS

CRSD CRS

CRSEVT CRS

CRSMAIN CRS

CRSOCR CRS,EVM

CRSPLACE CRS

CRSRES CRS

CRSRTI CRS

CRSTIMER CRS

CRSUI CRS

CSSCLNT CRS,EVM

CSSD CSS

EVMAGENT EVM

EVMAPP EVM

EVMCOMM EVM

EVMD EVM

EVMDMAIN EVM

EVMEVT EVM

Page 38: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

38

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareDebugging To debug individual modules use:

# crsctl debug log crs <module>:<level>[,<module>:<level>]

For example:

# crsctl debug log crs "CRSCOMM:2,COMMCRS:2,COMMNS:2"Set CRSD Debug Module: CRSCOMM Level: 2Set CRSD Debug Module: COMMCRS Level: 2Set CRSD Debug Module: COMMNS Level: 2

Values only apply for current node Stored within OCR in SYSTEM.crs.debug.<node>.<module> For example:

# ocrdump -stdout -keyname SYSTEM.crs.debug.vm1.CRSCOMM

Log will be written to: $ORA_CRS_HOME/log/<node>/crsd/crsd.log

Page 39: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

39

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareDebugging To debug an individual resource use:

# crsctl debug log res <resource>:<level>

For example:

# crsctl debug log res ora.vm1.vip:5Set Resource Debug Module: ora.vm1.vip Level: 5

To disable debugging again set level 0 e.g.:

# crsctl debug log res ora.vm1.vip:0Set Resource Debug Module: ora.vm1.vip Level: 0

OCR debug value is stored in USR_ORA_DEBUG To check current debug value set in OCR for ora.vm1.vip use:

# ocrdump -stdout -keyname \CRS.CUR.ora\!vm1\!vip.USR_ORA_DEBUG

Log will be written to $ORA_CRS_HOME/log/<node>/racg/<resource>.log

Page 40: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

40

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareDebugging Debugging for CRSD and EVMD can also be configured using

environment variables To enable tracing for all modules use ORA_CRSDEBUG_ALL For example:

# export ORA_CRSDEBUG_ALL=5

To enable tracing for individual modules use ORA_CRSDEBUG_<module>

For example:

# export ORA_CRSDEBUG_CRSOCR=5

Note that these environment variables have not been implemented in OCSSD or OPROCD

Page 41: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

41

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareDebugging In Oracle 10.1 and above debugging can also be configured in

$ORA_CRS_HOME/srvm/admin/ocrlog.ini By default this file contains:

# "mesg_logging_level" is the only supported parameter currently.# level 0 means minimum logging. Only error conditions are loggedmesg_logging_level = 0

# The last appearance of a parameter will override the previous value.# For example, log level will become 3 when the following value is uncommented.# Change to log level 3 for detailed logging from Oracle Cluster Registry# mesg_logging_level = 3

# Component log and trace level specification template#comploglvl="comp1:3;comp2:4"#comptrclvl="comp1:2;comp2:1"

Component level logging can be configured in this file e.g.:

comploglvl="OCRAPI:5;OCRCLI:5;OCRSRV:5;OCRMAS:5;OCRCAC:5"

Page 42: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

42

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareDebugging Component level logging can also be configured in the OCR For example:

crsctl debug log crs OCRAPI:5;OCRCLI:5;OCRSRV:5;OCRMAS:5;OCRCAC:5

Components include: OCRAPI - OCR Abstraction Component OCRCAC - OCR Cache Component OCRCLI - OCR Client Component OCRMAS - OCR Master Thread Component OCRMSG - OCR Message Component OCRSRV - OCR Server Component OCRUTL - OCR Util Component

Page 43: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

43

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareDebugging CRSCTL can also generate state dumps

crsctl debug statedump crscrsctl debug statedump csscrsctl debug statedump evm

CSS dump is written to $ORA_CRS_HOME/log/<node>/cssd/ocssd.log

Dump contents can be made more readable e.g.:

cut -c58- < ocssd.log > ocssd.dmp

Page 44: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

44

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareOLSNODES The olsnodes utility lists all nodes currently running on the

cluster With no arguments olsnodes lists the nodes e.g.

$ olsnodeslondon1london2

In Oracle 10.2 and above, with -p argument olsnodes lists node names and private interconnect

$ olsnodes -plondon1 london1-privlondon2 london2-priv

In Oracle 10.2 and above, with -i argument olsnodes lists node names and VIP address

$ olsnodes -ilondon1 london1-viplondon2 london2-vip

Page 45: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

45

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareOCRCONFIG In Oracle 10.1 and above the OCRCONFIG utility performs

various administrative operations on the OCR including: displaying backup history configuring backup location restoring OCR from backup exporting OCR importing OCR upgrading OCR downgrading OCR

In Oracle 10.2 and above OCRCONFIG can also manage OCR mirrors overwrite OCR files repair OCR files

Page 46: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

46

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareOCRCONFIG Options include

Option Description Version

-help Display help message 10.1+

-showbackup Display automatic OCR physical backup history 10.1+

-backuploc Change OCR physical backup location 10.1+

-restore Restore OCR from automatic physical backup 10.1+

-export Export contents of OCR to operating system file 10.1+

-import Import contents of OCR from operating system file 10.1+

-upgrade Upgrade OCR from a previous version 10.1+

-downgrade Downgrade OCR to a previous version 10.1+

-replace Add/replace/remove OCR file or mirror 10.2+

-overwrite Overwrite OCR configuration on disk 10.2+

-repair Repair local OCR configuration 10.2+

Page 47: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

47

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareOCRCONFIG In Oracle 10.1 and above

OCR is automatically backed up every four hours Previous three backup copies are retained Backup copy retained from end of previous day Backup copy retained from end of previous week

Check node, times and location of previous backups using the showbackup option of OCRCONFIG e.g.

# ocrconfig -showbackuplondon1 2005/08/04 11:15:29 /u01/app/oracle/product/10.2.0/crs/cdata/crslondon1 2005/08/03 22:24:32 /u01/app/oracle/product/10.2.0/crs/cdata/crslondon1 2005/08/03 18:24:32 /u01/app/oracle/product/10.2.0/crs/cdata/crslondon1 2005/08/02 18:24:32 /u01/app/oracle/product/10.2.0/crs/cdata/crslondon1 2005/07/31 18:24:32 /u01/app/oracle/product/10.2.0/crs/cdata/crs

ENSURE THAT YOU COPY THE PHYSICAL BACKUPS TO TAPE AND/OR REDUNDANT STORAGE

Page 48: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

48

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareOCRCONFIG In Oracle 11.1 and above OCR can be backed up manually

using:

# ocrconfig -manualbackup

Backups will be written to the location specified by:

# ocrconfig -backuploc <directory_name>

Manual backups can be listed using:

# ocrconfig -showbackup manual

Automatic backups can be listed using:

# ocrconfig -showbackup auto

Page 49: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

49

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareOCRCONFIG To restore the OCR from a physical backup copy

Check you have a suitable backup using: # ocrconfig -showbackup

Stop Oracle Clusterware on each node using:

# crsctl stop crs

Restore the backup file using

# ocrconfig -restore <filename>

For example:

# ocrconfig -restore $ORA_CRS_HOME/cdata/crs/backup00.ocr

Start Oracle Clusterware on each node using:

# crsctl start crs

Page 50: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

50

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareOCRCHECK In Oracle 10.1 and above, you can verify the configuration of

the OCR using the OCRCHECK utility

# ocrcheckStatus of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 262144 Used space (kbytes) : 7752 Available space (kbytes) : 254392 ID : 1093363319 Device/File Name : /dev/raw/raw1 Device/File integrity check succeeded /dev/raw/raw2 Device/File integrity check succeeded Cluster registry integrity check succeeded

In Oracle 10.1 this utility does not print the ID and Device/File Name information

Page 51: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

51

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareOCRDUMP In Oracle 10.1 and above, you can dump the contents of the

OCR using the OCRDUMP utility For example:

# ocrdump

This command writes its output to a file called OCRDUMPFILE in the current working directory

You can specify an output file name using:

# ocrdump <dump_file_name>

For example:

# ocrdump ocr_cluster1

Page 52: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

52

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareOCRDUMP In Oracle 10.2 and above, you can write OCRDUMP output to

stdout For example:

# ocrdump -stdout

In Oracle 10.2 and above, you can optionally restrict output by specifying a key

For example:

# ocrdump -stdout SYSTEM# ocrdump -stdout SYSTEM.css# ocrdump -stdout SYSTEM.css.misscount

In Oracle 10.2 and above, you can optionally format output in XML. For example:

# ocrdump -stdout SYSTEM.css.misscount -xml

Page 53: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

53

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareCRS_STAT The CRS_STAT utility reports the current status of resources

managed by Oracle Clusterware

Resources include: databases instances services ASM instances node applications

gsd ons vip

listeners

Page 54: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

54

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareCRS_STAT With no arguments CRS_STAT lists all resources currently

configured e.g.:

$ crs_statNAME=ora.RAC.RAC1.instTYPE=applicationTARGET=ONLINESTATE=ONLINE on london1

NAME=ora.RAC.RAC2.instTYPE=applicationTARGET=ONLINESTATE=ONLINE on london2

NAME=ora.RAC.SERVICE1.RAC1.srvTYPE=applicationTARGET=OFFLINESTATE=OFFLINE

etc...

If a node has failed, the STATE field will show which node the applications have failed over to

Page 55: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

55

© 2008 Julian Dyke

juliandyke.com

Oracle Clusterware CRS_STAT With the -t option, crs_stat lists resources together with their

state and the current node

Name Type Target State Host------------------------------------------------------------ora....T1.inst application ONLINE ONLINE server3ora....T2.inst application ONLINE ONLINE server4ora....T3.inst application ONLINE ONLINE server11ora....T4.inst application ONLINE ONLINE server12ora.TEST.db application ONLINE ONLINE server3ora....SM3.asm application ONLINE ONLINE server11ora....11.lsnr application ONLINE ONLINE server11ora....r11.gsd application ONLINE ONLINE server11ora....r11.ons application ONLINE ONLINE server11ora....r11.vip application ONLINE ONLINE server11ora....SM4.asm application ONLINE ONLINE server12ora....12.lsnr application ONLINE ONLINE server12ora....r12.gsd application ONLINE ONLINE server12ora....r12.ons application ONLINE ONLINE server12ora....r12.vip application ONLINE ONLINE server12

Page 56: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

56

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareCRS_STAT With the -ls option, crs_stat lists resources together with their

owner, group and permissions.

Name Owner Primary PrivGrp Permission-----------------------------------------------------------------ora....T1.inst oracle oinstall rwxrwxr--ora....T2.inst oracle oinstall rwxrwxr--ora....T3.inst oracle oinstall rwxrwxr--ora....T4.inst oracle oinstall rwxrwxr--ora.TEST.db oracle oinstall rwxrwxr--ora....SM3.asm oracle oinstall rwxrwxr--ora....11.lsnr oracle oinstall rwxrwxr--ora....r11.gsd oracle oinstall rwxr-xr--ora....r11.ons oracle oinstall rwxr-xr--ora....r11.vip root oinstall rwxr-xr--ora....SM4.asm oracle oinstall rwxrwxr--ora....12.lsnr oracle oinstall rwxrwxr--ora....r12.gsd oracle oinstall rwxr-xr--ora....r12.ons oracle oinstall rwxr-xr--ora....r12.vip root oinstall rwxr-xr--

Page 57: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

57

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareCRS_STAT CRS_STAT abbreviates resource names Oracle provides an AWK script that includes complete

resource names Metalink Note: 259301_1 CRS and 10g RAC

#!/bin/bash

RSC_KEY=$1QSTAT=-uAWK=/usr/bin/awk

$AWK \ 'BEGIN {printf "%-45s %-10s %-18s\n","HA Resource", "Target", "State"; printf "%-45s %-10s %-18s\n","-----------", "------", "-----";}'$ORA_CRS_HOME/bin/crs_stat $QSTAT | $AWK \ 'BEGIN { FS="="; state = 0; } $1~/NAME/ && $2~/'$RSC_KEY'/ {appname = $2; state=1}; state == 0 {next;} $1~/TARGET/ && state == 1 {apptarget = $2; state=2;} $1~/STATE/ && state == 2 {appstate = $2; state=3;} state == 3 {printf "%-45s %-10s %-18s\n", appname,apptarget,appstate;state = 0;}'

Page 58: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

58

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareCRS_STAT

#!/usr/bin/perl$s = ".";if ($#ARGV >= 0) { $s = $ARGV[0]; chomp $s;}printf ("%-45s %-12s %-18s\n","HA Resource","Target","State");printf ("%-45s %-12s %-18s\n","-----------","------","-----");open (CRS_STAT,"crs_stat -u|");while ($line = <CRS_STAT>){

if ($line =~ m/=/){

chomp $line;($n,$v) = split (/=/,$line);if ($n eq "NAME") { $name = $v; }elsif ($n eq "TYPE") { $type = $v; }elsif ($n eq "STATE"){

$state = $v;if ($name =~ m/$s/){

printf ("%-45s %-12s %-18s\n",$name,$type,$state);}

}}

}

Page 59: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

59

© 2008 Julian Dyke

juliandyke.com

Oracle Clusterware Permissions The CRS_GETPERM and CRS_SETPERM utilities can be used

to check and modify Oracle Clusterware permissions For example to change the owner of an instance to oracle and

group to oinstall

Check the current permissions

Set the new permissions

[root@server11]# crs_getperm ora.TEST.TEST3.inst Name: ora.TEST.TEST3.inst owner:root:rwx,pgrp:root:r-x,other::r--,

[root@server11]# crs_setperm ora.TEST.TEST3.inst -o oracle [root@server11]# crs_setperm ora.TEST.TEST3.inst -g oinstall

Check the new permissions

[root@server11]# crs_getperm ora.TEST.TEST3.inst Name: ora.TEST.TEST3.inst owner:oracle:rwx,pgrp:oinstall:r-x,other::r--,

Page 60: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

60

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareOCR Corruptions Oracle Cluster Registry

Vulnerable to corruption Versions experiencing OCR corruptions have included:

10.1.0.3 10.2.0.2 10.2.0.3 11.1.0.6

Also experienced by many Oracle employees about 20% of UKOUG RAC & HA SIG delegates

Typical symptom is "placement error" May be related to configuration of services

Corruption may occur at an earlier date May occur when service is configured on non-master node

Page 61: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

61

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareOCR Corruptions Recovery of corrupt OCR:

If mirror is configured: Restore from mirror using ocrconfig -overwrite See Administration and Deployment Guide for details

If backup is available: Restore from backup using ocrconfig -restore

If no backup is available: Rebuild OCR using procedure described in

Metalink Note 399482.1 - How to recreate OCR/Voting disk accidentally deleted

Page 62: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

62

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareOCR Corruptions Rebuild procedure (adapted from Note 399482.1):

On each node shutdown Oracle Clusterware

[root@server3]# crsctl stop crs

[root@server3]# $ORA_CRS_HOME/instance/rootdelete.sh

[root@server3]# $ORA_CRS_HOME/instance/rootdeinstall.sh

[root@server3]# dd if=/dev/zero of=/dev/ocr bs=1M

Note that for a corrupt OCR it may be necessary to zero the OCR. For example:

On first node execute rootdeinstall.sh

Check that all Clusterware processes have stopped On each node execute rootdelete.sh

Page 63: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

63

© 2008 Julian Dyke

juliandyke.com

Oracle ClusterwareOCR Corruptions Rebuild procedure (adapted from Note 399482.1) continued:

On first node execute root.sh

Use srvctl to add ASM instances Database Instance Services

Use netca to add listener Execute cluvfy to verify CRS configuration

[root@server3]# $ORA_CRS_HOME/root.sh

[root@server4]# $ORA_CRS_HOME/root.sh

[oracle@server4]$ cluvfy stage -post crsinst -n node1,node2

On remaining nodes execute root.sh

Page 64: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

64

© 2008 Julian Dyke

juliandyke.com

ASM and

RDBMS

Page 65: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

65

© 2008 Julian Dyke

juliandyke.com

Trace and DiagnosticsModules and Actions In Oracle 8.0 and above it is possible to specify a module and

action for any session

Modules and actions allow inefficient SQL statements to be identified and isolated more efficiently

Modules and actions are reported in STATSPACK / AWR / ASH reports V$SESSION V$SQL V$ACTIVE_SESSION_HISTORY

Current module and action for a session is reported in V$SESSION.MODULE V$SESSION.ACTION

Page 66: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

66

© 2008 Julian Dyke

juliandyke.com

Trace and DiagnosticsDBMS_MONITOR To specify a module and action use

To specify a new action within the current module use:

DBMS_APPLICATION_INFO.SET_MODULE (

MODULE_NAME => 'MODULE1', ACTION_NAME=> 'ACTION1'

);

DBMS_APPLICATION_INFO.SET_ACTION (

ACTION_NAME=> 'ACTION2');

Modules and actions can also be specified using OCI calls

Page 67: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

67

© 2008 Julian Dyke

juliandyke.com

Trace and DiagnosticsDBMS_MONITOR Introduced in Oracle 10.1

Contains the following subroutines SESSION_TRACE_ENABLE SESSION_TRACE_DISABLE DATABASE_TRACE_ENABLE DATABASE_TRACE_DISABLE CLIENT_ID_TRACE_ENABLE CLIENT_ID_TRACE_DISABLE CLIENT_ID_STAT_ENABLE CLIENT_ID_STAT_DISABLE SERV_MOD_ACT_TRACE_ENABLE SERV_MOD_ACT_TRACE_DISABLE SERV_MOD_ACT_STAT_ENABLE SERV_MOD_ACT_STAT_DISABLE

Page 68: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

68

© 2008 Julian Dyke

juliandyke.com

Trace and DiagnosticsDBMS_MONITOR Trace is enabled using the following subroutines:

SESSION_TRACE_ENABLE DATABASE_TRACE_ENABLE CLIENT_ID_TRACE_ENABLE SERV_MOD_ACT_TRACE_ENABLE

By default event 10046 level 8 trace will be enabled Includes wait events

In Oracle 11.1 these subroutines have an additional PLAN_STATS parameter which specifies when row source statistics are dumped. Possible values are NEVER FIRST_EXECUTION (default) ALL_EXECUTIONS

Page 69: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

69

© 2008 Julian Dyke

juliandyke.com

Trace and DiagnosticsDBMS_MONITOR Introduced in Oracle 10.1 To enable trace in the current session use:

To disable trace in session 42 use:

EXECUTE DBMS_MONITOR.SESSION_TRACE_ENABLE;

To enable trace in session 42 use:

EXECUTE DBMS_MONITOR.SESSION_TRACE_ENABLE(SESSION_ID => 42);

To disable trace in the current session use:

EXECUTE DBMS_MONITOR.SESSION_TRACE_DISABLE;

EXECUTE DBMS_MONITOR.SESSION_TRACE_DISABLE(SESSION_ID => 42);

Page 70: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

70

© 2008 Julian Dyke

juliandyke.com

Trace and DiagnosticsDBMS_MONITOR Introduced in Oracle 10.2 To enable trace for the entire database use:

To disable trace for instance RAC1 use:

EXECUTE DBMS_MONITOR.DATABASE_TRACE_ENABLE;

To enable trace for instance RAC1 use:

EXECUTE DBMS_MONITOR.DATABASE_TRACE_ENABLE(INSTANCE_NAME => 'RAC1');

To disable trace for the entire database use:

EXECUTE DBMS_MONITOR.DATABASE_TRACE_DISABLE;

EXECUTE DBMS_MONITOR.DATABASE_TRACE_DISABLE(INSTANCE_NAME => 'RAC1');

Page 71: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

71

© 2008 Julian Dyke

juliandyke.com

Trace and DiagnosticsDBMS_MONITOR Trace can be enabled for using client identifiers

Useful when many sessions connect using the same Oracle user

Useful with connection caches

To set a client identifier use DBMS_SESSION.SET_IDENTIFIER For example:

The client identifier for a specific session is reported in V$SESSION.CLIENT_IDENTIFIER

BEGINDBMS_SESSION.SET_IDENTIFIER ('CLIENT42');

END;

Page 72: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

72

© 2008 Julian Dyke

juliandyke.com

Trace and DiagnosticsDBMS_MONITOR To enable trace for CLIENT42 use:

BEGINDBMS_MONITOR.CLIENT_ID_TRACE_ENABLE(CLIENT_ID => 'CLIENT42');

END;

To statistics collection for CLIENT42 use:

BEGINDBMS_MONITOR.CLIENT_ID_STAT_ENABLE(CLIENT_ID => 'CLIENT42');

END;

Client statistics are reported in V$CLIENT_STATS

Page 73: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

73

© 2008 Julian Dyke

juliandyke.com

Trace and DiagnosticsDBMS_MONITOR Trace can be enabled for a specific

service service and module service, module and action

To enable trace for SERVICE1 use:

BEGINDBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE(SERVICE_NAME => 'SERVICE1');

END;

To disable trace for SERVICE1 use:

BEGINDBMS_MONITOR.SERV_MOD_ACT_TRACE_DISABLE(SERVICE_NAME => 'SERVICE1');

END;

Page 74: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

74

© 2008 Julian Dyke

juliandyke.com

Trace and DiagnosticsDBMS_MONITOR To enable trace for MODULE1 use:

BEGINDBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE(

SERVICE_NAME => 'SERVICE1',MODULE_NAME => 'MODULE1'

);END;

To enable trace for ACTION1 use:

BEGINDBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE(

SERVICE_NAME => 'SERVICE1',MODULE_NAME => 'MODULE1',ACTION_NAME => 'ACTION1'

);END;

Page 75: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

75

© 2008 Julian Dyke

juliandyke.com

Trace and DiagnosticsDBMS_MONITOR To enable statistics collection for MODULE1 use:

BEGINDBMS_MONITOR.SERV_MOD_ACT_STAT_ENABLE(

SERVICE_NAME => 'SERVICE1',MODULE_NAME => 'MODULE1'

);END;

To enable statistics collection for ACTION1 use:

BEGINDBMS_MONITOR.SERV_MOD_ACT_STAT_ENABLE(

SERVICE_NAME => 'SERVICE1',MODULE_NAME => 'MODULE1',ACTION_NAME => 'ACTION1'

);END;

Statistics are externalized in V$SERV_MOD_ACT_STATS

Page 76: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

76

© 2008 Julian Dyke

juliandyke.com

Trace & DiagnosticsDBMS_MONITOR In Oracle 10.1 and above, current trace configuration is

reported in DBA_ENABLED_TRACES

TRACE_TYPE column can be CLIENT_ID SERVICE SERVICE_MODULE SERVICE_MODULE_ACTION DATABASE

Currently enabled trace aggregations are reported in DBA_ENABLED_AGGREGATIONS

Page 77: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

77

© 2008 Julian Dyke

juliandyke.com

Trace and DiagnosticsAutomatic Diagnostic Repository In Oracle 11.1 and above the diagnostics area has been

redesigned

Diagnostics area is located in $ORACLE_BASE/diag and includes the following top-level directories asm clients crs diagtool lsnrctl netcman ofm rdbms tnslsnr

Page 78: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

78

© 2008 Julian Dyke

juliandyke.com

Trace and DiagnosticsAutomatic Diagnostic Repository Trace directory includes

server (foreground) process trace files background process trace files alert log (text)

All trace files and alert log are written to $ORACLE_BASE/diag/rdbms/<database>/<instance>/trace

For example for database TEST $ORACLE_BASE/diag/rdbms/test/TEST1/trace

BACKGROUND_DUMP_DEST and USER_DUMP_DEST both specify same trace directory by default Deprecated in Oracle 11.1

Page 79: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

79

© 2008 Julian Dyke

juliandyke.com

Trace and Diagnostics Automatic Diagnostic Repository V$DIAG_INFO dynamic performance view Introduced in Oracle 11.1 Returns values for the following diagnostics

Name Example Value

ADR Base /u01/app/oracle

ADR Home /u01/app/oracle/diag/rdbms/test/TEST

Active Incident Count 2

Active Problem Count 1

Default Trace File /u01/app/oracle/diag/rdbms/test/TEST/trace/TEST_ora_14003.trc

Diag Alert /u01/app/oracle/diag/rdbms/test/TEST/alert

Diag Cdump /u01/app/oracle/diag/rdbms/test/TEST/cdump

Diag Enabled TRUE

Diag Incident /u01/app/oracle/diag/rdbms/test/TEST/incident

Diag Trace /u01/app/oracle/diag/rdbms/test/TEST/trace

Health Monitor /u01/app/oracle/diag/rdbms/test/TEST/hm

Page 80: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

80

© 2008 Julian Dyke

juliandyke.com

Trace & DiagnosticsSRVCTL In Oracle 10.1 and above, to enable trace in SRVCTL use

export SRVM_TRACE = true

By default trace is written to standard output

In Oracle 10.1 and above, the same environment variable can be used to trace: NETCA VIPCA SRVCONFIG GSDCTL CLUVFY CLUUTIL

Page 81: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

81

© 2008 Julian Dyke

juliandyke.com

References Metalink Notes

265769.1 - Troubleshooting CRS Reboots 240001.1 - Troubleshooting CRS root.sh problems 341214.1 - How to cleanup after a failed (or successful) Oracle Clusterware

installation 294430.1 - MISSCOUNT Definition and Default Values 357808.1 - CRS Diagnostics 272331.1 - CRS 10g Diagnostic Guide 330358.1 - CRS 10g R2 Diagnostic Collection Guide 331168.1 - Clusterware consolidated logging in 10gR2 357808.1 - Diagnosibility for CRS/EVM/RACG 289690.1 - Data Gathering for Troubleshooting RAC and CRS Issues 284752.1 - Increasing CSS Misscount, Reboottime and Disktimeout 462616.1 - Reconfiguring the CSS disktimeout of 10gR2 Clusterware for

proper LUN failover 317628.1 - How to replace a corrupt OCR mirror file 279793.1 - How to restore a lost voting disk in 10g

Page 82: 1 juliandyke.com © 2008 Julian Dyke RAC Troubleshooting Web Version - May 2008 Julian Dyke Independent Consultant.

82

© 2008 Julian Dyke

juliandyke.com

Thank you for your interest