Oracle DataGuard Support Issues

76
Oracle DataGuard Support Issues Brian Hitchcock OCP 10g DBA Sun Microsystems [email protected] [email protected] www.brianhitchcock.net Brian Hitchcock October 23, 2007 Page 1

description

Oracle DataGuard Support Issues. Brian Hitchcock OCP 10g DBA Sun Microsystems [email protected] [email protected]. www.brianhitchcock.net. Brian Hitchcock October 23, 2007. Page 1. Oracle DataGuard. Focus on Logical Standby Support Issues Maintains a standby database - PowerPoint PPT Presentation

Transcript of Oracle DataGuard Support Issues

Page 1: Oracle DataGuard Support Issues

Oracle DataGuardSupport Issues

Brian HitchcockOCP 10g DBA

Sun [email protected]

[email protected]

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 1

Page 2: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 2

Oracle DataGuard

Focus on Logical Standby Support Issues Maintains a standby database

– Archived redo logs on primary– Sent to standby and applied

Simple idea Many configuration options Can become very complicated

Page 3: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 3

DataGuard

Must be SYS to make changes– Sqlplus / as sysdba

Changes to DataGuard standby database– Some can’t be made while apply process running

Change Guard status

Support Issues– Create physical standby

– Convert to logical standby– After logical standby is running– Refresh process

Page 4: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 4

DataGuard Errors

DataGuard reports lot of errors– Standby database alert log– Many are for normal operation

Why reported as errors?

– Monitoring of db alert log Will report these ‘errors’ Hard to filter out normal ‘errors’

Page 5: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 5

Create Physical Standby

On Primary database– Enable Forced Logging– Create password file– Setup init.ora/spfile parameters

Can’t connect to standby SYS password

– Verify archiving enabled– Backup db (hot or cold)– Create standby control file

Page 6: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 6

Create Physical Standby

On Standby database– Copy db backup files from primary– Copy standby control file from primary– Setup init.ora/spfile parameters– Start physical standby db

Trace file

– Verify physical standby working May not ‘see’ redo logs, register them Redo logs not deleted, use RMAN

Page 7: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 7

Convert to Logical Standby

On Primary database– Build LogMiner dictionary

On Standby database– Stop redo apply

Errors, no impact– Convert database to logical standby

Two trace files– Restart db– Open resetlogs– Verify logical standby working

Page 8: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 8

Logical Standby is Running

Business requirements– Standby frozen most of the day– Standby catches up once per day

Alert log messages while catching up Disk space for archived redo logs

Other issues– Apply process is slow

How to detect, resolve– Primary versus Standby backups

Impact, resolution

Page 9: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 9

Logical Standby is Running

Other Issues– Constraint violations

Errors, resolution

– No data found Errors, resolution

– ORA-16211 Errors, Oracle Support

– Primary db XDB schema issues Fixed on primary, errors on standby

Page 10: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 10

Logical Standby is Running

Other issues– ORA-07445

Refresh cures all– Refresh process

After refresh– ORA-16211: unsupported record found in the archived redo log

– Compile invalid objects– Import into standby database

Page 11: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 11

Primary Can’t Connect

Standby not available– Reported on primary production database– “ORACLE not available”

Looks like production primary is down

– Your monitoring may need to be adjusted

Thu Oct 18 16:59:20 2007 Error 1034 received logging on to the standby Thu Oct 18 16:59:20 2007 Errors in file /shared/orahome01/admin/BRHPROD/bdump/brhprod_arc1_2635.trc: ORA-01034: ORACLE not available PING[ARC1]: Heartbeat failed to connect to standby ‘BRHPRSB'. Error is 1034.

Page 12: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 12

SYS Password Issue

Mon Oct 8 15:31:36 2007 Error 1017 received logging on to the standby ------------------------------------------------------------ Check that the primary and standby are using a password file and remote_login_passwordfile is set to SHARED or EXCLUSIVE, and that the SYS password is same in the password files. returning error ORA-16191 ------------------------------------------------------------ Mon Oct 8 15:31:36 2007 Errors in file /orahome01/admin/BRHBETA/bdump/brhbeta_arc0_2309.trc: ORA-16191: Primary log shipping client not logged on standby PING[ARC0]: Heartbeat failed to connect to standby ‘BRHBRSB'. Error is 16191.

Primary tries to connect to standby

Page 13: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 13

SYS Password Issue

Verify SYS password is the same– On primary and standby– Sqlplus sys/<password>

Verify password file has same password– On primary and standby– Cat $ORACLE_HOME/dbs/orapw<SID>

Refresh password file– Alter user SYS identified by <password>– Update password file

Page 14: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 14

DataGuard Trace File

Physical Standby– Start log apply process– Trace file created– Stops when log apply process stops

See file contents later…

Page 15: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 15

Can’t ‘See’ Redo Logs

Physical Standby Creating or Refreshing standby

– Primary configured, sending redo logs– Standby not yet created/running

Standby may not register redo logs– Our scripts maintain primary archived redo logs

Compress to save disk space, delete after 2 days

Manually register– Alter database register logfile ‘<logfile>’;– DataGuard applies redo log

Page 16: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 16

Can’t ‘See’ Redo LogsBRHBETA> select * from v$archive_gap; THREAD# LOW_SEQUENCE# HIGH_SEQUENCE# ---------- ------------- -------------- 1 1959 1976 BRHBETA> select sequence#, applied from v$archived_log order by sequence#; SEQUENCE# APP ---------- --- 1956 YES 1957 YES 1958 YES 1977 NO 1978 NO 1979 NO 1980 NO 1981 NO 1982 NO 1983 NO 1984 NO 11 rows selected.

1959 thru 1976 on standby

Page 17: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 17

Redo Logs Not Deleted

Physical Standby– After applied to standby– Unlike logical standby

SQL apply process does delete them

Use RMAN Possible disk space issues on standby

– How long will you need to store redo logs?– Not an issue if converting to logical soon

Page 18: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 18

Stop Physical Standby

Log Apply Process start– Starts trace file– When physical standby first created– Ends when log apply stops

Normal processing– Trace file looks like a problem

Page 19: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 19

Turn Off Apply Process

Physical standby– Turn off apply process

Regular maintenance Converting to Logical Standby

– Generates ‘error’ Why is this an error? Typical of DataGuard Everything seems to be an ‘error’

Even when it is perfectly routine Makes support more difficult

When is an error something to worry about?

Page 20: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 20

Standby Alert Log

Tue Oct 9 16:34:34 2007 Physical Standby Database mounted. Completed: ALTER DATABASE MOUNT Tue Oct 9 16:34:36 2007 alter database recover managed standby database disconnect from session Tue Oct 9 16:34:36 2007 Attempt to start background Managed Standby Recovery process (BRHBETA) MRP0 started with pid=11, OS id=13474 Tue Oct 9 16:34:36 2007 MRP0: Background Managed Standby Recovery process started (BRHBETA) Managed Standby Recovery not using Real Time Apply parallel recovery started with 7 processes

Log apply process started when physical standby created

Page 21: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 21

Standby Alert LogWed Oct 10 10:15:15 2007 alter database recover managed standby database cancel Wed Oct 10 10:15:19 2007 MRP0: Background Media Recovery cancelled with status 16037 Wed Oct 10 10:15:19 2007 Errors in file /orahome01/admin/BRHBETA/bdump/brhbeta_mrp0_13474.trc: ORA-16037: user requested cancel of managed recovery operation Recovery interrupted! Wed Oct 10 10:15:20 2007 Errors in file /orahome01/admin/BRHBETA/bdump/brhbeta_mrp0_13474.trc: ORA-16037: user requested cancel of managed recovery operation Wed Oct 10 10:15:20 2007 MRP0: Background Media Recovery process shutdown (BRHBETA) Wed Oct 10 10:15:21 2007 Managed Standby Recovery Canceled (BRHBETA) Wed Oct 10 10:15:21 2007 Completed: alter database recover managed standby database cancel

Log apply process stopped – preparing to convert to logical standby

Page 22: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 22

Trace File$ more /orahome01/admin/BRHBETA/bdump/brhbeta_mrp0_13474.trc /orahome01/admin/BRHBETA/bdump/brhbeta_mrp0_13474.trc Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production With the Partitioning, OLAP and Data Mining options ORACLE_HOME = /orahome01/product/10.2.0 System name: SunOS Node name: brh-beta1-zone04 Release: 5.10 Version: Generic_118833-36 Machine: sun4u Instance name: BRHBETA Redo thread mounted by this instance: 1 Oracle process number: 11 Unix process pid: 13474, image: oracle@beta1-zone04 (MRP0)

*** SERVICE NAME:() 2007-10-09 16:34:36.298 *** SESSION ID:(394.1) 2007-10-09 16:34:36.298 ARCH: Connecting to console port... *** 2007-10-09 16:34:36.299 60639 kcrr.c MRP0: Background Managed Standby Recovery process started

Start applying redo logs to physical standby

Page 23: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 23

Trace File*** 2007-10-09 16:34:41.302 1018 krsm.c Managed Recovery: Initialization posted. *** 2007-10-09 16:34:41.303 60639 kcrr.c Managed Standby Recovery not using Real Time Apply Recovery target incarnation = 2, activation ID = 0 Influx buffer limit = 27762 (50% x 55524) Successfully allocated 7 recovery slaves Using 158 overflow buffers per recovery slave Start recovery at thread 1 ckpt scn 8257757517457 logseq 1956 block 5 *** 2007-10-09 16:34:42.124 Media Recovery add redo thread 1 *** 2007-10-09 16:34:42.124 1018 krsm.c Managed Recovery: Active posted. ORA-00367: checksum error in log file header ORA-00305: log 1 of thread 1 inconsistent; belongs to another database ORA-00312: online log 1 thread 1: '/shared/oralogs01/BRHBETA/redo01a.log' *** 2007-10-09 16:34:42.147 60639 kcrr.c Clearing online redo logfile 1 /shared/oralogs01/BRHBETA/redo01a.log *** 2007-10-09 16:36:15.066 *** 2007-10-09 16:36:15.066 60639 kcrr.c Clearing online redo logfile 1 complete ORA-00367: checksum error in log file header ORA-00305: log 2 of thread 1 inconsistent; belongs to another database ORA-00312: online log 2 thread 1: '/shared/oralogs01/BRHBETA/redo02a.log'

Recreating redo logs

Page 24: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 24

Trace File*** 2007-10-09 16:36:15.100 60639 kcrr.c Clearing online redo logfile 2 /shared/oralogs01/BRHBETA/redo02a.log *** 2007-10-09 16:37:51.473 *** 2007-10-09 16:37:51.473 60639 kcrr.c Clearing online redo logfile 2 complete ORA-00367: checksum error in log file header ORA-00305: log 3 of thread 1 inconsistent; belongs to another database ORA-00312: online log 3 thread 1: '/shared/oradata02/BRHBETA/redo03b.log' *** 2007-10-09 16:37:51.479 60639 kcrr.c Clearing online redo logfile 3 /shared/oradata02/BRHBETA/redo03b.log *** 2007-10-09 16:39:26.048 *** 2007-10-09 16:39:26.048 60639 kcrr.c Clearing online redo logfile 3 complete ORA-00367: checksum error in log file header ORA-00305: log 4 of thread 1 inconsistent; belongs to another database ORA-00312: online log 4 thread 1: '/shared/oradata02/BRHBETA/redo04b.log' *** 2007-10-09 16:39:26.488 60639 kcrr.c Clearing online redo logfile 4 /shared/oradata02/BRHBETA/redo04b.log *** 2007-10-09 16:41:00.447 *** 2007-10-09 16:41:00.447 60639 kcrr.c Clearing online redo logfile 4 complete *** 2007-10-09 16:41:00.469 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1956

Page 25: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 25

Trace File*** 2007-10-09 16:41:00.469 60639 kcrr.c Fetching gap sequence in thread 1, gap sequence 1956-1976 *** 2007-10-09 16:41:30.782 ----------------------------------------------------------- Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization parameter is defined to a value that is sufficiently large enough to maintain adequate log switch information to resolve archivelog gaps. ----------------------------------------------------------- *** 2007-10-09 16:54:31.045 *** 2007-10-09 16:54:31.045 60639 kcrr.c Fetching gap sequence in thread 1, gap sequence 1956-1956 *** 2007-10-09 16:55:01.154 ----------------------------------------------------------- Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization parameter is defined to a value that is sufficiently large enough to maintain adequate log switch information to resolve archivelog gaps. ----------------------------------------------------------- *** 2007-10-09 16:56:31.179 Media Recovery Log /oraarch01/BRHBETA/LOG_1956_1_629245032.arc *** 2007-10-09 16:56:33.431 Media Recovery Log /oraarch01/BRHBETA/LOG_1957_1_629245032.arc *** 2007-10-09 16:56:44.495

Applying redo logs to physical standby

Page 26: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 26

Trace File*** 2007-10-09 16:56:44.495 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1958 *** 2007-10-09 16:56:44.495 60639 kcrr.c Fetching gap sequence in thread 1, gap sequence 1958-1976 *** 2007-10-09 16:57:14.647 ----------------------------------------------------------- Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization parameter is defined to a value that is sufficiently large enough to maintain adequate log switch information to resolve archivelog gaps. ----------------------------------------------------------- *** 2007-10-09 17:05:14.785 Media Recovery Log /oraarch01/BRHBETA/LOG_1958_1_629245032.arc *** 2007-10-09 17:05:18.043 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1959 *** 2007-10-09 17:05:18.043 60639 kcrr.c Fetching gap sequence in thread 1, gap sequence 1959-1976 *** 2007-10-09 17:05:48.284 ----------------------------------------------------------- Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization parameter is defined to a value that is sufficiently large enough to maintain adequate log switch information to resolve archivelog gaps. -----------------------------------------------------------

Page 27: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 27

Trace File*** 2007-10-09 17:07:18.309 Media Recovery Log /oraarch01/BRHBETA/LOG_1959_1_629245032.arc *** 2007-10-09 17:07:21.114 Media Recovery Log /oraarch01/BRHBETA/LOG_1960_1_629245032.arc *** 2007-10-09 17:07:22.945 Media Recovery Log /oraarch01/BRHBETA/LOG_1961_1_629245032.arc *** 2007-10-09 17:07:27.300 Media Recovery Log /oraarch01/BRHBETA/LOG_1962_1_629245032.arc *** 2007-10-09 17:07:29.637 Media Recovery Log /oraarch01/BRHBETA/LOG_1963_1_629245032.arc *** 2007-10-09 17:07:29.709 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1964 *** 2007-10-09 17:07:29.709 60639 kcrr.c Fetching gap sequence in thread 1, gap sequence 1964-1976 *** 2007-10-09 17:07:59.858 ----------------------------------------------------------- Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization parameter is defined to a value that is sufficiently large enough to maintain adequate log switch information to resolve archivelog gaps. ----------------------------------------------------------- *** 2007-10-09 17:08:29.866 Media Recovery Log /oraarch01/BRHBETA/LOG_1964_1_629245032.arc

Page 28: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 28

Trace File*** 2007-10-09 17:08:31.924 Media Recovery Log /oraarch01/BRHBETA/LOG_1965_1_629245032.arc *** 2007-10-09 17:09:12.510 Media Recovery Log /oraarch01/BRHBETA/LOG_1966_1_629245032.arc *** 2007-10-09 17:09:21.050 Media Recovery Log /oraarch01/BRHBETA/LOG_1967_1_629245032.arc *** 2007-10-09 17:09:40.234 Media Recovery Log /oraarch01/BRHBETA/LOG_1968_1_629245032.arc *** 2007-10-09 17:09:45.055 Media Recovery Log /oraarch01/BRHBETA/LOG_1969_1_629245032.arc *** 2007-10-09 17:09:50.572 Media Recovery Log /oraarch01/BRHBETA/LOG_1970_1_629245032.arc *** 2007-10-09 17:09:58.968 Media Recovery Log /oraarch01/BRHBETA/LOG_1971_1_629245032.arc *** 2007-10-09 17:10:03.922 Media Recovery Log /oraarch01/BRHBETA/LOG_1972_1_629245032.arc *** 2007-10-09 17:10:13.196 Media Recovery Log /oraarch01/BRHBETA/LOG_1973_1_629245032.arc *** 2007-10-09 17:10:21.927 Media Recovery Log /oraarch01/BRHBETA/LOG_1974_1_629245032.arc *** 2007-10-09 17:10:34.064 Media Recovery Log /oraarch01/BRHBETA/LOG_1975_1_629245032.arc *** 2007-10-09 17:10:42.420 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1976

Page 29: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 29

Trace File*** 2007-10-09 17:10:42.421 60639 kcrr.c Fetching gap sequence in thread 1, gap sequence 1976-1976 *** 2007-10-09 17:11:12.538 ----------------------------------------------------------- Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization parameter is defined to a value that is sufficiently large enough to maintain adequate log switch information to resolve archivelog gaps. ----------------------------------------------------------- *** 2007-10-09 17:12:42.563 Media Recovery Log /oraarch01/BRHBETA/LOG_1976_1_629245032.arc *** 2007-10-09 17:12:45.563 Media Recovery Log /oraarch01/BRHBETA/LOG_1977_1_629245032.arc *** 2007-10-09 17:12:48.534 Media Recovery Log /oraarch01/BRHBETA/LOG_1978_1_629245032.arc *** 2007-10-09 17:13:00.505 Media Recovery Log /oraarch01/BRHBETA/LOG_1979_1_629245032.arc *** 2007-10-09 17:13:02.054 Media Recovery Log /oraarch01/BRHBETA/LOG_1980_1_629245032.arc *** 2007-10-09 17:13:03.231 Media Recovery Log /oraarch01/BRHBETA/LOG_1981_1_629245032.arc *** 2007-10-09 17:13:03.902 Media Recovery Log /oraarch01/BRHBETA/LOG_1982_1_629245032.arc

Page 30: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 30

Trace File*** 2007-10-09 17:13:04.492 Media Recovery Log /oraarch01/BRHBETA/LOG_1983_1_629245032.arc *** 2007-10-09 17:13:08.171 Media Recovery Log /oraarch01/BRHBETA/LOG_1984_1_629245032.arc *** 2007-10-09 17:13:26.860 *** 2007-10-09 17:13:26.860 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1985 *** 2007-10-09 17:16:07.172 Media Recovery Log /oraarch01/BRHBETA/LOG_1985_1_629245032.arc *** 2007-10-09 17:16:08.067 Media Recovery Log /oraarch01/BRHBETA/LOG_1986_1_629245032.arc *** 2007-10-09 17:16:08.131 Media Recovery Log /oraarch01/BRHBETA/LOG_1987_1_629245032.arc *** 2007-10-09 17:16:08.195 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1988 *** 2007-10-09 17:16:13.202 Media Recovery Log /oraarch01/BRHBETA/LOG_1988_1_629245032.arc *** 2007-10-09 17:16:13.268 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1989 *** 2007-10-09 21:14:01.119 Media Recovery Log /oraarch01/BRHBETA/LOG_1989_1_629245032.arc *** 2007-10-09 21:14:16.922 *** 2007-10-09 21:14:16.922 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1990

Page 31: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 31

Trace File*** 2007-10-10 09:32:33.399 *** 2007-10-10 09:32:33.399 60639 kcrr.c Fetching gap sequence in thread 1, gap sequence 1990-1990 *** 2007-10-10 09:33:05.187 Media Recovery Log /oraarch01/BRHBETA/LOG_1990_1_629245032.arc *** 2007-10-10 09:33:22.505 Media Recovery Log /oraarch01/BRHBETA/LOG_1991_1_629245032.arc *** 2007-10-10 09:33:22.570 Media Recovery Log /oraarch01/BRHBETA/LOG_1992_1_629245032.arc *** 2007-10-10 09:33:22.631 Media Recovery Log /oraarch01/BRHBETA/LOG_1993_1_629245032.arc *** 2007-10-10 09:33:22.693 Media Recovery Log /oraarch01/BRHBETA/LOG_1994_1_629245032.arc *** 2007-10-10 09:33:22.761 Media Recovery Log /oraarch01/BRHBETA/LOG_1995_1_629245032.arc *** 2007-10-10 09:33:22.807 Media Recovery Log /oraarch01/BRHBETA/LOG_1996_1_629245032.arc *** 2007-10-10 09:33:22.864 Media Recovery Log /oraarch01/BRHBETA/LOG_1997_1_629245032.arc *** 2007-10-10 09:33:22.918 Media Recovery Log /oraarch01/BRHBETA/LOG_1998_1_629245032.arc *** 2007-10-10 09:33:23.199 Media Recovery Log /oraarch01/BRHBETA/LOG_1999_1_629245032.arc

Page 32: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 32

Trace File*** 2007-10-10 09:33:23.255 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 2000 *** 2007-10-10 10:11:07.685 Media Recovery Log /oraarch01/BRHBETA/LOG_2000_1_629245032.arc *** 2007-10-10 10:11:08.422 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 2001 *** 2007-10-10 10:14:48.843 Media Recovery Log /oraarch01/BRHBETA/LOG_2001_1_629245032.arc *** 2007-10-10 10:14:49.013 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 2002 *** 2007-10-10 10:15:19.072 *** 2007-10-10 10:15:19.072 60639 kcrr.c MRP0: Background Media Recovery cancelled with status 16037 ORA-16037: user requested cancel of managed recovery operation ----- Redo read statistics for thread 1 ----- Read rate (ASYNC): 619732Kb in 63640.12s => 0.01 Mb/sec Total physical reads: 619732Kb Longest record: 28Kb, moves: 0/2001133 (0%) Change moves: 779641/4101685 (19%), moved: 141Mb Longest LWN: 1023Kb, moves: 117/175493 (0%), moved: 23Mb Last redo scn: 0x0782.a8f27f37 (8257761607479) ---------------------------------------------- *** 2007-10-10 10:15:19.088 Media Recovery drop redo thread 1

Stop Log Apply ProcessReady to convert to Logical Standby

Page 33: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 33

Trace File

*** 2007-10-10 10:15:20.864 1018 krsm.c Managed Recovery: Not Active posted. ORA-16037: user requested cancel of managed recovery operation ARCH: Connecting to console port... *** 2007-10-10 10:15:20.871 60639 kcrr.c MRP0: Background Media Recovery process shutdown *** 2007-10-10 10:15:20.871 1018 krsm.c oraarch01/BRHBETA $

Page 34: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 34

Convert to Logical Standby SQL Apply Process

– When applying redo logs– Generates 2 trace files– What are they?

Trace files– One shows start of kcrrwkx– Second shows end of kcrrwkx– What are these for?– Neither show up in alert log– Both continue as long as SQL apply process runs

Page 35: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 35

First Trace File/orahome01/admin/BRHBETA/bdump $ more brhbeta_arc0_13168.trc /orahome01/admin/BRHBETA/bdump/brhbeta_arc0_13168.trc Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production With the Partitioning, OLAP and Data Mining options ORACLE_HOME = /orahome01/product/10.2.0 System name: SunOS Node name: brh-beta1-zone04 Release: 5.10 Version: Generic_118833-36 Machine: sun4u Instance name: BRHBETA Redo thread mounted by this instance: 1 Oracle process number: 24 Unix process pid: 13168, image: oracle@beta1-zone04 (ARC0)

*** SERVICE NAME:() 2007-10-10 10:40:26.358 *** SESSION ID:(188.2) 2007-10-10 10:40:26.358 kcrrwkx: nothing to do (start) *** 2007-10-10 10:45:26.240 kcrrwkx: nothing to do (start) *** 2007-10-10 10:46:35.388 kcrrwkx: nothing to do (end) ... ...

Page 36: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 36

Second Trace File/orahome01/admin/BRHBETA/bdump $ more brhbeta_arc1_13170.trc /orahome01/admin/BRHBETA/bdump/brhbeta_arc1_13170.trc Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production With the Partitioning, OLAP and Data Mining options ORACLE_HOME = /orahome01/product/10.2.0 System name: SunOS Node name: brh-beta1-zone04 Release: 5.10 Version: Generic_118833-36 Machine: sun4u Instance name: BRHBETA Redo thread mounted by this instance: 1 Oracle process number: 9 Unix process pid: 13170, image: oracle@beta1-zone04 (ARC1)

*** SERVICE NAME:() 2007-10-10 10:40:26.358 *** SESSION ID:(396.1) 2007-10-10 10:40:26.358 kcrrwkx: nothing to do (start) *** 2007-10-10 10:41:26.315 kcrrwkx: nothing to do (end) *** 2007-10-10 10:42:26.322 kcrrwkx: nothing to do (end) ……

Page 37: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 37

DataGuard Likes to Chat Physical Standby

– While applying archived redo logs– Trace file documents everything standby does

Logical Standby– Once converted to logical standby– Two trace files generated– Contain messages for start/stop of each log

apply Why are these generated?

– Why not have DataGuard alert logs?– Trace files tell me that something is wrong

Page 38: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 38

Normal Operation

Logical Standby catching up to Primary– Apply process turned off during the day

Catches up at night

– Apply process failed Catch up after fix (skip table in the example)

Typical alert log messages– Redo log from primary registered with DG– Redo logs applied to standby– Redo logs deleted from standby

Page 39: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 39

Standby Catching UpTue Oct 16 15:13:22 2007 Completed: ALTER DATABASE STOP LOGICAL STANDBY APPLY Tue Oct 16 15:14:16 2007 Incremental checkpoint up to RBA [0x7.a0aa2.0], current log tail at RBA [0x7.b8e2c.0] Tue Oct 16 15:14:45 2007 ALTER DATABASE START LOGICAL STANDBY APPLY Tue Oct 16 15:14:45 2007 ALTER DATABASE START LOGICAL STANDBY APPLY (BRHBETA) Tue Oct 16 15:14:45 2007 No optional part Attempt to start background Logical Standby process LSP0 started with pid=21, OS id=5041 LOGSTDBY status: ORA-16111: log mining and apply setting up Tue Oct 16 15:14:46 2007 LOGMINER: Parameters summary for session# = 1 LOGMINER: Number of processes = 3, Transaction Chunk Size = 201 LOGMINER: Memory Size = 30M, Checkpoint interval = 150M Tue Oct 16 15:14:46 2007 Completed: ALTER DATABASE START LOGICAL STANDBY APPLY LOGMINER: session# = 1, builder process P001 started with pid=7 OS id=10018 LOGMINER: session# = 1, reader process P000 started with pid=34 OS id=10014 LOGMINER: session# = 1, preparer process P002 started with pid=36 OS id=10020 LSP2 started with pid=23, OS id=5043

Stop SQL Apply process

Start SQL Apply process after skipping table

Page 40: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 40

Standby Catching UpTue Oct 16 15:14:48 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2048_1_629245032.arc LOGSTDBY Analyzer process P003 started with pid=13 OS id=10051 Tue Oct 16 15:14:48 2007 LOGMINER: Turning ON Log Auto Delete LOGSTDBY Apply process P004 started with pid=40 OS id=10054 LOGSTDBY Apply process P006 started with pid=42 OS id=10062 LOGSTDBY Apply process P007 started with pid=17 OS id=10064 LOGSTDBY Apply process P005 started with pid=15 OS id=10060 Tue Oct 16 15:22:02 2007 Beginning log switch checkpoint up to RBA [0x8.2.10], SCN: 8295181217591 Thread 1 advanced to log sequence 8 Current log# 4 seq# 8 mem# 0: /shared/oradata02/BRHBETA/redo04b.log Current log# 4 seq# 8 mem# 1: /shared/oralogs01/BRHBETA/redo04a.log Tue Oct 16 15:25:28 2007 Completed checkpoint up to RBA [0x8.2.10], SCN: 8295181217591 Tue Oct 16 15:34:32 2007 Incremental checkpoint up to RBA [0x8.4cbae.0], current log tail at RBA [0x8.65553.0] Tue Oct 16 15:42:40 2007 LOGMINER: End mining logfile: /oraarch01/BRHBETA/LOG_2048_1_629245032.arc Tue Oct 16 15:42:40 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2049_1_629245032.arc ... ... ...

Processing redo logs

Page 41: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 41

Standby Catching UpTue Oct 16 17:20:48 2007 LOGMINER: End mining logfile: /oraarch01/BRHBETA/LOG_2049_1_629245032.arc Tue Oct 16 17:20:48 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2050_1_629245032.arc Tue Oct 16 17:20:54 2007 LOGMINER: End mining logfile: /oraarch01/BRHBETA/LOG_2050_1_629245032.arc ... ... ... Tue Oct 16 18:39:13 2007 LOGMINER: Log Auto Delete - deleting: /oraarch01/BRHBETA/LOG_2048_1_629245032.arc Deleted file /oraarch01/BRHBETA/LOG_2048_1_629245032.arc ... ... ... Tue Oct 16 18:43:40 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2082_1_629245032.arc Tue Oct 16 18:43:59 2007 LOGMINER: End mining logfile: /oraarch01/BRHBETA/LOG_2082_1_629245032.arc Tue Oct 16 18:43:59 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2083_1_629245032.arc

Processing redo logs

Deleting redo logs

Processing redo logs

Page 42: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 42

Standby Catching Up

Tue Oct 16 18:44:01 2007 LOGMINER: Log Auto Delete - deleting: /oraarch01/BRHBETA/LOG_2056_1_629245032.arc Deleted file /oraarch01/BRHBETA/LOG_2056_1_629245032.arc Tue Oct 16 18:44:01 2007 LOGMINER: Log Auto Delete - deleting: /oraarch01/BRHBETA/LOG_2057_1_629245032.arc Deleted file /oraarch01/BRHBETA/LOG_2057_1_629245032.arc Tue Oct 16 18:44:01 2007 LOGMINER: Log Auto Delete - deleting: /oraarch01/BRHBETA/LOG_2058_1_629245032.arc Deleted file /oraarch01/BRHBETA/LOG_2058_1_629245032.arc ... ……

Deleting redo logs

Page 43: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 43

Standby Catching Up

Tue Oct 16 18:44:15 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2087_1_629245032.arc Tue Oct 16 18:48:37 2007 Completed checkpoint up to RBA [0xa.2.10], SCN: 8295181577382 Tue Oct 16 18:55:18 2007 Incremental checkpoint up to RBA [0xa.12dad.0], current log tail at RBA [0xa.1314b.0] Tue Oct 16 19:01:31 2007 RFS[1]: No standby redo logfiles created RFS[1]: Archived Log: '/oraarch01/BRHBETA/LOG_2153_1_629245032.arc' Tue Oct 16 19:01:32 2007 RFS LogMiner: Registered logfile [/oraarch01/BRHBETA/LOG_2153_1_629245032.arc] to LogMiner session id [1] Tue Oct 16 19:15:22 2007 Incremental checkpoint up to RBA [0xa.142b2.0], current log tail at RBA [0xa.143fe.0] Tue Oct 16 19:29:01 2007 LSP0: warning -- apply server 2, sid 384 waiting on user sid 196 for event (since 0 seconds): Tue Oct 16 19:29:01 2007 LOGMINER: End mining logfile: /oraarch01/BRHBETA/LOG_2087_1_629245032.arc ... ……

Primary is at 2153

Standby is at 2087

Page 44: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 44

Standby Catching UpTue Oct 16 19:30:58 2007 LOGSTDBY stmt: CREATE PFILE = '/tmp/datatools/BRHBETA.PFILE.19144.1192413665' FROM SPFILE =… LOGSTDBY status: ORA-16226: DDL skipped due to lack of support LOGSTDBY id: XID 0x0003.02d.00013e70, hSCN 0x0782.a9c2fdb8, lSCN 0x0782.a9c2fdb8, Thread 1, RBA …LOGSTDBY stmt: create pfile='/orahome01/oradba/tmp/ora_adm_sqlbt_bkp.tmp1.17449.BRHBETA' from spfile …LOGSTDBY status: ORA-16226: DDL skipped due to lack of support LOGSTDBY id: XID 0x000b.001.000126cf, hSCN 0x0782.a9c2fe15, lSCN 0x0782.a9c2fe15, Thread 1, RBA… LOGSTDBY stmt: CREATE PFILE = '/tmp/datatools/BRHBETA.PFILE.19695.1192413687' FROM SPFILE = …LOGSTDBY status: ORA-16226: DDL skipped due to lack of support LOGSTDBY id: XID 0x0003.00c.00013e62, hSCN 0x0782.a9c2fe4a, lSCN 0x0782.a9c2fe4a, Thread 1, RBA… LOGSTDBY stmt: ALTER DATABASE BACKUP CONTROLFILE TO '/tmp/datatools/dtodump_…LOGSTDBY status: ORA-16226: DDL skipped due to lack of support LOGSTDBY id: XID 0x0009.007.00011453, hSCN 0x0782.a9c2feb4, lSCN 0x0782.a9c2feb4, Thread 1, RBA… Tue Oct 16 19:30:58 2007 ALTER TABLESPACE "SYSTEM" BEGIN BACKUP Completed: ALTER TABLESPACE "SYSTEM" BEGIN BACKUP Tue Oct 16 19:30:58 2007 ALTER TABLESPACE "SYSTEM" END BACKUP Completed: ALTER TABLESPACE "SYSTEM" END BACKUP ... ……

Unsupported DDL – Standby doesn’t execute

Page 45: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 45

Standby Catching UpTue Oct 16 21:29:19 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2157_1_629245032.arc Tue Oct 16 21:30:03 2007 LOGMINER: End mining logfile: /oraarch01/BRHBETA/LOG_2157_1_629245032.arc Tue Oct 16 21:35:52 2007 Incremental checkpoint up to RBA [0xa.f41b7.0], current log tail at RBA [0xa.f41cc.0] Tue Oct 16 21:55:56 2007 Incremental checkpoint up to RBA [0xa.f43b5.0], current log tail at RBA [0xa.f43b5.0] Tue Oct 16 22:11:16 2007 RFS[1]: No standby redo logfiles created RFS[1]: Archived Log: '/oraarch01/BRHBETA/LOG_2158_1_629245032.arc' Tue Oct 16 22:11:16 2007 RFS LogMiner: Registered logfile [/oraarch01/BRHBETA/LOG_2158_1_629245032.arc] to LogMiner session id [1] Tue Oct 16 22:11:16 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2158_1_629245032.arc Tue Oct 16 22:11:20 2007 LOGMINER: End mining logfile: /oraarch01/BRHBETA/LOG_2158_1_629245032.arc

Standby catches up at 2158

Page 46: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 46

Archived Redo Logs

Logical Standby– After applied to standby– SQL apply process does delete them

Unlike physical standby

Possible disk space issues on standby– How long will you need to store redo logs?– If standby frozen all day

Weekends? Holidays?

– If standby fails How many days to fix failures?

Page 47: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 47

Archived Redo Logs

How long are redo logs available on primary?– If not on disk when needed for standby– Recover from backup– Dataguard may not see these redo logs

Register redo logs

Logical standby– Also generates its own archived redo logs– Needed to recover standby db

Unique standby db objects?

Page 48: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 48

SQL Apply Process Slow

Detect long-running transaction Compute estimate of time to complete Identify and skip problem table

Page 49: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 49

Long Running Transaction

Standby Alert Log– SQL apply process applying redo log 2049– Doesn’t move on within a few minutes– Current time is Tue Oct 16 08:09:55 2007 – Shows start time for this redo log

– Has been processing for over 24 hours

Mon Oct 15 05:52:29 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2049_1_629245032.arc

Page 50: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 50

Long Running Transaction

What is apply process doing? Check redo logs waiting to be applied Where is processing in current redo log?

– How long to complete current redo log?

Page 51: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 51

Long Running Transactionalter session set nls_date_format = 'DD-Mon-YYYY hh24:mi:ss'; column first_change# format 99999999999999999999 column next_change# format 99999999999999999999 column resetlogs_change# format 99999999999999999999 select * from dba_logstdby_log;

BRHBETA> SELECT TYPE, STATUS, HIGH_SCN FROM V$LOGSTDBY;

TYPE ------------------------------ STATUS --------------------------------------------------------------------------------------------------------------HIGH_SCN --------------------- COORDINATOR ORA-16116: no work available 8257767540953 READER ORA-16127: stalled waiting for additional transactions to be applied 8257767541085 BUILDER ORA-16127: stalled waiting for additional transactions to be applied 8257767540965

Page 52: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 52

Long Running TransactionPREPARER ORA-16127: stalled waiting for additional transactions to be applied 8257767540965 ANALYZER ORA-16117: processing 8257767540953 APPLIER ORA-16116: no work available 8257767539467 APPLIER ORA-16116: no work available 8257767512259

APPLIER ORA-16113: applying change to table or sequence "BRH"."XXSUN_BRH_COMPS_INT" 8257767539247

APPLIER ORA-16116: no work available 8257767512262 9 rows selected. BRHBETA>

Page 53: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 53

Long Running TransactionBRHBETA> select * from dba_logstdby_log; THREAD# RESETLOGS_CHANGE# RESETLOGS_ID SEQUENCE# FIRST_CHANGE# NEXT_CHANGE# ---------- --------------------- ------------ ---------- --------------------- --------------------- -------------------------- ---------- FILE_NAME TIMESTAMP DIC DIC APPLIED 1 8257200902826 629245032 2048 8257767447753 8257767534297 12-Oct-2007 22:41:05 12-Oct-2007 23:18:23 /oraarch01/BRHBETA/LOG_2048_1_629245032.arc 12-Oct-2007 22:19:08 NO NO CURRENT

1 8257200902826 629245032 2049 8257767534297 8257767754044 12-Oct-2007 23:18:23 13-Oct-2007 00:11:05 /oraarch01/BRHBETA/LOG_2049_1_629245032.arc 12-Oct-2007 23:12:18 NO NO CURRENT

1 8257200902826 629245032 2050 8257767754044 8257767922751 13-Oct-2007 00:11:05 13-Oct-2007 01:11:05 /oraarch01/BRHBETA/LOG_2050_1_629245032.arc 13-Oct-2007 00:11:06 NO NO NO ... ... ... 1 8257200902826 629245032 2140 8257781397314 8257781562968 16-Oct-2007 07:41:15 16-Oct-2007 08:41:15 /oraarch01/BRHBETA/LOG_2140_1_629245032.arc 16-Oct-2007 07:41:16 NO NO NO

93 rows selected.

Page 54: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 54

Long Running Transaction redo log 2049 goes from

– SCN 8257767534297 to SCN 8257767754044

Check again -- Tue Oct 16 15:04:29 MST 2007

Compute Estimate– Tue Oct 16 08:09:55 -- Tue Oct 16 11:17:39– APPLIER has moved from 39247 to 39991– 3 hours --> roughly 750 SCNs, 250 per hour– it still needs to go from 539991 to 754044

over 200,000 SCNs -- at 250 per hour, – this would take 800 hours --> 33 days

APPLIER ORA-16113: applying change to table or sequence "BRH"."XXSUN_BRH_COMPS_INT" 8257767540857

Page 55: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 55

Long Running Transaction

Check again -- Tue Oct 16 15:04:29 MST 2007

APPLIER has moved– 39991 to 40857– in the last 4 hours, 866 SCNs, – roughly in line with 250/hr we computed earlier

APPLIER ORA-16113: applying change to table or sequence "BRH"."XXSUN_BRH_COMPS_INT" 8257767540857

Page 56: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 56

Long Running Transaction

This is truly awful science!– Assumes all SCNs take same amount of time

If processing takes more than a few minutes– Compute estimate– Confirm that it will take a long time– Compare with business requirements for standby

Must be in synch once per day

– Decide to skip table If table required, must wait or full refresh

Page 57: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 57

Long Running Transaction

Skip table

– SQL Apply Process restarts with redo 2048– Standby catches up quickly

ALTER DATABASE STOP LOGICAL STANDBY APPLY;

EXECUTE DBMS_LOGSTDBY.SKIP - (stmt => 'DML' , - schema_name => 'BRH' , - object_name => 'XXSUN_BRH_COMPS_INT', - proc_name => null);

ALTER DATABASE START LOGICAL STANDBY APPLY;

Page 58: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 58

Primary/Standby Interactions Logical standby backup starts

– Tablespaces put into backup mode Apply process applies redo logs from primary

– Contain transactions for primary backup– Tries to put tablespaces into backup mode

Apply process fails– Wait for standby backup to finish– Restart apply process

Disable standby backups when catching up– Apply process runs longer than normal

Page 59: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 59

Unique Constraint Violation

Oracle calls this– Oscillating updates

Oracle docs explain this (I can’t…)

– Or primary update really did fail And was rolled back on primary db Fails and rolls back in standby db

SQL apply process restarts– Automatically– No need to do anything

Page 60: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 60

Unique Constraint ViolationTue Oct 16 21:23:42 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2147_1_629245032.arc ... ... ... Tue Oct 16 21:24:31 2007 LOGSTDBY stmt: insert into "APPLSYS"."WF_LOCAL_ROLES" values “COL1" = 'Value1', “COL2" = 'Value2', “COL3" IS NULL, …LOGSTDBY status: ORA-00001: unique constraint (APPLSYS.WF_LOCAL_ROLES_U1) violated LOGSTDBY id: XID 0x0009.016.00011548, hSCN 0x0782.aa32b533, lSCN 0x0782.aa32b533, Thread 1, RBA… Tue Oct 16 21:25:20 2007 LOGMINER: End mining logfile: /oraarch01/BRHBETA/LOG_2147_1_629245032.arc

SQL Apply Process continues processing

Page 61: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 61

No Data Found

What does it mean?– When DataGuard updates standby

Brings update from primary Brings pre-update data from primary

– On standby, DataGuard compares Pre-update data from primary Current data on standby

– If they don’t agree DataGuard won’t apply the update on standby SQL apply process fails

Page 62: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 62

No Data Found

Wed Sep 19 12:09:23 2007 LOGSTDBY stmt: update "PO"."PO_LINE_LOCATIONS_ALL" ... ...SQL, values ... LOGSTDBY status: ORA-01403: no data found LOGSTDBY id: XID 0x0008.01e.0000c437, hSCN 0x0789.eacde6c1, lSCN 0x0789.eacde6c1… LOGSTDBY Apply process P007 pid=29 OS id=3447 stopped Wed Sep 19 12:09:23 2007 Errors in file /shared/orahome01/admin/BRHPRSB/bdump/brhprsb_lsp0_12386.trc: ORA-12801: error signaled in parallel query server P004 ORA-01403: no data found LOGSTDBY Analyzer process P003 pid=24 OS id=3439 stopped LOGSTDBY Apply process P006 pid=27 OS id=3445 stopped LOGSTDBY Apply process P005 pid=26 OS id=3443 stopped

Page 63: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 63

No Data Found What happened?

– For some reason– Table data not the same primary vs standby

How could this happen?– Logical standby is read-write

SYS can change anything at any time– SYS left guard status at NONE

Other db users can make changes in standby

How to fix?– Skip table– Refresh logical standby

Page 64: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 64

No Data Found

Logical Standby– No way to find out what happened– No utility to verify primary, standby in synch– Differences can exist for a long time

Won’t cause error until table updated on primary

Logical Standby for reporting?– Can you depend on this for your reports?– How do you know what is in the standby?– What has been skipped?

Page 65: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 65

Primary Schema Issues

Primary db– XDB schema reinstalled

Create java class (loads java class from filesystem)

Standby db– Transactions came through to standby– Standby doesn’t have java class files– Apply process fails– Identify and skip transaction(s)

Page 66: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 66

ORA-07445 Errors

SR opened– Results

Known bug fixed in 11g Apply patch on standby

– Impact None, no affect on standby

– Apply patch? No – refresh would wipe out patch Don’t want to patch primary db

Primary doesn’t have this error

Page 67: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 67

ORA-07445 ErrorsTue Oct 16 21:27:50 2007 Errors in file /orahome01/admin/BRHBETA/bdump/brhbeta_p004_6577.trc: ORA-07445: exception encountered: core dump [krvsmso()+1212] [SIGSEGV] [Address not mapped to object]… Tue Oct 16 21:29:06 2007 Errors in file /orahome01/admin/BRHBETA/bdump/brhbeta_lsp0_5041.trc: ORA-12805: parallel query server died unexpectedly Tue Oct 16 21:29:06 2007 TLCR process death detected. Shutting down TLCR logminer process death detected, exiting logical standby LOGSTDBY Analyzer process P003 pid=13 OS id=10051 stopped LOGSTDBY Apply process P005 pid=15 OS id=10060 stopped LOGSTDBY Apply process P006 pid=42 OS id=10062 stopped LOGSTDBY Apply process P007 pid=17 OS id=10064 stopped Tue Oct 16 21:29:06 2007 LOGSTDBY status: ORA-16222: automatic Logical Standby retry of last action LOGSTDBY status: ORA-16111: log mining and apply setting up

SQL Apply Process stops

SQL Apply Process automatically restarts

Logical Standby is not for the faint of heart!

Page 68: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 68

ORA-07445 Errors

Tue Oct 16 21:29:07 2007 LOGMINER: Parameters summary for session# = 1 LOGMINER: Number of processes = 3, Transaction Chunk Size = 201 LOGMINER: Memory Size = 30M, Checkpoint interval = 150M LOGMINER: session# = 1, builder process P001 started with pid=7 OS id=10018 LOGMINER: session# = 1, reader process P000 started with pid=34 OS id=10014 LOGMINER: session# = 1, preparer process P002 started with pid=36 OS id=10020 Tue Oct 16 21:29:10 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2147_1_629245032.arc Tue Oct 16 21:29:10 2007 LOGMINER: Turning ON Log Auto Delete LOGSTDBY Analyzer process P003 started with pid=13 OS id=10051 LOGSTDBY Apply process P006 started with pid=42 OS id=10062 LOGSTDBY Apply process P004 started with pid=30 OS id=10219 LOGSTDBY Apply process P005 started with pid=15 OS id=10060 LOGSTDBY Apply process P007 started with pid=17 OS id=10064

SQL Apply Process continues processing

Page 69: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 69

Refresh Process Export unique standby db objects

– Scripts to recreate Backup primary db

– Create standby control file Recover primary db backup on standby

– Use standby control file Create physical standby Convert to logical standby Import unique standby db objects

– Recreate with scripts

Page 70: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 70

Unsupported Record

ORA-16211– SQL apply process fails

Must skip table or refresh standby Oracle SR tells me to

– Add all column supplemental log group to table– Rebuild standby

Or reinstantiate the table– Needed for each table

Not an easy process

Page 71: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 71

Unsupported RecordThu Oct 11 10:11:58 2007 LOGMINER: Log Auto Delete - deleting: /oraarch01/BRHBETA/LOG_2005_1_629245032.arc Deleted file /oraarch01/BRHBETA/LOG_2005_1_629245032.arc Thu Oct 11 10:15:55 2007 ** LOGMINER WARNING - Invalidated 4 LCRs ** Thu Oct 11 10:20:29 2007 LOGSTDBY stmt: "BRH"."XXSUN_INV_ITEMS_INT": unsupported LOGSTDBY status: ORA-16211: unsupported record found in the archived redo log ORA-06512: at "SYS.DBMS_INTERNAL_LOGSTDBY", line 4717 ORA-06512: at line 1 LOGSTDBY id: XID 0x0009.02e.0001127d, hSCN 0x0782.a9016545, lSCN 0x0782.a9016545, Thread 1… LOGSTDBY Apply process P007 pid=23 OS id=16578 stopped Thu Oct 11 10:20:29 2007 Errors in file /orahome01/admin/BRHBETA/bdump/brhbeta_lsp0_13625.trc: ORA-12801: error signaled in parallel query server P007 ORA-16211: unsupported record found in the archived redo logLOGSTDBY Analyzer process P003 pid=19 OS id=16570 stopped LOGSTDBY Apply process P005 pid=21 OS id=16574 stopped LOGSTDBY Apply process P006 pid=36 OS id=16576 stopped LOGSTDBY Apply process P004 pid=34 OS id=16572 stopped

Page 72: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 72

Unsupported Record

What causes this?– Metalink 304061.1– Possible causes

Direct path insert on partitioned table Table has 500 columns

Is this a standby?– At any time this error may happen– How to predict/prevent?

Page 73: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 73

Compile Invalid Objects In Logical Standby

– Execute utlrp.sql– 2 hours go by– Not much changed

Disable Guard for session– Alter session disable guard

alter database guard standby;– Recompile runs in 2 minutes– Alter session enable guard

When normal things don’t work– Perhaps guard enabled is the problem

Guard level is the problem (all vs standby)

Page 74: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 74

Import Into Logical Standby

For recompile we used– Alter session disable guard;

Refresh Logical Standby– Unique db objects exported before refresh– Must be imported after refresh

Import doesn’t use SQL*Plus session– Alter database guard standby;

Page 75: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 75

Conclusion

Logical standby– Lots of errors

Many require refreshing standby Lots of DBA support needed

– For all of this support What do you have? Do you know what is in the standby

Reporting?

Page 76: Oracle DataGuard Support Issues

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 76

Conclusion

Physical standby– Is solid, dependable– No issues

Logical standby– Is it really a standby?– Is it ready for failover?– Is it providing complete data for reports?– Lots of issues– Is it worth the effort/risk?