Data Guard Deep Dive UKOUG 2012

21
1 Configuration Considerations 2 Performance Tuning 3 Role Transition Best Practices 4 Corruption Detection 5 Integration Issues Emre Baransel DBA, Employee ACE- Oracle Data Guard Deep Dive

Transcript of Data Guard Deep Dive UKOUG 2012

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

Emre Baransel – DBA, Employee ACE- Oracle

Data Guard Deep Dive

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

Choosing the Protection Mode

MODE REDO TRANSPORT

ACTION WITH NO STANDBY DATABASE CONNECTION

RISK OF DATA LOSS

Maximum Protection

SYNC & LGWR

The primary database has to write redo to at

least one standby database. Otherwise it will shut down

Zero data loss is guaranteed

Maximum Availability

SYNC & LGWR Normally works with SYNC. If primary

database cannot write redo to any of its standby databases, it continues as in

ASYNC mode

Zero data loss in normal operation, but not guaranteed

Maximum Performance

ASYNC & (LGWR or ARCH)

Never expects acknowledgment from standby database

Potential for minimal data loss in normal operation

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

Choosing the Protection Mode

• If there is network bandwidth and latency issue • use Maximum Performance • recommended because it has not any performance benefit with LGWR !!! ARCH is not but has less data protection in 11g

• When any data loss is not acceptable & service outage is preferred against any data loss

• make your network bandwidth high enough • and use Maximum Protection.

• If there is no intolerance about data loss

& have high bandwidth • use Maximum Availability

Required bandwidth (Mbps) = ((Max redo rate bytes per sec. / 0.7) * 8) / 1,000,000 If maximum redo generation rate is 500MB per minute which is 8738133 bytes per second, Then Required bandwidth = 100 Mbps * only for data guard * latency is important

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

SYNC Enhancement in 11g

Previously, primary database was first finishing writes to online redo log and then sending redo to standby database. There were two consecutive I/O operations that primary database needs to wait in order to complete the commit.

Standby Redo Log

Redo Log Buffer

Online Redo Log

Before 11g

Commit OK

ok

ok

Primary Standby

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

SYNC Enhancement in 11g

In 11g these two I/O operations run in parallel. Primary database does not wait finishing writes to online redo log and it sends the redo data to standby at the same time.

Standby Redo Log

Redo Log Buffer

Online Redo Log

In11g

Commit OK

ok

ok

Primary Standby

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

No More Delay to Decrease RTO

Prefer Real Time Apply with “Flashback On” rather than “Delay”. Delayed configuration

increases RTO

LOG_ARCHIVE_DEST_2='SERVICE=STANDBY LGWR ASYNC VALID_FOR= (ONLINE_LOGFILES, PRIMARY_ROLE) DB_UNIQUE_NAME=ORCLSTD DELAY=120

DB_RECOVERY_FILE_DEST=‘+FRA’; DB_RECOVERY_FILE_DEST_SIZE=500G; DB_FLASHBACK_RETENTION_TARGET=120; ALTER DATABASE FLASHBACK ON;

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION

USE REAL-TIME APPLY

TURN ON FLASHBACK

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

Using Flashback Database...

You can reinstate the original primary database as a new standby database following a failover

A failed switchover process can be reversed easily

Unwanted changes on Primary Database can be reversed and queried from Standby Database if flashback is not being used on primary.

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

Using Real Time Apply...

Prefer Real Time Apply to avoid ORA-01555 Snapshot Too Old errors on Active Data Guard standby databases.

Query fresh data from standby

RTO is decreased

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

11g Performance Improvements

11g Recovery performance improvements include:

• More parallelism by default • More efficient asynchronous redo read, parse, and apply • Fewer synchronization points in the parallel apply algorithm • The media recovery checkpoint at a redo log boundary no longer blocks

the apply of the next log

Active Data Guard 11g

Best Practices

Oracle Maximum Availability Architecture White Paper

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

Determining Redo Apply Rate

1. Method:

SQL> select * from v$recovery_progress

23-SEP-11 Media Recovery Active Apply Rate KB/sec 15564 0

23-SEP-11 Media Recovery Average Apply Rate KB/sec 20890 0

2. Method:

SQL> select APPLY_RATE from V$STANDBY_APPLY_SNAPSHOT;

APPLY_RATE

----------

16305

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

Determining Redo Apply Rate

SQL> SELECT PROCESS, SEQUENCE#, THREAD#, block#, BLOCKS, TO_CHAR(SYSDATE,

'DD-MON-YYYY HH:MI:SS') time from v$MANAGED_STANDBY WHERE PROCESS='MRP0';

PROCESS SEQUENCE# THREAD# BLOCK# BLOCKS TIME --------- ---------- ---------- ---------- ---------- -------------------- MRP0 276877 1 147338 4097947 19-APR-2011 12:25:34 PROCESS SEQUENCE# THREAD# BLOCK# BLOCKS TIME --------- ---------- ---------- ---------- ---------- -------------------- MRP0 276877 1 645542 4097947 19-APR-2011 12:25:39

SQL> SELECT lebsz LOG_BLOCK_SIZE from x$kccle; Redo block size (byte)

3. Method:

0. Second

5. Second

Media Recovery Rate: ((BLOCK#_END – BLOCK#_BEG) * LOG_BLOCK_SIZE)) / ((TIME_END – TIME_BEG) * 1024 * 1024)

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

Redo Apply Tuning

• By default recovery parallelism = CPU Count-1. Do not use any other values.

• Keep PARALLEL_EXECUTION_MESSAGE_SIZE >= 8192

• Keep DB_CACHE_SIZE >= Primary value

• Keep DB_BLOCK_CHECKING = FALSE (if you have to)

• System Resources Needs to be assessed

SQL> select a.sid, b.username, b.osuser, a.event, a.wait_time,

a.p1, a.p1text, a.seconds_in_wait from gv$session_wait a,

gv$session b where a.sid=b.sid and b.sid=(select SID from

v$session where PADDR=(select PADDR from v$bgprocess where

NAME='MRP0'));

Query what MRP process is waiting

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

Redo Transport Tuning

Also consider: 3 - Configuring TCP Send / Receive Buffer Sizes (RECV_BUF_SIZE / SEND_BUF_SIZE) 4 - Increasing SDU Size 5 - Setting TCP.NODELAY to YES

1 - Tune LOG_ARCHIVE_MAX_PROCESSES parameter on the primary.

• Specifies the parallelism of redo transport • Default value is 2 in 10g, 4 in 11g • Increase if there is high redo generation rate and/or multiple standbys • Must be increased up to 30 in some cases. • Significantly increases redo transport rate.

2 - Consider using Redo Transport Compression:

• In 11.2.0.2 redo transport compression can be always on • Use if network bandwidth is insufficient • and CPU power is available

Redo Transport Services Best

Practices Oracle® Database

High Availability Best Practices

11g Release 1

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

Switchover Best Practices

Set JOB_QUEUE_PROCESSES & AQ_TM_PROCESSES params to 0.

Use Real-Time Apply

Reduce LOG_ARCHIVE_MAX_PROCESSES to the minimum.

Properly set archiving destinations on the standby database.

Set LOG_ARCHIVE_TRACE=8191;

Enable Flashback Database or use Guaranteed Restore Points

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

Failover Best Practices

Enable Flashback Database

Use Real-Time Apply

Consider configuring multiple standby databases.

Consider using Fast-Start Failover Set FastStartFailoverThreshold

Set FastStartFailoverAutoReinstate

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

Corruption Detection Parameters

DB_BLOCK_CHECKSUM OFF

(FALSE) TYPICAL (TRUE)

FULL Physical

Corruption

DB_BLOCK_CHECKING OFF

(FALSE) LOW MEDIUM

FULL (TRUE)

Logical Corruption

Best Practices for Corruption Detection,

Prevention, and Automatic Repair - in a

Data Guard Configuration [ID 1302539.1]

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

Automatic Block Corruption Repair

‘Automatic Block Corruption Repair’

• 11gR2 feature

• ON with Physical Standby & Active Data Guard

• Corruptions are reparied automatically using the remote db.

Also using RMAN “RECOVER BLOCK” command you can repair the corruption. This operation will try use the standby database first. If you don’t want to use the standby database for corruption repair, you must use EXCLUDE STANDBY option in the “RECOVER BLOCK” command.

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

“Lost – Write” detection

“Lost – Write” detection

• 11gR1 feature

• A serious corruption which has its source in I/O subsystem.

• Physical Standby, Active Data Guard and Real-Time Apply is needed

• DB_LOST_WRITE_PROTECT = “TYPICAL” on both Primary and standby.

• When detected, standby recovery stops

• The way to get rid of this corruption is to failover to standby database.

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

RMAN Integration

And beginning with 11g, for “Block Change Tracking” feature of RMAN, which records the changed blocks for incremental backups, standby databases can be used. This requires Active Data Guard. There are important bugs of this feaure. Check bugs 9869287, 9068088, 10094823.

Integration Requirements and Best Practices

• Only Physical Standby can be used for interchangeable backups.

• RMAN Catalog must be used. (In a seperate location if possible)

• DB_UNIQUE_NAME must be different.

• General RMAN Best Practices must be preserved.

Data Guard Deep Dive

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues

Integration with Oracle Applications

• Directs write operations to primary

• All read operations to Active Data Duard

standby

• Applications developed with Oracle TopLink are

able to be configured as “Active Data Guard

aware”

• An ongoing study,

• Writes will work on primary and Reads on standby

• Automatic direction to primary in a case of lag

Configuring Oracle TopLink Applications with Oracle Active Data Guard Oracle Maximum Availability Architecture White Paper

Configuring Oracle BI EE Server with Oracle Active Data Guard Oracle Maximum Availability Architecture White Paper

Using Active Data Guard Reporting with Oracle E-Business Suite Release 12.1 and Oracle Database 11g [ID 1070491.1]

• Redirect Reports to Active Data Guard

• “fnd_adg_utility.enable_adg_support”