Oracle High Availability - A Case Study

50
Copyright © 2009 Rolta International, Inc., All Rights Reserved Oracle High Availability - A Case Study Rama Balaji Senior Oracle Consultant

description

Rama Balaji Senior Oracle Consultant. Oracle High Availability - A Case Study. Overview. This presentation is a case study of a customer whose storage array failed during heavy transaction processing period. Methods followed to quickly restore the database. - PowerPoint PPT Presentation

Transcript of Oracle High Availability - A Case Study

Page 1: Oracle High Availability - A Case Study

Copyright © 2009 Rolta International, Inc., All Rights Reserved

Oracle High Availability - A Case Study

Rama BalajiSenior Oracle

Consultant

Page 2: Oracle High Availability - A Case Study

• This presentation is a case study of a customer whose storage array failed during heavy transaction processing period.

• Methods followed to quickly restore the database.

• How the best practice configuration was arrived while maintaining at least two copies of production databases on-line.

• How Oracle’s HA components were effectively explored to achieve the final configuration with only few minutes of production downtime.

Overview

Page 3: Oracle High Availability - A Case Study

Agenda• Background• Approach• Initial Configuration• Disaster to Full recovery• High Availability Features • Conclusion and best practices• Question and Answer

Page 4: Oracle High Availability - A Case Study

• RMAN compressed backups• Physical Standby using Data Guard• ASM to non-ASM and vice versa• Cascaded Destinations Standby• 2 node RAC • Failover and Switchover • Flash Recovery Area and Flash Back Logs

Oracle Features Used

Page 5: Oracle High Availability - A Case Study

Background• E-commerce client• Nature of business• Initial cost effective business approach

• Business growth• IT infrastructure

Page 6: Oracle High Availability - A Case Study

Approach

1. Storage array failure 2. Restored database

from tape using RMAN 3. Set up the Physical

Standby 4. Second Physical

Standby using cascaded destination feature

5. Failover from production to standby

Event Database Availability

No Database

One

Two

Three

Two

Page 7: Oracle High Availability - A Case Study

Approach

6. 2-Node RAC with ASM on the new storage array as standby

7.Switched over from single node Db instance to 2 Node RAC as primary

8.Disconnected the second standby

9. 2 node RAC standby to 2 node RAC primary

10. Flashback Logs cleanup from FRA

Event Database Availability

Three

Three

Two

Two

No Downtime

Page 8: Oracle High Availability - A Case Study

Storage Array

2 node RAC + ASM

Initial configuration• Server - Linux Red Hat

x86_64• Database – Oracle 10.2.0.3• ASM – two diskgroups DATA_DG – Database FLASH_DG –

flash_recovery_area• Daily full database backup to

disk- RMAN, and to tape using third party software

• Hourly archivelog backups to disk – RMAN, and to tape using third party software

• No RMAN catalog

Page 9: Oracle High Availability - A Case Study

RMAN• RMAN> show controlfile autobackup; RMAN configuration parameters are: CONFIGURE CONTROLFILE AUTOBACKUP ON;

• RMAN> show controlfile autobackup format; RMAN configuration parameters are: CONFIGURE CONTROLFILE AUTOBACKUP

FORMAT FOR DEVICE TYPE DISK TO '/opt/oracle/admin/PROD/backup/cntl/%F';

Page 10: Oracle High Availability - A Case Study

• RUN { CONFIGURE RETENTION POLICY TO

REDUNDANCY 1; ALLOCATE CHANNEL ch1 DEVICE TYPE DISK

MAXPIECESIZE 2G FORMAT '/opt/oracle/admin/PROD/backup/db/RCOMPRESSED_%U';

BACKUP as compressed backupset DATABASE plus archivelog channel ch1;

}

RMAN Backup script

Page 11: Oracle High Availability - A Case Study

• RUN { ALLOCATE CHANNEL ch1 DEVICE TYPE DISK

FORMAT '/opt/oracle/admin/PROD/backup/archive/ARCH_%U';

BACKUP as compressed backupset archivelog all;

}

RMAN Archivelog backup script

Page 12: Oracle High Availability - A Case Study

Storage Array Failure

Storage Array

2 node RAC + ASM

• Storage admin determined that the SAN failure has caused loss of entire data including the database and backups.

• Validated that the RMAN backup on tape from previous night as well as archivelog files from previous hour were intact.

Page 13: Oracle High Availability - A Case Study

Restore using RMAN

Storage Array

2 node RAC + ASM Single DB Non- ASM

• Restored the tape backup

and all archivelog file backups from tape to internal disks on one of the RAC node

Page 14: Oracle High Availability - A Case Study

Restore from ASM to non-ASM• Restored the backup from tape to the original

backup location.

• Restored spfile from autobackup. RMAN> restore spfile from autobackup;

• Created pfile from spfile;

Page 15: Oracle High Availability - A Case Study

• Edited the following parameters in pfile control_files - Changed +DATA_DG

and +FLASH_DG to file system db_file_name_convert - Changed

+DATA_DG to file system log_file_name_convert - Changed

+DATA_DG and +FLASH_DG to file system

• startup nomount using pfile;

• Using RMAN nocatalog, restore controlfile from autobackup.

RMAN> restore controlfile from ‘/opt/oracle/admin/PROD/backup/cntl/c-662196802-20081230-0e’

Restore from ASM to non-ASM (Continued…)

Page 16: Oracle High Availability - A Case Study

• alter database mount;

• restore database;

• recover database;

• Created temporary tablespaces

• alter database open;

Restore from ASM to non-ASM (Continued…)

Page 17: Oracle High Availability - A Case Study

Physical Standby

Standby Database

Primary Database

PROD PRODS

Page 18: Oracle High Availability - A Case Study

Physical Standby Steps• On the primary database (PROD) generated

pfile. Create pfile=’/tmp/initPROD.ora’ from

spfile; and copy that file to the standby server.

• On the primary (PROD) backed up the current controlfile using RMAN.

RMAN> backup current controlfile for standby;

• RMAN> copy current controlfile for standby to '/tmp/sby_control01.ctl';

Page 19: Oracle High Availability - A Case Study

• Run a full RMAN backup on the primary (PROD). RMAN> RUN { ALLOCATE CHANNEL ch1 DEVICE TYPE

DISK MAXPIECESIZE 2G FORMAT '/home/oracle/backup_standby/RCOMPRESSED_%U';

BACKUP as compressed backupset DATABASE plus archivelog channel ch1;

}• Copied all backup files as well as controlfile

backup to the standby (PRODS). Backup files needs to be in the same location as the primary backup location or create a symbolic link.

Physical Standby Steps (Continued…)

Page 20: Oracle High Availability - A Case Study

• Edited the parameter file on the standby (PRODS).

Changed control_files parameter to point to the controlfile on the standby server.

Changed db_file_name_convert to use the new location on the standby server.

Changed log_file_name_convert to use the new location on the standby server.

• Start up the standby (PRODS)instance in

nomount

Physical Standby Steps (Continued…)

Page 21: Oracle High Availability - A Case Study

• rman nocatalog RMAN> connect target

username/password@PROD RMAN> connect auxiliary / RMAN> RUN { ALLOCATE auxiliary CHANNEL ch1 DEVICE

TYPE DISK FORMAT '/home/oracle/backup_standby/RCOMPRESSED_%U';

duplicate target database for standby; } • Changed the following parameter on the

primary (PROD). SQL> ALTER SYSTEM SET

log_archive_dest_2= 'SERVICE=PRODS_XPT REOPEN=300'

Physical Standby Steps (Continued…)

Page 22: Oracle High Availability - A Case Study

• Changed the standby (PRODS) to Managed Recovery

alter database recover managed standby database disconnect from session;

• Issued the following command to make sure the archive log files are applied on the standby (PRODS)

Select sequence#, applied from v$archived_log order by sequence#;

Physical Standby Steps (Continued…)

Page 23: Oracle High Availability - A Case Study

Data Guard Cascaded Destinations

LOG_ARCHIVE_DEST_2='SERVICE=PRODS_XPT REOPEN=300';

Primary Database

Standby Database 1Data Guard

Data Guard Cascaded Destination

Standby Database 2

LOG_ARCHIVE_DEST_2='SERVICE=PRODC_XPT REOPEN=300';

PROD PRODS PRODC

Page 24: Oracle High Availability - A Case Study

Data Guard Failover

Primary Database

Data Guard Failover

Primary Database Data

Guard

Standby Database

PROD PRODS PRODC

Page 25: Oracle High Availability - A Case Study

Data Guard Failover Steps• On the primary (PROD)I did the

following. After users were logged out, created

a table called TEST. Create table test as select * from

dba_users; This was a last operation on the

primary database.

• On the primary (PROD) issued the following command couple of times.

Alter system switch logfile;

Page 26: Oracle High Availability - A Case Study

• On the primary (PROD), noted down the sequence#

SQL> archive log list Database log mode Archive Mode Automatic archival Enabled Archive destination /d01/oracle/flashback Oldest online log sequence 6503 Next log sequence to archive 6504 Current log sequence 6504• On the standby (PRODS) issued the following

command to make sure standby is in maximum performance mode.

SQL> alter database set standby database to maximize performance;

Data Guard Failover Steps (Continued…)

Page 27: Oracle High Availability - A Case Study

• On the standby(PRODS), issued the following command

SQL> select thread#, low_sequence#, high_sequence# from v$archive_gap;

no rows selected

• On the standby (PRODS), check the last sequence#,

SQL> select max(sequence#) from v$archived_log;

MAX(SEQUENCE#) ------------- 6504

Data Guard Failover Steps (Continued…)

Page 28: Oracle High Availability - A Case Study

• On the standby (PRODS), SQL> alter database recover managed

standby database finish force; Database altered On the standby (PRODS), SQL> alter database commit to switchover

to primary; Database altered • On the standby instance (PRODS), SQL> shutdown immediate ORA-01507: database not mounted ORACLE instance shut down.

Data Guard Failover Steps (Continued…)

Page 29: Oracle High Availability - A Case Study

• SQL> startup ORACLE instance started. Total System Global Area 2147483648

bytes Fixed Size 2074152 bytes Variable Size 1056967128 bytes Database Buffers 1073741824 bytes Redo Buffers 14700544 bytes Database mounted. Database opened. At this point standby is opened as primary.

Data Guard Failover Steps (Continued…)

Page 30: Oracle High Availability - A Case Study

• On the current primary (PRODS), issued the following query to make sure the table was brought over.

Select * from test; then drop the table.

• On the second standby (PRODC) instance verified the cascaded standby destination by querying v$archived_log.

• I had to bounce the cascaded standby instance (PRODC), and put it back in a recovery mode.

Data Guard Failover Steps (Continued…)

Page 31: Oracle High Availability - A Case Study

2-Node RAC as Standby

PROD1

Storage Array

2 node RAC + ASMStandby Database 2

Primary Database Data Guard

Standby Database 1

PRODS PRODC

PROD2

Page 32: Oracle High Availability - A Case Study

Data Guard Switchover Steps• Make sure only one instance on the RAC

system is mounted(PROD1).

• Make sure the FAL_SERVER and FAL_CLIENT parameters are set correctly on the primary as well as standby.

On the primary FAL_SERVER=standby instance FAL_CLIENT=primary instance

On the Physical Standby FAL_SERVER=primary instance FAL_CLIENT=standby instance

Page 33: Oracle High Availability - A Case Study

• On the primary (PRODS) issued the following SQL> select switchover_status from v$database; SWITCHOVER_STATUS -------------------------------- TO STANDBY

• On the standby (PROD1) instance issued the following command

SQL> select switchover_status from v$database; SWITCHOVER_STATUS -------------------------------- TO PRIMARY

Data Guard Switchover Steps (Continued…)

Page 34: Oracle High Availability - A Case Study

• On the primary (PRODS) issued the following command

SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY;

• SQL> ALTER SYSTEM SET log_archive_dest_state_2='DEFER';

• Shutdown the primary instance (PRODS) SQL> shutdown immediate; SQL> Startup nomount; SQL> alter database mount standby

database;

Data Guard Switchover Steps (Continued…)

Page 35: Oracle High Availability - A Case Study

• On the standby (PROD1) issued the following command

SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;

• On the standby (PROD1) SQL>shutdown immediate; SQL> startup open;

• SQL>alter system switch logfile;

• Make sure other instance(PROD2) comes up as well.

Data Guard Switchover Steps (Continued…)

Page 36: Oracle High Availability - A Case Study

Data Guard Switchover

Storage Array

2 node RAC + ASMPrimary Database 1

Standby Database 1 Data

Guard

PROD PRODS

Standby Database 2

PRODC

Page 37: Oracle High Availability - A Case Study

Storage Array

2 node RAC + ASMPrimary Database

Storage Array

2 node RAC + ASMStandby Database

Final Configuration

Page 38: Oracle High Availability - A Case Study

Real Application Clusters• RAC misconception.• Always register the database and instances to the cluster.

• You have to use “netca” to register the listener to crs in 10g.

• Srvctl is “case sensitive”. • Always use single parameter file for multiple RAC nodes.

• Client side load balancing.• Server side load balancing.

Page 39: Oracle High Availability - A Case Study

ASM Configuration

LUN 1 LUN 2

LUN 3 LUN 4

LUN 5 LUN 6

LUN 7 LUN 8

Storage Group 1 Storage Group 2

DATA FLASH

ASM Disk Groups

Data FilesControl FilesOnline log files

Archive log filesRMAN backupsMirrored cntl and logFlashback Logs

ASM disks

Page 40: Oracle High Availability - A Case Study

Flash Recovery Area• Show parameter db_recovery NAME TYPE VALUE------------------------------------ -----------

--------------------------- db_recovery_file_dest string +FLASH_DG db_recovery_file_dest_size big integer 190000M

• SQL> select space_used/(1024*1024*1024),space_limit/(1024*1024*1024) from v$recovery_file_dest;

SPACE_USED (in GB) SPACE_LIMIT (in GB) --------------------------- ---------------------------- 55.6249833 185.546875

Page 41: Oracle High Availability - A Case Study

• SQL> select * from v$flash_recovery_area_usage;

CONTROLFILE 0 0 0 ONLINELOG 0 0 0 ARCHIVELOG 18.17 6.03 4691 BACKUPPIECE 0 0 0 IMAGECOPY 0 0 0 FLASHBACKLOG 11.81 11.73 1077

• ORA-19815: WARNING: db_recovery_file_dest_size of 53687091200 bytes is 87.63% used, and has 6642196992 remaining bytes available

Flash Recovery Area

Page 42: Oracle High Availability - A Case Study

• SQL> alter system set db_recovery_file_dest_size=55G scope=memory;

• SQL> select * from v$flash_recovery_area_usage;

CONTROLFILE 0 0 0ONLINELOG 0 0 0ARCHIVELOG 61.3 20.34 4692BACKUPPIECE 0 0 0IMAGECOPY 0 0 0FLASHBACKLOG 38.7 38.44 1046

Flash Recovery Area

Page 43: Oracle High Availability - A Case Study

• Issue the following RMAN commands RMAN> crosscheck archivelog all; RMAN> delete expired archivelog all;

• After “delete expired” command CONTROLFILE 0 0 0 ONLINELOG 0 0 0 ARCHIVELOG 3.83 0 1274 BACKUPPIECE 0 0 0 IMAGECOPY 0 0 0 FLASHBACKLOG 51.23 50.93 1007

Cleanup of Archivelog files from Flash Recovery Area

Page 44: Oracle High Availability - A Case Study

Flashback Logs• SQL> show parameter db_flashback NAME TYPE

VALUE ------------------------------------

------------------------------------------ ------------------ db_flashback_retention_target integer

1440

• SQL> show parameter log_archive_min_succeed_dest;

NAME TYPE VALUE ------------------------------------ -----------

------------------------------ log_archive_min_succeed_dest integer

1

Page 45: Oracle High Availability - A Case Study

How to cleanup Flashback Logs from Flash Recovery Area• SQL> alter system set

log_archive_dest_1='LOCATION=/home/oracle/temp_archivelog' scope=memory;

• SQL> alter system set log_archive_dest_state_10=defer scope=memory;

• SQL> alter system set db_recovery_file_dest_size=55G scope=memory;

• Check the alert log for “deleted Oracle managed files” messages.

Page 46: Oracle High Availability - A Case Study

How to cleanup Flashback Logs from Flash Recovery Area• SQL> select * from v$flash_recovery_area_usage;

CONTROLFILE 0 0 0ONLINELOG 0 0 0ARCHIVELOG 92.19 0 1274BACKUPPIECE 0 0 0IMAGECOPY 0 0 0FLASHBACKLOG 7.3 0 6

• SQL> alter system set db_recovery_file_dest_size=190G scope=memory;

Page 47: Oracle High Availability - A Case Study

How to cleanup Flashback Logs from Flash Recovery Area

• SQL> alter system set log_archive_dest_1='' scope=memory;

• SQL> alter system set log_archive_dest_state_10=enable scope=memory

Page 48: Oracle High Availability - A Case Study

Conclusion and Best Practices• All configuration changes were performed

without any significant downtime outages to the production or standby databases.

• Various Oracle technologies, including RMAN, RAC, ASM, and Data Guard, were successfully utilized together to achieve the final high-availability solution.

• An awareness and understanding of the technologies available from Oracle, together with an innovative approach to implementation, were critical in building the environment while complying with the customer’s business needs.

Page 49: Oracle High Availability - A Case Study

Question Answers

Contact InformationRama Balaji

[email protected](303) 985-2213www.tusc.com

Page 50: Oracle High Availability - A Case Study

World-wide Team of IT Professionals

Contact InformationRama Balaji

[email protected](303) 985-2213www.tusc.com

• Core Services– Oracle E-Business Suite – Enterprise Performance

Management (EPM)– Oracle DBA, Database,

and Infrastructure – Business Intelligence and

Data Warehousing– Managed Services

– remote support & hosting

– Oracle Fusion Middleware – Oracle Hyperion– Software– Training