ASM23

96
http://dba-expert.blogspot.in/search?updated-min=2009-01- 01T00:00:00%2B04:00&updated-max=2010-01-01T00:00:00%2B04:00&max- results=15 ASM – Automatic Storage Management Files Systems: Disadvantages of Raw Devices: 1. It supports storage of only 1 file in only 1 raw device. Hence archive redo logs files & flashback logs which are generated numerously are not suitable member for raw devices. 2. General O/S commands like cp, ls, mv, du etc will not work in Raw Devices. 3. Only dd (diskdump) is used for format, backup, restore the raw devices 4. Raw devices will not support collection of I/O statistics 5. They cannot be resize online 6. In Linux environment, out of 15 partitions we can use only 14 for creation of raw devices, In Solaris we can use only 6 out of 7 partitions per a disk

description

asm

Transcript of ASM23

http://dba-expert.blogspot.in/search?updated-min=2009-01-01T00:00:00%2B04:00&updated-max=2010-01-01T00:00:00%2B04:00&max-results=15

ASM Automatic Storage Management

Files Systems:

Disadvantages of Raw Devices:1.It supports storage of only 1 file in only 1 raw device. Hence archive redo logs files & flashback logs which are generated numerously are not suitable member for raw devices.2.General O/S commands like cp, ls, mv, du etc will not work in Raw Devices.3.Only dd (diskdump) is used for format, backup, restore the raw devices4.Raw devices will not support collection of I/O statistics5.They cannot be resize online6.In Linux environment, out of 15 partitions we can use only 14 for creation of raw devices, In Solaris we can use only 6 out of 7 partitions per a disk

To overcome all the disadvantages we use LVM (Logical volume manager)`

1.It is a logical storage area which is created by collection of multiple disk partitions onto which we can create any type of file system.2.It supports storage of multiple file in a single volume.3.Online resizing is possible4.Supports collection of I/O statistics5.It improves the I/O performance & availability with the help of software level RAID techniques*.

Types of LVMs & Vendors:LVMsVendors

1.VERITAS Volume Manager2.Tivole Volume Manager3.Sun Volume Manager (SVM)4.ASM(from oracle 10g)SymantecIBMOracle SUNOracle

ASM:- It is a type of LVM supported from oracle 10g and has a special type of instance. INSTANCE_TYPE=ASM. & has a small footprint of SGA with size 100-128 MB.-It supports for creation of logical volume known as disk groups, internally uses both strIPing and mirroring.-Hence it does not have any control file to mount, so its least and last stage is nomount. It has to mount the diskgroups-Diskgroup is a logical storage area which is created by collection of multiple disk partitions.- ASM supports storage of multiple database related files like control files, redo, data, archive logs, flashback logs, RMAN backup pieces, spfile etc. but it will not support the storage of static files like pfile, listener.ora. tnsnames.ora sqlnet.ora etc.- From 11.2 onwards, by using ADVM (ASM dynamic volume manager) & ACFS (ASM cluster file system) we can store static files also.

Note: Sometimes ASM instance may contain large pool also.-1 ASM instance will support creation of multiple disk groups and will provide services to multiple clients.

ASM Clients:These are general DB instances which are dependent on ASM instance in order to access the diskgroups.

ASM Instance Background processes:

RBAL Rebalance Master:It is responsible for managing and coordinating the disk group activities and also responsible for generating the plans for even distribution of ASM instance (extends) for better load balancing whenever a new disk is added and removed.

ARBn ASM Rebalancer:It is a slave process of RBAL background process and it is responsible for actual load balancing of ASM Disks.

ASMB ASM Background:It is responsible for successful establishment of communication channel between ASM instance & ASM clients.

GMON Global Monitor:It is responsible for coordinating the disk group activities whenever a disk group becomes offline or drop.

KATE Konductor for ASM Temporary Errand:It is responsible for making online for disk groups.

ASM client Background Processes:

RBAL Rebalance Master:It is responsible for successful opening and closing the diskgroups whenever a read or write operations occur.

PZ9X:It is responsible for gathering the dynamic views information globally across all the instances of database.

ASM related dynamic views:In RAC environment, all the dynamic views start with gv$ , in non-RAC g$1.gv$ asm_disk2.gv$ asm_diskgroups3.gv$ asm_io_stat4.gv$ asm_clients5.gv$ asm_template (total 19 views)

ASM in RAC Environment:

Working on Datapump ExportCreate a directory at O/S Level

$ mkdirdump_dir (for dump directory)$ mkdir log_dir (for log directory)$ chmod 775 dump_dir log_dirConnect sqlplus and execute:Sql> Create directory datapump_dir as /u01/dump_dir;Sql> Create directory datapump_log as /u01/log_dir;Sql> grant read,write on directory datapump_dir to public; # to take expdp for any schemaSql> grant read,write on directory datapump_log to public;$ more expdp.sh#!/bin/kshexport ORACLE_HOME="/u01/app/oracle/product/10.2.0"export ORACLE_SID="abc"echo export started at `date` >> /u05/abc/export/dailyexpdp_abc.log$ORACLE_HOME/bin/expdp system/password dumpfile=datapump_dir:abc-expdp-`date '+%Y%m%d'`.dmp logfile=datapump_log:abc-expdp-`date '+%Y%m%d'`.log schemas=aet3echo export stopped at `date` >> /u05/abc/export/dailyexpdp_abc.logecho tape archiving started at `date` >> /u05/abc/export/dailyexpdp_abc.logtar Ecvf /dev/rmt/0 /u05/abc/exportecho tape archiving stopped at `date` >> /u05/abc/export/dailyexpdp_abc.log

$ crontab -l5023**0,1,2,3,4,5,6/u06/abc/scripts/expdp.shIt will generate dumpfile datewise.Enable CrontabGiving Crontab permission to new os user

Login as rootGoto /etc/cron.d directoryAdd the user entry (ex. ota ) in cron.allow fileNote: if cron.allow files doesn't exists then create it

Login as o/s userSet the editor in .profile or .bash_profileExample: EDITOR=vi;export EDITORNow you can schedule cron jobs.

To setup cronjobcrontab -l( list current jobs in cron)crontab -e( edit current jobs in cron)_1_ _2_ _3_ _4_ _5_ executable_or_jobWhere1 Minutes (0-59)2 Hours ( 0-24)3 day of month ( 1- 31 )4 Month ( 1-12)5 A day of week ( 0- 6 ) 0 -> sunday 1-> mondaye.g. 0 3 * * 6 Means run job at 3AM every saturdayThis is useful for scheduling tablespace threshold, ftp, rman backup or removed old log files, or other scripts regularly.Sample Scheduled backup:$ crontab l

OTA Database:

5023**0,2,3,6/u01/ota/dailyexp_ota.sh

5023**1,4/u01/ota/offbkup_ota.sh

1514**0,1,2,3,4,6/u01/ota/morning_arch.sh

Upgrade Oracle from 10.2.0.1 To 10.2.0.4 on Linux x86 AS4Upgrade Oracle from 10.2.0.1 To 10.2.0.4 on Linux x86 AS4Screen shots below attached for production upgrade on solaris 64-BITDownload6810189[p6810189_10204_Linux-x86]$ unzip p6810189_10204_Linux-x86.zipShut down all the Databases / listener / services / enterprise mangerBackup your databaseStart Patching:cdpatchset_directory/Disk1./runInstallerOUI starts and patch gets installed; When prompted, run the$ORACLE_HOME/root.shscript as therootuser.Upgrading a Release 10.2 Database using Oracle Database Upgrade AssistantAfter you install the patch set, you must perform the following steps on every database:If you do not run the Oracle Database Upgrade Assistant then the following errors are displayed:ORA-01092: ORACLE instance terminated.ORA-39700: database must be opened with UPGRADE option.

Start thelisteneras follows:$ lsnrctl startRun Oracle Database Upgrade Assistant$ dbuaComplete the following steps displayed in the Oracle DBUAOn the Welcome screen, clickNext.On the Databases screen, select the name of the Oracle Database that you want to update, then clickNext.On the Recompile Invalid Objects screen, select theRecompile the invalid objects at the end of upgradeoption, then clicknext.If you have not taken the back up of the database earlier, on theBackupscreen, select theI would like to take this tool to backup the databaseoptionOn the Summary screen, check the summary, and then clickFinish.

Upgrade the Database Manually

After you install the patch set, you must perform the following steps on every database associated with the upgraded Oracle home:

Start ListenerConnect as sys user

Sql> Startup Upgrade

Sql> Spool Patch.Log

Sql> @ORACLE_BASE\ORACLE_HOME\rdbms\admin\catupgrd.sql

Sql> Spool Off

Review the patch.log file for errors and inspect the list of components that is displayed at the end of catupgrd.sql script.

This list provides the version and status of each SERVER component in the database.If necessary, rerun the catupgrd.sql script after correcting any problems.

4. Restart the database:Sql> ShutdownSql> Startup

5. Compile Invalid Objects

Run the utlrp.sql script to recompile all invalid PL/SQL packages now instead of when the packages are accessed for the first time. This step is optional but recommended.SQL> @ORACLE_BASE\ORACLE_HOME\rdbms\admin\utlrp.sql

SQL> select * from v$version;

BANNER----------------------------------------------------------------Oracle Database 10g Release 10.2.0.4.0 - ProductionPL/SQL Release 10.2.0.4.0 - ProductionCORE 10.2.0.4.0 ProductionTNS for 32-bit Windows: Version 10.2.0.4.0 - ProductionNLSRTL Version 10.2.0.4.0 - Production

----Screenshots----

LOG APPLY SERVICES (LAS)LOG APPLY SERVICE:i.Applying redo immediatelyii.Time delay for redo applyApplying redo data to Physical Standby Database1. start redo apply2. stop redo apply3. monitor redo applyApplying redo data to Logical Standby Databaseiii.start sql applyiv.stop sql applyv.monitor sql applyLOG APPLY SERVICES (LAS): Process is Automatic1. Redo Apply (Physical Standby Database only)Uses Media Recovery to keep Primary Database & Standby Database synchronized.Kept in mounted state & can be open for reporting.2. SQL Apply (Logical Standby Database only)Reconstructs the SQL statements form redo data received from Primary Database & applies it to Logical Standby Database.Can be opened in R/W mode.Redo Transport Service Process on the Standby Database receives the redo data and applies it to standby redolog files or archived redolog files.RFS -Redo file server processMRP- Managed recovery process (performs recovery i.e... starts apply redo data)LSP - Logical Standby Process.

FIG 6-1:Oracle Dataguard (B14239-04)1. Applying Redo Data Immediately: (Real-Time Apply)In this process the redo data is applied immediately as it is received without waiting for the current standby redolog file to be archivedEnabling real-Time Apply for Physical Standby DatabaseALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILEEnabling real-Time Apply for Logical Standby DatabaseALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE2. Specifying a time delay for applying Redologs:Paramter used:log_archive_dest_nAttribute:delaySystem:Delay in minutesDefault Value:30 minsDelay is used to protect the corrupted data getting applied to Standby Database.Delay time starts after redo is received and completely archivedIf real time is applied & delay is specified then delay is ignoredCancel delay using nodelayEx:Physical Standby DatabaseSQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE NODELAY;Logical Standby DatabaseSQL> ALTER DATABASE START LOGICAL STANDBY APPLY NODELAY;Alternate option for delaying:Using flash back on standby database.Applying Redo data to Physical Standby Database:By default redo is always applied from archived redo logs of standby db.In case of real-time apply; redo applied directly from standby redo log files before they are archived.Redo data cannot be applied if Physical Standby Database is open in read only mode. There fore start the Physical Standby Database and keep it in mounted state to apply the redo.Apply redo is foreground process ( control is not returned)Sql> Alter database recover managed standby database;Applying redo as background process(control is returned)Sql> Alter database recover managed standby database disconnect;Using Real-time Apply:Sql> Alter database recover managed standby database using current logfile;Cancel Real-time Apply:Sql> Alter database recover managed standby database cancel;Monitoring:Use OEM for monitoring log apply services.Applying redo data to Logical Standby Database:SQL Apply converts the data from archived redo log files or standby redolog ifles on Logical Standby Database into sql statements and then these sql statement are applied to Logical Standby Database.Logical Standby Database always remain open as sql statements has to be executed.Used for reporting, summations and quering purpose.Starting sql applySQL> ALTER DATABASE START LOGICAL STANDBY APPLY;Realtime:SQL> ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;Stopping:SQL> ALTER DATABASE STOP LOGICAL STANDBY APPLY;Note: This command is delayed as sql apply will wait to apply all the commited transactions.For stopping immediately usedSQL> ALTER DATABASE ABORT LOGICAL STANDBY APPLY;Monitoring:Use OEM for monitoring log apply servies.REDO TRANSPORT SERVICES (RTS)Redo Transport Service:Automates the transfer of redo to one or more destinationsResolve gaps in Redo Transport in case of network failures

FIG 5-1:Oracle Dataguard (B14239-04)Destinations types for Redo Transport Service:Oracle Data Guard Standby DatabaseArchived redo log repositoryOracle Streams real-time downstream capture databaseOracle Change Data Capture staging databaseLOG_ARCHIVE_DEST_n parameter:Number of destinations maximum (10)Number of StandbyDatabase configured max (10)Attributes:Location= specifies the local destinationsService= specifies remote destinationsLOG_ARCHIVE_DEST_nis used along with LOG_ARCHIVE_DEST_STATE_nparameter.Attributes: of LOG_ARCHIVE_DEST_STATE_n parameterENABLE:Redo transport services can transmit redo data to this destination. This is the default.

DEFER:This is a valid but unused destination (Redo transport services will not transmit redo data to this destination.)

ALTERNATE:This destination is not enabled, but it will become enabled if communication to its associated destination fails.

RESET:Functions the same as DEFER

Example 51 Specifying a Local Archiving DestinationLOG_ARCHIVE_DEST_1=LOCATION=/arch1/chicago/LOG_ARCHIVE_DEST_STATE_1=ENABLEExample 52 Specifying a Remote Archiving DestinationLOG_ARCHIVE_DEST_1=LOCATION=/arch1/chicago/LOG_ARCHIVE_DEST_STATE_1=ENABLELOG_ARCHIVE_DEST_2=SERVICE=bostonLOG_ARCHIVE_DEST_STATE_2=ENABLEWe can change the destination attributesSQL> ALTER SYSTEM SET LOG_ARCHIVE_DEST_2=SERVICE=bostonVALID_FOR= (ONLINE_LOGFILES,PRIMARY_ROLE);SQL> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2=DEFER;(This commands defers the Redo Transport Service)The modifications take effect after the next log switch on the primary database.Parameter for configuring Flash Recovery Area isDB_RECOVERY_FILE_DEST = /If no destinations for local archiving are specified then LOG_ARCHIVE_DEST_10 is implicitly mapped to DB_RECOVERY_FILE_DEST location by Oracle Data Guard.A Primary Database cannot write the redo data to the Flash Recovery Area of Logical Standby DatabaseNote: Flash Recovery Area is the directory to stores the files related to recovery.To configure Flash Recovery Area to any other destinations other then LOG_ARCHIVE_DEST_10 useLOG_ARCHIVE_DEST_9=LOATION=USE_ DB_RECOVERY_FILE_DEST ARCH MANDATORY REOPEN 5Specifying Flash Recovery Area as Physical Standby DatabaseSTANDBY_ARCHIVE_DEST = LOCATION=USE_ DB_RECOVERY_FILE_DESTSharing a Flash Recovery Area between Physical Standby Database and Primary Database DB_UNIQUE_NAME should be specified to each database and it should have a unique name.Example 53 Primary Database Initialization Parameters for a Shared Recovery AreaDB_NAME=PAYROLLLOG_ARCHIVE_DEST_1=LOCATION=USE_DB_RECOVERY_FILE_DESTDB_RECOVERY_FILE_DEST=/arch/oradataDB_RECOVERY_FILE_DEST_SIZE=20GExample 54 Standby Database Initialization Parameters for a Shared Recovery AreaDB_NAME=PAYROLLDB_UNIQUE_NAME=bostonLOG_ARCHIVE_DEST_1=LOCATION=USE_DB_RECOVERY_FILE_DESTSTANDBY_ARCHIVE_DEST=LOCATION=USE_DB_RECOVERY_FILE_DESTDB_RECOVERY_FILE_DEST=/arch/oradataDB_RECOVERY_FILE_DEST_SIZE=5GSending Redo:Redo can be transmitted by archiver process (Arcn) and log writer process (lgwr). But both cannot be used for the same destinations i.e... arcn can send redo to one destinations and lgwr to other.Using ARCn to send redoDefault method & 4 processes are used by defaultSupports only Maximum Performance level of data protectionSpecify LOCATION attribute for local archiving and SERVICE attribute for remote archiving.EX:LOG_ARCHIVE_DEST_1=LOCATION=/arch1/chicago/LOG_ARCHIVE_DEST_2=SERVICE=bostonAnother parameterLOG_ARCHIVE_MAX_PROCESSES(Dynamic parameter; Maximum is 30 process)Archival Processing:

FIG 5-3:Oracle Dataguard (B14239-04)Note:use v$archive_log to verify the redo data is received on Standby DatabaseMinimum 2 Arch Process are required default is 4 & maximum is 30RFS: On the remote destination, the remote file server process (RFS) will, in turn, write the redo data to an archived redo log file from a standby redo log file. Log apply services use Redo Apply (MRP process1) or SQL Apply (LSP process2) to apply the redo to the standby database.MRP: The managed recovery process applies archived redo log files to the physical standby database, and automatically determines the optimal number of parallel recovery processes at the time it starts. The number of parallel recovery slaves spawned is based on the number of CPUs available on the standby server.LSP: The logical standby process uses parallel execution (Pnnn) processes to apply archived redo log files to the logical standby database, using SQL interfaces.Using LGRW to Send Redo- LGWR SYNC- LGWR ASYNCLGWR SYNC archival processing:Parameter: LOG_ARCHIVE_DEST_nAttributes: LGWR, SYNC,SERVICEExample 55 Initialization Parameters for LGWR Synchronous ArchivalLOG_ARCHIVE_DEST_1='LOCATION=/arch1/chicago'LOG_ARCHIVE_DEST_2='SERVICE=bostonLGWR SYNC NET_TIMEOUT=30'LOG_ARCHIVE_DEST_STATE_1=ENABLELOG_ARCHIVE_DEST_STATE_2=ENABLESYNC: Network I/O is synchronous (default)Waits until each write operation is completedNote: if LGWR process does not work for some reason then redo transport will automatically shift to ARCn process.NET_TIMEOUT: waits for specified seconds over the network & give error if write operation does not completeLGWR ASYNC archival processing:Ex: Same as above without SYNC & NET_TIMEOUT attributeUse ASYNC instead of SYNCNET_TIMEOUT is not necessary in ora10.2Diagram showing SYNC & ASYNC LGWR archival process:

FIG 5-4:Oracle Dataguard (B14239-04)

FIG 5-5:Oracle Dataguard (B14239-04)Note: LOG_ARCHIVE_DEST & LOG_ARCHIVE_DUPLEX_DEST should not be used for configuring Flash Recovery Area.Providing security while transmitting redo:Sql> Orapwd file=orapwpassword=xyz entries=10Note: Make sys user passwore identical for all dbs in Oracle Data Guard. Also set remote_login_password_file=exclusive/shared.VALID_FOR attribute of LOG_ARCHIVE_DEST_n parameterVALID_FOR=(redo_log_type,database_role)redo_log_type:ONLINE_LOGFILE, STANDBY_LOGFILE, or ALL_LOGFILESdatabase_role: PRIMARY_ROLE, STANDBY_ROLE, or ALL_ROLES.VALID_FOR attribute is required for role transtition- configures destination attrivutes for both Primary Database and Standby Database in one SPFILE- If VaLID_FOR is not used then we need to user two spfiles each time we do the role transitions- This attribute makes the switch-over and Fail-over easy.ExLOG_ARCHIVE_DEST_1='LOCATION=/ARCH1/CHICAGO/ VALID_FOR=(ALL_LOGFILES,ALL_ROLES)'DB_UNIQUE_NAME: Specified unique database name in Oracle Data Guard conf.Used along with LOG_ARCHIVE_CONFIG.Ex:DB_NAME=chicagoDB_UNIQUE_NAME=chicagoLOG_ARCHIVE_CONFIG='DG_CONFIG= (chicago,boston)'LOG_ARCHIVE_DEST_1='LOCATION=/arch1/chicago/ VALID_FOR= (ALL_LOGFILES, ALL_ROLES)LOG_ARCHIVE_DEST_2='SERVICE=bostonLGWR ASYNCVALID_FOR= (ONLINE_LOGFILES, PRIMARY_ROLE)DB_UNIQUE_NAME=boston'The LOG_ARCHIVE_CONFIGparameter also has SEND, NOSEND, RECEIVE, and NORECEIVE attributes:- SEND enables a database to send redo data to remote destinations- RECEIVE enables the standby database to receive redo from another databaseTo disable these settings, use the NOSEND and NORECEIVE keywordsEx: LOG_ARCHIVE_CONFIG='NORECEIVE, DG_CONFIG= (chicago,boston)'Use of these parameters can effect the role transition. Therefore trys to remove them before doing any role transitionsHandling Errors while transmitting redo:Options when archiving failsRetry the archival operations (control the number of retry operations)Use an Alternate destinationsEx: LOG_ARCHIVE_DEST_1='LOCATION=/arc_dest REOPEN=60 MAX_FAILURE=3'sOther parameters used:REOPEN: default value is 300 seconds, 0 turns off this optionMAXIMUM _FAILURE: Maximum number of failuresALTERNATE: Alternate DestinationsNote: Alternate take precedence over mandatory attribute; i.e.. even if the archiving destinations is mandatory and if it fails; the archiving automatically moves to alternate destinations.DATA PROTECTION MODES:MAXIMUX PROTECTIONMAXIMUM AVAILIBLITYMAXIMUX PERFOMANCE (Default)

- No Data loss if Primary Database fails-Redo data needed for recovery has to be written both to an online redo log files and standby redo log files before commit- Atleast one Standby Database should be available- If any fault happens Primary Database will shutdown- Configure LGWR, SYNC & AFFIRM attributes of LOG_ARCHIVE_DEST_n parameter on Standby Database-Provides highest level of data protection without compromising availability of Primary Database-Primary Database doesnt shutdown and continues to work in maximumperformance mode until the fault is corrected-all gaps in redolog files are resolved and then it goes back to maximum availability mode.-Alteast on Standby Database should be available- Configure LGWR, SYNC & AFFIRM attributes of LOG_ARCHIVE_DEST_n parameter on Standby Database-Provides highest level of data protection-does not effect the performance- As soon as the redo data is writted to the online redo log file the transacation is committed.-redo is also written to alteast one Standby Database asynchronously- Use network links with sufficient bandwith to get maximumavailablitiy with minimal input on performance on pdb- Set LGWR, SYNC & AFFIRM attributes of LOG_ARCHIVE_DEST_n parameter on alteast Standby Database

Setting the Data Protection Mode of a Data Guard ConfigurationAtleast one db should meet the following minimum requirements:MAXIMUM PROTECTIONMAXIMUM AVAILIBLITYMAXIMUM PERFOMANCE

Redo Archival ProcessLGWRLGWRLGWR OR ARCH

Network transmission modeSYNCSYNCSYNC or ASYNC when usingLGWR process. SYNC if usingARCH process

Disk write optionAFFIRMAFFIRMAFFIRM OR NO AFFIRM

Standby redo log required?YESYESNO BUT RECOMMENDED

Note: oracle recommends that on Oracle Data Guard configurations that is running on maximumprotection mode contains atleast two Standby Database meeting the above requirements so that the Primary Database continue processing without shutting down if one of the Standby Database cannot receive redo data from Primary Database.Managing log files:1. Specify alternate directory for archived redologs.- Redo received from Primary Database is identified by location attribute of the parameterLOG_ARCHIVE_DEST_n.- Alternate directory can be specified by using parameter STANDBY_ARCHIVE_DEST.- If both parameters are specified then STANDBY_ARCHIVE_DEST overrides LOG_ARCHIVE_DEST_n parameter.- query v$arvhive_dest to check the value of STANDBY_ARCHIVE_DEST parameterSQL> SELECT DEST_NAME, DESTINATION FROM V$ARCHIVE_DEST WHEREDEST_NAME='STANDBY_ARCHIVE_DEST';- Filesnames are generated in the format specified LOG_ARCHIVE_FORMAT=log%t_%s_%r.arcNote: Redo Transport Service stores the fully qualified domain name of these files in Standby Database control file and redo apply uses this information to perform recovery.- view v$archived_log- checking archived redos log files that are on the standby systemSQL> SELECT NAME FROM V$ARCHIVED_LOG;2.Reusing Online Redo Log Files1. Specify alternate directory for archived redolog files- redo received from Primary Database is identified by location attribute of the parameter LOG_ARCHIVE_DEST_n- Alternate directory can be specified by using parameter STANDBY_ARCHIVE_DEST- If both parameters are specified than STANDBY_ARCHIVE_DEST overrides LOG_ARCHIVE_DEST_n parameter- Query v$arhive_dest to check the value of STANDBY_ARCHIVE_DEST parameterSQL> SELECT DEST_NAME, DESTINATION FROM V$ARCHIVE_DEST WHERE DEST_NAME= 'STANDBY_ARCHIVE_DEST';- Files name are generated in the format specified by LOG_ARCHIVE_FORMAT=log%t_%s_%r.arcNote: Redo Transport Service stores fully qualified domain name of these files in Standby Database control file & redo apply uses this information to perform recovery.2. Reusing Online Redolog file:For reusing the online redolog files we have to set optional or mandatory option with LOG_ARCHIVE_DEST_n parameterEx: LOG_ARCHIVE_DEST_3=LOCATION=/arch_dest MANDATORYNote:By Default remote destinations are set to optional.By Default one local destination is mandatory.If mandatory is specified the archive log files are not overwritten until the failed archive log is appliedIf optional is specified; even if the redo is not applied the files are over written.3. Managing Standy Redo log files:Check the RFS process trace file or database alert log file to determine we have adequate standby redo log files or not.i.e... If these files indicate RFS process has to wait frequently for a group as archiving is not getting completed than add more log file groups to standby redo log.Note: when ever an ORL file group is added to Primary Database than we must add the corresponding standby redo log file group to the Standby Database.If the no. of standby redo log file groups are inadequate then Primary Database will shutdown if it is in maximum protection mode or switch to maximum performance mode if it is in maximum availability modeEx: Adding a member to the standby redo log groupSql> Alter database add standby logfile member /disk1/oracle/dbs/log2b.rdo to group 2;4. Planning for growth & reuse of control files:The maximum control file size is 20,000 database blocksIf 8k is the block size (8124) then the maximum control file size will be 156 MB.As long as the archived redo logs are generated or RMAN backups are taken records are added to the control file. If control file reaches its maximum size then these records are reused.Parameter used to specify the time for keeping control file records is control_file_record_keep_time value ranges from 0-365 days (default is 7 days)Note: Keep the control_file_record_keep_time value atleast upto last 2 full backups period.In case if redo is planned to apply with delay then set this value to more no. of days.5. Sharing a log file destinations among multiple Standby Databases:Ex:LOG_ARCHIVE_DEST_1=LOCATION=disk1 MANDATORYLOG_ARCHIVE_DEST_2 =SERCIVE=standby1 optionalLOG_ARCHIVE_DEST_3 =SERCIVE=standby2 optional dependency LOG_ARCHIVE_DEST_2In this case DEPENDENCY attribute is set to second standby db which takes the redo data from LOG_ARCHIVE_DEST_2.This kind of setup can be used ifPrimary Database & Standby Database resides on the same system.Physical Standby Database & Logical Standby Database resides on the same system.When clustered file system is used.When network file system is usedMANAGING ARCHIVE GAPS:Oracle Data Guard resolves the gaps automaticallyGaps can happen due to network failure or archiving problem on Primary Database.Primary Database polls Standby Database every minute to detect the gaps [polling mechanism]In case Primary Database is not available then we have to resolve the gaps manually by applying redo from one of the Standby Database.No extra configurations are required to resolve the gaps automatically.1. Using FAL [fetch archive logs mechanism] to resolve gaps.Set the parameters FAL_SERVER = net_service_nameFAL_SERVER=standby2_db, standby3_dbFAL_CLIENT=stadnby1_db2. Manually resolving archive gapsWe have to resolve the gaps manually if Primary Database is not available and if we are using Logical Standby Database (case 1); Also application for some other cases.Resolving gaps on a Physical Standby Database:1. Query the gap on Physical Standby Database:SQL> SELECT * FROM V$ARCHIVE_GAP;THREAD#LOW_SEQUENCE#HIGH_SEQUENCE#-------------------------------------------------------------17101. Find the missing logs on Primary Database and copy them to the Physical Standby Database.SQL> SELECT NAME FROM V$ARCHIVED_LOG WHERE THREAD#=1 ANDDEST_ID=1 AND SEQUENCE# BETWEEN 7 AND 10;NAME-------------------------------------------------------------------------/primary/thread1_dest/arcr_1_7.arc/primary/thread1_dest/arcr_1_8.arc/primary/thread1_dest/arcr_1_9.arc1. Once these log files are copied on Physical Standby Database then we have to register them with Physical Standby DatabaseSQL> ALTER DATABASE REGISTER LOGFILE'/physical_standby1/thread1_dest/arcr_1_7.arc';SQL> ALTER DATABASE REGISTER LOGFILE'/physical_standby1/thread1_dest/arcr_1_8.arc';1. Restart Redo Apply.Resolving gaps on a Logical Standby Database:Same procedure as Physical Standby Database but the view used is dba_logstdby_log instead of v$archive_gapSteps:a.SQL> COLUMN FILE_NAME FORMAT a55SQL> SELECT THREAD#, SEQUENCE#, FILE_NAME FROM DBA_LOGSTDBY_LOG LWHERE NEXT_CHANGE# NOT IN(SELECT FIRST_CHANGE# FROM DBA_LOGSTDBY_LOG WHERE L.THREAD# =THREAD#) ORDER BY THREAD#,SEQUENCE#;THREAD#SEQUENCE#FILE_NAME-------------------------------------------------------------------16/disk1/oracle/dbs/log-1292880008_6.arc110/disk1/oracle/dbs/log-1292880008_10.arcNote: If there is a gap then only one file is hsown for each thread. Otherwise it shows two files for each threadIn the above examples missing files are 7,8,9.b. copy these file on Logical Standby Database location.c. register these files with Logical Standby DatabaseSQL> ALTER DATABASE REGISTER LOGICAL LOGFILE 'file_name';d. Restart SQL ApplyVerification:1. Check the status of online redofile on Primary DatabaseSQL> SELECT THREAD#, SEQUENCE#, ARCHIVED, STATUS FROM V$LOG;2. determine is the most recent archive file on Primary DatabaseSQL> SELECT MAX(SEQUENCE#), THREAD# FROM V$ARCHIVED_LOG GROUP BY THREAD#;3. Use the following query on Primary Database to check which is the most recently transmitted archive log file to each destinationSQL> SELECT DESTINATION, STATUS, ARCHIVED_THREAD#, ARCHIVED_SEQ#FROM V$ARCHIVE_DEST_STATUSWHERE STATUS 'DEFERRED' AND STATUS 'INACTIVE';4. Use the following query on Primary Database to find out the archive redolog files not received at each destinationSQL> SELECT LOCAL.THREAD#, LOCAL.SEQUENCE# FROM(SELECT THREAD#, SEQUENCE# FROM V$ARCHIVED_LOG WHERE DEST_ID=1)LOCAL WHERE LOCAL.SEQUENCE# NOT IN(SELECT SEQUENCE# FROM V$ARCHIVED_LOG WHERE DEST_ID=2 ANDTHREAD# = LOCAL.THREAD#);5. Setlog_archive_trace parameter in Primary Database & Standby Database to see the transmission of redo dataMonitoring the Performance of Redo Transport ServicesView:v$system_eventParameter:log_archive_dest_nAttributes:ARCHLGWR ( SYNC/ASYNC)Waits Events to monitor:1. ARCn Wait Events1. LGWR SYNC wait events1. LGWR ASYNC Wait EventsNote: Use OEM to Monitor in GUI for Oracle Data Guard

CONFIGURING DATA GUARDConfiguring oracle10g Dataguard on Linux AS4

Creating a physical standby database:

A. Preparing the primary database for standby database creation.1. Enable forced logging1. create a password file1. configure a standby redo log1. Identical log files sizes on PD and SD1. Determine appropriate no of log file groups1. Verify parameters related to log files1. Create standby redo logs1. Verify standby redo log file group creations1. Set Initialization Parameters for the primary database1. Enable ArchivingB. Create of physical standby database1. create a backup copy of Primary Database1. create a control file for Standby Database1. prepare initialization parameter file for Standby Database1. copy files from Primary Databaseto Standby Databasesystem1. setup the environment to support Standby Database1. start physical standby db1. very physical Standby DatabaseC. Post Creation Steps.1. upgrade the data protection mode1. enable flash back databaseA. Preparing Primary Database for PHYSICAL STANDBY DATABASE creation:1. Place the Primary Database in force logging modeSQL> Alter database force logging;2. Create a password file for Sys User.Every Database in Oracle Data Guard configuration must have password file for sys user.$ orapwd file=orapwprod password=oracle entries=1003. Configure standby redo logs for maximum availability & data protection.LGWR ASYNC transport mode is preferred.If possible multiples the standby redo logs filesUse identical sizes for PDB and SDB redo log files determine appropriate no of redo log files.Formula: ( max no. of log files per group +1 ) x no. of log groups ex : 2 groups with 2 members each; thenthe no. of standby redo logs should be ( 2+1) x 2 = 3 x 2 =6 standby redo log groupsCheck for maxlogfiles & maxlogmembers clauses. If there is a limit then you have to recreate the DB or the controlfile.Adding a standby redolog file group to a specific threadSQL> alter database add standby logfile thread 5size 20mAdding a standby redolog file group to a specific groupSQL> alter database add standby logfile group 10size 20mNote: If we skip group no. using 10, 20 & so on we will end up using additional space in Standby Database controlfile.Here we have configured standby redolog files as primary db in order to make switch over easy if required. i.e... Primary Database =Standby DatabaseVerify standby redolog file group creation.SQL> select group#, thread#, sequence#, archived, status from v$standby_logsD. Setting Primary Database Initialization parametersprod.__db_cache_size=184549376prod.__java_pool_size=4194304prod.__large_pool_size=4194304prod.__shared_pool_size=88080384prod.__streams_pool_size=0*.audit_file_dest=/oracle10g/product/10.2.0/db_1/admin/prod/adump*.background_dump_dest=/oracle10g/product/10.2.0/db_1/admin/prod/bdump*.compatible=10.2.0.1.0*.control_files=/oracle10g/product/10.2.0/oradata/prod/control01.ctl*.core_dump_dest=/oracle10g/product/10.2.0/db_1/admin/prod/cdump*.db_block_size=8192*.db_domain=*.db_file_multiblock_read_count=16*.db_flashback_retention_target=3600*.db_name=prod*.db_recovery_file_dest=/oracle10g/arch*.db_recovery_file_dest_size=21474836480*.db_unique_name=standby*.dispatchers=(PROTOCOL=TCP) (SERVICE=prodXDB)*.fal_client=STANDBY*.fal_server=PROD*.fast_start_mttr_target=17*.job_queue_processes=10*.log_archive_config=DG_CONFIG=(prod,standby)*.log_archive_dest_1=LOCATION=/oracle10g/arch1_prod VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=standby*.log_archive_dest_2=SERVICE=prod VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=prod*.log_archive_dest_state_1=ENABLE*.log_archive_dest_state_2=ENABLE*.log_archive_format=prod_%s_%t_%r.arc*.log_archive_max_processes=4*.log_buffer=262144*.open_cursors=300*.pga_aggregate_target=94371840*.processes=150*.remote_login_passwordfile=EXCLUSIVE*.sga_target=283115520*.standby_archive_dest=/oracle10g/arch1_prod/*.standby_fIle_management=AUTO*.undo_management=AUTO*.undo_retention=3600*.undo_tablespace=UNDOTBS*.user_dump_dest=/oracle10g/product/10.2.0/db_1/admin/prod/udump

Creation of physical standby database:1. Make a backup copy of the Primary Databasedata files1. create control file for standby db1. Prepare pfile for Standby Database1. copy file from primary system to secondary system1. Setup the environment of standby system to support Standby Database.1. Start Physical Standby Database1. Verify Physical Standby Database1. Making a backup copy of Primary Database data files: (on Primary Database system)Any backup copy can be usedUse RMAN utility2. Creating controlfle for Standby Database (on Primary Database system)SQL> shutdown immediate(then take offline backup of datafiles cold backup)SQL> startup mount;SQL> alter database create standby controlfile as /u01/oracle/standby.ctl;SQL> alter database open;3. Prepare pfile for Standby Database: (on Primary Database system)Copy the file for PDB to SDB & make changes to few parameters as follows:Note:Compatible parameters should be 9.2.0.1.0 minimumTake care, advantages of oracle 10g by setting it as 10.2.0.0 or higherShould be same on both Primary Database& Standby DatabaseIf diff value then redo will not be transmittedCheck for bdump, cdump, udump destination to have the same location on PDB and SDB.Initstandby.oraprod.__db_cache_size=436207616standby.__db_cache_size=427819008prod.__java_pool_size=4194304standby.__java_pool_size=4194304prod.__large_pool_size=4194304standby.__large_pool_size=4194304prod.__shared_pool_size=146800640standby.__shared_pool_size=155189248prod.__streams_pool_size=0standby.__streams_pool_size=0*.audit_file_dest='/u00/app/oracle/admin/standby/adump'*.background_dump_dest='/u00/app/oracle/admin/standby/bdump'*.compatible='10.2.0.1.0'*.control_files='/u01/oracle/control01.ctl','/u01/oracle/control02.ctl','/u01/oracle/control03.ctl'*.core_dump_dest='/u00/app/oracle/admin/standby/cdump'*.db_block_size=8192*.db_domain='global.com'*.db_file_multiblock_read_count=16*.db_flashback_retention_target=3600*.db_name='prod'*.db_recovery_file_dest_size=21474836480*.db_recovery_file_dest='/u01/oracle/flash'*.db_unique_name='standby'*.dispatchers='(PROTOCOL=TCP) (SERVICE=prodXDB)'*.FAL_CLIENT='standby'*.FAL_SERVER='prod'*.global_names=TRUE*.job_queue_processes=10*.log_archive_config='DG_CONFIG=(prod,standby)'*.log_archive_dest_1='LOCATION=/u01/oracle/arch VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=standby'*.log_archive_dest_2='SERVICE=prod VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=prod'*.log_archive_dest_state_1='ENABLE'*.log_archive_dest_state_2='ENABLE'*.log_archive_format='standby_%s_%t_%r.arc'*.log_archive_max_processes=4*.open_cursors=300*.pga_aggregate_target=197132288*.processes=150*.remote_login_passwordfile='EXCLUSIVE'*.sga_target=592445440*.standby_archive_dest='/u01/oracle/arch'*.standby_file_management='AUTO'*.undo_management='AUTO'*.undo_tablespace='UNDOTBS1'*.user_dump_dest='/u00/app/oracle/admin/standby/udump'

4. Copy files for primary db:Copy data files/standby control file and pfile to the standby system using O/S commands (ftp)5. Setting up the environment to support Standby Database (on Standby Database)Create the service standby and password file using oradim utility on windows.If UNIX; just create password file$ orapwd file=orapwprod password=oracle entries=100Note: the password file for sys user should have the same password as both PDB and SDB.Configure listener.oraConfigure net service name on both system for both databases.Note: connect descriptor should specify the usage of dedicated serverListener.ora on primary db:# listener.ora Network Configuration File: /u00/app/oracle/product/10.2.0/db_1/network/admin/listener.ora# Generated by Oracle configuration tools.SID_LIST_LISTENER =(SID_LIST =(SID_DESC =(SID_NAME = PLSExtProc)(ORACLE_HOME = /u00/app/oracle/product/10.2.0/db_1)(PROGRAM = extproc))(SID_DESC =(GLOBAL_DBNAME = prod.global.com)(ORACLE_HOME = /u00/app/oracle/product/10.2.0/db_1)(SID_NAME = prod)))LISTENER =(DESCRIPTION_LIST =(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = ans.global.com)(PORT = 1521)))(DESCRIPTION =(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC0))

Listener.ora on standby db:stener.ora Network Configuration File: /u00/app/oracle/product/10.2.0/db_1/network/admin/listener.ora# Generated by Oracle configuration tools.SID_LIST_LISTENER =(SID_LIST =(SID_DESC =(SID_NAME = PLSExtProc)(ORACLE_HOME = /u00/app/oracle/product/10.2.0/db_1)(PROGRAM = extproc))(SID_DESC =(GLOBAL_DBNAME = standby.global.com)(ORACLE_HOME = /u00/app/oracle/product/10.2.0/db_1)(SID_NAME = standby)))LISTENER =(DESCRIPTION_LIST =(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = 100.100.112.72)(PORT = 1521)))(DESCRIPTION =(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC0))))

tnsnames.ora on primary db:# tnsnames.ora Network Configuration File: /u00/app/oracle/product/10.2.0/db_1/network/admin/tnsnames.ora# Generated by Oracle configuration tools.PROD =(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = ans.global.com)(PORT = 1521)))(CONNECT_DATA =(SERVICE_NAME = prod.global.com)(INSTANCE_NAME = prod))(HS = OK))STANDBY =(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = 100.100.112.72)(PORT = 1521)))(CONNECT_DATA =(SERVICE_NAME = standby.global.com)(INSTANCE_NAME = standby))(HS = OK))EXTPROC_CONNECTION_DATA =(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC0)))(CONNECT_DATA =(SID = PLSExtProc)(PRESENTATION = RO)))

tnsnames.ora on standby db:names.ora Network Configuration File: /u00/app/oracle/product/10.2.0/db_1/network/admin/tnsnames.ora# Generated by Oracle configuration tools.PROD =(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = 100.100.101.44)(PORT = 1521)))(CONNECT_DATA =(SERVICE_NAME = prod.global.com)))STANDBY =(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = 100.100.112.72)(PORT = 1521)))(CONNECT_DATA =(SERVICE_NAME = standby.global.com)))EXTPROC_CONNECTION_DATA =(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC0)))(CONNECT_DATA =(SID = PLSExtProc)(PRESENTATION = RO)))

6. Start Physical Standby DatabaseCreate spfile from pfile on standby db from the text parameter file that was edited for Standby DatabaseSQL> create spfile from pfile=initstandby.ora;Start Standby DatabaseSQL> startup mount;Start applying redoSQL> alter database recover managed standby database disconnect from session;Test archival operation to the Standby DatabaseSQL> alter system switch logfile; (Primary Database)7. Verify Physical Standby DatabaseSQL> select sequence#,first_time,next_time from v$archived_log order by sequence#; ( on Primary Databaseand Standby Database )Verify the logs applied by the following command (On Standby Database)SQL> select sequence#, applied from v$archived_log order by sequence#;RMAN Script: POINT IN TIME RECOVERY

[point_intime_recovery.scp]# This scenario assumes that all initializaiton filesa nd the current controlfile arein place and you want to recover to a point in time '2001-04-09"14:30:00'.

# Ensure you set your NLS_LANG enviroment variable

STARTUP MOUNT FORCE;RUN{SET UNTIL TIME "TO_DATE('2001-04-09:14:30:00','yyyy-dd-mm:hh24:mi:ss')";RESTORE DATABASE;RECOVER DATABASE;ALTER DATABASE OPEN RESETLOGS;}

# You must take a new whole database backup after resetlogs,since backups of previous incarnation are not easily usableRMAN SCRIPT : DISASTER RECOVERY

[ disaster_recovery.scp]# The commands below assume that all initialization parameters files are in place and the complete directory structure for the datafiles is recreated# Ensure uou set your NLS_LANG environment variable# e.g in unix (csh);# >setenv NLS_LANG amarican_america.we8dec

# Start RMAN without the target option, and use the following commands to restore and recover the database

# SET DBID; use database if from RMAN output# not required if using recovery catalog

connect target sys/password@omrstartup nomount;run{# you need to allocate channels if not using recovery catalog.

allocate channel c1 type disk;

# optionally you can set newname and switch commands to restore datafiles to a new location

restore controlfile from autobackup;alter database mount;restore database;reocver database;alter database open resetlogs;

# you must takea new whole database backup after resetlogs, since backups of previous incarnatin are not easily usable.

RMAN SCRIPT : CONTROLFILE RECOVERY

[controlfile_recovery.scp]# Oracle strongly recommends that you specify multiple controlfiles, on separatephysical disks and controllers, in the CONTROL_FILE initialization parameter.# - If one copy is lost due to media failure, copy one of the others over the lost controlfile and restart the instance.# - If you lose all copies of the controlfile, you must re-create it using the create controlfile sql command# You should use RMAN to recover a backup controlfile only if you have lost all copies of the current controlfile, because after restoring a backup controlfile, you will have to open RESETLOGS and take a new whole database backup.# This section assumes that all copies of the current controlfile have been lost, and than all initialization parameter files, datafiles and online logs are intact.# Ensure you set your NLS_LANG environment variable e.g. in unix (csh):#>setenv NLS_LANG american_america.we8dec# Start RMAN without the TARGET option, and use the following commands to restore and recover the database;# SET DBID;use database id from RMAN output; not required if using recovery catalog

connect target sys/password@omrstartup nomount;run{

# you need to allocate channels if not using recovery catalog.# allocate channel foo type sbt parms'';

allocate channel c1 type disk;restore controlfile from autobackup # or;alter database mount;recover database;alter database open resetlogs;}

# you must take a new whole database backup after reerlogs, since backups of previous incarnation are not easily usable$

RMAN Script: DATAFILE RECOVERY

[datafile_recovery.scp]# This section assumes that datafile 5 has been damaged and needs to be restoredand recovered, and that the current controlfile and all other datafiles are intact. the database is mounted during the restore and recovery.

# the steps are:# - offlie the datafile that needs recovery# - restore the datafile from backups# - apply incrementals and archivelogs as necessary to recover.# - make online recovered datafile

run{sql 'alter database datafile 5 offline';

#if you want to restore to a different location,uncomment the following command# Set newname for datafile 5 to '/newdirectory/new_filename.f';

restore datafile 5;

# if you restored to a different locatin, uncomment the command below to# switch the controlfile to point to the file in the new location# SWITCH DATAFILE ALL;

recover datafile 5;sql 'alter database datafile 5 online';}

RMAN Script: OMR Database Full Backup (Database Mounted)

[ dbbkup_full.scp]

run {allocate channel c1 type disk;backuptag weekly_omr_fullformat '/u07/omr/backup/full_%d_%s_%p_%t'(database);release channel c1;

configure controlfile autobackup format for device type disk to '/u07/omr/backup/auto_cntrl_%F';configure controlfile autobackup on;

allocate channel c2 type disk;backupformat '/u07/omr/backup/archive_%d_%s_%p_%t'(archivelog all);release channel c2;}startup;

RMAN Script: Cumulative level 2 backup

[ call_dbbkup_cm2.scp]#!/bin/kshexport ORACLE_HOME="/u01/app/oracle/product/10.2.0"export ORACLE_SID="omr"PATH=$PATH:$ORACLE_HOME/binecho rman backup cm level1 for CATDB started `date` >> /u07/catdb/rmanbkup.logrman target sys/password@omr catalog rman/rman@catdb cmdfile='/u04/catdb/scripts/dbbkup_cm1.scp'echo rman backup cm level0 for CATDB ended `date` >> /u07/catdb/rmanbkup.logexit[dbbkup_cm2.scp]run{allocate channel c1 type disk;backup incremental level 2 cumulativetag omr_cm2format '/u10/catdb/backup/cm2_%d_%s_%p_%t'(database);release channel c1;

#backing up controlfile to the specified destination keeping autobakup copy

configure controlfile autobackup format for device type disk to '/u10/catdb/backup/auto_cntrl_%F';configure controlfile autobackup on;

#backup up archivelog files

allocate channel c2 type disk;backupformat '/u10/catdb/backup/cm2_%d_%s_%p_%t'(archivelog all);release channel c2;}

RMAN Script: Cumulative level 1 backup

[ call_dbbkup_cm1.scp]#!/bin/kshexport ORACLE_HOME="/u01/app/oracle/product/10.2.0"export ORACLE_SID="omr"PATH=$PATH:$ORACLE_HOME/binecho rman backup cm level1 for CATDB started `date` >> /u07/catdb/rmanbkup.logrman target sys/password@omr catalog rman/rman@catdb cmdfile='/u04/catdb/scripts/dbbkup_cm1.scp'echo rman backup cm level0 for CATDB ended `date` >> /u07/catdb/rmanbkup.logexit[dbbkup_cm1.scp]run{allocate channel c1 type disk;backup incremental level 1 cumulativetag omr_cm1format '/u10/catdb/backup/cm1_%d_%s_%p_%t'(database);release channel c1;

#backing up controlfile to the specified destination keeping autobakup copy

configure controlfile autobackup format for device type disk to '/u10/catdb/backup/auto_cntrl_%F';configure controlfile autobackup on;

#backup up archivelog files

allocate channel c2 type disk;backupformat '/u10/catdb/backup/cm1_%d_%s_%p_%t'(archivelog all);release channel c2;}

RMAN Script: Cumulative level 0 backup

[ call_dbbkup_cm0.scp]#!/bin/kshexport ORACLE_HOME="/u01/app/oracle/product/10.2.0"export ORACLE_SID="omr"PATH=$PATH:$ORACLE_HOME/binecho rman backup cm level0 for CATDB started `date` >> /u07/catdb/rmanbkup.logrman target sys/password@omr catalog rman/rman@catdb cmdfile='/u04/catdb/scripts/dbbkup_cm0.scp'echo rman backup cm level0 for CATDB ended `date` >> /u07/catdb/rmanbkup.logexit[dbbkup_cm0.scp]run{allocate channel c1 type disk;backup incremental level 0 cumulative tag omr_cm0 format '/u10/catdb/backup/cm0_%d_%s_%p_%t' (database);release channel c1;

#backing up controlfile to the specified destination keeping autobakup copy

configure controlfile autobackup format for device type disk to '/u10/catdb/backup/auto_cntrl_%F';configure controlfile autobackup on;

#backup up archivelog files

allocate channel c2 type disk;backupformat '/u10/catdb/backup/cm0_%d_%s_%p_%t'(archivelog all);release channel c2;}

RMAN Script: Deleting archivelog when catalog exists.

[call_omr_archflush.scp]#!/bin/kshexport ORACLE_HOME="/u01/app/oracle/product/10.2.0"export ORACLE_SID="omr"PATH=$PATH:$ORACLE_HOME/binrman target sys/password@omr catalog rman/rman1956@rman cmdfile='/u04/rman/rman/scripts/omr_archflush.scp'exit

[omr_archflush.scp]

# RMAN SCRIPT: DELETING ARCHIVE LOGSrun{allocate channel c1 type disk;delete archivelog until time 'SYSDATE-8'; # OR delete archivelog until sequence=; #orRMAN> BACKUP ARCHIVELOG ALL DELETE INPUT;release channel c1;

RMAN Script: Deleting the old archives when no catalog exists

[ call_catdb_archflush.scp]#!/bin/kshexport ORACLE_HOME="/u01/app/oracle/product/10.2.0"export ORACLE_SID="catdb"PATH=$PATH:$ORACLE_HOME/binrman target sys/password@catdb cmdfile='/u04/catdb/scripts/catdb_archflush.scp'exit[ catdb_archflush.scp]run{allocate channel c1 type disk;delete archivelog until time 'SYSDATE-8'; # OR delete archivelog until sequence=;release channel c1;

RMAN Script: Backing up all the archivelog files

[call_arch_bkup.scp] (this scripts is calling the scriptarch_bkup.scp)

#!/bin/kshexport ORACLE_HOME="/u01/app/oracle/product/10.2.0"export ORACLE_SID="omr"PATH=$PATH:$ORACLE_HOME/binecho rman ARCHIVE backup for CATDB started `date` >> /u07/catdb/rmanbkup.logrman target sys/password@omr catalog rman/rman@catdb cmdfile='/u04/catdb/scripts/arch_bkup.scp'echo rman ARCHIVE backup for CATDB ended `date` >> /u07/catdb/rmanbkup.logexit[arch_bkup.scp]run{allocate channel c1 type disk;backupformat '/u10/catdb/backup/arch_%d_%s_%p_%t'(archivelog all);release channel c1;# deleting archive logs older than 8 daysallocate channel c2 type disk;delete archivelog until time 'SYSDATE-5';release channel c2;}

Switching Undo TablespaceCREATE UNDO TABLESPACE "UNDOTBS"DATAFILE '/u05/omr/oradata/undotbs.dbf' SIZE 4000M;

SQL> select name from v$tablespace where name like 'UNDO%';NAME--------UNDOTBS1UNDOTBS

SQL> ALTER SYSTEM SET undo_tablespace='UNDOTBS' SCOPE=BOTH;SQL> show parameter undo

NAME TYPE VALUEundo_management string AUTOundo_retention integer 21600undo_tablespace string UNDOTBS

SQL>select status from V$ROLLSTATSTATUSONLINEONLINEPENDING OFFLINEPENDING OFFLINEPENDING OFFLINEPENDING OFFLINEONLINEONLINEONLINEONLINEONLINEONLINE

If the status is pending offline; u cannot drop undo tablespace UNDOTBS1

SQL>drop tablespace undotbs1 including contents and datafiles

If u drop u will get ORA-30013: undo tablespace 'UNDOTBS1' is currently in use

Note: You can find the following messages in alert.log after issuing alter system set commandSat Jul 18 15:38:38 2009Successfully onlined Undo Tablespace 7.Undo Tablespace 1 moved to Pending Switch-Out state.*** active transactions found in undo tablespace 1 during switch-out.Sat Jul 18 15:46:28 2009Undo Tablespace 1 successfully switched out.

Multiplexing Control FileMultiplexing the Control File When Using SPFILE

Sql> shutdown immediate$ cp /u00/app/oracle/oradata/ota/control01.ctl /u04/ota/control04.ctl

Sql> Startup nomountSql> ALTER SYSTEM SET control_files =/u00/app/oracle/oradata/ota/control01.ctl,/u00/app/oracle/oradata/ota/control02.ctl,/u00/app/oracle/oradata/ota/control03.ctl,/u04/ota/control04.ctlSCOPE=SPFILE;

Sql> shutdown immediateSql> startupSql> select name from v$controlfile;Sql> create PFILE from SPFILE;

Multiplexing the Control File When Using PFILE

Sql> sqlplus /nologSql> Connect /as sysdbaSql> shutdown immediate

$ cp /u00/app/oracle/oradata/ota/control01.ctl /u0X/ota/control04.ctl

Add this entry in PFILE

control_files = /u00/app/oracle/oradata/ota/control01.ctl,/u00/app/oracle/oradata/ota/control02.ctl,/u00/app/oracle/oradata/ota/control03.ctl,/u0X/ota/control04.ctl

SQL> startup

Frequently used OS commands for DBA

As DBA you need to use frequent OS command, here I would like to share you some of the important day to day commands:

1. To delete files older than N number of days ?(Useful in delete log, trace, tmp file )find . -name *.* -mtime +[N in days] -exec rm {} \;Example : find . -mtime +5 -exec rm {} \;(This command will delete files older then N days in that directory

2. To list files modified in last N daysfind . -mtime --exec ls -lt {} \;Example: find . -mtime +3 -exec ls -lt {} \;1So to find files modified in last 3 days3. To sort files based on Size of file ?ls -l | sort -nk 5 | moreuseful to find large files in log directory to delete in case disk is full4. To find files changed in last N daysfind-mtime -N printExample: find-mtime -2 -print

5. To extract cpio filecpio -idmv < file_name(Dont forget to use sign b(Condition:you should have file b in that directory & there should not be any file with name a)

12. To setup cronjob(cronjob is used to schedule job in Unix at O.s. Level )crontab -l( list current jobs in cron)crontab -e( edit current jobs in cron)_1_ _2_ _3_ _4_ _5_ executable_or_jobWhere1 Minutes (0-59)2 Hours ( 0-24)3 day of month ( 1- 31 )4 Month ( 1-12)5 A day of week ( 0- 6 ) 0 -> sunday 1-> mondaye.g. 0 3 * * 6 Means run job at 3AM every saturdayThis is useful for scheduling tablespace threshold, ftp, rman backup or removed old log files, or other scripts regularly.Sample Scheduled backup:$ crontab lRman Database:

0020**1,4/u07/rman/scripts/call_dbbkup_cm0.scp

0015**6/u07/rman/scripts/offbkup_rman.sh

0020**0,2,3,6/u07/rman/scripts/call_arch_bkup.scp

OTA Database:

5023**0,2,3,6/u01/ota/dailyexp_ota.sh

5023**1,4/u01/ota/offbkup_ota.sh

1514**0,1,2,3,4,6/u01/ota/morning_arch.sh

How to kill all similar processes with single command (in this case opmn)ps -ef | grep opmn |grep -v grep | awk {print $2} |xargs -i kill -9 {}

Locating Files under a particular directoryfind . -print |grep -i test.sql

Using AWK in UNIXTo remove a specific column of output from a UNIX command for example to determine the UNIX process Ids for all Oracle processes on server (second column)ps -ef |grep -i oracle |awk '{ print $2 }'

Changing the standard prompt for Oracle UsersEdit the .profile for the oracle userPS1="`hostname`*$ORACLE_SID:$PWD>"

Display top 10 CPU consumers using the ps command/usr/ucb/ps auxgw | head -11

Show number of active Oracle dedicated connection users for a particular ORACLE_SIDps -ef | grep $ORACLE_SID|grep -v grep|grep -v ora_|wc -l

Display the number of CPUs in Solarispsrinfo -v | grep "Status of processor"|wc -l

Display the number of CPUs in AIXlsdev -C | grep Process|wc -l

Display RAM Memory size on Solarisprtconf |grep -i mem

Display RAM memory size on AIXFirst determine name of memory devicelsdev -C |grep memthen assuming the name of the memory device is mem0lsattr -El mem0

Swap space allocation and usageSolaris : swap -s or swap -lAix : lsps -a

Total number of semaphores held by all instances on serveripcs -as | awk '{sum += $9} END {print sum}'

View allocated RAM memory segmentsipcs -pmb

Manually deallocate shared memeory segmentsipcrm -m ''

Show mount points for a disk in AIXlspv -l hdisk13

Display amount of occupied space (in KB) for a file or collection of files in a directory or sub-directorydu -ks * | sort -n| tail

Display total file space in a directorydu -ks .

Cleanup any unwanted trace files more than seven days oldfind . *.trc -mtime +7 -exec rm {} \;

Locate Oracle files that contain certain stringsfind . -print | xargs grep rollback

Locate recently created UNIX files (in the past one day)find . -mtime -1 -print

Finding large files on the server (more than 100MB in size)find . -size +102400 -print

Crontab:To submit a task every Tuesday (day 2) at 2:45PM45 14 2 * * /opt/oracle/scripts/tr_listener.sh > /dev/null 2>&1

To submit a task to run every 15 minutes on weekdays (days 1-5)15,30,45 * 1-5 * * /opt/oracle/scripts/tr_listener.sh > /dev/null 2>&1

To submit a task to run every hour at 15 minutes past the hour on weekends (days 6 and 0)15 * 0,6 * * opt/oracle/scripts/tr_listener.sh > /dev/null 2>&1

CLUSTER ADMINISTRATION

OCR Updation:Three utilities to perform OCR updates1.SRVCTL (recommended) remote administration utility2.DBCA (till 10.2)3.OEM4.SRVCTL Service Control:-It is the most widely used utility in RAC environment-It is used to perform administration & control of OCR file

Registry sequence of services into OCR:1.Node applications (automatically done in 11.2)2.ASM instances (automatically done in 11.2)3.Databases4.Database instances5.Database services.

Note: To unregister you have to follow in reverse order.

OLR Oracle Local Registry

-Both OLR & GPNP profile needed by lower/HAS stack & OCR, VD is needed by upper / CSR Stack.-If OLR or GPNP got corrupted, the corresponding node will go down where as if OCR, VD gets corrupted the complete Cluster will go down.-Every daemon of the node will communicate with the peer (same) daemon nodes.-Oracle availability perform OLR backup at the time of execution of root.sh scrIPt of grid infrastructure installation & stores in the location $ GRID_HOME/cdata//backup_date_time.olr.-The default location of OLR file is $ GRID_HOME/cdata/.olr.

OLR Backup:Using root user

$ G_H# ./ocrconfig local manual backup$ G_H# ./ocrconfig local backuploc$ G_H# ./ocrcheck - local

Restoring OLR:-Bring the init level into either init1 or init2-Stop the cluster in the specific node-Restore the OLR from the backup location # ./ocrconfig local restore-Start cluster-Change the init level to either 3 or 5 (init 3 for CLI and init 5 for GUI mode)

-OCR Oracle cluster registry or repository

It is a critical & shared Clusterware file and contains the complete cluster information like cluster node name, their corresponding IPs, CSS parameters, OCR autobackup information & registered resources like nodeapps, ASM instances with their corresponding node names, databases & database instances & database services

CRSD daemon is responsible for updating OCR file whenever the utilities like srvctl, dbca, oem, netca etc.

CSSD daemon automatically brings up online all the cluster resources which got registered in OCR file

To know the OCR location# ./ocrcheck // disk ocr location# cat /etc/oracle/ocr.loc // In linux & HP-UX# cat /var/opt/oracle/ocr.loc // in Solaris & IBM-AIX

OCR Backup method:3 ways to perform backup1.Automatic2.Physical3.Logical

1.Automatic:Oracle automatically perform OCR backup for every regular interval of 4 hrs since the CRS start time and stores in master node.

Identifying the master node:# vi $ G_H/log//crsd/crsd.log

I AM THE NEW OCR MASTERORTHE NEW OCR MASTER NODE IS

Backup location:$ G_H/cdata/ Backup00.ocr (latest) Backup01.ocr Backup02.ocr Day.ocr Week.ocr

Oracle retains the latest three 4 hours backup, similarly one latest day backup and one latest week backup by purging all the remaining backup.

Note: It is no possible to change the automatic backup interval time

Manual Backup:

# ./ocrconfig manual backup (it will create backup in default location $ G_H/cdata//backup_date_time.ocr)# ./ocrconfig backuploc(recommended is shared storage)

Restoring OCR:-Stop the complete cluster on all the nodes # ./crsctl stop crs-Identify the latest backup (backup00.ocr)-Restore the backup # ./ocrconfig restore-Start the cluster in all the nodes-Check the integrity of the restored OCR backup # ./cluvfy comp ocr n all verbose-2.Physical backup:Oracle supports image or sector level backup of OCR using dd utility(if OCR in on raw devices). & cp,(if OCR is on general file system)

# ./ cp# dd if=of=//if: input file, of: output file.

Restoring:# ./ cp# dd if=of=//if: input file, of: output file.

3.Logical backup:# ./ocrconfig export# ./ocrconfig import

Note: Oracle recommends taking the backup of OCR file whenever the cluster configuration got modified (ex: adding a node/ deleting a node)

OCR Multiplexing:To avoid OCR lost &the complete cluster goes down due to the single point of failure (SPF) of OCR, Oracle supports OCR multiplexing from 10.2 onwards in max 2 locations (1 as primary other as mirror copy) but from 11.2 onwards it is supporting max 5 locations (1 as primary and remaining as mirror copies)

Note: from 11.2 onwards, oracle support storage of OCR in ASM diskgroups so it provides mirroring depending on the redundancy level.

GPNP Grid Plug n Play Profile:

-It contains basic cluster information like location of voting disk, ASM spfile location, all the IP addresses and their subnet masks-This is a node specific file-It is and xml formatted fileBackup loc: $ G_H/gpnp//profile/peer/profile.xmlActual loc: $ G_H/gpnp/profile/peer/profile.xml

Voting Disk (VD):

-It is another & shared file which contains the node membership of all the nodes within the cluster-CSSD Daemon is responsible for sending the heartbeat messages to other nodes for every 1 sec and write the response into VD

VD Backup:-Oracle supports only physical method to take the backup of VD.-From 11.2 onwards, oracle not recommend to take the backup of VD because it automatically maintains VD backup into OCR file

Restoring VD:1.Stop the CRS on all the nodes2.Restore the VD # ./crsctl restore vdisk3.Start the CRS on all the nodes4.Check the integrity of restored VD. # ./cluvfy comp vdisk n all verbose

VD Multiplexing:To avoid VD lost and the complete cluster goes down, due to SPF of VD, oracle supports multiplexing of VC from 10.2 onwards in max 31 locations, but from 11.2 it is supporting in max 15 locations.

Node Eviction:

It is the process of automatically rebooting a cluster node due to private network or VC access failure to avoid data corruption.If node1 & node2 can communicate with each other but not with the node3 through private network, a split syndrome can occur for the formation of 2 sub cluster and try to master a single resource their by having data corruption. To avoid this split blind syndrome, the master node evicts the corresponding node by the handshake node membership information of D.

CSS Parameter:1.Miscount:default 30 sec: It specifies the maximum private network latency to wait before triggering node eviction process by the master node.2.Disk timeout:Default is 200 sec: It specifies the VD access latency if elapsed to have node eviction process by the master node.3.Reboot Time:default 3 sec: The affected node waits till the reboot time elapsed for actual node reboot process (this is to make some 3rdparty application goes down properly)

CLUSTER COMPONENTS

OHASD - Oracle High Availability Services Daemon:-It is the first and only daemon which is going to start by parent init process and in turn it is responsible for starting some other agents & daemon by reading OLR (oracle local registry file)-It needs access to OLR file that contains the startup sequence of other child daemons.

CSRD Cluster Ready Services Daemon:-It is responsible for maintain the cluster configuration and HA (High availability) operations by reading the OCR file (oracle cluster registry)-OCR file contains the complete cluster information which is required for CRSD Daemon.

CSSD Cluster Synchronization Services Daemon:-It is responsible for updating node membershIP of all the nodes within the cluster into VD (Voting Disk)-In Non-RAC environment, CSSD daemon is responsible for maintaining the communication between ASM instance and ASM Clients database instances.

VD Voting Disk:-It consist the updated information of all the cluster nodes.-Both OCR & VD requires 280 MB of space

EVMP Event Manager Daemon:It is responsible for publishing & subscribing the events which are generated by CRSD Daemon to the other nodes

OCTSSD Oracle Cluster Time Synchronization Services Daemon:-It is responsible for maintaining the consistency in time-It has two modesoObserver : if NTP (network time protocol) is enabledoActive : if NTP is disabled

GMNPD

GSD:From 10g it is duplicated, it is responsible for performing the administrative tasks whenever GUI application like NETCA or DBCA invoked.

ONS Oracle Notification Server:It is responsible for publishing the notification events thru FAN (Fast Application Notifications)

VIP Virtual IP:-It is registered as a resource into OCR and maintained the status into OCR.-For Release 2 onwards, we require every node specific one private IP in one subnet mask and one public IP & VIP on other subnet mask and 3 unused scan VIPs on the same subnet mask of public.Types of Storage:

Types of Storage:

NAS: Network Attached Storage:It supports file level I/O

SAN Storage Area Network

NASSAN

SS

PP

AA

MM

ISCSI Internal Small Computer System Interface

I/O performance: NAS < ISCSI < SAN

Types of Clusters

Types of Clusters:

1.Operating system level2.Hardware level3.Network level4.Application levela.Failover clusterb.Parallel clusterc.Hybrid cluster

Failover Cluster:

Ex: VCS (VERITAS Cluster System)

Disadvantages of failover cluster:1.More downtime for the users2.No load balancing3.Wastage of resources4.Max it support 2 node cluster setup

Parallel cluster or scalable high performance cluster:

In 9i, it supports 67 nodes per clusterIn 10g, it supports 100 nodes per clusterIn 11g, it supports 100+ nodes per cluster

Advantages:1.Zero or negligible downtime for the users2.Better load balancing3.No wastage of resources4.It supports multiple nodes per cluster5.Best example oracle RAC

Hybrid Cluster:

It is a combination for both parallel & failover clusterEx: Data guard in RAC environment

Types of Cluster Software:

TypesVendor

1.Hp Service Guard2.Sun ClusterKernel levelHPOracle SUN

3.VERITAS cluster (failover + parallel)4.Oracle RAC (Truly Parallel)User / Application levelSymantecOracle

Real Application Cluster - RAC

Global cache is maintained by1.In 8i, OPS (Oracle parallel Server)2.From 9i, it is maintained by GRD (Global Resource directory)Global Resource directory (GRD) = Global cache service (GCS) + Global Enqueue Services (GES) GRD = GCS + GES

RACis not software; it is a concept in which multiple instances (each on separate node) and can access a common (single) database

Advantages of RAC:1.It offers SPAMa.SScalabilityb.PPerformancec.AAvailabilityd.MManageability2.It Supports HA (high availability) operations to the services like VIPs, Scan IPs etc3.Automatic error Detection4.Automatic restart of failed services5.It supports TAF Transparent Application Failover

Cluster Component:

1.Clusterware Software2.Private Interconnect3.Shared Storage

Clusterware software:

It coordinates and manages all the cluster nodes of a cluster by treating all the nodes as a single large logical server

VersionName of Cluster S/W

9iOracle cluster manager (Supports only Linux/windows)

10.1Oracle CRS (cluster ready Service)a.Supports all the o/sb.Mandatory for building 10g RAC Setup

10.2Oracle CRS is renamed to ORACLE CLUSTERWAREa.Supports HA operationsb.High performance

11.2a.Grid Infrastructureb.Combination of Clusterware binaries and ASM binaries

Oracle Homes10.1ASM_HOME (ASM)CRS_HOME (Clusterware)ORACLE_HOME (RDBMS)

10.2CRS_HOMEORACLE_HOME (Clusterware + ASM)

11.2GRID_HOME (Clusterware + ASM)ORACLE_HOME (RDBMS)

High Availability

Availability:

1.Low Availability: Some downtime + Data loss (Incomplete Recovery)2.Medium Availability: Some downtime + No Data loss (Complete Recovery)3.High Availability: No Downtime + No Data loss (Complete Recovery)

Low Availability:ICR - Loss of redo log files or current control file.

Medium Availability:It is a complete recovery example Data guardMax. Performance: This is the default modeMax Availability:Guarantee data availability by compromising database availabilityMax Protection:Guarantee database availability by compromising data availability

High Availability: Its means No downtime & no data loss Example RAC (it gives the solution for instance crash)Data guard: Data AvailabilityRAC: Instance AvailabilityMAAMaximum Availability Architecture = Data guard + RAC

Oracle ASM disk failure - Part 1IntroductionOracle Automatic Storage Management (ASM) was introduced in Oracle 10g. ASM provides advance storage management features such as DISK I/O re-balancing, volume management and easy database file name management. It also can provide MIRRORING of data for high availability and redundancy in the event of a disk failure (Mirroring is optional). ASM guarantees that data extents (table,index row data etc.) in one disk are mirrored in another disk (normal redundancy) and in two disks (high redundancy).

A few times I have faced ASM disk failures when redundancy (mirroring) was enabled and none of them resulted in an issue for an end user. ASM automatically detects the disk failure and services Oracle SQL requests by retrieving information from the mirrored (other) disk. Such a failure is handled gracefully and entirely managed by Oracle. I am very impressed by the fault tolerance capability in ASM.

But soon the Oracle DBA must work with the system administrator to replaced the failed disk. If the mirrored disk also fails before the replacement, then Oracle SQL by end users will error because both the primary and mirrored disks have failed.

This postassumesthat you are using ASM redundancy (Normal or High) and that you are not using ASMLib program. The commands and syntax could be different if you are using ASMLib.

How to identify a failed disk

An ASM disk failure as noted below is transparent to end users and one can be caught unaware if one is not proactive in database monitoring. The DBA can write a program that constantly checks the database alert logfile or a SQL script that checks for any read/write errors.

If either of the below queries return rows, then it is confirmed there are one or more ASM disks that have failed.

select path,name,mount_status,header_statusfrom v$asm_diskwhere WRITE_ERRS > 0

select path,name,mount_status,header_statusfrom v$asm_diskwhere READ_ERRS > 0;

But despite the read/write errors, the header_status column value may still be shown as "MEMBER".

Drop the failed disk

1)alter diskgroup #name# drop disk #disk name#;Caution: DoNOTphysically remove the failed disk YET from the disk enclosure of the server. The above command is executed immediately, but ASM also starts a lengthy re-balance operation. The disk should be physically removed only after the header_status for the failed disk becomes FORMER. This status is set after the re-balance operation is completed. One can monitor the progress of the re-balance operation by checking v$asm_operation.

state,power,group_number,EST_MINUTESfrom v$asm_operation;

After a few min/hours the above operation will get completed (no rows returned). Then verify that theheader_statusis nowFORMERand then request the System Administrator to physically remove the disk from the disk enclosure. The LED light for the failed disk should get turned off and this indicates the physical location of the failed disk in the enclosure.

Add the replacement disk

1) Get the replacement device name, partition it and change ownership to the database owner. For example let the disk path after partitioning be/dev/sdk12)select distinct header_status from v$asm_disk where name = '/dev/sdk1';(Must show as CANDIDATE)

3)alter diskgroup #name# add disk '/dev/sdk1';4) ASM starts the re-balancing operation due to the above disk add command.One can monitor the progress of the re-balance operation by checking v$asm_operation.

select state,power,group_number,EST_MINUTESfrom v$asm_operation;

After a few min/hours the above gets completed (no rows returned)

5) The disk add operation is now considered complete.

How to decrease the ASM re-balance operation timeWhile the above ASM re-balancing operation is in progress, the DBA can let it complete quickly by changing 'ASM power' by running the below commandfor example.

alter diskgroup #name# rebalance power 8;

The default power is 1 (i.e ASM starts one re-balance background process to handle the re-balancing work, called ARB process). The above command dynamically starts 8 ARB processes (ARB0 to ARB7), which can dramatically decrease the time to re-balance.The maximum power limit in 11g R1 is 11 (upto 11 ARB processes can be started).

Conclusion

None of the above maintenance operations (disk drop, disk add) causes a downtime to the end user and therefore can be completed during normal business hours. The re-balance operation can cause slight degradation of performance and hence increase the power limit to let it complete quickly.Oracle ASM disk failure - Part 2IntroductionInPart 1, I wrote about a scenario when ASM detects READ_ERRS/WRITE_ERRS and updates these columns in v$asm_disk for the ASM disk. The DBA has to explicitly drop the disk in ASM. This article is about a different scenario when ASM instance itself performs the 'drop disk' operation.

This postassumesthat you are using ASM redundancy (Normal or High) and that you are not using ASMLib program. The commands and syntax could be different if you are using ASMLib.

Scenario

In this scenario, ASM drops the disk automatically. Furthermore, the READ_ERRS/WRITE_ERRS in v$asm_disk could be showing a value ofNULL(instead of an actual count of READ or WRITE errors noticed).

How to identify the failed disk

Unlikescenario 1 discussed in Part 1 of the ASM series, ASM instance can initiate the 'drop disk' by itself in some situations. Let the failed disk be '/dev/sds1'.select pathfrom v$asm_diskwhere read_errs is NULL;

/dev/sds1

select pathfrom v$asm_diskwhere write_errs is NULL

/dev/sds1

Additionally, the HEADER_STATUS in v$asm_disk returns a value ofUNKNOWN.

select mount_status,header_status,mode_status,statefrom v$asm_diskwhere path = '/dev/sds1';

CLOSED UNKNOWN ONLINE NORMAL

Compare this scenariowith that of the scenario mentioned in Part 1, when the HEADER_STATUS is still shown as MEMBER and the READ_ERRS/WRITE_ERRS has a value > 0.The following are the errors mentioned in the +ASM alert log file when the failure was first noticed.

WARNING: initiating offline of diskNOTE: cache closing diskWARNING: PST-initiated drop disk

ORA-27061: waiting for async I/Os failedWARNING: IO Failed. subsys:System dg:0, diskname:/dev/sds1

No "drop disk" command required by DBA

The disk is already dropped by ASM instance. There is no need of an "alter diskgroup ...drop disk" command again. Instead the DBA has to work with the system administrator and physically locate the failed disk in the disk enclosure and remove it.Add the replacement disk

1)Get the replacement/new device name, partition it and change ownership to the database owner. For example let the disk path after partitioning be/dev/sdk12)select distinct header_status from v$asm_disk where name = '/dev/sdk1';(Must show as CANDIDATE)

3)alter diskgroup #name# add disk '/dev/sdk1';4)ASM starts the re-balancing operation due to the above disk add command.One can monitor the progress of the re-balance operation by checking v$asm_operation.

select state,power,group_number,EST_MINUTESfrom v$asm_operation;

After a few min/hours the above gets completed (no rows returned)

5)The disk add operation is now considered complete.How to decrease the ASM re-balance operation timeWhile the above ASM re-balancing operation is in progress, the DBA can let it complete quickly by changing 'ASM power' by running the below commandfor example.

alter diskgroup #name# rebalance power 8;

The default power is 1 (i.e ASM starts one re-balance background process to handle the re-balancing work, called ARB process). The above command dynamically starts 8 ARB processes (ARB0 to ARB7), which can dramatically decrease the time to re-balance.The maximum power limit in 11g R1 is 11 (upto 11 ARB processes can be started).

Conclusion

I am not exactly sure why ASM shows the status of a failed disk in different ways, but these are two scenarios that I aware of so far.None of the above maintenance operations (faile disk removal from the disk enclosure, new disk add) causes a downtime to the end user and therefore can be completed during normal business hours. The re-balance operation can cause slight degradation of performance and hence increase the power limit to let it complete quickly.

Oracle DBA Activities1) What is a typical day at your job?

I start my day checking any system alerts such as database performance problems, backup failures etc. We are using Oracle's Enterprise Manager which is a web-based software that sends email alerts to us automatically whenever it detects a problem based on certain criteria. I spend most of the day working on current projects such as database upgrades, migrations, new installations etc. I also help application developers and end-users whenever they have a database related question or problem.2) What does it take to be a successful Oracle DBA?Most of today's E-Business and IT applications are entirely web-based and hence the underlying databases have to be highly available 24*7.Responsibility, proactive attitude and emergency preparednessare some of the key characteristics that can make a successful Oracle DBA. IT application developers and the end-user communities rely heavily on the database administrator for their day-to-day database issues, questions and projects. An Oracle DBA should be polite and must treat every one in the organization with courtesy and respect.3) Has your job description evolved over time?Yes indeed ! The definition of an Oracle DBA has amuch broader scope today. I started with just "database work" in my first job. Today my responsibilities includeOracle systems design and architecture, including Oracle E-Business Suite administration, Oracle Application Server setup and administration, setting up of Continuity of Business systems (Disaster Recovery preparedness), setup and administration of Oracle Fusion Middleware components such as Oracle Portal Server, Identity Management etc. I am also expected to work on hardware specifications and requirements for upgrading existing Oracle installations or setting new ones. Whereas the traditional "Oracle DBA" designation has remained the same, it has a much wider scope and responsibility today.4) How do you keep up with new features and changes & advancements in database technology?Every major Oracle database release comes with a lot of exciting new features which can be leveraged for simplicity, automation or better database management.a)I am an avid reader of the bi-monthlyOracle Magazine. The subscription is free and it is available online as well. The magazine covers the latest in Oracle, contains a lot of expert articles with a practical outlook to tackle business problems.

b)I have also subscribed to rss feeds inhttp://otn.oracle.com/so that i get updated whenever there is a new knowledge based article. This a popular site for the Oracle community and most of the technology articles are posted by Oracle ACEs and Oracle ACE Directors who are proven and recognized individuals by Oracle Corporation.

c)I also recommend aspiring DBAs to register in the OfficialOracle Forum, thanks to the many experts who generously contribute to this discussion board, virtually any of your database related questions can get answered here.5.What is the best feature you like about oracle DB, what needs improvement compared to other databases in the market?My favorite Oracle database feature isReal Application Clusters(RAC). Using RAC technology, Oracle databases can be setup forhigh availability andvirtuallyunlimited scalability. I did not get a chance to fully evaluate other databases in the market vis-a-vis the Oracle database. Oracle is the recognized Industry leader as per various results published by market research companies such as IDC and Gartner.6.Has any of the following major macro trends affected you personally, whats your opinion?a.Outsourcing & Offshoring

No. Oracle DBA is one of the few jobs that had a lesser impact by Outsourcing. A DBA is critical to the success of an IT department requiring a lot of technical understanding, emotional maturity, ability to handle pressure and crisis and one that comes with a lot of responsibility. Infact, all theDice Reportsthis year show Oracle database as one of the top technology skills in the market in the USA.

b.Virtualization

Remote Service and Tele-commuting are only for low profile work such as after-hours support etc. Most of the managers prefer Oracle DBAs to work onsite and with direct supervision.

c.Moving from client-server to web-based

The Oracle DBA is usually less impacted by Client-server to Web-based migrations. Oracle databases can work with both client-server systems and web-based systems.7.Your advice to people who are evaluating Oracle DB administration as a career.

The IT industry is facing a shortage of quality Oracle DBAs. Oracle database administration is a good career option with long-term benefits. I have been working as an Oracle database administrator since more than 6 years and theexperience is very rewarding. It has also given me theconfidence to architect and build large scale IT systems. I was able to positively impact the experience of the end-user community and positively contribute to various IT departments.