Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

31
Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux - (FireWire) by Jeff Hunter, Sr. Database Administrator Contents 1. Overview 2. Remove the Instance 3. Remove the Node from the Cluster Overview With any RAC configuration, it is common for the DBA to encounter a scenario where he or she needs to remove a node from the RAC environment. It may be that a server is being underutilized in the cluster and could be better used in another business unit. Another scenario is a node failure. In this case, a node can be removed from the cluster while the remaining nodes continue to service ongoing requests. This document is an extension to two articles: "Building an Inexpensive Oracle10 g RAC Configuration on Linux - (WBEL 3.0) " and "Adding a Node to an Oracle10 g RAC Cluster - (WBEL 3.0) ". Contained in this document are the steps to remove a single node (the third node I added in the second article) from an already running and configured Oracle10g RAC environment. This article assumes the following: 1. Three-node Oracle10g Environment: As I noted previously, this article assumes that the reader has already built and configured a three-node Oracle10g RAC environment. This system would consist of a three node cluster (each with a

Transcript of Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

Page 1: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux - (FireWire)

by Jeff Hunter, Sr. Database Administrator

Contents

1. Overview 2. Remove the Instance

3. Remove the Node from the Cluster

Overview

With any RAC configuration, it is common for the DBA to encounter a scenario where he or she needs to remove a node from the RAC environment. It may be that a server is being underutilized in the cluster and could be better used in another business unit. Another scenario is a node failure. In this case, a node can be removed from the cluster while the remaining nodes continue to service ongoing requests.

This document is an extension to two articles: "Building an Inexpensive Oracle10 g RAC Configuration on Linux - (WBEL 3.0) " and "Adding a Node to an Oracle10 g RAC Cluster - (WBEL 3.0) ". Contained in this document are the steps to remove a single node (the third node I added in the second article) from an already running and configured Oracle10g RAC environment.

This article assumes the following:

1. Three-node Oracle10g Environment: As I noted previously, this article assumes that the reader has already built and configured a three-node Oracle10g RAC environment. This system would consist of a three node cluster (each with a single processor), all three running Linux (White Box Enterprise Linux 3.0 Respin 1 or Red Hat Enterprise Linux 3) with a shared disk storage based on IEEE1394 (FireWire) drive technology.

2. Node to be Removed is Available: The node to be removed in this example is available and running within the cluster. Of the three nodes in current RAC configuration, I will be removing linux3.

3. FireWire Hub: The enclosure for the Maxtor One Touch 250GB USB 2.0 / Firewire External Hard Drive has only two IEEE1394 (FireWire) ports on the back. To configure a three-node cluster, I needed to purchase a FireWire hub. The one I used for this article is a BELKIN F5U526-WHT White External 6-Port Firewire Hub with AC Adapter.

Page 2: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

The steps in this document provide the steps for removing the node's metadata from the cluster registry. The node being removed can easily be added back to the cluster at a later time.

If a node needs to be removed from an Oracle10g RAC database, even if the node will no longer be available to the environment, there is a certain amount of cleanup that needs to be done. The remaining nodes need to be informed of the change of status of the departing node.

The three most important steps that need to be followed are and will be discussed in this article are:

1. Remove the instance using DBCA (preferred) or command-line (using srvctl).

2. Remove the node from the cluster.

3. Reconfigure the OS and remaining hardware.

For the purpose of this example, I have a three-node Oracle10g cluster:

Oracle10g RAC Configuration

Node Name IP Address Instance Name Using ASM ASM Instance Name Status

linux1 192.168.1.100 orcl1 Yes +ASM1 Available

linux2 192.168.1.101 orcl2 Yes +ASM2 Available

linux3 192.168.1.107 orcl3 Yes +ASM3 To be removed.

I will be removing node linux3, along with all metadata associated with it. Most of the operations to remove the node from the cluster will need to be performed from a pre-existing node that is available and will remain in the cluster. For this article, I will be performing all of these actions from linux1 to remove linux3.

Remove the Instance

When removing a node from an Oracle10g RAC cluster, the DBA will first need to remove the instance that is (or was) accessing the clustered database. This includes the ASM instance if the database is making use of Automatic Storage Management. Most of the actions to remove the instance need to be performed on a pre-existing node in the cluster that is available and will remain available after the removal.

Page 3: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

For this section, I will be removing the instance(s) on linux3 and performing all of these operations from linux1:

This section provides two ways to perform the action of removing the instance(s): using DBCA or command-line (svrctl). When possible, always attempt to use the DBCA method.

Using DBCA

The following steps can be used to remove an Oracle10g instance from a clustered database using DBCA - even if the instance on the node is not available.

1. First, verify that you have a good backup of the Oracle Configuration Repository (OCR) using ocrconfig:

2. $ ocrconfig -showbackup3.4. int-linux1 2005/05/25 10:01:46

/u01/app/oracle/product/10.1.0/crs/cdata/crs5.6. int-linux1 2005/05/25 06:01:45

/u01/app/oracle/product/10.1.0/crs/cdata/crs7.8. int-linux1 2005/05/25 02:01:45

/u01/app/oracle/product/10.1.0/crs/cdata/crs9.10. int-linux1 2005/05/24 00:02:48

/u01/app/oracle/product/10.1.0/crs/cdata/crs11.

int-linux1 2005/05/23 20:02:47 /u01/app/oracle/product/10.1.0/crs/cdata/crs

12. Next, run the DBCA from one of the nodes you are going to keep. The database should remain up as well as leaving the departing instance up and running (if it is available).

$ dbca &

Within the DBCA, perform the following steps:

1. Choose "Oracle Real Application Clusters database" and click [Next].

2. Choose "Instance Management" and click [Next].

3. Choose "Delete an instance" and click [Next].

4. On the next screen, select the cluster database from which you want to remove the instance from. You will need to supply the system privilege (SYSDBA) username and password and click [Next].

5. On the next screen, a list of cluster database instances will appear. Highlight the instance you would like to delete (linux3 in my example) and click [Next].

Page 4: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

6. If you have services configured, they will need to be reassigned. Modify the services so that each service can run on one of the remaining instances. Set "not used" for each service regarding the instance that is to be deleted. click [Finish].

7. Acknowledge the dialog box by clicking [Ok] when asked to confirm you want to delete the selected instance.

8. Acknowledge the second dialog by clicking [Ok] when asked to confirm the DBCA will remove the Oracle instance and all associated OFA directory structure. All information about this instance will be deleted.

  If the database is in archive log mode, the DBA may receive the following errors:

ORA-00350 or ORA-00312

This may occur because the DBCA cannot drop the current log, as it needs archived. This issue is fixed with 10.1.0.3 patchset. If the DBA encounters this error, click the [Ignore] button and when the DBCA completes, manually archive the logs for the deleted instance and drop the log group:

SQL> alter system archive log all;SQL> alter database drop logfile group 3;

9. After the DBCA has removed the instance, click [No] when prompted to perform another operation. The DBCA will exit.

13. Verify that the redo thread for the dropped instance has been removed by querying v$log:

14. SQL> select group#, thread#, status from v$log;15.16. GROUP# THREAD# STATUS17. ---------- ---------- ----------------18. 1 1 CURRENT19. 2 1 INACTIVE20. 3 2 CURRENT

4 2 INACTIVE

If for any reason the redo thread is not disabled then disable the thread:

SQL> alter database disable public thread 3;

21. Verify that the instance was removed from the Oracle Configuration Repository (OCR) using the srvctl config database -d <db_name> command. The following example assumes the name of the clustered database is orcl:

22. $ srvctl config database -d orcl23. linux1 orcl1

/u01/app/oracle/product/10.1.0/db_1linux2 orcl2 /u01/app/oracle/product/10.1.0/db_1

Page 5: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

You should also run the crs_stat command:

$ $ORA_CRS_HOME/bin/crs_stat | grep insNAME=ora.orcl.orcl1.instNAME=ora.orcl.orcl2.inst

24. If the node had an ASM instance and the node will no longer be a part of the cluster, the DBA should remove the ASM instance using the following, assuming the node being removed is linux3:

25. $ srvctl stop asm -n linux3$ srvctl remove asm -n linux3

Verify that the ASM instance was removed using the following:

$ srvctl config asm -n linux3

If the removal of the ASM instance was successful, you should simply get your prompt back with no output. If however, you receive a record back (i.e. +ASM3 /u01/app/oracle/product/10.1.0/db_1), then the removal of the ASM instance failed.

Using SRVCTL

The following steps can be used to remove an Oracle10g instance from a clustered database using the command-line utility srvctl - even if the instance on the node is not available.

1. First, verify that you have a good backup of the Oracle Configuration Repository (OCR) using ocrconfig:

2. $ ocrconfig -showbackup3.4. int-linux1 2005/05/25 10:01:46

/u01/app/oracle/product/10.1.0/crs/cdata/crs5.6. int-linux1 2005/05/25 06:01:45

/u01/app/oracle/product/10.1.0/crs/cdata/crs7.8. int-linux1 2005/05/25 02:01:45

/u01/app/oracle/product/10.1.0/crs/cdata/crs9.10. int-linux1 2005/05/24 00:02:48

/u01/app/oracle/product/10.1.0/crs/cdata/crs11.

int-linux1 2005/05/23 20:02:47 /u01/app/oracle/product/10.1.0/crs/cdata/crs

12. Use the srvctl command-line utility from a pre-existing / available node in the cluster to remove the instance (from the node to be removed) from the cluster. This should be run as the oracle UNIX user account as follows:

13. $ srvctl remove instance -d orcl -i orcl3Remove instance orcl3 for the database orcl? (y/[n]) y

Page 6: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

14. Verify that the redo thread for the dropped instance has been removed by querying v$log. If for any reason the redo thread is not disabled then disable the thread:

SQL> alter database disable public thread 3;

15. Verify that the instance was removed from the Oracle Configuration Repository (OCR) using the srvctl config database -d <db_name> command:

16. $ srvctl config database -d orcl17. linux1 orcl1

/u01/app/oracle/product/10.1.0/db_1linux2 orcl2 /u01/app/oracle/product/10.1.0/db_1

You should also run the crs_stat command:

$ $ORA_CRS_HOME/bin/crs_stat | grep insNAME=ora.orcl.orcl1.instNAME=ora.orcl.orcl2.inst

18. If the node had an ASM instance and the node will no longer be a part of the cluster, the DBA should remove the ASM instance using the following, assuming the name of the clustered database is named orcl and the node being removed is linux3:

19. $ srvctl stop asm -n linux3$ srvctl remove asm -n linux3

Verify that the ASM instance was removed using the following:

$ srvctl config asm -n linux3

Remove the Node from the Cluster

Now that the instance has been removed (and the ASM instance is applicable), we now need to remove the node from the cluster. This is a manual method performed using scripts that need to be run on the deleted node (if available) to remove the CRS install as well as scripts that should be run on any of the existing nodes (i.e. linux1).

Before proceeding to the steps for removing the node, we need to determine the node name and the CRS-assigned node number for each node stored in the Oracle Cluster Registry. This can be run from any of the existing nodes (linux1 for this example).

$ $ORA_CRS_HOME/bin/olsnodes -nlinux1 1linux2 2linux3 3

Now that we have the node name and node number, we can start the steps to remove the node from the cluster. Here are the steps that should be executed from a pre-existing (available) node in the cluster (i.e. linux1):

Page 7: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

1. Run the NETCA utility to remove the network configuration: 2. $ DISPLAY=<machine_or_ip_address>:0.0; export DISPLAY

$ netca &

Perform the following steps within the NETCA:

1. Choose "Cluster Configuration" and click [Next]. 2. Only select the node you are removing and click [Next].

3. Choose "Listener Configuration" and click [Next].

4. Choose "Delete" and delete any listeners configured on the node you are removing. Acknowledge the dialog box to delete the listener configuration.

NOTE: For some reason, I needed to login to linux3 and manually kill the process ID for the listener process.

3. Run the crs_stat command to verify that all database resources are running on nodes that are going to be kept:

$ $ORA_CRS_HOME/bin/crs_stat

For example, verify that the node to be removed is not running any database resources. Look for the record of type:

NAME=ora.<db_name>.dbTYPE=applicationTARGET=ONLINESTAT=ONLINE on <node>

Assuming the name of the clustered database is orcl, this is the record that was returned from the crs_stat command on my system:

NAME=ora.orcl.dbTYPE=applicationTARGET=ONLINESTATE=ONLINE on linux1

I am safe here since the resource is running on linux1 and not linux3 - the node I want to remove.

If, however, the database resource was running on linux3, we would need to relocate it to a node that we are going to keep (i.e. linux1) using the following:

$ $ORA_CRS_HOME/bin/crs_relocate ora.<db_name>.db

4. From a pre-existing node (i.e. linux1), remove the nodeapps from the node you are removing as the root UNIX user account:

5. $ su6. Password: xxxxx7.8. # srvctl stop nodeapps -n linux3

Page 8: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

9. CRS-0210: Could not find resource ora.linux3.LISTENER_LINUX3.lsnr.

10.11. # srvctl remove nodeapps -n linux312. Please confirm that you intend to remove the node-level

applications on node linux3 (y/[n]) y#

13. The next step is to update the node list using the updateNodeList option to the OUI as the oracle user. This procedure will remove the node to be deleted from the list of node locations maintained by the OUI by listing only those remaining nodes. The only file that I know of that gets modified is $ORACLE_BASE/oraInventory/ContentsXML/inventory.xml. Here is the command I used for removing linux3 from the list. Notice that the DISPLAY variable needs to be set even though the GUI does not run.

14. $ DISPLAY=<machine_or_ip_address>:0.0; export DISPLAY15.16. $ $ORACLE_HOME/oui/bin/runInstaller -ignoreSysPrereqs -

updateNodeList \17. ORACLE_HOME=/u01/app/oracle/product/10.1.0/db_1 \

CLUSTER_NODES=linux1,linux2

Note that the command above will produce the following error which can safely be ignored:

PRKC-1002 : All the submitted commands did not execute successfully

18. If the node to be removed is still available and running the CRS stack, the DBA will need to stop the CRS stack and remove the ocr.loc file. These tasks should be performed as the root user account and on the node that is to be removed from the cluster. The nosharedvar option assumes the ocr.loc file is not on a shared file system (which is the case in my example). If the file does exist on a shared file system, then specify sharedvar. From the node to be removed (i.e. linux3) and as the root user, run the following:

19. $ su20. Password: xxxx21.22. # cd $ORA_CRS_HOME/install23. # ./rootdelete.sh remote nosharedvar24. Running Oracle10 root.sh script...25. \nThe following environment variables are set as:26. ORACLE_OWNER= oracle27. ORACLE_HOME= /u01/app/oracle/product/10.1.0/crs28. Finished running generic part of root.sh script.29. Now product-specific root actions will be performed.30. Shutting down Oracle Cluster Ready Services (CRS):31. /etc/init.d/init.crsd: line 188: 29017 Aborted

$ORA_CRS_HOME/bin/crsd -232.33. Shutting down CRS daemon.34. Shutting down EVM daemon.35. Shutting down CSS daemon.36. Shutdown request successfully issued.37. Checking to see if Oracle CRS stack is down...38. Oracle CRS stack is not running.

Page 9: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

39. Oracle CRS stack is down now.40. Removing script for Oracle Cluster Ready services41. Removing OCR location file '/etc/oracle/ocr.loc'

Cleaning up SCR settings in '/etc/oracle/scls_scr/linux3'

42. Next, using the node name and CRS-assigned node number for the node to be deleted, run the rootdeletenode.sh command as follows. Keep in mind that this command should be run from a pre-existing / available node (i.e. linux1) in the cluster as the root UNIX user account:

43. $ su44. Password: xxxx45.46. # cd $ORA_CRS_HOME/install47. # ./rootdeletenode.sh linux3,348. Running Oracle10 root.sh script...49. \nThe following environment variables are set as:50. ORACLE_OWNER= oracle51. ORACLE_HOME= /u01/app/oracle/product/10.1.0/crs52. Finished running generic part of root.sh script.53. Now product-specific root actions will be performed.54. clscfg: EXISTING configuration version 2 detected.55. clscfg: version 2 is 10G Release 1.56. Successfully deleted 13 values from OCR.57. Key SYSTEM.css.interfaces.nodelinux3 marked for

deletion is not there. Ignoring.58. Successfully deleted 5 keys from OCR.59. Node deletion operation successful.

'linux3,3' deleted successfully

To verify that the node was successfully removed, use the following as either the oracle or root user:

$ $ORA_CRS_HOME/bin/olsnodes -nlinux1 1linux2 2

60. Now, switch back to the oracle UNIX user account on the same pre-existing node (linux1) and run the runInstaller command to update the OUI node list, however this time for the CRS installation ($ORA_CRS_HOME). This procedure will remove the node to be deleted from the list of node locations maintained by the OUI by listing only those remaining nodes. The only file that I know of that gets modified is $ORACLE_BASE/oraInventory/ContentsXML/inventory.xml. Here is the command I used for removing linux3 from the list. Notice that the DISPLAY variable needs to be set even though the GUI does not run.

61. $ DISPLAY=<machine_or_ip_address>:0.0; export DISPLAY62.63. $ $ORACLE_HOME/oui/bin/runInstaller -ignoreSysPrereqs -

updateNodeList \64. ORACLE_HOME=/u01/app/oracle/product/10.1.0/crs \

CLUSTER_NODES=linux1,linux2

Note that each of the commands above will produce the following error which can safely be ignored:

PRKC-1002 : All the submitted commands did not execute successfully

Page 10: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

The OUI now contains the valid nodes that are part of the cluster!

65. Now that the node has been removed from the cluster, the DBA should manually remove all Oracle10g RAC installation files from the deleted node. Obviously, this applies only if the removed node is still accessible and only if the files are not on a shared file system that is still being accessed by other nodes in the cluster!

From the deleted node (linux3) I performed the following tasks as the root UNIX user account:

1. Remove ORACLE_HOME and ORA_CRS_HOME: 2. # rm -rf /u01/app/oracle/product/10.1.0/db_1

# rm -rf /u01/app/oracle/product/10.1.0/crs

3. Remove all init scripts and soft links (for Linux). For a list of init scripts and soft links for other UNIX platforms, see Metalink Note: 269320.1

4. # rm -f /etc/init.d/init.cssd5. # rm -f /etc/init.d/init.crs6. # rm -f /etc/init.d/init.crsd7. # rm -f /etc/init.d/init.evmd8. # rm -f /etc/rc2.d/K96init.crs9. # rm -f /etc/rc2.d/S96init.crs10. # rm -f /etc/rc3.d/K96init.crs11. # rm -f /etc/rc3.d/S96init.crs12. # rm -f /etc/rc5.d/K96init.crs13. # rm -f /etc/rc5.d/S96init.crs

# rm -Rf /etc/oracle/scls_scr

14. Remove all remaining files: 15. # rm -rf /etc/oracle16. # rm -f /etc/oratab17. # rm -f /etc/oraInst.loc18. # rm -rf /etc/ORCLcluster19. # rm -rf /u01/app/oracle/oraInventory20. # rm -rf /u01/app/oracle/product21. # rm -rf /u01/app/oracle/admin22. # rm -f /usr/local/bin/coraenv23. # rm -f /usr/local/bin/dbhome

# rm -f /usr/local/bin/oraenv

24. Remove all CRS/EVM entries from the file /etc/inittab: 25. h1:35:respawn:/etc/init.d/init.evmd run

>/dev/null 2>&1 </dev/null26. h2:35:respawn:/etc/init.d/init.cssd fatal

>/dev/null 2>&1 </dev/nullh3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null

It is not very easy to read this output if you have large number of nodes with lots of resources configured on them. You can use the “-t” option with the crs_stat to see the output in a tabular form like:

Page 11: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

crs_stat

However, this output is designed for a fixed terminal width of 60 characters. Hence the resource names are truncated. This makes it even more difficult to see what resource is in which state.

Thankfully, there are some scripts out there that parse the default output of the crs_stat and provide a tabular output in a wider form so you can see what your are looking for.

As being a one liner junkie I prefer my own version

crs_stat | awk -F= '/NAME=/{n=$2}/TYPE=/{t=$2}/TARGET=/{g=$2}/STATE=/{s=$2; printf("%-45s%-15s%-10s%-30s\n", n,t,g,s)}'

I also have an alias my_crs_stat for this command so I don’t have type it all the time.

alias my_crs_stat='crs_stat | awk -F= '\''/NAME=/{n=$2}/TYPE=/{t=$2}/TARGET=/{g=$2}/STATE=/{s=$2; printf("%-45s%-15s%-10s%-30s\n", n,t,g,s)}'\'''

This will do the trick and provide a fancier output.

my_crs_stat

Page 12: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

1. * crs_stat* crs_register* crs_unregister* crs_start* crs_stop* crs_getperm* crs_profile* crs_relocate* crs_setperm* crsctl check crsd* crsctl check cssd* crsctl check evmd* crsctl debug log* crsctl set css votedisk* crsctl start resources* crsctl stop resources

2. 10g RAC administration3. See OCFS Oracle Cluster Filesystem, ASM, TNSnames configuration,

Oracle Database 11g New Features, Raw devices,4. Resource Manager , Dbca5. See http://www.oracle.com/technology/support/metalink/index.html

to view certification matrix

This is just a draft of basic RAC 10g administration

6.

RAC benefit and characteristics- does not protect from human errrors- increased availabilty from node/instance failure- speed up parallel DSS queries- no speed up parallel OLTP processes- no availability increase on data failures- no availability increase on network failures- no availability increase on release upgrades- no scalability increased for applications workloads in all cases

RAC tuning - After migration to RAC test: - Interconnect latency - Instance recovery time - Application strongly relying on table truncates, full scan tables, sequences and non-sequences key generation,   global context variables

7.

RAC specific background processes for the database instanceCluster Synchronization Service (CSS) ocssd daemon, manages cluster configuration Cluster Ready Services (CRS) manages resources(listeners, VIPs, Global Service Daemon GSD, Oracle Notification Service ONS) crsd daemon backup the OCR every for hours, configuration is stored in OCR

Page 13: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

Event Manager (EVM) evmd daemon, publish events

LMSn coordinate block updatesLMON global enqueue for shared locksLMDn manages requests for global enqueuesLCK0 handle resources not requiring Cache FusionDIAG collect diagnostic info

GSD 9i is not compatible with 10g

8.

FAN Fast Application Notification- Must connect using service

Logged to:&ORA_CRS_HOME/racg/dump $ORA_CRS_HOHE/log/<nodename>/racg

<event_type> VERSION=<n.n> service=<service_namne.db_domain_name> [database=<db_unique_name> [instance=<instance_name>]] [host=<hostname>] status=<event_status> reason=<event_reason> [card=<n>] timestamp=<event_date> <event_time>

event_type Description SERVICE Primary application service event SRV_PRECONNECT Preconnect application service event (TAF) SERVICEMEMBER Application service on a specific instance event DATABASE Database event INSTANCE Instance event ASM ASM instance event NODE Cluster node event

#FAN events can control the workload per instance for each service

9.

Oracle Notification Service ONS- Transmits FAN events- For every FAN event status change, all executables in $ORA_CRS_HOME/racg/usrco are launched (callout scripts) The ONS process is $ORA_CRS_HOME/opmn/bin/ons Arguments: -d: Run in daemon mode -a <command>: <command> can be [ping, shutdown, reload, or debug] [$ORA_CRS_HOME/opmn/conf/ons.config] localport=6lOO remoteport=6200 loglevel=3 useocr=on

onsctl start/stop/ping/reconfig/debug/detailed

10.

FCF Fast Connection Failover

Page 14: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

- A JDBC application configured to use FCF automatically subscribes to FAN events- A JDBC application must use service names to connect- A JDBC application must use implicit connection cache- $ORACLE_HOME/opmn/lib/ons.jar must be in classpath- -Doracle.ons.oraclehome - <location of oracle home> or System.setProperty ("oracle.ons.oraclehome", "/u01/app/oracle/product/10.2.0/db_l"); OracleDataSource ods = new OracleDataSource(); ods.setUser("USERl"); ods.setPassword("USERl"); ods.setConnectionCachingEnabled(true); ods.setFastConnectionFailoverEnabled(true); ods.setConnectionCacheName("MyCache"); ods.setConnectionCacheProperties(cp); ods.setURL("jdbc:oracle:thin:@(DESCRIPTION=(LOAD_BALANCE=on)(ADDRESS=(PROTOCOL=TCP)(HOST=londonl-vip)(PORT=152l)(ADDRESS=(PROTOCOL=TCP)(HOST=london2-vip)(PORT=152l)(CONNECT_DATA=(SERVICE_NAME=SERVICE1)))")

11.

Check for main Clusterware services up #check Event Manager upps -ef | grep evmd#check Cluster Synchronization Services upps -ef | grep ocssd #check Cluster Ready Services upps -ef | grep crsd#check Oracle Notification Serviceps -ef | grep ons

[/etc/inittab]...hl:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&l </dev/null h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&l </dev/null h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null

12.

crs_stat#Tested, as root#Lists the status of an application profile and resources#crs_stat [resource_name [...]] [-v] [-l] [-q] [-c cluster_node]$ORA_CRS_HOME/bin/crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora.e2.gsd application ONLINE ONLINE e2 ora.e2.ons application ONLINE ONLINE e2 ora.e2.vip application ONLINE ONLINE e2 VIP Normal Name Type Target State Host ------------------------------------------------------------ ora.e2.vip application ONLINE ONLINE e2 ora.e2.vip application ONLINE ONLINE e3VIP Node 2 is down Name Type Target State Host

Page 15: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

------------------------------------------------------------ ora.e2.vip application ONLINE ONLINE e2 ora.e2.vip application ONLINE ONLINE e2 crs_stat -p ...AUTO_START = #2 CRS will not start after system boot

crs_statNAME=ora.RAC.RACl.inst TYPE=application TARGET=ONLINE STATE=ONLINE on londonl

NAME=ora.RAC.SERVICEl.RACl.srv TYPE=application TARGET=OFFLINE STATE=OFFLINE

#use -v for verbose resource use#use -p for a lot of details#use -ls to view resources and relative owners

13.

Voting diskOn Shared storage, Used by CSS, contains nodes that are currently available within the clusterIf Voting disks are lost and no backup is available then Oracle Clusterware must be reinstalled3 way multiplexing is ideal

#backup a voting disk onlinedd if=<fname> of=<out_fname>

crsctl#Tested, as oracle$ORA_CRS_HOME/bin/crsctl check crs Cluster Synchronization Services appears healthy Cluster Ready Services appears healthy Event Manager appears healthy

#add online a new voting disk(10.2), -force if Oracle Clusterware is not startedcrsctl add css votedisk 'new votedisk path' -force

crsctl start/stop/enable/disable crs

#set/unset parameters on OCRcrsctl set/unset <parameter> <value>

You can list the currently configured voting disks: crsctl query css votedisk 0. 0 /u02/oradata/RAC/CSSFilel 1. 1 /u03/oradata/RAC/CSSFile2 2. 2 /u04/oradata/RAC/CSSFile3

Dynamically add and remove voting disks to an existing Oracle Clusterware installation: crsctl add/delete css votedisk <path> -force

CRS log and debug

Page 16: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

#as root, enable extra debug for the running CRS daemons as well as those running in future#enable to inspect system rebootscrsctl debug log crs

#Collect log and traces to upload to Oracle Supportdiagcollection.pl

14.

OCR - Oracle Cluster Registry [/etc/oracle/ocr.loc](10g) or [/etc/oracle/srvConfig.loc](9i, still exists in 10g for compatibility)ocrconfig_loc=/dev/raw/rawl ocrmirrorconfig_loc=/dev/raw/raw2 local_only=FALSE OCRCONFIG - Command-line tool for managing Oracle Cluster Registry #recover OCR logically, must be done on all nodes ocrconfig -import exp.dmp #export OCR content logically ocrconfig -export #recover OCR from OCR backup ocrconfig -restore bck.ocr #show backup status #crsd daemon backup the OCR every for hours, the most recent backup file is backup00.ocr ocrconfig -showbackup londonl 2005/08/04 11:15:29 /uOl/app/oracle/product/lO.2.0/crs/cdata/crs londonl 2005/08/03 22:24:32 /uOl/app/oracle/product/10.2.0/crs/cdata/crs #change OCR autobackuo location ocrconfig -backuploc #must be run on each affected node ocrconfig -repair ocr <filename> ocrconfig -repair ocrmirror <filename> #force Oracle Clusterware to restart on a node, may lose recent OCR updates ocrconfig -overwrite

CVU - Cluster verification utility to get status of CRS resourcesdd : use it safely to backup voting disks when nodes are added/removed

#verify restorecluvfy comp ocr -n all

ocrcheck#OCR integrity check, validate the accessibility of the device and its block integritylog to current dir or to $OCR_HOME/log/<node>/client

ocrdump#dump the OCR content to a text file, if succeds then integrity of backups is verifiedOCRDUMP - Identify the interconnect being used$ORA CRS HOME/bin/ocrdump.bin -stdout -keyname SYSTEM.css.misscount -xml

Page 17: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

15.

Pre install, prerequisite(./run)cluvfy : run from install media or CRS_HOME, verify prerequisites on all nodes

Post installation- Backup root.sh- Set up other user accounts- Verify Enterprise Manager / Cluster Registry by running srvctl config database -d db_name

16.

SRVCTLStores infos in OCR, manages:Database, Instance, Service, Node applications, ASM, Listener

srvctl config database -d <db_name> : Verify Enterprise Manager / Cluster Registryset SRVM_TRACE=TRUE environment var to create Java based tool trace/debug file for srvctl#-v to check servicessrvctl status database -d RAC -v SERVICE1srvctl start database -d <name> [-o mount]srvctl stop database -d <name> [-o stop_options] #moves parameter filesrvctl modify database -d name -p /u03/oradata/RAC/spfileRAC.ora srvctl remove database -d TEST#Verify the OCR configurationsrvctl config database - TEST

srvctl start instance -d RACDB -i "RAC3,RAC4" srvctl stop instance -d <orcl> -i "orcl3,orcl4" -o immediatesrvctl add instance -d RACDB -i RAC3 -n londonS #move the instance to node london4srvctl modify instance -d RAC -i RAC3 -n london4 #set a dependency of instance RAC3 to +ASM3srvctl modify instance -d RAC -i RAC3 -s +ASM3 #removes an ASM dependencysrvctl modify instance -d RAC -i RAC3 -r

#stop all applications on nodesrvctl stop nodeapps -n londonl #-a display the VIP configurationsrvctl config nodeapps -n londonl -asrvctl add nodeapps -n london3 -o $0RACLE_H0ME -A london3-vip/255.255.0.0/eth0

17.

ServicesChanges are recorded in OCR only! Must use DBMS_SERVICE to update the dictionary

srvctl start service -d RAC -s "SERVICE1,SERVICE2"srvctl status service -d RAC -s "SERVICE1,SERVICE2"srvctl stop service -d RAC -s "SERVICE1,SERVICE2" -fsrvctl disable service -d RAC -s "SERVICE2" -i RAC4srvctl remove service -d RAC -s "SERVICE2"

Page 18: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

#relocate from RAC2 to RAC4srvctl relocate service -d RAC -s "SERVICE2" -i RAC2 -t RAC4

#preferred RAC1,RAC2 and available RAC3,RAC4#-P PRECONNECT automatically creates a ERP and ERP_PRECONNECT service to use as BACKUP in tns_names#See TNSnames configuration#the service is NOT started, must be started manually (dbca do it automatically)srvctl add service -d ERP -s SERVICE2 -i "RAC1,RAC2" -a "RAC3,RAC4" -P PRECONNECT

#show configuration, -a shows TAF confsrvctl config service -d RAC -a

#modify an existing servicesrvctl modify service -d RACDB -s "SERVICE1" -i "RAC1,RAC2" -a "RAC3,RAC4"srvctl stop service -d RACDB -s "SERVICE1"srvctl start service -d RACDB -s "SERVICE1"

ViewsGV$SERVICES GV$ACTIVE_SERVICES GV$SERVICEMETRICGV$SERVICEMETRIC_HISTORYGV$SERVICE_WAIT_CLASSGV$SERVICE_EVENTGV$SERVICE_STATSGV$SERV_MOD_ACT_STATS

18.

SQL for RACselect * from V$ACTIVE_INSTANCES;

Cache Fusion - GRD Global Resource DirectoryGES(Global Enqueue Service)GCS(Global Cache Service)

Data Guard & RAC- Configuration files at primary location can be stored in any shared ASM diskgroup, on shared raw devices, on any shared cluster file system. They simply have to be shared

19.

VIP virtual IP- Both application/RAC VIP fail over if related application fail and accept new connections- Recommended RAC VIP sharing among database instances but not among different applications because...- ...VIP fail over if the application fail over- A failed over VIP application accepts new connection- Each VIP requires an unused and resolvable IP address- VIP address should be registered in DNS- VIP address should be on the same subnet of the public network- VIPs are used to prevent connection requests timeout during client connection attempts

Page 19: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

Changing a VIP1- Stop VIP dependent cluster components on one node2- Make changes on DNS3- Change VIP using SRVCTL4- Restart VIP dependent components5- Repeat above on remaining nodes

20.

oifcfgallocating and deallocating network interfaces, get values from OCR

To display a list of networksoifcfg getifeth1 192.168.1.0 global cluster_interconnecteth0 192.168.0.0 global public

display a list of current subnetsoifcfg iflist etho 147.43.1.0 ethl 192.168.1.0

To include a description of the subnet, specify the -p option: oifcfg iflist -p ethO 147.43.1.0 UNKNOWN ethl 192.168.1.0 PRIVATE

In 10.2 public interfaces are UNKNOWN. To include the subnet mask, append the -n option to the -p option: oifcfg if list -p -n etho 147.43.1.0 UNKNOWN 255.255.255.0 ethl 192.168.1.0 PRIVATE 255.255.255.0

21.

Db parameters with SAME VALUE across all instancesactive_instance_countarchive_lag_targetcompatiblecluster_database RAC paramcluster_database_instance RAC param#Define network interfaces that will be used for interconnect#it is not a failover but a redistribution. If an address not work then stop all#Overrides the OCRcluster_interconnects RAC param = 192.168.0.10; 192.168.0.11; ...control_filesdb_block_sizedb_domaindb_filesdb_namedb_recovery_file_destdb_recovery_file_dest_sizedb_unique_namedml_locks (when 0)instance_type (rdbms or asm)max_commit_propagation_delay RAC paramparallel_max_serversremote_login_password_filetrace_enabled

Page 20: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

#cannot be mixed AUTO and MANUAL in a RACundo_management

Db parameters with INSTANCE specific VALUE across all instancesinstance_nameinstance_numberthreadundo_tablespace #system param

Listener parameterslocal_listener='(ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.0.13) (PORT = 1521)))' #allow pmon to register with local listener when not using 1521 portremote_listener = '(ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.9) (PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST =192.168.2.10)(PORT = 1521)))' #make the listener aware of the load of the listeners of other nodes

Important Rac Parametersgc_files_to_locks #other than default disable Cache Fusionrecovery_parallelism #number of redo application server processes in instance or media recovery

Rac and Standby parametersdg_broker_config_file1 #shared between primary and standby instancesdg_broker_config_file2 #different from dg_broker_config_file1, shared between primary and standby instances

22.23.

Shared contentsdatafiles, controlfiles, spfiles, redo log

Shared or local? RAW_Dev File_Syst ASM NFS OCFS- Datafiles : shared mandatory- Control files : shared mandatory- Redo log : shared mandatory- SPfile : shared mandatory- OCR and vote : shared mandatory Y Y N- Archived log : shared not mandatory. N Y N Y- Undo : local- Flash Recovery : shared Y Y Y- Data Guard broker conf.: shared(prim. & stdby) Y Y

24.

Adding logfile thread groups for a new instance#To support a new instance on your RAC 1) alter database add logfile thread 3 group 7;1) alter database add logfile thread 3 group 8;#makes the thread available for use by any instance2) alter database enable thread 3;

Page 21: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

# if you want to change an used thread 2) alter system set thread=3 scope=pfile sid='RAC01' 3) srvctl stop instance -d RACDB -i RAC01

25.

Views and queriesselect * from GV$CACHE_TRANSFER

26.

An instance failed to start, what do we do?1) Check the instance alert.log2) Check the Oracle Clusterware software alert.log3) Check the resource state using CRS_STAT

27. Install28. See official Note 239998.1 for removing crs installation29. See http://startoracle.com/2007/09/30/so-you-want-to-play-with-

oracle-11gs-rac-heres-how/ to install 11g RAC on VMware30. See

http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_iscsi.html to install on Linux with iSCSI disks

31. See http://www.oracle-base.com/articles/10g/OracleDB10gR2RACInstallationOnCentos4UsingVMware.php to install on VMware

32. See OCFS Oracle Cluster Filesystem33.34.35.36.

Prerequisites check#check node connectivity and Clusterware integrity./runcluvfy.sh stage -pre dbinst -n all./runcluvfy.sh stage -post hwos -n "linuxes,linuxes1" -verboseWARNING: Package cvuqdisk not installed.

rpm -Uvh clusterware/rpm/cvuqdisk-1.0.1-1.rpm

WARNING: Unable to determine the sharedness of /dev/sdf on nodes: linuxes1,linuxes1,linuxes1,linuxes1,linuxes1,linuxes1,linuxes,linuxes,linuxes,linuxes,linuxes,linuxes

Safely ignore this error

./runcluvfy.sh comp peer -n "linuxes,linuxes1" -verbose

./runcluvfy.sh comp nodecon -n "linuxes,linuxes1" -verbose

./runcluvfy.sh comp sys -n "linuxes,linuxes1" -p crs -verbose

./runcluvfy.sh comp admprv -n "linuxes,linuxes1" -verbose -o user_equiv./runcluvfy.sh stage -pre crsinst -n "linuxes,linuxes1" -r 10gR2

37.38.39.  

40.

Page 22: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

41.

Restart intallation - Remove from each nodesu -c "$ORA_CRS_HOME/install/rootdelete.sh; $ORA_CRS_HOME/install/rootdeinstall.sh"#oracle userexport DISPLAY=192.168.0.1:0.0/app/crs/oui/bin/runInstaller -removeHome -noClusterEnabled ORACLE_HOME=/app/crs LOCAL_NODE=linuxesrm -rf $ORA_CRS_HOME/*#rootsu -c "chown oracle:dba /dev/raw/*; chmod 660 /dev/raw/*; rm -rf /var/tmp/.oracle; rm -rf /tmp/.oracle"

42.43.44. #Format rawdevices using45. dd if=/dev/zero of=/dev/raw/raw6 bs=1M count=25046.47. #If related error message appears during installation, manually

launch on related node48. /app/crs/oui/bin/runInstaller -attachHome -noClusterEnabled

ORACLE_HOME=/app/crs ORACLE_HOME_NAME=OraCrsHome CLUSTER_NODES=linuxes,linuxes1 CRS=true "INVENTORY_LOCATION=/app/oracle/oraInventory" LOCAL_NODE=linuxes

49.50. runcluvfy.sh stage -pre crsinst -n linuxes -verbose51.

52.

/etc/hosts example# Do not remove the following line, or various programs # that require network functionality will fail, 127.0.0.1 localhost 147.43.1.101 londonl 147.43.1.102 london2 #VIP is usable only after VIPCA utility run,#should be created on the public interface. Remember that VIPCA is a GUI tool147.43.1.201 londonl-vip 147.43.1.202 london2-vip 192.168.1.1 londonl-priv 192.168.1.2 london2-priv

53.54.  

55.

Kernel Parameters(/etc/sysctl.conf) Recommended Valueskernel.sem (semmsl) 250 kernel.sem (semmns) 32000 kernel.sem (semopm) 100 kernel.sem (semmni) 128 kernel.shmall 2097152 kernel.shmmax Half the size of physical memory kernel.shmmni 4096 fs.file-max 65536 net.core.rmem_default 262144 net.core.rmem_max 262144 net.core.wmem_default 262144 net.core.wmem_max 262144 net.ipv4.ip_local_port_range 1024 to 65000

Page 23: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

56.57.  

58.

RAC restrictions- dbms_alert, both publisher and subscriber must be on same instance, AQ is the workaround- dbms_pipe, only works on the same instance, AQ is the workaround- UTL_FILE, directories, external tables and BFILEs need to be on shared storage

59.60.  

61.

Implementing the HA High Availability FrameworkUse srvctl to start/stop applications

#Manually create a script that OCR will use to start/stop/status

#Create an application VIP. #This command generates an application profile called haf demovip.cap in the $ORA_CRS_HOME/crs/ public directory. $ORA_CRS_HOME/bin/crs_profile -create hafdemovip -t application -a $ORA_CRS_HOME/bin/usrvip -o oi=eth0,ov=147.43.1.200,on=255.255.0.0

#As the oracle user, register the VIP with Oracle Clusterware: ORA_CRS_HOME/bin/crs_register hafdemovip

#As the root user, set the owner of the apphcation VIP to root: $ORA_CRS_HOME/bin/crs_setperm hafdemovip -o root

#As the root user, grant the oracle user permission to run the script: $ORA_CRS_HOME/bin/crs_setperm hafdemovip -u user:oracle:r-x

#As the oracle user, start the application VIP: $ORA_CRS_HOME/bin/crs_start hafdemovip

2. Create an application profile. $ORA_CRS_HOHE/bin/crs_profile -create hafdemo -t application -d "HAF Demo" -r hafdemovip -a /tmp/HAFDemoAction -0 ci=5,ra=60

3. Register the application profile with Oracle Clusterware. $ORA_CRS_HOHE/bin/crs_register hafdemo

$ORA_CRS_HOME/bin/crs_start hafdemo

62.63.  

64.

CRS commandscrs_profilecrs_registercrs_unregistercrs_getpermcrs_setpermcrs_startcrs_stopcrs_statcrs_relocate

65.

Page 24: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

66.  

67.

Server side calloutsOracle instance up(/down?)Service member down(/up?)Shadow application service up(/down?)

68.69.  

70.

Adding a new node- Configure hardware and OS- With NETCA reconfigure listeners and add the new one- $ORA_CRS_HOME/oui/bin/addnode.sh from one of existing nodes to define the new one to all existing nodes- $ASM_HOME/oui/bin/addnode.sh from one of existing nodes (if using ASM)- $ORACLE_HOME/oui/bin/addnode.sh from one of existing nodes- racgons -add_config to add ONS metadata to OCR from one of existing nodes

Removing a node from a cluster- Remove node from clusterware- Check that ONS configuration has been updated on other node- Check that database and instances are terminated on node to remove- Check that node has been removed from database and ASM repository- Check that software has been removed from database and ASM homes on node to remove

71.72.  

73.

RAC contentions- enq:HW - contention and gc current grant wait events Use larger uniform extent size for objects - enq: TX - index contention Re-create the index as a global hash partitioned index. Increase the sequence cache size if retaining the sequence. Re-create the table using a natural key instead of a surrogate key.

74.75.76.