High Availability and Disaster Recovery Best...

W H I T E P A P E R

High Availability and Disaster Recovery Best Practices for Microsoft Exchange 2000

1

Table of Contents

Introduction.................................................................................................................................................................3

Business Continuity Relies on a Disaster Recovery Plan for Exchange ...............................................................3 Business Challenges..................................................................................................................................................3

Causes of System Downtime.................................................................................................................................4 Providing high availability for Exchange ................................................................................................................4 Need for an Effective Disaster Recovery Solution.................................................................................................5 How Much Downtime Can You Endure? ...............................................................................................................6 Recovery Point Objective (RPO) ...........................................................................................................................6 Recovery Time Objective (RTO)............................................................................................................................6 VERITAS Components of an Exchange 2000 High Availability and Disaster Recovery Solution.........................7 High Availability......................................................................................................................................................7

VERITAS Volume Manager™ ...........................................................................................................................7 VERITAS Edition™ for Microsoft Exchange 2000.............................................................................................7 VERITAS Cluster Server™................................................................................................................................7 VERITAS Cluster Server™ Enterprise Agent for Exchange 2000 Server.........................................................7 VERITAS Storage Replicator™.........................................................................................................................7

Disaster Recovery..................................................................................................................................................7 VERITAS Volume Replicator™ .........................................................................................................................8 VERITAS Cluster Server™ VERITAS Volume Replicator™ bundled agent .....................................................8 VERITAS Cluster Server™ VERITAS Volume Manager™ bundled agent .......................................................8 VERITAS Global Cluster Manager™ with the Disaster Recovery option..........................................................8 Global Cluster Manager™ VERITAS Volume Replicator™ Agent....................................................................8 Global Application Object ..................................................................................................................................8

VERITAS Solutions for Exchange..........................................................................................................................8 VERITAS Edition for Microsoft Exchange 2000.....................................................................................................9

Volume Management.........................................................................................................................................9 Protecting Against Human Error........................................................................................................................9 Exchange design considerations for performance and availability .................................................................10

VERITAS Cluster Server......................................................................................................................................10 Choices of Cluster Architecture ...........................................................................................................................11

Local Clustering ...............................................................................................................................................12 Campus Clustering ..........................................................................................................................................13 Wide Area Disaster Recovery .........................................................................................................................13

Maximize Use of Existing Hardware ....................................................................................................................13 Server Consolidation............................................................................................................................................14 Adaptive Workload Management.........................................................................................................................14 Extensive Application Support .............................................................................................................................15 Powerful Management Featureset.......................................................................................................................15 Add & Remove Cluster Nodes ‘On The Fly’ ........................................................................................................16

Volume Replication ..................................................................................................................................................16 Volume Replication Enhances Exchange Availability..........................................................................................17

Features of VERITAS Volume Replicator that Facilitate Disaster Recovery ..................................................17 VERITAS STorage Replicator..............................................................................................................................17 VERITAS Global Cluster Manager.......................................................................................................................18 Defining the Disaster Recovery Solution for Local and Global Availability..........................................................18

Geographic Wide Area Disaster Recovery......................................................................................................19 Supportability .......................................................................................................................................................19

Technical Support Alliance Network (TSANET) Support Agreement..............................................................19 Recommendations ...................................................................................................................................................20 Conclusion/Summary ...............................................................................................................................................22

Copyright © 2003 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are trademarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or registered trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.Specifications and product offerings subject to change without notice. June 2002.

2

INTRODUCTION BUSINESS CONTINUITY RELIES ON A DISASTER RECOVERY PLAN FOR EXCHANGE The steady march of new, innovative storage software capabilities continues to revolutionize information in many ways: from availability to recoverability to manageability. As server and storage capacities continue to explode, so do the demands for advanced, tightly integrated server and storage management solutions. Along with having a growing amount of data and information, customers are demanding total server and storage management solutions that have a high level of functionality and performance. Additionally, these requirements need to be achieved without impacting the availability or performance of production applications. Nowhere are these demanding requirements more visible than when supporting Microsoft Exchange environments. The critical issues affecting Exchange Server deployments today are availability and recoverability. With even greater reliance on electronic communications, anything that could cause an Exchange Server outage quickly becomes a potential business disaster. The harsh realities of today's global business requires Exchange Server environments be designed and implemented using advanced and reliable server and storage management features to ensure the highest levels of availability and recoverability. The critical business information residing within production Exchange Server databases requires periodic backup to protect against data loss or corruption. More frequent backups permit faster recovery time. However, the downside is that more frequent backups may tax your hardware imposing a significant impact on the performance and availability of Exchange Server databases. Because of the tremendous availability and accessibility requirements imposed by today's electronic commerce, enterprise applications, such as the Exchange Server, cannot tolerate long periods of downtime caused by traditional backup methods. Few backups result in a too long and often cumbersome recovery process while frequent backups may degrade overall system availability and performance. This raises a difficult question affecting many enterprises today: How can customers protect business critical Exchange Server databases without imposing any performance or availability limitations on their production computing systems? Fortunately, improvements in server and storage management software can allow customers to design, implement and manage even the most complex and data intensive environments, often without acquiring new storage hardware. There are many compelling reasons why customers need to implement improved server and storage management solutions in support of Microsoft Exchange Server. The two most important is business continuity and the ability to keep critical business computing assets available and recoverable. This white paper describes best practices of VERITAS Software Corporation products for Microsoft Exchange Server and how each product built on each other can address the availability and protection of mission critical Microsoft Exchange Server databases based on your downtime requirements. BUSINESS CHALLENGES With today's need to communicate with employees, customers and partners 24 hours a day, 7 days a week, Microsoft Exchange servers are crucial for ensuring business success. As more and more demands are placed on those servers for expanding communication, the importance of keeping them up and running and highly available has become even more crucial to businesses of today. The obvious challenge is to configure and manage Exchange servers to minimize downtime and recover quickly from any problems that might occur – including hardware failure, server consolidation, site failure, data corruption, viruses, e-mail storms, to name a few. The more difficult proposition is to do this all while containing the total cost of ownership for the Exchange solution.


3

This document addresses these challenges and provides an overview of the features that comprise high availability and disaster recovery for Microsoft Exchange 2000. We also highlight the business value that has made VERITAS the leading solution for Exchange. Areas of focus include managing downtime through disaster recovery planning and implementation, maximizing hardware utilization and lowering costs through server consolidation, and simplified management and recovery of Exchange both locally, and across geographically dispersed sites. CAUSES OF SYSTEM DOWNTIME When we think of downtime, images of failed components immediately come to mind. Hardware failures, however, account for only a small fraction of degraded system availability. Network failure is even less of a culprit. The vast majority of downtime is actually a combination of software problems, human error, and planned maintenance. Minimizing both planned and unplanned downtime is a primary concern for the Exchange administrator. All systems require some level of planned downtime, which accounts for 30 percent of all Exchange system downtime. Common reasons are the need to address hardware or software upgrades or replacements, service packs, the need to make changes to the configuration, or the need to maintain some relied upon component (such as power or network services). Any solution that can reduce planned downtime of Exchange services is extremely valuable.

Causes of System Downtime

Client<1%

LAN/WAN Equip<1%

Planned Downtime

30%

Software40%

Environment5%

People15%

Hardware10%

The primary objective of planned downtime is to minimize the time that any service is offline. This can be achieved by redundancy of services, compartmentalized dependencies, and having good written and automated procedures to effect the necessary changes.

Source: IEEE Computer The primary causes of unplanned downtime are software, hardware, and people-related downtime. Difficulties with software account for the bulk of system downtime. This is not surprising given the number of interactions between software components, not to mention the variables that today’s software must deal with. Some downtime is caused by incidents of poor data management, such as storage volumes running out of space. Actual hardware-related components account for only about 10 percent of all downtime. Although this number is not difficult to reduce, doing so can be quite costly. The conventional method of reducing downtime is to provide hardware redundancy to ensure that a single component failure will not cause a system outage. The most preventable cause of unplanned downtime is human influence. User error accounts for 15 percent of all downtime. The causes of this type of downtime are numerous. Unintentionally subjecting the system to a virus, making configuration errors, and having poor procedures are the primary causes for people-related downtime. To achieve the “five 9’s” or 99.999% of uptime that many businesses aspire to achieve, only up to five minutes of unplanned downtime can occur in a single year. It is important to note, however, that planned downtime for things such as hardware or software upgrades and hardware maintenance are not counted against the 99.999%. Since this equates to 30% of downtime, you can understand how achieving 99.999% availability is difficult. PROVIDING HIGH AVAILABILITY FOR EXCHANGE High availability can be achieved in part by providing a combination of proper planning, redundant hardware, and the appropriate software. This is not meant to imply that an outage will never occur, but rather that software or hardware component failure will not dramatically disrupt availability of the application data. A proper balance


4

must be achieved in hardware and software design to minimize the effect of component failure, such that no single point of failure (SPOF) exists. The extent to which this is implemented will determine the speed in which the application becomes available after some type of failure. Backup is the necessary foundation for any high availability and data recovery plan, but restoring data from tape can be time consuming and requires applications to be reinstalled, restored, or manually restarted. For many customers who have more demanding time-to-recovery requirements for their applications, backup alone is unable to meet their Service Level Agreements (SLA). VERITAS Cluster Server™ software eliminates long application downtime by providing automated application recovery within seconds of an application or server failure. Implementing a high availability solution like VERITAS Cluster Server doesn’t replace the need for a backup strategy. Rather, clustering compliments a backup strategy by providing faster application recovery in the event of a hardware or software failure. An issue such as Exchange database corruption would still require a valid copy of the database to be restored from a previous point in time, which would require traditional backup methodologies to be in place. Our discussion in this document will assume that a proper backup strategy is already in place. NEED FOR AN EFFECTIVE DISASTER RECOVERY SOLUTION Exchange 2000 Server is a messaging and collaboration platform that provides e-mail, scheduling, online forms, and tools for custom messaging and collaborative applications. It is closely integrated with Microsoft Windows 2000 core services such as Active Directory (AD) and Internet Information Services (IIS). Considering the volume of data that flows through many Exchange servers today, and our reliance on that data, it becomes imperative that we should be able to recover the latest data in the event that a disaster occurs. Ideally, recovery should be up to the last successful transaction that occurred before disaster. Hence, in order to resume operations following a system failure or disaster, it is necessary to have a solution that will migrate or restart the failed application seamlessly to another server either locally or to a remote site. Either Exchange continues to access the data locally, as on a Storage Area Network (SAN), or Exchange is running at a remote site, accessing a replicated copy of the data. Either way, the application and data is back online with minimal, if any, administrative intervention, and is transparent to the end user. A disaster recovery solution is vital for businesses that rely on the availability of their data. A well designed disaster recovery solution prepares a business for unexpected disasters and provides the following benefits in case of a disaster recovery situation:

- Minimizes economic loss due to the unavailability or loss of data. - Ensures safe and efficient recovery of data and services. - Minimizes decision making during the disaster recovery. - Reduces the reliance on key individuals. - Minimizes the data loss during recovery and ensures availability of most recent data.

Thus, planning a disaster recovery solution provides businesses with affordable ways to meet their SLAs, comply with government regulations, minimize their business risk, and maximize the use of their investment in hardware and software.


5

HOW MUCH DOWNTIME CAN YOU ENDURE?

RECOVERY POINT OBJECTIVE (RPO) How much data loss can you afford? What amount of data loss is acceptable? Recovery point objective is the point at which data must be restored When you have a site outage, there’s two key factors to consider: Recovery Point (data loss) and Recovery Time (downtime). Organizations should have a Recovery Point Objective (per application) that must be satisfied as well as a Recovery Time Objective that must be met. These are respectively called RPO and RTO by industry analysts. Most people tend to focus on the RTO or how much downtime is acceptable. However, just as important is to look at how much data loss an organization can tolerate. Make it a point to look at both. Data is important and data loss (even it just a few minutes, hours or days) can have far reaching negative business impacts. Many customers in the Government and Financial arenas are actually bound legally to ensure that ZERO data is lost. They therefore opt for synchronous replication for many applications. Most companies even today rely primarily on tape backup and restore as the center of their DR plan. This usually means at least a day of lost data and a few days of downtime after a disaster. This is fine if it meets the business needs, but most organizations will at least have some applications that will require a more aggressive RPO and RTO. RECOVERY TIME OBJECTIVE (RTO) Recovery time objective is the time in which data accessibility must be restored. For example, if it takes 45 minutes to bring applications on line, and the maximum acceptable outage duration is 1 hour, then the Mandatory Decision Point is within 15 minutes of the actual disaster. Decision is made based on maximum acceptable outage duration: Clock starts once the disruption occurs How long are end users impacted? Business Requirement: Need applications on line in 1 hour. Problem: clock starts upon disaster declaration 1. 1 hour is maximum acceptable outage 2. Time of fault detection eats into IT recovery time 3. Even in a local failure, there’s some time needed to actually detect the fault. 4. Drives the “Mandatory Decision Point” within 15 minutes


6

VERITAS COMPONENTS OF AN EXCHANGE 2000 HIGH AVAILABILITY AND DISASTER RECOVERY SOLUTION Planning for High availability starts with redundant hardware. Hardware designated to run in a highly available configuration should, at minimum, start with redundant power supplies, network interface cards, Host Bus Adapters (HBA), processors, network switches, SAN infrastructure, completely separate network and data paths, internal controllers, internal disks and disk enclosures, and then configuring those disks in a fault tolerant configuration using mirroring or RAID. A VERITAS software solution provides redundancy by allowing the virtualization of disk storage, applications, network name and IP address. By virtualizing these applications and storage resources, we can prevent even the loss of a complete site from creating a prolonged outage. HIGH AVAILABILITY VERITAS Volume Manager™ VERITAS Volume Manager software is an enterprise-level, host based virtualization and online storage management tool that provides flexible storage configuration. VERITAS Volume Manager removes physical limitations of disk storage and enables complete online disk storage management without interrupting data availability. VERITAS Edition™ for Microsoft Exchange 2000 VERITAS Edition for Microsoft Exchange 2000 product is an integrated suite of industry-leading VERITAS storage management technologies engineered specifically for Microsoft Exchange 2000 in the enterprise to deliver proactive management and quick recovery. VERITAS Cluster Server™ For applications that cannot tolerate downtime that lasts for more than minutes, VERITAS Cluster Server software can create a customizable high availability solution that can scale up to 32 nodes, on Standard, Advanced, and Data Center versions of Windows 2000. VERITAS Cluster Server has a highly customizable agent to fully support Exchange. With VCS and VERITAS Volume Manager, mount points can be supported as well as providing the ability for each Exchange instance to run up to the full number of supported storage groups and databases when running with Exchange 2000 Enterprise edition. VCS can also be customized to make any application highly available and comes with both a cross platform management console and a Web Console. Either can be used to monitor all clusters in your organization across all platforms. VERITAS Cluster Server™ Enterprise Agent for Exchange 2000 Server VERITAS Cluster Server Enterprise Agent for Exchange 2000 provides high availability for the Exchange 2000 Server or Enterprise Edition application running on a cluster and provides an installation wizard to simplify the installation of Exchange into the clustered environment. The Exchange agent is a cluster component specifically designed to interact with and monitor the health of the Exchange Virtual Server (EVS). VERITAS Storage Replicator™ VERITAS Storage Replicator provides organizations the ability to protect their remote office Exchange data through real-time data replication. Continuous copying of data to another server guards against business interruption by keeping data and applications online and available. VERITAS Storage Replicator software enables administrators to centralize backup without disrupting normal server operations and saves the organizations money by eliminating the need to invest in remote office hardware and administration. DISASTER RECOVERY Building on the high availability product components mentioned above, a secondary (disaster recovery) site can be deployed with the following VERITAS products:


7

VERITAS Volume Replicator™ VERITAS Volume Replicator product supports data replication via a local area network (LAN) or wide area network (WAN) to ensure replicated data can be stored at a sufficient distance in the event of a site disaster. VVR does not require a dedicated network and is not dependent on any vendor-specific hardware platform. VERITAS Cluster Server™ VERITAS Volume Replicator™ bundled agent The VERITAS Cluster Server VERITAS Volume Replicator agent provides high availability for replication by monitoring VVR replicated volume groups (RVGs). VERITAS Cluster Server™ VERITAS Volume Manager™ bundled agent The VERITAS Cluster Server and VERITAS Volume Manager agent monitors and fails over disk volumes between systems in a cluster. VERITAS Global Cluster Manager™ with the Disaster Recovery option VERITAS Global Cluster Manager software is a wide area network (WAN) solution that provides failover capabilities between clusters in the event of a site disaster. A company’s mission-critical applications can remain running by having a replicated site in any worldwide location with clustering capabilities. This enables operations from one cluster in a site to fail over to another cluster in a different site with replicated data, regardless of distance. Additionally, Global Cluster Manager provides heterogeneous cluster management, so an administrator can control application and data availability in an entire enterprise, across multiple platforms, from a single console (Command line, Web or Java based). Global Cluster Manager also provides “cross cluster event correlation”. This allows an event in one cluster to initiate an action in another. Global Cluster Manager also provides complete wide area failover capability, through the integration of replication (VVR, SRDF and HDS). Global Cluster Manager™ VERITAS Volume Replicator™ Agent The Global Cluster Manager VERITAS Volume Replicator agent for Global Cluster Manager is included with the VERITAS Volume Replicator product. This agent provides monitoring and migration capabilities for replication. Global Application Object The Global Cluster Manager Disaster Recovery feature helps set up a relationship between application and replication groups on the primary and secondary clusters. The type of solution chosen is typically based on the amount of downtime that an organization can tolerate. By choosing the solution based on the amount of data loss or downtime that can be tolerated a solution can be tailored to the organization and scaled to meet more stringent requirements as the need for more timely recovery becomes necessary. VERITAS SOLUTIONS FOR EXCHANGE Designing for high availability and disaster recovery in a Windows data center involves more than just a simple cluster implementation. While other vendors may provide a single piece of what is required for application availability, additional layers are still necessary to keep a data center fully protected from disaster. VERITAS provides an end-to-end solution. This solution suite begins with a solid backup plan using VERITAS NetBackup™ software or VERITAS Backup Exec™ software. Backup consolidation from remote sites can be achieved through VERITAS Storage Replicator (VSR), which provides file level replication. This would be followed by data sets on RAID volumes for fault tolerance using VERITAS Volume Manager or VERITAS Edition for Exchange 2000. Clustering for application availability with VERITAS Cluster Server, replication for wide area disaster recovery using VERITAS Volume Replicator, and multi-cluster management for site-migration using VERITAS Global Cluster Manager are the final components to a complete solution tightly wrapped around Microsoft Exchange.


8

VERITAS EDITION FOR MICROSOFT EXCHANGE 2000 For many Exchange servers, the backup window is small to nonexistent. VERITAS Edition™ for Microsoft Exchange 2000 has an automated VERITAS FlashSnap™ Backup and Recovery utility that solves the problem of limited backup windows by creating point-in-time snapshots of the Exchange data. These snapshots can then be backed up either locally or on a remote host, with limited or no impact on the production system. VERITAS Edition for Exchange 2000 also includes all VERITAS Volume Manager functionality for a complete solution for managing disk storage for your entire Exchange environment. Volume Management As stated earlier, backup should be at the heart of any properly designed plan, but beyond that, as we move along the RTO and RPO timeline, the next product solution to increase the recovery time objective is Volume Management. Volume Manager is our Foundation product and allows host based storage virtualization. In addition, many performance advances are made possible by dynamic disks, including but not limited to Mount Point support for Exchange when using Exchange Edition or VERITAS Cluster Server, and Volume Snapshots that can be used to create up to 32 copies of your data. VERITAS Volume Manager and VERITAS Edition for Microsoft Exchange 2000 provide Cluster Disk Groups, Software fault tolerance, and the ability to completely control storage access from or to any host on the SAN regardless of the operating system. Operating systems expect to have disk storage for their exclusive use. In other words, when a computer is attached to a SAN, its expectation is that all the storage it sees it owns. This will obviously give rise to problems when more than one host is attached to a SAN. Using software or hardware methods, multiple computers are able to coexist on a SAN. Once this coexistence has been established, it is up to the administrator to make the most efficient use of the resources at hand. For instance, it is more justifiable to use an expensive storage array on a SAN as opposed to a single host. The array can be divided into LUNs, and then, using software management, the administrator can apportion the storage into the best configuration based on application constraints and service level agreements. Using Volume management to enhance availability of disk storage is sometimes referred to as “Host based Virtualization.” By adding Software Volume Management on top of hardware disk array management solution, you can achieve the maximum disk storage optimization, performance, availability, and flexibility. VERITAS Volume Manager facilitates this by allowing LUNs to be presented from the smallest unit size from the array vendor and then pooling the LUNs together into “Disk Groups” which, in turn, can be subdivided into volumes. This allows the reallocation of storage to occur online, and in production where disk storage can be added on the fly where it is needed rather than creating LUNs that mis-allocate space that cannot be reclaimed and used where and when it is needed. Before you create a disk group, it is critical to determine:

The type of LUNs that is required (for example, RAID-5 for databases and RAID 1+0 for logs). • • • •

The number of LUNs required for the disk group. The implications of backup and restore operations on the disk group setup. The size of the databases and logs which depend on the traffic load.

Protecting Against Human Error In a large Data Center environment, many systems will share access to the same SAN and potentially have the ability to see disks that may be under the control of other systems that are running the same or even different operating systems. For this reason, VERITAS Volume Manager™ software provides Cluster Disk Groups. Cluster Disk Groups have unique benefits in that they use SCSI reservation to control the disk group and completely block access to all systems on the SAN except the one in control of the disk group. Since they are designed to be controlled by a cluster, only the cluster can automatically import the disk group and the assign drive letters to the volumes under its control. While the Windows operating system automatically mounts disks after a reboot or bus rescan by


9

default, Cluster Disk Groups effectively eliminates the possibility of corruption caused when SAN attached nodes are added or rebooted on the SAN. All nodes attached on the SAN should have VERITAS Volume Manager 3.x or greater installed whether “cluster” or “stand-alone”. This will ensure that even if zoning or LUN security mistakes are made, VERITAS Volume Manager will recognize these disks and leave them alone, preventing configuration mistakes from becoming a disaster. VERITAS Volume Manager not only recognizes other Windows disks, but any disk configured in a Disk Group on any version of VERITAS Volume Manager on any platform, helping to protect all data on the SAN. VERITAS Editions for Microsoft Exchange not only allows for Snap Shots, but actually enhances Exchange 2000 availability by automating the entire process for rapid backup and recovery of Exchange 2000 data. Additionally, Edition for Microsoft Exchange provides full support for Volume monitoring, auto growth, and spike detection features with VCS clusters as well as support for mount points to reduce dependency on legacy drive letters. Exchange design considerations for performance and availability Microsoft Exchange 2000 Enterprise Edition allows for the creation of up to 4 Storage groups per server and up to 5 Databases per Storage group. While Microsoft Cluster Server (MSCS) only supports 4 Storage groups in a cluster, VERITAS Cluster Server (VCS) imposes no such limitation. As such, VERITAS makes the following recommendations for increasing Exchange availability and to facilitate growth while reducing recovery times:

One Disk Group per Storage Group One Volume per database One Volume per streaming file One Volume per log One Volume for Registry replication and should reside in the Disk Group containing the First Storage Group for consistency. (There will be one per Exchange Virtual Server)

NOTE: Disk Groups and Volumes should be configured such that databases, streaming files and logs should not be on the same disk spindles whenever possible for best performance and to allow for a faster and more granular backup and recovery. Disk Storage and Tape storage should never be configured in the same Zone. A minimum of one zone for disk storage and one zone for tape storage should be configured with the only common member in each group being the HBA from each host that has the need for both disk connectivity and tape device connectivity. Failure to follow this guideline can result in degradation of SAN performance, loss of availability, and application downtime due to misdirected encapsulated SCSI commands targeting unintended recipients. VERITAS CLUSTER SERVER VERITAS Cluster Server™ software (VCS) provides value to businesses that require applications or services to be available constantly, with little or no downtime per year. VCS can monitor applications, services, and their supporting infrastructure for server and application failure, and take responsive actions if a failure occurs. VCS will move the dependant resources to a healthy server. Many advanced cluster capabilities found in the VERITAS Cluster Server software are not available in traditional 2-node high availability solutions. Features such as role-based security, intelligent workload management, web-based administration, and N+1 clustering all contribute towards lowering the total cost overhead and providing easy management of a high availability solution.


10

From a cluster architecture perspective, VERITAS Cluster Server software is based on a proven technology design that has been awarded market dominance in the high availability space, with thousands of production installations worldwide. VERITAS Cluster Server software can scale from 1 to 32 nodes to provide availability and flexibility in the most demanding configurations. Applications are viewed by VCS as a collection of resources – each resource representing a component of an application, such as a network card, an IP address, a disk, a virtual host name, or an application’s services. A collection of resources specific to an application is contained in a Service Group. For example, a cluster may have three Service Groups – one for Exchange 2000, one for a Printshare, and one for a Fileshare. These groups can be moved independently of one another between systems in the cluster. When a failure occurs, VCS will restart the failed component or migrate the entire application Service Group to another server using a process called failover. CHOICES OF CLUSTER ARCHITECTURE VERITAS recognizes that not all data center environments are alike. Building an infrastructure for high availability using independent, two-node clusters may not meet the management or availability requirements of every business. Using VERITAS Cluster Server as a stand-alone solution or in combination with other VERITAS products, availability can be achieved in almost any environment. Three primary architectures are used as the foundation for application availability in a Windows environment:

• Local area clustering using shared SCSI or SAN attached storage • Stretch clustering for campus or metropolitan area environments using mirrored or replicated storage • Wide area clustering for disaster recovery spanning larger geographic distances


11

Local Clustering Stretch Clustering Wide Area Disaster Recovery This standard cluster configuration uses shared storage between nodes using either SCSI or Fibre Channel interconnects. All nodes are connected via redundant Ethernet links for private cluster communication. A single standby server can be used to provide redundancy for multiple active servers.

Organizations using a recovery site on the same or nearby campus may choose to create a stretch cluster. This can be done by stretching a mirror over a SAN (Campus Cluster).

Once clusters are deployed at multiple sites, it is possible to provide application failover from one cluster to another. Replication is used to copy data from one site to another, and site migration can be automated through VERITAS Global Cluster Manager.

Local Clustering Local high availability should be the first method of deployment while clustering for server failure or complete disaster recovery. This allows an application to resume from failure using locally attached storage rather than relying on a remote copy. The standby host uses the same physical data set that was used by the primary host. As a best practice, failover should always be done locally for application or server faults – failover to a remote site should only be used to protect against a site-wide disaster. Another very important aspect of local clustering is that application recovery happens automatically, and failing back to the primary requires no resynchronization of data. Using stretch or geographic clusters sometimes requires manual intervention to initiate failover. This is done to avoid inadvertent application migration to a remote site due to cluster heartbeat failure.


12

Campus Clustering As customers see an increasing need to protect from natural disasters and site failure, campus cluster or “stretch cluster” configurations become a popular alternative to a full-scale disaster recovery solution. Implementing a Campus Cluster will provide a lightweight form of disaster recovery for environments where a traditional wide area disaster recovery solution using replication is not suitable and SAN connectivity is available to connect all systems and storage together. This solution eliminates both the hardware array and the physical building as a single point of failure in a cluster, and effectively provides application and data fault tolerance in the event of nearly all failure scenarios with the exception of campus-wide disasters. Specific advantages in a Campus Cluster configuration:

• Dynamic volume support within a cluster • Protection from common storage management errors in a clustered environment • Optional automatic failover in site-failure scenarios

For more detailed information, please refer to the VERITAS Campus Cluster Solution for Windows 2000 Whitepaper found on the VERITAS web site, or go directly to http://eval.veritas.com/downloads/pro/vcs_vm_campuscluster_wp.pdf. Wide Area Disaster Recovery Once a local cluster is deployed, it may be necessary to provide a remote failover target to provide protection for applications in the event the entire site or cluster is destroyed, or if there are no available resources left within a cluster to host an application. By creating a cluster at a remote site, VERITAS Global Cluster Manager (GCM) can be used to facilitate application failover between clusters. Using TCP/IP to facilitate cluster-to-cluster communications, clusters can be distributed anywhere in the world. Applications can be migrated completely with a single click – including DNS server updates for client redirection to the recovery site. This is especially important in a disaster situation, where stepping through a twenty-page recovery document is a task not suitable during a crisis. A single notification to alert the administrator and confirm site migration is all that should be necessary before bringing applications back online using a remote data set. Global clustering reduces the overall cost of disaster recovery by allowing rapid application migration with no loss of data. Less administration is required to manage a global availability environment by centralizing control of geographically distributed clusters through the use of a single web console. MAXIMIZE USE OF EXISTING HARDWARE Getting the most efficient use out of existing hardware can be challenging. Using advanced storage management software from VERITAS means a single solution can be used across multiple vendors’ hardware, whether that’s storage arrays, servers, network infrastructure, or operating systems. Purchasing a new server to host an application in the data center is common, but purchasing two servers for every application to account for high availability can become expensive. In some cases, it may not even be necessary to have a standby server with as much processing power as a primary server. If a failure occurs, it may be acceptable for an application to run in degraded mode on a less powerful server until the primary is repaired. These servers may not even be the same make or model. VERITAS Cluster Server™ software offers customers the flexibility to include systems of different hardware specifications, with only a few basic requirements for compatibility. This allows existing, older hardware to be used as a backup for new, more powerful hardware. Details about minimum requirements can be found in the VCS Installation and Configuration Guide, located on the VERITAS technical support web site.


13

http://eval.veritas.com/downloads/pro/vcs_vm_campuscluster_wp.pdf

SERVER CONSOLIDATION One of the many advantages of the VERITAS Software solution stack is the ability to not only automate disaster recovery but to actually allow an organization to consolidate server and storage hardware. Rather than having one server for every application as in a stand-alone configuration, the VERITAS solution for Exchange actually allow the creation of availability through fault tolerance and redundancy with fewer servers. The VERITAS solution provides many to one or any to any failover, thereby significantly reducing overall hardware requirements. Server consolidation involves the IT infrastructure to move from large numbers of small open systems, many running at low capacity levels, to a much smaller number of large scale enterprise servers running at near max capacity (80% or better). One possible solution is N+1 clustering, where one enterprise class server can provide redundancy for multiple active servers. This reduces the cost of redundancy for a given set of applications. N+1 also simplifies failover location choices, as all applications running on a failed server simply move to the spare server. However, N+1 clustering starts to fall short in true Server Consolidation environments. Customers require the ability to withstand multiple cascading failures, or take systems offline for maintenance and still have adequate redundancy in the server cluster. Typical application clustering packages have fallen short in this area, as the amount of flexibility is limited when it comes to choosing the proper hosts for potentially tens or hundreds of application groups. The resolution is Any-to-Any clustering. Any-to-Any refers to multiple Service Groups running on multiple servers, with each Service Group capable of being failed over to different servers in the cluster. For example, imagine a 4-node cluster, with each node supporting 3 critical application instances. On failure of any node, each of the three instances is started on a different node, ensuring on node does not get overloaded. This is a logical evolution of N + 1, where there is not a need for a “standby system” but rather “standby capacity” in the cluster. What is required to truly utilize the capabilities of Any-to-Any clustering is an advanced ability to proactively determine the absolute best node to run an application at time of failure and scalability to handle this decision for unlimited groups simultaneously. VERITAS Cluster Server provides a unique feature called Adaptive Workload Management to address these issues. This is addressed in greater detail in the chapter Recommendations. ADAPTIVE WORKLOAD MANAGEMENT VERITAS Cluster Server™ software includes a powerful application load-balancing mechanism at the service group level, known as “Adaptive Workload Management.” This feature allows the cluster to determine the optimal system on which to host an application during startup or when recovering from a failure. With Adaptive Workload Management, system capacity and application load can be defined by the administrator either statically or dynamically based on changing server resource load. Application Load Balancing VERITAS Cluster Server software has three primary settings for Failover Policy. These are Priority, Round Robin and Load. Priority is the most basic - the system with the lowest priority in the cluster is chosen. This is ideal for a simple two-node cluster, or a small cluster with a very small number of Service Groups. Priority is the default behavior in VCS. Round Robin chooses the system running the least number of Service Groups as a failover target. This is ideal for larger clusters running a large number of Service Groups of essentially the same server load characteristics, such as similar databases or applications. Load is the most flexile and powerful policy. It provides the framework for true server consolidation in the data center. Load policy is made of two components, Capacity & Load and Limits & Prerequisites. System Limits and Group Prerequisites add additional capability to the load policy. The user can set a list of finite resources available on a server (Limits), such as available memory, processor usage, etc. Each Service Group is then assigned a set of Prerequisites. For example, a SQL database may need 256MB of memory and two 1.5 GHz processors. VCS load policy will first determine a subset of all systems that meet these criteria and then choose the lowest loaded system from this set. In this way, an unloaded system that does not meet all the


14

Prerequisites of a group will not be chosen. As soon as the decision is made to online a group on a particular system, the Prerequisites of the group is subtracted from the Limits of the system, so a database migration would decrease available memory of a server by 256MB. System Limits and Group Prerequisites work independently of Failover Policy. Prerequisites are used to determine a sub set of eligible systems that a group can be started on during failover or startup. Once a list of systems meeting proper Prerequisites is created, VCS will then follow the configured Failover Policy. System Limits and Group Prerequisites are used to control Exchange failover such that only a single instance of Exchange will ever attempt to run on a given node at one time. This is particularly advantageous when configuring clusters that contain more than one Exchange Virtual Server (EVS). EXTENSIVE APPLICATION SUPPORT VERITAS has a long history of working closely with application vendors to jointly support applications running in VCS environments, and Microsoft is no exception to this. Microsoft Exchange, SQL, IIS, and SPS (SharePoint Portal Server) are all developed with the engineering support of each product team, with the understanding that VCS is promoting sales of these products into the enterprise Windows market. As a result, VERITAS is able to provide robust, easy-to-use solutions for all of the enterprise Windows applications available today, along with a joint support agreement in place to manage customer escalations. Significant engineering effort goes into creating VCS agents to support the latest applications. Each agent requires an installer to push the application configuration to all nodes in the cluster, as well as a configuration wizard to assist in customizing the application for the environment. Today, VERITAS Cluster Server software supports many Windows applications, including Microsoft Exchange 5.5 and 2000, Microsoft SQL 7 and 2000, IIS 5.0, Oracle 8i and 9i, Lotus Domino, File / Print, SAP, and nearly any other application through VERITAS Enterprise Consulting Services. VERITAS Cluster Server also provides strong support for managing hundreds to thousands of file shares and print shares. Many tunable features, such as mount point support, auto-sharing of subdirectories, hidden shares, and Volume Manager with Volume Replicator integration make network file and print serving a powerful and scalable component of our Windows solution stack. POWERFUL MANAGEMENT FEATURESET VERITAS Cluster Server software consolidates islands of two-node clusters into fewer, more manageable cluster configurations. Using a web-based interface, customers can monitor and manage clusters remotely, from any standard browser. A Java console provides management of multiple clusters on any platform from a single console. The VERITAS Enterprise Administrator feature can be used from any computer to easily and efficiently manage all storage for Exchange as well as the rest of your volumes of data across the entire enterprise, regardless of platform. Management of clusters also goes beyond the local site. Once multiple clusters in multiple sites have been deployed, VERITAS Global Cluster Manager™ (GCM) allows applications to failover between clusters and consolidates cluster administration with replication using an extension of the existing VCS management console. From a nation-wide view, an administrator can view all clusters at all sites from a single console, then drill down into a cluster at one site, which will transparently move from the GCM console into the VCS interface.


15

Simplified Management IT departments frequently engage in server and storage consolidation projects as a way to get approval for newer equipment by showing the cost justification of not having to hire additional resources to manage the growth of their systems and to lower hardware maintenance costs. An important requirement of this consolidation effort is migrating the users data from locally attached server storage to shared storage in a SAN environment. Migrating Microsoft Exchange data to the SAN is complex process, and requires a thorough understanding of SAN hardware, backup requirements, performance characteristics of Microsoft Exchange and a good plan for growth. Typically hardware solutions only address issues such as fault tolerance, however with a VERITAS Solution for Microsoft Exchange you can manage your complex Exchange environment through a suite of complimentary tools that can be run on the server or from your workstation in your office to your home computer. ADD & REMOVE CLUSTER NODES ‘ON THE FLY’ Adding and removing nodes to and from VERITAS Clusters is a simple operation that benefits future business growth, online cluster maintenance, and capacity-on-demand challenges. Server maintenance is an ongoing practice to continually upgrade or service network adapters, memory, processors, or other devices. In a standard two-node cluster configuration, work must be performed on the standby server while the primary server is hosting an application. Most environments, however, implement clustering due to the serious economical and business impact of application downtime. By performing maintenance on the standby server, it effectively leaves the application vulnerable to hardware or operating system failure. Using even a simple three-node configuration with VERITAS Cluster Server, one server can be taken offline for maintenance while still providing high availability for the application using the remaining two servers. Upgrading the Configuration Upgrading a cluster configuration can seem to be a daunting task. In a standard two node Windows cluster, the standby server is upgraded first, followed by a migration of the application from the existing node to the new node, and then the remaining node is upgraded. This not only introduces downtime during failover, but a risk that the application may not properly come online using the new configuration. VERITAS Cluster Server software allows the underlying cluster configuration to be upgraded without affecting applications. Using an automated installation wizard, the VCS services are stopped on all nodes in the cluster and updated to the new configuration – all while the applications monitored by VCS remain online. Once the upgrade is complete, the VCS services are restarted and monitoring of the application continues as normal. VOLUME REPLICATION Even with the most well executed backup strategy, restoring data from tape usually results in several hours of lost data. For many environments, this kind of data loss is unacceptable and real-time replication is a requirement. Real time replication not only minimizes or eliminates data loss, but also enables rapid recovery when compared to conventional bulk data transfer from sequential media. VERITAS Volume Replicator™ software works as a fully integrated module within VERITAS Volume Manager™ software, the industry-leading, highly popular online data storage management solution used in more than 1000,000 enterprises worldwide. Replication, in the context of disaster recovery, is an automated and rules-based method for the geographical distribution of identical data. This reduces the opportunity for human error and minimizes the need for administrator intervention. Replication should make efficient use of resources and, after an initial synchronization, keep WAN network traffic down by replicating only the data blocks that actually change. *Initial synchronization time can be all but eliminated by using VERITAS NetBackup™ software to perform a “volume image” backup.


16

VOLUME REPLICATION ENHANCES EXCHANGE AVAILABILITY When combining High availability with replication technology it is possible to create an exact duplicate of mission critical data a remote disaster recovery site. In organizations with multiple sites data can be cross-replicated so that each site becomes a secondary for the other. Then if a disaster strikes important applications can be resurrected at the secondary site. In stressful times where mission critical data must be brought back online as soon as possible, even a good plan can become an obstacle to bring applications online quickly and error free. For this reason VERITAS Global Cluster Manager software was conceived to completely automate the site failover. Now even the night operator or junior administrator can easily bring the important application back online. Additionally, many organizations must quickly respond to ever increasing data security issues and Global Cluster Manager and VCS allow rolling upgrades to take place and greatly minimize even planned downtime. One of the many complications of moving applications from one site to another is the reconnecting of the clients to the application at its new location GCM provides automation of the virtual name change and the DNS updates to eliminate the need to make complex changes to name resolution. Features of VERITAS Volume Replicator that Facilitate Disaster Recovery Following is the list of the important VVR features that help with disaster recovery in an Exchange Server environment:

- Write Order Fidelity: VVR guarantees that changes made to data on the Primary host are made in the same sequence on the Secondary host. This ensures that data remains in a consistent state in the event of a disaster.

- Asynchronous Replication: VVR reflects the changes to the application immediately on the Primary and are then reflected on the Secondary as soon as possible. Until the data is sent to the Secondary it is stored on the Replicator Log (SRL).

- Synchronous Replication: VVR guarantees that changes committed on the Primary host are also committed on the Secondary host. This ensures the data on the Secondary host matches the data on the Primary host and minimizes data loss in the event of a disaster. However, it is recommended that you use the soft-synchronous mode of replication, which runs synchronously but can switch to asynchronous if replication falls behind due to network outages and switches back when SRL is drained.

- Volume Snapshot: VERITAS Volume Manager, the volume management technology used by VVR, provides the ability to take a point-in-time snapshot of a volume. This allows you to verify the consistency of the data on the Secondary host without impacting the replication between the Primary and Secondary host. A volume snapshot is also used to execute offline backups without impacting Exchange Server’s performance. This feature requires the Volume Manager FlashSnap feature license or VERITAS Exchange Edition 1.1.

- Heterogeneous Storage Support: VVR provides a replication technology that works with the heterogeneous storage hardware. VVR allows replication to occur between similar or dissimilar storage arrays from a vendor or between different storage arrays from different vendors. This allows for maximum use of existing hardware and provides flexibility when adding new hardware.

•

VERITAS STORAGE REPLICATOR VERITAS Storage Replicator™ software is ideal for replicating critical files to one or more offsite servers. It is designed to continuously monitor the state of data on a VERITAS Storage Replicator system and record changes made to data on the system’s disk. Those changes are selectively mirrored to another system (the Target server) via a network connection. There is no requirement to physical proximity between the Source and the Target servers. By using VSR, the Microsoft Exchange Database, Log files and associated files can be replicated to a secondary server as part of an upgrade strategy. Hardware can be replaced, or the Primary Exchange Server can be migrated to a new location. Many companies today have gone to a centralized model and use a corporate Data Center to house their most active servers. The basic steps involved in migrating the primary Exchange server (database and associated files) are as follows:

Installation of VSR on the primary Exchange server and new (target) server.


17

Installation of Exchange on the primary server. •

• • • • • • •

Installation of Exchange on the new server. Creating a replication job using the VSR console. Running the replication job. Validating the new Exchange server Shutting down the primary Exchange server. Adjusting the network configuration for the new Exchange server. Bringing the new Exchange server online.

VERITAS GLOBAL CLUSTER MANAGER VERITAS Global Cluster Manager™ software allows IT staff to manage geographically distributed data and application availability from a web-based console. Administrators can view and manage from a single location their distributed clusters built on VERITAS Cluster Server software. This management framework reduces administrative overhead for any organization with two or more server clusters. The VERITAS Global Cluster Manager™ Disaster Recovery Option is an add-on product that integrates clustering and replication technologies to minimize planned and unplanned downtime. The Disaster Recovery Option directs data replication to remote sites, allowing for site and or application migration. Replication is a continuous ongoing process. As the application writes data, this data must be replicated at the remote sites. In the event of disaster or planned downtime, it is important for the administrator to know the exact status of the replication process. By combining cluster failover with replication, VERITAS Global Cluster Manager software offers the ultimate in disaster recovery management for businesses that cannot afford data loss and prolonged downtime. Administrators can manage disaster recovery sites from a single web console regardless of platform or location. DEFINING THE DISASTER RECOVERY SOLUTION FOR LOCAL AND GLOBAL AVAILABILITY The VERITAS Global Cluster Manager (GCM) provides an extension to the VERITAS Cluster Server (VCS) software local high availability protection clustering function. Building upon the local application management and high availability provided by clustering, GCM moves beyond simple clustering to global clustering and beyond local high availability to wide-area failover for disaster recovery. There are several situations where managing multiple clusters is preferable or required, for example, systems may need to be linked by the clustering communication system. When these links cannot be provided between systems due to distance or expense, systems cannot be part of the same cluster. This is where GCM steps in to provide solutions for: managing multiple clusters, setting policy, and replication management for disaster recovery. Without such a tool as GCM, failover from one cluster to another requires monitoring each cluster through its own interface and manually sending commands to each cluster to perform a failover. Disaster Recovery really means wide-area high availability. Providing high availability for the wide area means providing the capability to fail over applications smoothly between clusters.


18

Knowledge of the replication status enables the administrator to protect against data loss and minimize disruption to application users. GCM provides a mechanism for reporting the replication status and statistics needed by administrators: the replication framework. The DR solution involves VCS for local high availability capability, and GCM and VVR for wide-area high availability capability and DR. Thus GCM is a complete wide area network (WAN) DR solution. GCM provides failover capabilities between clusters in the event of a site disaster. The mission-critical applications can continue running by having a replicated site in any worldwide location with clustering capabilities. This enables operations from one cluster in a site to fail over to another cluster in a different site with the replicated data. GCM supports any custom data replication software, but VERITAS recommends using VVR to seamlessly replicate any database. For more details on using GCM as a DR solution to recover Exchange 2000 Databases refer to the VERITAS Global Cluster Manager Disaster Recovery Guide for Exchange 2000. Geographic Wide Area Disaster Recovery While Clustering provides availability of applications at a primary data center and replication technology allows those same applications to be replicated to a remote disaster recovery site, without automation of the disaster recovery solution, seamless migration of applications cannot occur. Worse yet, without automation, only highly trained administrators following detailed steps can hope to achieve the daunting task of migrating mission critical applications to a remote site. In times of extreme chaos and panic, only a completely automated disaster recovery solution makes sense. VERITAS Global Cluster Manager (GCM) software allows for ease of migration of Global Applications with a single mouse click. GCM allows for complete control of your applications through a centralized web console that can be accessed from anywhere without even installing a separate management client. Not only is the application brought online in the remote location, but all modifications to DNS, WINS, etc. are made to ensure that clients continue to access crucial Exchange data without experiencing any significant disruption of service. As an example, the following diagram displays a solution that provides for high degree of availability and reliability using a combination of wide area clustering, combining replication and a well-defined backup and recovery strategy. SUPPORTABILITY Technical Support Alliance Network (TSANET) Support Agreement VERITAS recognizes that technical support is a key factor in the decision making process when evaluating third party solutions. For this reason, establishing a joint support agreement between both Microsoft and VERITAS was a top priority for the High Availability team. Through the Technical Support Alliance Network (TSANet) agreement, a contractual obligation exists for each company to respond to customer escalations within hours, not days. Customers are not redirected when issues arise involving a combination of VERITAS products with Microsoft applications. A dedicated channel exists between back-line support groups from each company to facilitate communication, which allows the vendors to work on the problem together while providing a single interface to the customer. Microsoft has documented a full statement of support:

“Microsoft recognizes VERITAS Software as member in good standing of TSANet, and as such will support VERITAS staff resolving customer’s technical issues involving VERITAS software running on a Microsoft platform, or interacting with a Microsoft application.”

(*Full document can be provided on request)


19

RECOMMENDATIONS Consider the Full Scale Disaster Recovery Plan diagram below. The Primary site is a 10-node cluster, with 7 active nodes supporting an Exchange Virtual Server instance, while 3 nodes are Standby nodes available to any of the 7 instances of Exchange. On failure of any 3 nodes, each of the Standby servers can accept one instance of Exchange. This is a logical evolution of N+1, where there is not a need for a “standby system” but rather “standby capacity” in the cluster. Once corrective action has made the previously failed systems available, they will act as Standby servers to those that are running. Through policy-based management of the cluster, other applications can co-exist on these Standby or Active servers. Additionally, these applications can be relocated to other servers to give maximum performance to the most important Exchange Virtual Servers or other applications Disaster Recovery (DR) is achieved through creating a series of procedures you can use to reliably and efficiently restore application data and services in the event of a catastrophic failure. A basic procedure could be series of steps involving restoring from a tape backup to a more robust solution consisting of primary and secondary sites, and within those sites, primary and secondary clusters. The primary cluster provides data and services during normal operation, and the secondary cluster provides data and services if the primary cluster or site fails. As online requirements increase, so does the need to automate the procedure so that in the time of crisis a complex set of steps is not required. Most organizations would much prefer the simplicity of pressing a button to allow seamless failover or migration of data and applications. A solution much like the one described in the previous diagram illustrates the best model for protecting against disaster in your Exchange 2000 environment.


20

The Primary Components for Achieving High Availability:

• Use VERITAS Cluster Server software for Clustering in both the primary and secondary sites to ensure application availability. Typically, providing one or more failover targets on the primary site, and at least one failover target on the secondary.

• Administering the two clustered data centers using a Software Management component (GCM) from any location. Can be accomplished from a web browser from anywhere in the world.

• Data Management utilizing Volume Management software on each server to enable dynamic disks and online volume growth for Exchange 2000 databases and transaction logs through VERITAS Edition for Exchange 2000.


21

• Using Replication between the two data centers to ensure that Exchange 2000 in the secondary site can

continue to serve clients immediately in case of disaster in the Primary site. This prevents the time consuming task of restoring lost data from backup. VVR provides replication across a standard network connection to similar or dissimilar storage devices.

Many factors can affect successful deployment and utilization of a complex solution. A short time spent on the front end of the deployment to develop a plan will save countless hours on the back end of the project to troubleshoot where things went wrong. To minimize deployment time and increase performance it is important to install hardware and software in accordance with all best practices for your entire solution stack. Contact the appropriate resources before deployment if any steps are unclear. This will avoid the potential for countless hours correcting avoidable mistakes. VERITAS Software can provide Enterprise Consulting Services to assist in everything from the design to the deployment to meet your specific requirements. Designing resilient systems will keep everything online and reduce unplanned downtime. VERITAS Software solutions for Windows 2000 and Exchange 2000 must be installed on hardware that is supported on the Microsoft Hardware compatibility list. Additionally, VERITAS maintains an HCL for many products and should be consulted before selecting and designing a solution. In addition to just hardware on the HCL, it is very important to be certain that Server, Network and Storage related hardware are at the appropriate firmware and driver levels. All enterprise solutions rely heavily on the ability to use the network infrastructure. As such, it is critical that Domain Name Resolution (DNS) is properly configured before attempting installation and configuration of enterprise software and applications. Network cards and connected ports should be manually configured to the same speed and duplex mode for all systems on the same network segment. Exchange also relies heavily on Internet Information Server, Active Directory, and access to the Global Catalog. Proper configuration of these important resources will reduce the risk of a single point of failure. The ability to use disk storage is essential to any High Availability deployment. Ensure that SAN and Storage systems are properly configured and all disk resources are visible to only systems where intended. This should be accomplished through proper LUN masking, Zoning, etc. Additionally, to avoid contention issues on the SAN, disk storage and tape storage should not be configured in the same zone. HBAs, modular data routers, and storage hubs or switches should be configured appropriately according to the manufacturer specifications for the type of deployment in use. Disastrous results can occur from improperly configuring the SAN. To avoid preventable downtime and limit exposure to disaster:

- Devise a solid backup and recovery plan and perform periodic “fire drills” to ensure the integrity of your plan.

- Review backup logs to determine success or failure and explore the reason for any failures. - Examine the Windows 2000 Event Viewer logs to proactively look for problems that could be developing. - Keep good records of changes of hardware or software made to production systems. If changes have

been made to your systems, the method of restoration may be determined by information contained in your change log.

CONCLUSION/SUMMARY Application downtime costs customers money in lost productivity, operational costs, and recovery time. Customers who deploy High Availability solutions do so because their applications are considered critical to the overall success of their business. When investing in such a critical component, the choice between relying on a ‘free’ cluster solution and what VERITAS offers is clear. As a leader in storage virtualization, data protection, replication, and high availability, VERITAS software, with its portfolio of solutions that manage and protect critical data in the most demanding IT environments, is a natural choice to meet your high availability and disaster recovery requirements.


22

The integrated suite of VERITAS storage management technologies improves manageability, increases availability, and ensures quick recovery of Exchange servers. With built-in expertise, central administration, and proactive notification and configuration, VERITAS Software simplifies the management, availability, and recovery of your Exchange environment. It delivers benefits to companies of all sizes, making it easier to meet, and in some cases, even exceed your service level agreements. VERITAS offers hardware-independent software solutions, which include backup and recovery, replication and remote mirroring, and clustering, allowing you to leverage existing hardware within your data center. No other vendor provides as much flexibility of OS and application support, scalability for local and wide area recovery, ease of management, or reliability for your most critical data. These features combined with the flexibility of architecting large clusters, providing increased total cost of ownership and future value for the business. Trust high availability and disaster recovery of your Exchange data to the market leader – VERITAS Software Corporation.


23

High Availability and Disaster Recovery Best...

Documents

Transcript of High Availability and Disaster Recovery Best...