A Practical Guide to Business...

12
WHITE PAPER A Practical Guide to Business Continuity

Transcript of A Practical Guide to Business...

Page 1: A Practical Guide to Business Continuityimg2.insight.com/graphics/uk/content/microsite/overland/... · 2009-07-22 · Too often, Business Continuity (BC) and Disaster Recovery (DR)

WHITE PAPER

A Practical Guide to BusinessContinuity

Page 2: A Practical Guide to Business Continuityimg2.insight.com/graphics/uk/content/microsite/overland/... · 2009-07-22 · Too often, Business Continuity (BC) and Disaster Recovery (DR)

Executive SummaryOne of the most important, but seemingly most overlooked, aspects of a completedata protection solution is a Business Continuity plan. The list of why’s are endless, andcommon: the cost of implementing a plan seems expensive, the planning itself takesmore time and follow through than budgets and IT departments allow for and finally,the question always remains, when will you actually use it?

Too often, Business Continuity (BC) and Disaster Recovery (DR) planning fall by the way-side for these exact reasons. However, business continuity is the most important plan-ning you can do . After all, what happens when you can’t get to your critical data?The trick is to use a little common sense and whittle the job down to practical terms thatcan be implemented, budgeted and executed in a timely manner. The goal is to neverhave to use BC and DR plans—other than verify that they work—but to have justenough insurance and resources to keep things going without hurting the operations ofthe business. The following whitepaper is a practical guide to building an affordable,but comprehensive, business continuity plan that is part of a an end-to-end data pro-tection solution.

Defining Business Continuity and Disaster RecoveryA solid business continuity plan is often required to make a disaster recovery plan viable.For this paper we will define them as:

Business ContinuityBusiness Continuity is the ability to keep vital business operations running in the event offailure in the existing infrastructure. Typically, when a part of the existing infrastructurefails, IT is expected to provide a response within a given time period, typically referred toas an SLA (Service Level Agreement). These failures can include power failures,application errors, network failures, data integrity issues, human error or any other issuewhere the majority of the infrastructure is still in place, but operations are halted andneed to resume.

Disaster RecoveryDisaster Recovery is when the infrastructure is significantly impacted and no longeravailable, impacting Business Continuity. This is most often a major natural disaster orother “act of God.” The number one requirement in DR is that the data is protected off-site. Data is the only thing truly unique in the data center. It is the only thing that can't bereadily replaced.

Don't Plan for the Perfect StormWe have all heard the “perfect storm” scenarios where a random series of events hasled to a seemingly inconceivable disaster. Often, papers on BC and DR site these majordisasters as the reason for planning and funding these programs, when the reality is thatthey don't happen often and when they do it is almost impossible to plan for everyimaginable disaster. The truth is, most BC/DR incidents are not full-blown natural disasters.According to Strategic Research Corp., the primary causes of BC/DR incidents are:

• 44% Hardware—A system component fails (can include servers, switches, disk drives,RAID controllers and/or another core piece of infrastructure) causing either a loss ofservices or failure to meet a Service Level Agreement (SLA).

Page 3: A Practical Guide to Business Continuityimg2.insight.com/graphics/uk/content/microsite/overland/... · 2009-07-22 · Too often, Business Continuity (BC) and Disaster Recovery (DR)

• 32% Human Error—The primary cause of human error is either a mistake in aconfiguration setting, or issuing the wrong command on a production system. This canusually be prevented with testing, but often these changes are being made on-the-flyand time does not allow for adequate testing. This happens frequently after hardwarereplacement.

• 14% Software and Firmware Errors—These failures are often related to operatingsystems hanging, driver incompatibilities and the introduction of new applications toservers that contend for resources.

• 7% Virus/Security Breach—The reality of IT is that malicious attacks happen, often assystem users download files from unsecured sites. These activities cause real problemsthat require a solid recovery plan. This is where the ability to maintain an RPO(Recovery Point Objective) isvital to bring systems back tothe time before the attackwithout any data corruption.

• 3% Natural Disaster—Naturaldisasters are often cited as aleading reason for BC and DRplanning but they represent arelatively small percentage ofactual BC and DR events.

Beyond the research related tothe leading causes of BC and DRevents, an IT manager needs to focus on having a plan and an agreed-upon responseto his company’s top concerns. We recommend building "what-if" scenarios and testingthem to see what you have, what you need and where the gaps exist. For example:

• What if we have lost our RAID array?• What if we have lost our primary network switch?• What if we have lost our tape library or Virtual Tape Library?• What if we have corrupted our finance, web, or email application server?• What if we have lost all power to the data center?• What if we only have our backup tapes and all the hardware is gone... how do werecover?• Regulatory SLAs for data protection are...?• What if we need to restore all data in (XX) minutes or hours?

Like most design efforts, greater value is achieved from time invested up front. The nextsection outlines a model to build a BC/DR plan.

Key Points of a BC and DR PlanTo state the obvious, the most important thing is to have a plan and to have set theproper expectations about the effectiveness of the plan based on business andtechnical resources. Next, is to conduct an initial test of the plan, review it on a regularbasis and conduct spot tests on a random basis. The base process for BC and DR plans issimple:

Page 4: A Practical Guide to Business Continuityimg2.insight.com/graphics/uk/content/microsite/overland/... · 2009-07-22 · Too often, Business Continuity (BC) and Disaster Recovery (DR)

3

Step 1: Define and AssessWhat are the priorities of your organization? Familiarize yourself with corporate goals andbe sure to get buy-in for your plan from the CEO and other key executives. Understandwhat issues you face from business, contractual and regulatory perspectives, and setthese as a minimum. Define how long your business can be without IT services. Onceyou have defined what needs to be done (key applications and services), and howlong you have to do it, (your Service Level Agreement), then the process becomesmore manageable, budget-friendly and executable.

Typically, the top items for small business are:• Keep Revenue Flowing• Keep Communications Up (Email, IM, Phones)• Keep Customers Engaged and Happy (Web Site, Service and Shipping)• Track the Transactions (Billing and Accounting)

Step 2: Research and RecruitNow that you know what needs to be protected and the SLAs required, talk to yourinternal sponsors, suppliers and partners to find out what tools and options are available,then build a set of key data (equipment, services, budget) to help get ready fordesigning a solution. This is a topic that has been well documented by other companiesand information on best practices is easily accessible online.

Step 3: DesignHit the whiteboard and map things out! Document the design, then outline the testingand ongoing monitoring of the plan and tools. This is where the greatest cost—and yourgreatest success—will come from. At this point, you should bring in not only technicalresources, but also business partners to test the impact and implications of the proposedplan.

Step 4: ImplementNow that you have a plan, process and budget, get the equipment and startimplementing the BC/DR changes. This will almost always be a phased approachdesigned to reduce impact on production systems. Implementation should be based onbusiness risk reduction, budget realities and technical capabilities. Make sure thisphased approach is supported by your business champions as well as the IT team.

Page 5: A Practical Guide to Business Continuityimg2.insight.com/graphics/uk/content/microsite/overland/... · 2009-07-22 · Too often, Business Continuity (BC) and Disaster Recovery (DR)

Step 5: ValidateOnce you have the first four steps implemented, we suggest two ways of validating theBC/DR plan. The first is from Jon Toigo, CEO of the Data Management Institute. Hesuggests walking around the data center and placing colored Post-it® notes onhardware throughout the room.For example, a yellow Post-it would signify a hardware failure, while a blue Post-it wouldsignify the application running on that device failed. Then gather your team andworkshop each step of the solution, ensuring that the team clearly understands whatneeds to be done in the Post-it scenario. The second way of validating your BC plan is toactually take your systems down, and then clearly document, step-by-step, the processto rebuild the application, the network or storage to an operational state. This way, youclearly understand how it affects users, the business and your customers.

Step 6: MaintainMake sure your have the tools and software in place to monitor the equipment andnetworks you need available for your BC and DR plans. Ensure that you have budgetedfor on-site spares or the proper service contracts.

Building the Plan and a BC/DR Score CardIt is easy to over-build a BC/DR plan, so we have provided a few simple tools to help youwork through the key "what-ifs" and to help with budget justification. The first tool is theBC/DR score card that will provide you with a checklist to help analyze your needs.Following is a sample BC/DR score card that has been completed. There is a blank scorecard for you to use at the end of this white paper.

4

Page 6: A Practical Guide to Business Continuityimg2.insight.com/graphics/uk/content/microsite/overland/... · 2009-07-22 · Too often, Business Continuity (BC) and Disaster Recovery (DR)

The second tool is a BC/DR justification tool. Onesignificant challenge for most BC and DR programsis getting budget. This interactive tool found atwww.overlandstorage.com/downtime will helpyou size the impact for not planning or investing toprotect the company. In a recent report fromNetwork World, "A conservative estimate fromGartner pegs the hourly cost of downtime forcomputer networks at $42,000, so a company thatsuffers from an average downtime of 175 hours peryear can lose more than $7 million per year."Obviously, this estimate is based on a largerorganization, but the cost of downtime is not just asingle number, it varies by applications andcompany size. Please visitwww.overlandstorage.com/downtime to plan acustomized cost for downtime for your company.

Building the Plan and a BC/DR Score CardIn a recent paper from ESG (Enterprise Strategy Group), they polled IT managers usingdisk-based backup to meet specific SLAs for key applications. 69 percent of respondentshad SLAs of less than two hours—mere minutes to recover data for key transactionalsystems such as CRM, ERP, eCommerce and other mission-critical applications. When itcomes to meeting SLAs, the storage part of BC and DR planning needs to provide threetypes of recovery:

• RTO (Recovery Time Objective) defined as how quickly IT managers can find, accessand retrieve required information.

• RPO (Recovery Point Objective) defined as the point-in-time (PIT) the IT managerneeds to be able to recover... or the requirement that he roll back to a specificmoment in time.

• RPO is a measure of the tolerance for data loss, in other words, how manytransactions or hours of work the company can tolerate losing.

• RGO (Recovery Granularity Objective) this is how granular the storage architecturerecovery needs to be: file level, block level or at a transaction level.

A combination of hardware and software is required to meet each of these recoveryobjectives.

Each combination enables IT managers to meet the three dimensions of recovery, andprovide the right level of support for each SLA. Disk-based backup is at the core of allthese BC SLAs, in order to provide high-speed access to data and provideinstantaneous recovery of data.

Page 7: A Practical Guide to Business Continuityimg2.insight.com/graphics/uk/content/microsite/overland/... · 2009-07-22 · Too often, Business Continuity (BC) and Disaster Recovery (DR)

Storage Networking and Consolidation: Partners in BC & DRMost IT managers have read about the benefits of storage networks (iSCSI or FibreChannel) to consolidate and share storage. Storage networking is also vital to buildinga cost-effective BC and DR plan. Storage consolidation centralizes resources, and stor-age networking ties them together so they can be migrated, replicated and protectedwithout impacting production systems and to support SLAs. Storage networking linksstorage islands inside data centers, remote sites and hot sites. Storage networking andconsolidation for BC and DR provide several other key advantages:

• Single Storage Protection Policy—Consolidated storage is the key to building a solidand sustainable BC and DR plan. It enables IT managers to build a single set of dataprotection policies and have them executed in a unified fashion. When the storage iscentralized it is easier to support SLAs with a single plan, it is easier to manage andeasier to validate BC and DR plans.

• Rapid Restore for SLAs—Storage networking, combined with disk-based backup, isvital to meet SLAs (that are often under one agreement) for business-criticalapplications such as transaction systems, web sites and email.

• Off-site DR—Again, the combination of storage networking and disk-based backupare vital to provide off-site DR and BC capabilities. The limited bandwidth oftelecommunication lines require high performance and low latency storage formaximizing the use of expensive Telco lines and to ensure synchronouscommunications.

• Regulatory Requirements—Almost every business has regulatory requirements relatedto data protection, security and application availability. For any public company andmany companies that supply government agencies, BC and DR plans arerequirements. The partnership of storage networking and storage consolidationprovides the tools required to share and replicate data locally and over distances.

Disk-Based Backup SolutionsThere are several options for disk-based backup to meet SLA requirements for RTO, RPOand RGO. As you would expect, the cost of these solutions is directly proportional toRTO, RPO and RGO access time objectives. The leading disk-based data protectionsolutions are:

Page 8: A Practical Guide to Business Continuityimg2.insight.com/graphics/uk/content/microsite/overland/... · 2009-07-22 · Too often, Business Continuity (BC) and Disaster Recovery (DR)

In the following chart, you can see how data protection solutions support RTO and RPOrequirements.

Building a Consolidated BC Plan with Overland StorageOverland Storage has worked with IT managers to build a complete storagearchitecture that can support RTO, RPO and RGO needs of SMBs. The Overland Storageproduct family can provide an integrated and modular approach to storage for BCand DR and provides a complete set of solutions to plan for your BC and DR needs.

REO SERIES® Business Continuity AppliancesLeveraging a solutions-centric approach to addressing the multiple market requirementssurrounding Business Continuity and Disaster Recovery, REO BCA includes server andapplication awareness, in addition to capacity-optimized block and file datamovement across IP-based networks.

• Disaster Recoveryo Local and Remote Simultaneouslyo Can create BCA mirrors over distanceso Supports virtualized environments: VMware, Hyper-V, XenServer

• ‘Near Zero’ RPO and RTOo Real-time Replication, leveraging CDP, provides granular recovery to address even the most aggressive Service Level Agreements (SLAs)

• Application and Data Aware Failover AND Failbacko Based on predetermined or ad-hoc business eventso Integrated with various applications including Windows/Linux File Services, Exchange and SQL

• Environmental profiling + Network and Storage Modeling/Reporting toolso Reports include data change rate, network bandwidth and I/O throughput

• Non-Disruptive Migrationo Server, Storage and Network Hardware Independento Physical to Physical (P2P), Physical to Virtual (P2V), Virtual to Physical (V2P), Virtual to Virtual (V2V)

Page 9: A Practical Guide to Business Continuityimg2.insight.com/graphics/uk/content/microsite/overland/... · 2009-07-22 · Too often, Business Continuity (BC) and Disaster Recovery (DR)

ULTAMUS® RAIDHigh availability RAID systems are the cornerstone of BC and DR plans. They are the firstline of defense in the BC plan. Overland's ULTAMUS RAID is ideal for businesses that needenterprise BC and DR features and capabilities at a cost-effective price.

• Storage Consolidation and Networking—ULTAMUS RAID helps SMBs consolidate andcentralize storage to optimize their BC and DR planning. With support for industry-standard 4 Gb Fibre Channel, ULTAMUS is ready to be the cornerstone of SMB BC andDR plans.

• High Performance and High Capacity—ULTAMUS RAID supports high performanceSAS for production applications and high capacity SATA drives for disk-based backup,storing snapshots and CDP data. This ability to tier storage in a single infrastructurelowers costs, improves ROI and simplifies BC and DR plans.

• Dynamic Scalable Storage—ULTAMUS RAID has the ability for expansion on demandwith chassis-to-chassis scalability up to 72 TB, in just 8U of rack space, with zeroperformance degradation and no downtime.

• Advanced Data Protection with RAID 6—Keeping your business running is the goal ofBC planning. RAID 6 is a new tool to support BC needs, since it enables RAID sets tosurvive the failure of two disk drives without losing data.

• High Availability with Redundancy—Fully redundant and hot swap criticalcomponents eliminate any single point of failure. ULTAMUS features dualactive/active RAID controllers and fully redundant critical components to give ITstorage managers the reassurance that their data will be available online, all of thetime.

• Easy to Own and Manage—Beyond standard RAID and volume administration,ULTAMUS RAID Manager includes multiple enclosure management from a singlewindow, intuitive LUN mapping and an extensive set of utilities for performanceoptimization, monitoring and statistics.

Snap Server® NAS AppliancesA broad range of NAS appliances that share a common GuardianOS operating systemand applications, Snap Server allow the deployment and interoperability of a completeNAS architecture from simple to complex IT organizations for a broad range of solutions.With advanced data protection features, including snapshot and capacity-optimizedsnapshot, Snap Servers are an excellent fit within any BC and DR plan.

•Ease of Implementationo With the web based management user interface (Web UI), GuardianOS powered Snap Servers are easy to implement and manage. From initial setup and data migration to ongoing capacity expansion with Instant Capacity Expansion (I.C.E) capabilities, the Web UI is found to be very intuitive and easy touse.

• Heterogeneous File Sharing Protocolso GuardianOS supports all native file sharing protocols including CIFS/SMB for Windows; NFS v2, v3 and v4 for UNIX environments, AFP for Mac OS environmentsand FTP and HTTP that support all standard computing environments.

• Block Level Protocols

Page 10: A Practical Guide to Business Continuityimg2.insight.com/graphics/uk/content/microsite/overland/... · 2009-07-22 · Too often, Business Continuity (BC) and Disaster Recovery (DR)

o iSCSI is provided by GuardianOS to support block-level interface programs such as database applications and Microsoft Exchange.

• RAID Data Protectiono GuardianOS supports many different types of RAID volumes to satisfy varying needs of performance, capacity and drive redundancy.

• Advanced Snapshot Capabilityo Snapshots can be incorporated as a central component of your BC and DR strategy to ensure that all data in every operation is internally consistent and thatno data is overlooked or skipped.

• Enterprise-level Replicationo Snap Enterprise Data Replicator™ (Snap EDR) is a high-performance network-optimized software solution for distributing, protecting, and reporting on business-critical information throughout distributed enterprises. It allows administrators to easily control the movement of files between local and remote GuardianOS-powered Snap Servers and Windows, Linux, and Solaris application servers from anywhere on the LAN or WAN. Snap EDR provides flexible scheduling as often as "up-to-the minute", variable bandwidth throttling and compression. These features enable Snap EDR to offer maximum data protection without impacting users and applications.

REO SERIES® Virtual Tape LibrariesOverland Storage's REO Virtual Tape Libraries (VTL) provides instant access to backupdata to accelerate support for regulatory requirements and SLAs. VTLs provide the abilityto cache data for traditional tape automation, as well as reduce the cost of archivemedia and storage by only making tapes for what is required by policy. VTLs are alsoideal for backup from remote sites or to DR hot-sites. VTLs enable a single backup anddata protection policy with high performance disk-based storage. Overland Storageprovides a full range of VTL solutions from the SMB to the enterprise. REO Backup andRecovery Appliances provide:

• Storage Consolidation and Networking—REO can consolidate all of your backupoperations into a single networked solution to consolidate data protection, andoptimizing your BC and DR planning.

• Flexible Functionality—REO appliances can be easily configured as a virtual tapelibrary, standalone virtual tape drives, and/or disk volumes (LUNs).

• Easy Integration—REO is compatible with all popular open systems or Windows-basedbackup software, physical tape drives or tape libraries. It easily connects to Ethernet(iSCSI) or Fibre Channel networks for seamless integration into existing backupenvironments.

• Disk Backup Performance—when configured as virtual tape, data is writtensequentially and in large blocks for fast, reliable disk backup and recovery operations.

• Dynamic Virtual Tape—patent pending technology, which automatically sizes virtualtape cartridges to match the exact capacity requirements of the backup operation.Ensures that no capacity is wasted and eliminates the manual effort and guesswork ofsizing, configuring, and provisioning virtual tape cartridges.

• Virtual Tape Library (VTL)—when configured as a VTL, provides user-definable virtualtape library drives, cartridges and partitions.

Page 11: A Practical Guide to Business Continuityimg2.insight.com/graphics/uk/content/microsite/overland/... · 2009-07-22 · Too often, Business Continuity (BC) and Disaster Recovery (DR)

NEO SERIES® and ARCvault® Tape AutomationTape is not dead…it still provides the lowest cost per GB and is the most portable form ofdata archive. Tape continues to be the leading media for off-site storage. With thecontinuous growth of storage and new regulations, Overland Storage's NEO andARCvault families of tape automation solutions provide:

• Storage Consolidation and Networking—NEO and ARCvault tape automationsolutions can consolidate all of your tape backup operations into a single networkedsolution to consolidate data protection to optimize your BC and DR planning.

• Ideal Long-term Storage for Archive—NEO and ARCvault libraries provide ideal long-term, low-cost data storage and archiving solution for organizations with largeamounts of mission-critical data. NEO libraries offer a robust, flexible, and reliablesolution that can meet the needs of the most demanding data center environments.

• Scalable Low-cost Storage—The NEO SERIES is designed to accommodate rapidgrowth and data center consolidation, with the ability to scale quickly to as many as1,000 cartridges on 2 to 24 tape drives. It provides almost limitless flexibility withadvanced nonstop operation, remote failover, and expansion-on-demandcapabilities. And it supports a wide range of connectivity options, including SCSI,Gigabit Ethernet, and Fibre Channel.

• Flexible Virtualization—The NEO line of tape automation solutions include a uniquevirtual interface architecture that makes it easy to use multiple interfacessimultaneously and implement future technologies as they become available.

SummaryBusiness continuity is a big issue for IT managers. It is at the heart of the services that ITmanagers provide for their businesses. Like most things, a little planning and preparationcan go a long way in making a BC or DR program successful. The best way to proceedis to break the problem down into smaller pieces so they can be more easily managed.Overland Storage provides the tools and solutions for SMB IT managers to implement acomplete and cost effective BC and DR program. We hope this paper and the toolsprovided will help you build a strong yet simple BC plan for your organization.

Learn more about the different Business Continuity/Disaster Recovery solutions at www.overlandstorage.com/Quest

Page 12: A Practical Guide to Business Continuityimg2.insight.com/graphics/uk/content/microsite/overland/... · 2009-07-22 · Too often, Business Continuity (BC) and Disaster Recovery (DR)

© 2009 Overland Storage, Inc. All trademarks and registered trademarks are the property of their respective owners. PG2BC-WP0609-04

Overland Storage provides affordable end-to-enddata protection solutions that are engineered tostore smarter, protect faster and extend anywhere— across networked storage, media types, andmulti-site environments.

Overland Storage products include award- win-ning NEO SERIES® and ARCvault™ tape libraries, REOSERIES® disk-based backup and recovery applianceswith VTL capabilities, Snap Server® NAS appliances,and ULTAMUS™ RAID high-performance, high-densitystorage. For more information, visitwww.overlandstorage.com.

About Overland Storage

WORLDWIDE HEADQUARTERS4820 Overland AvenueSan Diego, CA 92123 USATEL 1-800-729-8725

1-858-571-5555FAX 1-858-571-3664

UNITED KINGDOM (EMEA OFFICE)Overland House, Ashville Way Wokingham, BerkshireRG41 2PL EnglandTEL +44 (0) 118-9898000FAX +44 (0) 118-9891897

www.overlandstorage.com