Replication for Business Continuity, Disaster Recovery and High Availability

© 2013 IBM Corporation

Replication for Business Continuity, Disaster Recovery and High AvailabilityTony Pearson – IBM Master Inventor and Senior Managing Consultant

March 2013

© 2013 IBM Corporation2

� Lost brand equity

� Loss of goodwill and trust

� Lost loyalty

� Lost revenue and market share

� Lost productivity

� Causes:

Everyone Knows: Downtime is Bad!


2013: continued acceleration of changes in today’s business world….

3|

Collaboration

Tru

st

Core Business

Subsidiary/JV

Customer

Partner/Channel

Supplier/OutsourcerIsolated

Operations

11

Select ‘Trusted Partners’

22

Extended Value Chain

33

Industry-Centric Value Web

44

Cross-Industry Value Coalition

55

Metcalf’s Law:Value of network

increasesproportional to square

of # people on it


Application 1Application 3

Analyticsreport

managementreports

http://xyz.xml

decisionpoint

MQseries

WebSphere

Application 2

SQL

db2

Businessprocess A

Businessprocess B

Businessprocess C

Businessprocess D

Businessprocess E

Businessprocess F

Businessprocess G

Inf

rast

ruct

ure

App

lica

tion

Bus

iness

1. An error occurs on a storage device that correspondingly corrupts a database

2. The error impacts the ability of two or more applications to share critical data

3. The loss of both applications affects two distinctly different business processes

IT Business Continuity must recover at the business processlevel

The “Business Process” is the Unit of Recovery


Overlap of valid data protection techniques

Protection of critical Business data Operations continue after a disaster

Costs are predictable and manageableRecovery is predictable and reliable

Fault-tolerant, failure-resistant streamlined infrastructure

with affordable cost foundation

1. High AvailabilityNon-disruptive backups and

system maintenance coupled with continuous availability of

applications

2. Continuous OperationsProtection against unplanned

outages such as disasters through reliable, predictable

recovery

3. Disaster Recovery

IT DataProtection


Production ☺ Network StaffOperations StaffOperations Staff

Data

Operating System

Physical Facilities

Telecom Network

Management Control

Execute hardware, O/S, data integrity recovery

AssessRPO

Software transactionintegrity recovery

Applications

Now we're done!

Applications Staff

Recovery Time Objective (RTO)of transaction integrity

Recovery Time Objective (RTO)of hardware data integrity

RPO

Outage

RecoverySite

ΔΔΔΔ Data

TimeRecovery Point

Objective (RPO).How much data

must be recreated?

Timeline of a Disaster Recovery


Tape Backup

SecsMinsHrsDaysWks Secs Mins Hrs Days Wks

Recovery PointRecovery Point Recovery TimeRecovery Time

Synchronous replication / HA

Periodic Replication

Asynchronousreplication

For example:

Technology drives the Recover Point Objective (RPO)


� Recovery Time includes:

– Fault detection

– Recovering data

– Bringing applications back online

– Network access

Manual Tape Restore

SecsMinsHrsDaysWks Secs Mins Hrs Days Wks

Recovery PointRecovery Point Recovery TimeRecovery Time

End to end automated clustering

Storage automation

For example:

Automation drives Recovery Time Objective (RTO)


Recovery Time Objective (guidelines only)

15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days

Cos

t / V

alue

BC Tier 4 –Point in Time replication to Backup/Restore

BC Tier 3 – VTL, Data De-Dup, Remote vault

BC Tier 2 – Tape libraries + Automation

BC Tier 7 –Server or Storage replication with end-t o-end automated server recovery

BC Tier 6 –real-time continuous data replication, s erver or storage

BC Tier 1 – Restore from Tape

BC Tier 5 –Application/database integration to Backup/Restore

Recovery from a disk image Recovery from tape copy

Balancing recovery time objective with cost / value

Business Continuity Tiers


Integration into IT ManageBusiness Prioritization

StrategyDesign

riskassessment

businessimpactanalysis

Risks,

Vulnerab

ilities

and Thre

ats

programassessment

Impacts

of

Outage

RTO/RPO

• Maturity Model

• Measure ROI

• Roadmap for Program

ProgramDesign

Current

Capab

ility

Implementprogram

validation

Estimate

d

Recove

ry Tim

e

ResilienceProgram

Management

Awareness, Regular Validation, Change Management, Quarterly Management Briefings

Business processes drive strategies and they are integral to the Continuity of Business Operations. A company cannot be resilient without having strategies for alternate workspace, staff members, call centers and communications channels.

crisis team

businessresumption

disasterrecovery

highavailability

1. People2. Processes3. Plans4. Strategies5. Networks6. Platforms7. Facilities

Database andSoftware design

High Availability Servers

Storage, Data Replication

High Availabilitydesign

Source: IBM STG, IBM Global Services

Ideal World for High Availability and Business Continuity (HA/BC)


The role of the basic “Data Strategy” for HA/BC purposes

� Define major data types “good enough”– i.e. by major application, by business line….– An ongoing journey

� For each data type:– Usage– Performance and measurement– Security– Availability– Criticality– Organizational role– Who manages– What standards for this data

• What type storage deployed on• What database • What virtualization

� Be pragmatic– Create a basic, “good enough” data strategy for HA/BC purposes

� Acquire tools that help you know your data

Data Strategy Defined

Business Strategies

IT Strategy

Data Strategy

Enterprise IT Architecture

IT Infrastructure

People

Process

Structure

Data

Technology

Data Strategy

You have toknow your data

And have abasic strategy

for it


A basic data strategy tells you how to categorize y our data -looks something like this (step by step):

� Mission – critical data – Mission-critical data that is the highest priority dtaa– Priority = uptime, with high value justification

Lower cost

� Subset of data that is either mission-critical or supports mission critical

– Data that supports business lines – Balanced priorities = Uptime and cost/value

� Knowledge of user and application data– All data, whether active or not…. – Which eventually needs to be archived, retained– Priority = cost

Mission Critical

Not easy to know and categorize your data -But is the only foundation possible

Virtualized Storage


Then, your basic data strategy allows you to scopeyour HA/BC – something like this:

� Continuous Availability (CA) – Finally, create the mission-critical subset with highest level of recovery– RTO = near continuous, RPO = small as possible (Tier 7)– Priority = uptime, with high value justification

Lower cost

� Rapid Data Recovery (RDR)– Then create separate storage pools as required– RTO = minutes, to (approx. range): 2 to 6 hours– BC Tiers 4, 5 and 6– Balanced priorities = Uptime and cost/value

� Backup/Restore (B/R)– Virtualize, optimize cost, lay recovery capability foundation – Provide universal 24 hour - 12 hour (approx) recovery capability– Address requirements for archival, compliance, green energy– Priority = cost

Mission Critical

Know and categorize your data -This is where virtualization is the enabler

VirtualizedStorage

Not easy to know and categorize your data -But is the only foundation possible


Rule of Thumb for continuous replication bandwidth

� Rule of Thumb:– Every 1 TB of mirrored disk storage generates

about this much MB/sec of writes:

� OLTP– 1-2 MB/sec of write bandwidth

� Sequential/batch– 6-7 MB/sec of write bandwidth

� Expect minimum 2.5x this to handle peaks

� Expect normal data compression to be about 2:1

� Example - you have 10 TB of disk to mirror :– OLTP: 10-20 MB/sec – Batch/sequential: 60-70 MB/sec ROT:

one OC3 line = 15 MB/sec raw Effective transfer rate


Short distance synchronous mirroring: 2 site

S

FP

Short distance may not meet DR requirements

Ability to utilize server capacity in both sites for single instance of application data

Potential/ability for non-disruptive failover

Hardware solution gives data consistency between multiple servers/applications and single management point

Additional copy of data might be provided for testing or testing may be done by regular switch of sites


1

0

0

0

0

1

0

S FP

Longer distance to meet regulatory requirements and protect against regional events Ability to utilize server

capacity in both sites for applications with separate/independent data

Disruptive failover and less potential to use DR solution for continuous availability

Asynchronous replication more likely due to performance requirements

SF P

Additional copy of data more likely to be provided for testing

Long Distance Mirroring: 2 site


•Write to primary volume•The primary site initiates an I/O to the secondary site to transfer the data

•Secondary indicates to the primary that the write is complete

•Primary acknowledges to the host application that the write is complete

•Round-trip latency added to each Write I/O

•Write to primary volume•The primary site acknowledges to the host application that the write is complete

Some later time:•The primary site initiates an I/O to the secondary site to transfer the data

•Secondary indicates to the primary that the write is complete

•Primary and secondary bitmap updated that data is in sync

2

3

1

43

4

1

2

Server I/OServer I/O

Metro MirrorSynchronous <300 km

Global MirrorAsynchronous (any distance)

P SP S

Sync versus Async


3-Site Configurations

CampusLocal-1, Local-2

Remote-3

Local-1

Bunker-2

Remote-3 Remote-3

Remote-2Local-1


FastBack for Workstations

•FastBack

•TSM ServerWAN

•Remote Office(s) •Data Center•DR Operations

Archive / Off Site

ProtecTIER

Tiers of Storage

Information Archive

FastBack

ApplicationsFile Servers

VMware Servers

•TSM Clients•TDPs

•Mobile Offices

•FlashCopy•Manager

Centralized Administration• Install / Upgrade• Monitoring• Reporting

• Configuration• Set Policies• Execute Backup / Restore

Cloud Gateway

•TSM Server

•TSM VE

•TSM Clients•TDPs

•DR

Critical ApplicatServers

CriticalApplicat

VMwareServers

ApplicationsFile Servers

VMware Servers

WAN

Cloud Storage

“TSM is the grand-daddy of unified recovery managem ent” --Lauren Whitehouse, Enterprise Strategy Group

Tiers of Storage

Tivoli Storage Manager an integrated, end-to-end data protection and unified recovery management solution


Summary

� Understand today’s best practices

– for IT High Availability and Business Continuity

� Strategies for:– Requirements, design,

implementation– In-house vs. out-sourcing

� Step by step methodology– Essential role of virtualization– IBM technologies for replication

and replication management

20


Resources and Information

� IBM Redbook: Business Continuity Planning Guide

� http://www.redbooks.ibm.com/abstracts/sg246547.html

� In particular, chapters 3, 6, 7


About the Speaker

Mr. Tony Pearson

Master Inventor,

Senior Managing Consultant

IBM System Storage

Tony Pearson is a Master Inventor and Senior managing consultant for the IBM System Storage™ product line. Tony joined

IBM Corporation in 1986 in Tucson, Arizona, USA, and has lived there ever since. In his current role, Tony presents briefings on storage topics covering the entire System Storage product line, Tivoli storage software products, and topics related to Cloud

Computing. He interacts with clients, speaks at conferences and events, and leads client workshops to help clients with strategic planning for IBM’s integrated set of storage management software, hardware, and virtualization products.

Tony writes the “Inside System Storage” blog, which is read by hundreds of clients, IBM sales reps and IBM Business Partners every week. This blog was rated one of the top 10 blogs for the IT storage industry by “Networking World” magazine, and #1

most read IBM blog on IBM’s developerWorks. The blog has been published in series of books, Inside System Storage: Volume I through V.

Over the past years, Tony has worked in development, marketing and customer care positions for various storage hardware and software products. Tony has a Bachelor of Science degree in Software Engineering, and a Master of Science degree in

Electrical Engineering, both from the University of Arizona. Tony holds 19 IBM patents for inventions on storage hardware and software products.

9000 S. Rita RoadBldg 9032 Room 1238Tucson, AZ 85744

+1 520-799-4309 (Office)

[email protected]

Tony Pearson

Master Inventor, Senior Managing Consultant

IBM System Storage™


Additional Resources

24

Email:[email protected]

Twitter:http://twitter.com/az99Øtony

Blog: http://ibm.co/brAeZØ

Books:http://www.lulu.com/spotlight/99Ø_tony

IBM Expert Network:http://www.slideshare.net/az99Øtony

24


Trademarks and disclaimersAdobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.

Other product and service names might be trademarks of IBM or other companies. Information is provided "AS IS" without warranty of any kind.

The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.

Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.

All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.

Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business Partner for the most current pricing in your geography.

Photographs shown may be engineering prototypes. Changes may be incorporated in production models.

© IBM Corporation 2013. All rights reserved.

References in this document to IBM products or services do not imply that IBM intends to make them available in every country.

Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml. ZSP03490-USEN-00

Replication for Business Continuity, Disaster Recovery and High Availability

Documents

Transcript of Replication for Business Continuity, Disaster Recovery and High Availability