Insider's Guide- The Data Protection Imperative

1

STORAGE VIRTUALIZATION: AN INSIDER’S GUIDE

Jon William Toigo CEO Toigo Partners International

Chairman Data Management Institute

Copyright © 2013 by the Data Management Institute LLC. All Rights Reserved. Trademarks and tradenames for products

discussed in this document are the property of their respective owners. Opinions expressed here are those of the author.

Copyright © 2013 by The Data Management Institute LLC. All Rights Reserved. 2

STORAGE VIRTUALIZATION:

AN INSIDER’S GUIDE

Part 4: The Data Protection Imperative

A confluence of three trends is making disaster preparedness and data

protection more important than ever before. These trends include the

increased use of server and desktop virtualization, growing legal and

regulatory mandates around data governance, privacy and preservation,

and increased dependency on automation in a challenging business

environment as a means to make fewer staff more productive.

Business continuity is now a mission-critical undertaking.

The good news is that storage virtualization can deliver the right tools to

ensure the availability of data assets – the foundation for any successful

business continuity or disaster recovery capability.


STORAGE VIRTUALIZATION: AN INSIDER’S GUIDE

The Data Protection Imperative

DATA PROTECTION MANAGEMENT: THE ESSENTIAL TASK OF BUSINESS CONTINUITY

Data protection and business continuity are subjects that nobody likes to talk about, but that

everyone in contemporary business and information technology must consider. Today, a

confluence of three trends – a kind of “perfect storm”— is making disaster protection planning

and disaster preparedness more important than ever before:

First is the increased use of server and desktop virtualization technologies in business

computing -- technologies that, for all their purported benefits, also have the downside

of being a risk multiplier. With hypervisor-based server hosting, the failure of one

hosted application can cause many other application “guests” to fail on the same

physical server. While other efficiencies may accrue to server hypervisor computing,

the risks that the strategies introduce must be clearly understood in order to avoid

catastrophic outcomes during operation.

A second trend underscoring the need for data protection and business continuity

planning is the growing regime of regulatory and legal mandates around data

preservation and privacy that affect a growing number of industry segments. Some of

these rules apply to nearly every company and most carry penalties if businesses cannot

show that reasonable efforts have been taken to safeguard data.

Third, and perhaps most compelling, is the simple fact that companies are more

dependent than ever before on the continuous operation of IT automation. In the

economic reality, the need to make fewer staff more productive has created a much

greater dependency on the smooth operation of information systems, networks and

storage infrastructure. Even a short term outage can have significant consequences for

the business.

Bottom line: for many companies, business continuity and data protection have moved from

nice-to-have to must-have status. Past debates over the efficacy of investments in

preparedness are increasingly moot.


Put simply, there is no “safe place” to construct a data center: historical data on weather and

seismic events, natural and man-made disaster potentials, and other catastrophic scenarios

demonstrate that all geographies are subject to what most people think of when they hear the

word “disaster.”

Moreover, from a statistical standpoint, big disasters – those with a broad geographical

footprint – represent only a small fraction of the overall causality of IT outages. Only about 5

percent of disasters are those cataclysmic events that grab a spot on the 24 hour cable news


channels. Most downtime is the result of equipment and software maintenance – some call it

planned downtime, though efforts are afoot to eliminate planned downtime altogether through

clustering and high availability engineering. The next big slices of the outage pie chart involve

problems that fall more squarely in the disaster category: those resulting software failures,

human errors (“carbon robots”), and IT hardware failures.

According to one industry study of 3000 firms in North America and Europe, IT outages in 2010

resulted in 127 million hours of downtime – equivalent to about 65,000 employees drawing

salaries without performing work for an entire year! The impact of downtime in tangible terms,

such as lost revenues, and intangible terms, such as lost customer confidence, can only be

estimated. One study placed idle labor costs across all industry verticals at nearly $1 million per

hour.

Despite this data, the truism still applies in disaster preparedness that fewer than 50% of

businesses have any sort of prevention and recovery capability. Of those that do, fewer than

50% actually test their plans – the equivalent of having no plan whatsoever.

The reasons are simple. First, planning requires money, time and resources whose allocation

may be difficult to justify given that the resulting capability may never need to be used.

Second, plans are typically difficult to manage, since effective planning typically involves

multiple data protection techniques and recovery processes that lack cost-effective testing and

validation methods. Third, many vendors have embued customers with a false sense of

security regarding the invulnerability of their product or architecture, conflating the notion of


high availability architecture with business continuity strategy (the former is actually a subset of

the latter).

Constructing the plan itself follows a well-

defined roadmap. Following an interruption

event, three things need to happen:

1. The data associated with critical

applications must be recovered to a

usable form.

2. Applications need to be re-

instantiated and connected to their

data.

3. Users need to be reconnected to their

re-hosted applications.

These three central tasks need to occur quickly as the duration of an interruption event is

usually what differentiates an inconvenience from a disaster. Taken altogether, the three tasks

may be measured using the metric “time to data.” Time to data, sometimes referred to as a

recovery time objective, is both the expression of the goal of a plan and a measure of the

efficacy of a strategy applied to realize that goal.

Data Recovery is Key

The process for building a comprehensive continuity capability requires book length

description. (One is being developed online as a free “blook” at book.drplanning.org.) The

much condensed version has three basic

components.

To do a good job of developing a

continuity capability, you need to know

your data – or more specifically, what

data belongs to what applications and

what business processes those

applications serve. Data and apps

“inherit” their criticality – their priority of

restore – from the business processes

that they serve. So, those relationships

must be understood.


The next step is to apply the right stratagems for data recovery, application re-hosting and

reconnecting users to each application and its data based on that earlier criticality assessment.

Third, plans must be tested – both routinely and on an ad hoc basis. Testing is the long-tail cost

of continuity plans, and the decisions we make about recovery objectives and the methods we

use to build recovery strategies need to take into account how these strategies will be tested to

see how we reduce the cost of a continuity program that virtually nobody wants to spend

money on.

As a practical matter, data recovery is almost always the slowest part of recovery efforts

following an outage – but this is contingent on a lot of things. First, how is data being

replicated? Is it backed up to tape, mirrored by software or disk array hardware to an

alternative hardware kit? Is the data accessible and in a good condition for restore at the

designated recovery site?

Chances are good that a company uses a mixture of data protection techniques today. That’s a

good thing, since data is not all the same

and budgetary sensibility dictates that

the most expensive recovery strategies

be applied only to the most critical data.

Still, planners need to ensure that the

approaches being taken are coordinated

and monitored on an ongoing basis.

From this perspective, a data protection

management service that provides a

coherent way to configure, monitor, and

manage the various data replication

functions would be a boon. With such a

service in place, it would be much

simpler to ascertain whether the right

data is being replicated, whether cost- and time-appropriate techniques are being applied

based on data criticality, and whether data is being replicated successfully on an on-going basis.

As a rule, a “built-in” service is superior to one that is “bolted on” when it comes to data

protection. It follows, therefore, that a data protection management service should be

designed into the storage infrastructure itself and in such a way as to enable its use across

heterogeneous hardware repositories.


Moreover, the ideal data protection management service should be able to manage different

types of data protection services such as those that are used today to provide “defense in

depth” to data assets.

Defense in depth is a concept derived from a realistic appraisal of the risks confronting data

assets. Different methods of protection may be required to safeguard assets against different

risks.

As illustrated above, data needs to be protected, first and foremost, against the most frequent

kinds of disaster potentials – those involving user and application errors that cause data

deletion or corruption. Many specialized technologies, represented by the red rectangle in the

illustration, have been developed to help meet this requirement. The most important,

arguably, is some sort of on-going local replication – sometimes called continuous data

protection or CDP. Ideally, CDP provides a way to roll data back to the moment before a

disruption event or error occurs, like rewinding a tape.

The second layer of protection provides protection against localized interruption events such as

hardware failures or facility-level events such as broken pipes in data center walls or ceilings,

HVAC outages in equipment rooms, etc. Typically, protection against localized faults involves

synchronous replication – that is replication in “real time” – between two different physical


repositories, usually across a low latency network. That could be a company LAN connecting

two or more arrays on the same raised floor, or in different buildings on a corporate campus or

at different sites interconnected by a metropolitan area network. Replicating locally provides

an alternative source for the data so that work can proceed with minimal interruption.

The third layer of data protection protects against a regional disaster, whether the failure of a

power grid or the impact of a severe weather event with a broad geographical footprint.

Recovering data in these circumstances typically requires asynchronous replication – that is,

replication across a wide area network to an alternative location well out of harm’s way. The

challenge of asynchronous replication is one of data deltas – difference between the state of

data in the production system and the state of replicated data in the recovery environment –

resulting from distance-induced latency and other factors. As a rule of thumb, for every 100

kilometers data travels in a WAN, the remote target is about 12 SCSI write operations behind

the source of the data. The effect of latency is cumulative and tends to worsen the further the

data travels. This, in turn, can have a significant impact on the usability of the recovery data set

and the overall recovery effort, so planners need a way to test asynchronous replication on an

on-going basis.

Bottom line: there are a lot of challenges to setting up effective defense in depth – especially

when the strategy involves the manual integration of many hardware and software processes,

often “bolted on” to infrastructure after the fact. Common challenges include a lack of visibility

into replication processes. (With most hardware based mirroring schemes, the only way to

check to see whether a mirror is working is to break the mirror and check both the source and

target for consistency. Checking a mirror is a hassle that nobody likes to do. As a result, a lot of

disk mirrors operate without validation.)

Another set of challenges relate to cost and logistical issues – especially in hardware-based

mirroring. Keeping the mirroring hardware itself synchronized in terms of how the two arrays

are divided into LUNs, what RAID levels are being applied, whether the two platforms have the

same firmware upgrades requires ongoing effort, time and resources.

Related to the above, the maintenance of local mirrors and remote replication strategies

typically requires tight coordination between server, storage and application administrators

and continuity planners that often doesn’t exist. If a set of LUNs are moved around for

production reasons, but this is not communicated to the business continuity planners and

accommodated in the replication strategy, replication issues will develop (mirroring empty

space, for example). The wrong time to find out about the mistake is when a disaster occurs!

Finally, there is the challenge of managing the testing of the data protection strategy holistically

– monitoring and managing the CDP and replication processes themselves and the coordinating


of all of the software processes, hardware processes, tape processes, disk mirrors, and so forth

that may be involved on an on-going basis. Without a coherent way to wrangle together all of

the protection processes, they can quickly become unwieldy and difficult to manage...not to

mention very expensive.

HOW STORAGE VIRTUALIZATION CAN HELP

Storage virtualization provides a solution to many of these challenges by building in a set of

services for data protection that are extensible to all hardware platforms and that can be

configured and managed from a single management console. To be more exact, storage

virtualization establishes a software-based abstraction layer – a virtual controller, if you will –

above storage hardware. In so doing, it creates an extensible platform on which shared,

centrally-managed storage services can be staged – including data protection management

services.

With storage virtualized, it is easy to pool storage resources into target volumes that can be

designated as repositories for different kinds of data. By segregating data and writing it onto

volumes in specific pools, services can be applied selectively to the data at either the volume or

the pool level.


Providing continuous data protection services to a specific volume is as easy as ticking a

checkbox in DataCore Software™ SANsymphony™-V storage hypervisor software. Equally

simple is the procedure for setting up a mirroring relationship between different volumes in

different pools, with synchronous replication for volumes within a metropolitan region and

asynchronous replication for volumes separated by longer distances.

Virtualized storage enables a wide range of data protection options, including the extension of

the entire storage infrastructure over distance…or into clouds. Perhaps the most important

benefit of this technology is the fact that, with products like DataCore SANsymphony-V, the

capabilities for defense in depth are delivered right out of the box. There is no need to cobble

together a number of third party software, application software, or hardware-driven processes:

data protection services are delivered, configured, managed and tested holistically with one


product. These services are built into infrastructure at the layer of the storage hypervisor,

rather than being bolted-on and separately managed.

Testing is also dramatically simplified, since replication processes can be paused at any time

and mirror sets can be validated without disrupting production systems. In fact, the capability

offered by DataCore Software to leverage remote copies as primary data repositories means

that the percentage of downtime currently accrued to “planned maintenance” can be all but

eliminated by switching to redundant storage infrastructure when you are performing

maintenance on your primary arrays.

CONCLUSION OF PART 4

There are many reasons to virtualize storage infrastructure, but one advantage that cannot be

overlooked is the utility of the strategy from the standpoint of data protection and business

continuity. Virtualized storage infrastructure avails itself of coherent and integrated processes

for data protection that can simplify configuration and maintenance, reduce testing costs, and

improve the likelihood of full recovery from data, localized and even regionalized interruption

events.

Data recovery is not the only component of successful business continuity, but it is an

extremely important one. Think about it: most resources in a business avail themselves of

recovery strategies based on either redundancy or replacement. Data, like personnel, are

irreplaceable. To protect your data, you need to replicate it and place the replica out of harm’s

way.

Insider's Guide- The Data Protection Imperative

Documents

Transcript of Insider's Guide- The Data Protection Imperative