Post on 12-Aug-2015
1
STORAGE VIRTUALIZATION: AN INSIDER’S GUIDE
Jon William Toigo CEO Toigo Partners International
Chairman Data Management Institute
Copyright © 2013 by the Data Management Institute LLC. All Rights Reserved. Trademarks and tradenames for products
discussed in this document are the property of their respective owners. Opinions expressed here are those of the author.
Copyright © 2013 by The Data Management Institute LLC. All Rights Reserved. 2
STORAGE VIRTUALIZATION:
AN INSIDER’S GUIDE
Part 4: The Data Protection Imperative
A confluence of three trends is making disaster preparedness and data
protection more important than ever before. These trends include the
increased use of server and desktop virtualization, growing legal and
regulatory mandates around data governance, privacy and preservation,
and increased dependency on automation in a challenging business
environment as a means to make fewer staff more productive.
Business continuity is now a mission-critical undertaking.
The good news is that storage virtualization can deliver the right tools to
ensure the availability of data assets – the foundation for any successful
business continuity or disaster recovery capability.
Copyright © 2013 by The Data Management Institute LLC. All Rights Reserved. 3
STORAGE VIRTUALIZATION: AN INSIDER’S GUIDE
The Data Protection Imperative
DATA PROTECTION MANAGEMENT: THE ESSENTIAL TASK OF BUSINESS CONTINUITY
Data protection and business continuity are subjects that nobody likes to talk about, but that
everyone in contemporary business and information technology must consider. Today, a
confluence of three trends – a kind of “perfect storm”— is making disaster protection planning
and disaster preparedness more important than ever before:
First is the increased use of server and desktop virtualization technologies in business
computing -- technologies that, for all their purported benefits, also have the downside
of being a risk multiplier. With hypervisor-based server hosting, the failure of one
hosted application can cause many other application “guests” to fail on the same
physical server. While other efficiencies may accrue to server hypervisor computing,
the risks that the strategies introduce must be clearly understood in order to avoid
catastrophic outcomes during operation.
A second trend underscoring the need for data protection and business continuity
planning is the growing regime of regulatory and legal mandates around data
preservation and privacy that affect a growing number of industry segments. Some of
these rules apply to nearly every company and most carry penalties if businesses cannot
show that reasonable efforts have been taken to safeguard data.
Third, and perhaps most compelling, is the simple fact that companies are more
dependent than ever before on the continuous operation of IT automation. In the
economic reality, the need to make fewer staff more productive has created a much
greater dependency on the smooth operation of information systems, networks and
storage infrastructure. Even a short term outage can have significant consequences for
the business.
Bottom line: for many companies, business continuity and data protection have moved from
nice-to-have to must-have status. Past debates over the efficacy of investments in
preparedness are increasingly moot.
Copyright © 2013 by The Data Management Institute LLC. All Rights Reserved. 4
Put simply, there is no “safe place” to construct a data center: historical data on weather and
seismic events, natural and man-made disaster potentials, and other catastrophic scenarios
demonstrate that all geographies are subject to what most people think of when they hear the
word “disaster.”
Moreover, from a statistical standpoint, big disasters – those with a broad geographical
footprint – represent only a small fraction of the overall causality of IT outages. Only about 5
percent of disasters are those cataclysmic events that grab a spot on the 24 hour cable news
Copyright © 2013 by The Data Management Institute LLC. All Rights Reserved. 5
channels. Most downtime is the result of equipment and software maintenance – some call it
planned downtime, though efforts are afoot to eliminate planned downtime altogether through
clustering and high availability engineering. The next big slices of the outage pie chart involve
problems that fall more squarely in the disaster category: those resulting software failures,
human errors (“carbon robots”), and IT hardware failures.
According to one industry study of 3000 firms in North America and Europe, IT outages in 2010
resulted in 127 million hours of downtime – equivalent to about 65,000 employees drawing
salaries without performing work for an entire year! The impact of downtime in tangible terms,
such as lost revenues, and intangible terms, such as lost customer confidence, can only be
estimated. One study placed idle labor costs across all industry verticals at nearly $1 million per
hour.
Despite this data, the truism still applies in disaster preparedness that fewer than 50% of
businesses have any sort of prevention and recovery capability. Of those that do, fewer than
50% actually test their plans – the equivalent of having no plan whatsoever.
The reasons are simple. First, planning requires money, time and resources whose allocation
may be difficult to justify given that the resulting capability may never need to be used.
Second, plans are typically difficult to manage, since effective planning typically involves
multiple data protection techniques and recovery processes that lack cost-effective testing and
validation methods. Third, many vendors have embued customers with a false sense of
security regarding the invulnerability of their product or architecture, conflating the notion of
Copyright © 2013 by The Data Management Institute LLC. All Rights Reserved. 6
high availability architecture with business continuity strategy (the former is actually a subset of
the latter).
Constructing the plan itself follows a well-
defined roadmap. Following an interruption
event, three things need to happen:
1. The data associated with critical
applications must be recovered to a
usable form.
2. Applications need to be re-
instantiated and connected to their
data.
3. Users need to be reconnected to their
re-hosted applications.
These three central tasks need to occur quickly as the duration of an interruption event is
usually what differentiates an inconvenience from a disaster. Taken altogether, the three tasks
may be measured using the metric “time to data.” Time to data, sometimes referred to as a
recovery time objective, is both the expression of the goal of a plan and a measure of the
efficacy of a strategy applied to realize that goal.
Data Recovery is Key
The process for building a comprehensive continuity capability requires book length
description. (One is being developed online as a free “blook” at book.drplanning.org.) The
much condensed version has three basic
components.
To do a good job of developing a
continuity capability, you need to know
your data – or more specifically, what
data belongs to what applications and
what business processes those
applications serve. Data and apps
“inherit” their criticality – their priority of
restore – from the business processes
that they serve. So, those relationships
must be understood.
Copyright © 2013 by The Data Management Institute LLC. All Rights Reserved. 7
The next step is to apply the right stratagems for data recovery, application re-hosting and
reconnecting users to each application and its data based on that earlier criticality assessment.
Third, plans must be tested – both routinely and on an ad hoc basis. Testing is the long-tail cost
of continuity plans, and the decisions we make about recovery objectives and the methods we
use to build recovery strategies need to take into account how these strategies will be tested to
see how we reduce the cost of a continuity program that virtually nobody wants to spend
money on.
As a practical matter, data recovery is almost always the slowest part of recovery efforts
following an outage – but this is contingent on a lot of things. First, how is data being
replicated? Is it backed up to tape, mirrored by software or disk array hardware to an
alternative hardware kit? Is the data accessible and in a good condition for restore at the
designated recovery site?
Chances are good that a company uses a mixture of data protection techniques today. That’s a
good thing, since data is not all the same
and budgetary sensibility dictates that
the most expensive recovery strategies
be applied only to the most critical data.
Still, planners need to ensure that the
approaches being taken are coordinated
and monitored on an ongoing basis.
From this perspective, a data protection
management service that provides a
coherent way to configure, monitor, and
manage the various data replication
functions would be a boon. With such a
service in place, it would be much
simpler to ascertain whether the right
data is being replicated, whether cost- and time-appropriate techniques are being applied
based on data criticality, and whether data is being replicated successfully on an on-going basis.
As a rule, a “built-in” service is superior to one that is “bolted on” when it comes to data
protection. It follows, therefore, that a data protection management service should be
designed into the storage infrastructure itself and in such a way as to enable its use across
heterogeneous hardware repositories.
Copyright © 2013 by The Data Management Institute LLC. All Rights Reserved. 8
Moreover, the ideal data protection management service should be able to manage different
types of data protection services such as those that are used today to provide “defense in
depth” to data assets.
Defense in depth is a concept derived from a realistic appraisal of the risks confronting data
assets. Different methods of protection may be required to safeguard assets against different
risks.
As illustrated above, data needs to be protected, first and foremost, against the most frequent
kinds of disaster potentials – those involving user and application errors that cause data
deletion or corruption. Many specialized technologies, represented by the red rectangle in the
illustration, have been developed to help meet this requirement. The most important,
arguably, is some sort of on-going local replication – sometimes called continuous data
protection or CDP. Ideally, CDP provides a way to roll data back to the moment before a
disruption event or error occurs, like rewinding a tape.
The second layer of protection provides protection against localized interruption events such as
hardware failures or facility-level events such as broken pipes in data center walls or ceilings,
HVAC outages in equipment rooms, etc. Typically, protection against localized faults involves
synchronous replication – that is replication in “real time” – between two different physical
Copyright © 2013 by The Data Management Institute LLC. All Rights Reserved. 9
repositories, usually across a low latency network. That could be a company LAN connecting
two or more arrays on the same raised floor, or in different buildings on a corporate campus or
at different sites interconnected by a metropolitan area network. Replicating locally provides
an alternative source for the data so that work can proceed with minimal interruption.
The third layer of data protection protects against a regional disaster, whether the failure of a
power grid or the impact of a severe weather event with a broad geographical footprint.
Recovering data in these circumstances typically requires asynchronous replication – that is,
replication across a wide area network to an alternative location well out of harm’s way. The
challenge of asynchronous replication is one of data deltas – difference between the state of
data in the production system and the state of replicated data in the recovery environment –
resulting from distance-induced latency and other factors. As a rule of thumb, for every 100
kilometers data travels in a WAN, the remote target is about 12 SCSI write operations behind
the source of the data. The effect of latency is cumulative and tends to worsen the further the
data travels. This, in turn, can have a significant impact on the usability of the recovery data set
and the overall recovery effort, so planners need a way to test asynchronous replication on an
on-going basis.
Bottom line: there are a lot of challenges to setting up effective defense in depth – especially
when the strategy involves the manual integration of many hardware and software processes,
often “bolted on” to infrastructure after the fact. Common challenges include a lack of visibility
into replication processes. (With most hardware based mirroring schemes, the only way to
check to see whether a mirror is working is to break the mirror and check both the source and
target for consistency. Checking a mirror is a hassle that nobody likes to do. As a result, a lot of
disk mirrors operate without validation.)
Another set of challenges relate to cost and logistical issues – especially in hardware-based
mirroring. Keeping the mirroring hardware itself synchronized in terms of how the two arrays
are divided into LUNs, what RAID levels are being applied, whether the two platforms have the
same firmware upgrades requires ongoing effort, time and resources.
Related to the above, the maintenance of local mirrors and remote replication strategies
typically requires tight coordination between server, storage and application administrators
and continuity planners that often doesn’t exist. If a set of LUNs are moved around for
production reasons, but this is not communicated to the business continuity planners and
accommodated in the replication strategy, replication issues will develop (mirroring empty
space, for example). The wrong time to find out about the mistake is when a disaster occurs!
Finally, there is the challenge of managing the testing of the data protection strategy holistically
– monitoring and managing the CDP and replication processes themselves and the coordinating
Copyright © 2013 by The Data Management Institute LLC. All Rights Reserved. 10
of all of the software processes, hardware processes, tape processes, disk mirrors, and so forth
that may be involved on an on-going basis. Without a coherent way to wrangle together all of
the protection processes, they can quickly become unwieldy and difficult to manage...not to
mention very expensive.
HOW STORAGE VIRTUALIZATION CAN HELP
Storage virtualization provides a solution to many of these challenges by building in a set of
services for data protection that are extensible to all hardware platforms and that can be
configured and managed from a single management console. To be more exact, storage
virtualization establishes a software-based abstraction layer – a virtual controller, if you will –
above storage hardware. In so doing, it creates an extensible platform on which shared,
centrally-managed storage services can be staged – including data protection management
services.
With storage virtualized, it is easy to pool storage resources into target volumes that can be
designated as repositories for different kinds of data. By segregating data and writing it onto
volumes in specific pools, services can be applied selectively to the data at either the volume or
the pool level.
Copyright © 2013 by The Data Management Institute LLC. All Rights Reserved. 11
Providing continuous data protection services to a specific volume is as easy as ticking a
checkbox in DataCore Software™ SANsymphony™-V storage hypervisor software. Equally
simple is the procedure for setting up a mirroring relationship between different volumes in
different pools, with synchronous replication for volumes within a metropolitan region and
asynchronous replication for volumes separated by longer distances.
Virtualized storage enables a wide range of data protection options, including the extension of
the entire storage infrastructure over distance…or into clouds. Perhaps the most important
benefit of this technology is the fact that, with products like DataCore SANsymphony-V, the
capabilities for defense in depth are delivered right out of the box. There is no need to cobble
together a number of third party software, application software, or hardware-driven processes:
data protection services are delivered, configured, managed and tested holistically with one
Copyright © 2013 by The Data Management Institute LLC. All Rights Reserved. 12
product. These services are built into infrastructure at the layer of the storage hypervisor,
rather than being bolted-on and separately managed.
Testing is also dramatically simplified, since replication processes can be paused at any time
and mirror sets can be validated without disrupting production systems. In fact, the capability
offered by DataCore Software to leverage remote copies as primary data repositories means
that the percentage of downtime currently accrued to “planned maintenance” can be all but
eliminated by switching to redundant storage infrastructure when you are performing
maintenance on your primary arrays.
CONCLUSION OF PART 4
There are many reasons to virtualize storage infrastructure, but one advantage that cannot be
overlooked is the utility of the strategy from the standpoint of data protection and business
continuity. Virtualized storage infrastructure avails itself of coherent and integrated processes
for data protection that can simplify configuration and maintenance, reduce testing costs, and
improve the likelihood of full recovery from data, localized and even regionalized interruption
events.
Data recovery is not the only component of successful business continuity, but it is an
extremely important one. Think about it: most resources in a business avail themselves of
recovery strategies based on either redundancy or replacement. Data, like personnel, are
irreplaceable. To protect your data, you need to replicate it and place the replica out of harm’s
way.