Download - Vol.10 No.2 April 2011 VirtualDR

Storage IssueManaging the information that drives the enterprise
STORAGE Vol. 10 No. 2 April 2011
INSIDE THIN PROVISIONING • STORAGE FOR EXCHANGE 2010
Virtual DR Disaster recovery can be tough, but virtualized servers and storage virtualization can make DR easier and a lot more effective.
ALSO INSIDE Massive amounts of data will bury us!
Cast of cloud storage players taking shape
Optimize bandwidth for backup
STORAGEinside |April 2011
STORAGEinside | april 2011
Gotta yottabyte? 5 EDITORIAL Four different news reports all point to the same fact:
Data is growing uncontrollably. It’s time for storage shops to start cleaning house. by RICH CASTAGNA
Some clarity for enterprise cloud storage 9 STORWARS Cloud storage is a next-generation IT infrastructure that’s
altering the data storage landscape. And its cast of key players is beginning to take shape. by TONY ASARO
Virtual disaster recovery 12 Whether used singly or combined, server virtualization and storage
virtualization are making an impact on the ability of IT to deliver disaster recovery, and to do so cost effectively. by LAUREN WHITEHOUSE
Thin provisioning in depth 21 Thin provisioning can help you use your disk capa city much more
efficiently, but you need to get under the hood to understand how the technology might work in your environment. by STEPHEN FOSKETT
Exchange 2010 and storage systems 32 With Exchange Server 2010, Microsoft made some significant changes
to the email app’s database structure, and those changes may also affect the storage it resides on. by BRIEN M. POSEY
Backing up to the cloud requires new approach to bandwidth 43 HOT SPOTS A lot of attention has been focused on security issues
related to cloud backup, but bandwidth and transfer issues may be bigger problems. by LAUREN WHITEHOUSE
Don’t let the cloud obscure good judgment 47 READ/WRITE Cloud storage is likely to become a significant part of
your data storage infrastructure. But test the waters before locking into a vendor. by ARUN TANEJA
Fibre Channel still top dog among disks 51 SNAPSHOT What kind of data drives are you using? Are they 6 Gig SAS?
Solid state? Or good old Fibre Channel (FC)? More than half of the companies in our survey favor FC for their top tier. by RICH CASTAGNA
From our sponsors 53 Useful links from our advertisers.
EMC2 and EMC are registered trademarks or trademarks of EMC Corporation in the United States and other countries. © Copyright 2011 EMC Corporation. All rights reserved.
EMC BACKUP SHOWCASE
on Wednesday, April 27.
New product advancements, industry insights, and live interaction — it’s all here.
MORE INFORMATION / REGISTER >>
Dealing with data growth
Cloud storage shaping up
5
tHIS JUST IN: Earth knocked off its axis due to weight of 295 exabytes of data! OK, maybe we’re just wobbling on our axis a little bit, but that’s a heckuva lot of data, and you’re going to need an awful lot of disks, chips, tape, paper and anything else that might hold a petabyte here and there to accommodate it all.
That number—295 exabytes—was reported in an article in Science Express, a journal published by the American Association for the Advancement of Science. The authors used some pretty complex computations to come up with that number, which they actually define as the amount of data we were able to store in 2007. Science Express looks like a pretty serious pub—among the other articles in the same issue were “Tomography of Reaction-Diffusion Microemulsions Reveals Three-Dimensional Turing Patterns” and “Dynamic Control of Chiral Space in a Catalytic Asymmetric Reaction Using a Molecular Motor.” These folks aren’t fooling around . . . and the fact they didn’t round the figure off to 300 exabytes adds a little edge of precision that impresses stat junkies like me.
Not only do we have to find a place to put all that data, but we’re probably going to have to back it up and then stash away a copy or two for disaster recovery. So that 295 exabytes could turn into a few zettabytes of data. Can yottabytes of data be far behind?
MORE THIS JUST IN: According to IDC, in the fourth quarter of 2010, “total disk storage systems capacity shipped reach 5,127 petabytes, growing 55.7% year over year.” According to my seventh-grade math, that’s approximately 290 exabytes short of what we need, but it’s still a lot of disk.
Sooner or later, we’re going to have to learn how to throw some of this stuff away. Once the attic and basement are crammed full, and little 0s and 1s are spilling out of the cupboards, we won’t have any room for new data. What happens then? Your shop might not be in exabyte territory yet, but a surprising number of companies have crossed the petabyte threshold, and coping with capacity is an ongoing struggle even for shops with far more modest amounts of data to store.
editorial | rich castagna
Gotta yottabyte? If you thought yottabytes were some kind of
snack food, you’re in for a rude awakening . . . but right now we’ve got exabytes to deal with.
Sooner or later, we’re going to have to learn how to throw some of this stuff away.
6 Storage April 2011
The problem is that data housecleaning tools either don’t exist or aren’t up to the task at hand. Knowing what can be deep-sixed and what needs to be preserved means you need to know what you have in the first place. Few available products can give you much insight into the state of your data stores. A few years ago, it looked like data classification was poised to catch on—if not as a product category then as an underpinning technology for a raft of storage management chores, like identifying data that belongs in the trash bin. Classification pretty much fizzled out, but maybe it can make a come- back now that we have renewed interest in automated storage tiering.
So what do you do? You can ask your users to clean up their acts by voluntarily deleting all that useless old stuff. Somebody would listen, right? I’ll be the first to admit that probably half of what I produce can eventually end up in the data dump- ster without any profound effect on humanity, my company or anyone for that matter. You could try data storage quotas that limit what each user can save; quotas can work, but you’ll become the second least popular person in your company, right after the guy who’s been stealing everyone’s lunch from the cafeteria fridge.
EVEN MORE THIS JUST IN: According to CBC News, the Government of Canada is getting very serious about reducing the amount of data it stores: “The fed- eral government has ordered a monster machine to chew up its discarded hard drives, USB thumb drives, CDs, and even ancient Beta videotapes.” Why didn’t I think of that? It’s a perfect solution: a monster machine that eats data. Let’s just hope it has a healthy enough appetite to eat 295 exabytes or has some hungry monster friends. But one CBC News reader had another idea: “It would be easier and cheaper just to buy them sledgehammers.” That’d work, too.
Or maybe just put everything on solid-state storage. An article in the Journal of Digital Forensics, Security and Law ominously titled “Solid State Drives: The Beginning of the End for Current Practice in Digital Forensic Re- covery?”, suggests that you don’t really have to worry if you’re drowning in data, just put it on solid-state devices. At the end of the article’s abstract, the two Australian authors wrote: “Our experimental findings demonstrate that solid-state drives (SSDs) have the capacity to destroy evidence cata-
I know it’s counter- intuitive for storage vendors to promote technologies that help their customers buy less stuff from them, but maybe there’s a wily little startup out there with a great idea and a useful tool that can stem the data tide.
strophically under their own volition, in the absence of specific instructions to do so from a computer.” Good news if you need to put your data on a diet, I guess, but not so good for the solid-state storage industry.
Talk about connecting the dots; stories about oceans of data, skyrocketing disk sales, data munching machines and not-so-solid-state storage, and all in the same week. Maybe it’s an omen. Maybe it’s time to look for a real solution to soaring data stores (and the associated soaring cost of keeping it all) instead of just throwing more disk, tape or chips at the problem. I know it’s counter- intuitive for storage vendors to promote technologies that help their customers buy less stuff from them, but maybe there’s a wily little startup out there with a great idea and a useful tool that can stem the data tide. 2
Rich Castagna ([email protected]) is editorial director of the Storage Media Group.
* Click here for a sneak peek at what’s coming up in the May 2011 issue.
STORAGE
server rooms that require GPs NaviGatioN.
We get that virtualization can drive a better ROI. Highly certified by Microsoft, VMware, HP and others, we can evaluate, design and implement the right solution for you. We’ll get you out of this mess at CDW.com/virtualization
soLveD.
©2011 CDW LLC. CDW®, CDW•G® and PeOPLe WHO Get It™ are trademarks of CDW LLC.
fOR THE LAST year or so, cloud storage has been on a roller coaster ride in terms of hype, buzz and, yes, plenty of skepticism. But it’s not just a passing fad. Private clouds aren’t just another way of talking about old IT stuff in a new way. And public clouds are real alternatives and becoming pervasive.
Cloud storage is a next-generation IT infrastructure that’s altering the data storage landscape. The changes will happen in obvious and not-so-obvious ways. The obvious ways include using cloud storage for backup, disaster recovery and archiving. The less obvious ways include new business models and applications being developed specifically for the cloud.
EMC Atmos offers private and public cloud storage. And even though EMC has been at it for a while, I personally haven’t run into any companies using that technology. I know they’re out there, but the only Atmos user I’ve spoken to is using it as a Centera replacement. NetApp is offering FAS in the cloud with major service providers and giving big incentives to NetApp salespeople to drive this business. True to form, NetApp is leveraging FAS for everything, including cloud storage. EMC’s and NetApp’s strategies are polar opposites—EMC has a number of different storage systems with little synergy among them even as they sometimes overlap, and NetApp has a single storage system it uses for everything. Interestingly, while they have diametrically opposed strategies, they’re both very successful overall and you can’t argue with success.
Hitachi has positioned its entire storage portfolio for cloud storage, and Dell is leveraging its DX solution. Hewlett-Packard hasn’t announced anything significant, and IBM doesn’t have anything substantial to speak of yet.
Nirvanix is an interesting emerging enterprise cloud storage vendor with an end-to-end solution that makes it extremely easy for companies to use its cloud storage. Nirvanix provides an easy-to-manage complete solution with front-end usability, security, reliability, performance, multi-tenancy and global access combined with back-end controls, reporting, management and analytics. It isn’t just a storage system that scales and supports HTTP, which so many other vendors tout as a cloud platform. Rather, Nirvanix provides a holistic solution designed specifically for businesses to use without being cloud experts.
This is exactly the requirement I wrote about here nearly a year ago for
StorWars | tony asaro
Some clarity for enterprise cloud storage
Cloud storage is moving beyond the hype and the cast of key players is beginning to take shape.
what was needed for cloud storage to succeed. Having a storage system built just for the cloud is important but it’s barely half the story. It’s the application and management of that infrastructure that’s essential to a true cloud storage solution.
Another trend in enterprise cloud storage is the use of open-source file systems to build your own. Techie IT professionals are considering Gluster, Hadoop and other extensible, scalable file systems. But they face a two-fold challenge: They need to bulletproof their homemade storage systems, and they need to build the front-end and back-end applications to manage and offer these services. It’s daunting but achievable; Amazon and Google are examples of companies doing this even though it was outside their core competencies.
Google is also an important player in cloud. More and more companies are moving to Google Apps and, as a result, are using Google infrastructure, which should also be considered enterprise cloud storage as real businesses are replacing their on-premises applications with Google’s offerings.
I haven’t mentioned Amazon Simple Storage Service (Amazon S3) as an enterprise cloud storage service because it truly isn’t that. Amazon S3 isn’t a total solution and seems to have no interest in being one. Rather, it’s a utili- ty that you can build your own apps to use. However, I believe next-generation Web-based businesses will continue to use Amazon S3 and, in that sense, it will be a storage platform for the enterprise.
I can see enterprise cloud storage playing out with EMC driving Atmos within its own customer base, which sometimes works (e.g., Data Domain) and sometimes doesn’t (e.g., Centera). But EMC will get customers and gen- erate business, things it tends to do very well. Expect a major IT vendor to acquire Nirvanix and then go head-to-head with EMC Atmos. Those two vendors will dominate this market, with other vendors having pockets of success here and there but never assuming a dominant position. Of course, that depends on who buys Nirvanix and how well they execute. Google will continue to drive more business applications into its cloud, which will impact storage vendors and application vendors such as Microsoft and Oracle.
There is no endgame with the market going in one direction or the other. Instead, most users will have a mixture of on-premises IT, private cloud, public cloud and applications in the cloud. There will be a handful of major leaders and the others will be forever chasing their tails. 2
Tony Asaro is senior analyst and founder of Voices of IT (www.VoicesofIT.com).
STORAGE
Expect a major IT vendor to acquire Nirvanix and then go head-to-head with EMC Atmos.
ACTIVE DRAWER TECHNOLOGY Manage without heavy lifting.
60 DRIVES IN 4U 3X the capacity at 1/3rd the power.
NEXT GEN CONTROLLER Latest dual RAID engine is fast, times two.
EASY EFFICIENT ENTERPRISE-CLASS
Virtual DISASTER RECOVERY
Storage and server virtualization make many of the most onerous disaster recovery (DR) tasks relatively easy to execute, while helping to cut overall DR costs.
BY LAUREN WHITEHOUSE
iF YOUR COMPANY still lacks a viable disaster recovery (DR) strategy, it might be time to start thinking virtualization. The initial drivers behind server virtualization adoption have been improving resource utilization and lowering costs through consolidation, but next-wave adopters have realized that virtualization can also improve availability.
Virtualization turns physical devices into sets of resource pools that are independent of the physical asset they run on. With server virtualization, decoupling operating systems, applications and data from specific physical assets eliminates the economic and operational issues of infrastructure
silos—one of the key ingredients to affordable disaster recovery. Storage virtualization takes those very same benefits and extends them
from servers to the underlying storage domain, bringing IT organizations one step closer to the ideal of a virtualized IT infrastructure. By harnessing the power of virtualization, at both the server and storage level, IT organizations can become more agile in disaster recovery.
REDUCE THE RISK Improving disaster recovery and business continuity are perennial top-10 IT priorities because companies want to reduce the risk of losing access to systems and data. While most shops have daily data protection plans in place, fewer of them focus their efforts on true disasters, which would include any event that interrupts service at the primary production location. An event can be many different things, including power failures, fires, floods, other weather-related outages, natural disasters, pandemics or terrorism. Regardless of the cause, unplanned downtime in the data center can wreak havoc on IT’s ability to maintain business operations.
The goal of a DR process is to recre- ate all necessary systems at a second location as quickly and reliably as possible. Unfortunately, for many firms, DR strategies are often cobbled together because there’s nothing or no one mandating them, they’re too costly or complex, or there’s a false belief that existing backup processes are adequate for disaster recovery.
Backup technologies and processes will take you just so far when it comes to a disaster. Tier 1 data (the most critical stuff) makes up approximately 50% of an organization’s total primary data. When the Enterprise Strategy Group (ESG) surveyed IT professionals responsible for data protection, 53% said their organization could tolerate one hour or less of downtime before their business suffered revenue loss or some other type of adverse business impact; nearly three-quarters (74%) fell into the less-than-three-hour range. (The results of this survey were published in the ESG research report, 2010 Data Protection Trends, April 2010.) Under the best conditions, the time it takes to acquire
STORAGE
While most shops have daily data protection plans in place, fewer of them focus their efforts on true disasters, which would include any event that interrupts service at the primary production location.
replacement hardware, re-install operating systems and applications, and recover data—even from a disk-based copy—will likely exceed a recovery time objective (RTO) of one to three hours.
Recovery from a mirror copy of a system is faster than recovering with traditional backup methods, but it’s also more expensive and complex. Maintaining identical systems in two locations and synchronizing configuration settings and data copies can be a challenge. This often forces companies to prioritize or “triage” their data, providing greater protection to some tiers than others. ESG research found that tier 2 data comprises 28% of all primary data, and nearly half (47%) of IT organizations we surveyed noted three hours or less of downtime tolerance for tier 2 data. Therefore, if costs force a company to apply a different strategy or a no-protection strategy for “critical” (tier 1) vs. “important” (tier 2), some risks may be introduced.
BENEFITS OF SERVER VIRTUALIZATION FOR DR Virtualization has become a major catalyst for change in x86 environments because it provides new opportunities for more cost-effective DR. When looking at the reasons behind server virtualization initiatives coming in the next 12 to 18 months, ESG research found that making use of virtual machine replication to facilitate disaster recovery ranked second behind consolidating more physical servers onto virtualization platforms. (See the ESG research report, 2011 IT Spending Intentions, published in January 2011, for details of the survey results.)
Because server virtualization abstracts from the physical hardware layer, it eliminates the need for identical hardware configurations at production and recovery data centers, which provides several benefits. And since virtualization is often a catalyst to refresh the underlying infrastructure, there’s usually retired hardware on hand. For some organizations that might not have been able to secure the CapEx to outfit a DR configuration, there may be an opportunity to take advantage of the “hand-me-down” hardware. Also, by consolidating multiple applications on a single physical server at the recovery data center, the amount of physical recovery infrastructure required is reduced. This, in turn, minimizes expensive raised floor space costs, as well as additional
STORAGE
Recovery from a mirror copy of a system is faster than recovering with traditional backup methods, but it’s also more expensive and complex.
Efficient Enterprises do more with Dell EqualLogic.
Reduce man hours and save up to 76% in network management costs. Learn how PACSUN reduced their storage administration time by 20% at dellenterprise.com/equallogic.
* The Dell EqualLogic PS6010XVS Hybrid SSD/SAS SAN is InfoWorld’s 2011 Technology of the Year award winner for Best Storage System. Click here to learn more.
Transform the way you store data with award-winning iSCSI SANs from Dell EqualLogic*
power and cooling requirements. Leveraging the encapsulation and portability features of virtual servers
aids in DR enablement. Encapsulating the virtual machine into a single file enables mobility and allows multiple copies of the virtual machine to be created and more easily transferred within and between sites for business resilience and DR purposes—a dramatic improvement over backing up data to portable media such as tape and rotating media at a cold standby site. In addition, protecting virtual machine images and capturing the system state of the virtual machine are new concepts that weren’t available in the physical world. In a recovery situation, there’s no need to reassemble the operating system, re-set configuration settings and restore data. Activating a virtual machine image is a lot faster than starting from a bare-metal recovery.
Flexibility is another difference. Virtualization eliminates the aforemen- tioned need for a one-to-one physical mirror of a system for disaster recovery. IT has the choice of establishing physical-to-virtual (P2V) and virtual-to-virtual (V2V) failover configurations—locally and/or remotely—to enable rapid recovery without incurring the additional expense of purchasing and maintaining identical hardware. Virtualization also offers flexibility in configuring active- active scenarios (for example, a remote or branch office acts as the recovery site for the production site and vice versa) or active-passive (e.g., a corporate-owned or third-party hosting site acts as the recovery site, remaining dormant until needed).
Finally, virtualization delivers flexibility in the form of DR testing. To fully test a disaster recovery plan requires disabling the primary data center and attempting to fail over to the secondary. A virtualized infrastructure makes it significantly easier to conduct fre- quent nondisruptive tests to ensure the DR process is correct and the organization’s staff is practiced in executing it consistently and correctly, including during peak hours of operation.
With server virtualization, a greater degree of DR agility can be achieved. IT’s ability to respond to service interruptions can be greatly improved, especially with new automation techniques, such as those available for VMware virtualization technology (see “Automating DR in VMware environments,” p. 17) and Microsoft System Center Virtual Machine Manager, which offers tools to determine which applications and services to restore in which order. Recovery
STORAGE
To fully test a disaster recovery plan requires disabling the primary data center and attempting to fail over to the secondary.
STORAGE
VMware Inc. introduced a VMware vCenter management service in 2008 to automate, document and facilitate disaster recovery (DR) processes. VMware vCenter Site Recovery Manager (SRM) turns manual recovery runbooks into automated recovery plans, providing centralized management of recovery processes via VMware vCenter. SRM accelerates recovery, improves reliability and streamlines management over manual DR processes.
VMware SRM automates the setup, testing and actual failover courses of action. With SRM, organizations can automate and manage failover between active-passive sites—production data center (protection site) and disaster recovery (recovery site) location—or active-active sites, two sites that have active workloads and serve as recovery sites for each other.
SRM integrates with third-party storage- and network-based replication solutions via a storage replicator adapter (SRA) installed at both the primary and recovery sites. The SRA facilitates discovery of arrays and replicated LUNs, and initiates test and failover, making it much easier to ensure that the storage replication and virtual machine configurations are established properly. Datastores are replicated between sites via preconfigured array- or network-based replication.
SRM doesn’t actually perform data protection or data recovery, at least not yet. VMware pre-announced its forthcoming IP-based replication feature in SRM. It will be able to protect dissimilar arrays in local and remote locations, provide virtual machine-level granu- larity and support local (DAS or internal) storage. This opens up lots of possibilities for companies that don’t have a SAN or don’t want to be limited to a peered storage replication solution. Even those who have taken advantage of SRM with SAN-based replication between like storage arrays at production and recovery sites can extend recovery strategies to other tiers of workloads with an asynchronous solution.
can be quicker and the skills required by operations staff to recover virtualized applications are less stringent.
USING STORAGE VIRTUALIZATION IN A DR PLAN As organizations become more comfortable with one form of virtualization, they don’t have to make great intellectual or operational leaps to grasp the concept of virtualizing other data center domains. Often, IT organizations un- dertaking complete data center refresh initiatives position virtualization as a key part of the makeover and look to extract all possible efficiencies in one fell swoop by deploying virtualization in multiple technology areas. So it’s not uncommon to see server virtualization combined with storage virtualization.
Like server virtualization, storage virtualization untethers data from dedicated devices. Storage virtualization takes multiple storage systems and treats those devices as a single, centrally managed pool of storage, enabling management from one console. It also enables data movement among different storage systems trans- parently, providing capacity and load balancing. In addition to lowering costs, improving resource utilization, increasing availability, simplifying upgrades and enabling scalability, the expected benefit of storage virtualization is easier and more cost-effective DR.
In a DR scenario, storage virtualization improves resource utilization, allowing organizations to do more with less capacity on hand. IT is likely to purchase and deploy far less physical storage with thin, just-in-time provisioning of multiple tiers of storage. By improving capacity utilization, organizations can reduce the amount of additional capacity purchases and more easily scale environments.
Virtualization allows storage configurations to vary between the primary and the DR site. Flexibility in configuring dissimilar systems at the production and recovery sites can introduce cost savings (by allowing existing storage systems to be reclaimed and reused), without introducing complexity. It also allows IT to mirror primary storage to more affordable solutions at a remote site, if desired.
Native data replication that integrates with the virtualized storage environ-
STORAGE
Storage virtualization takes multiple storage systems and treats those devices as a single, centrally managed pool of storage, enabling management from one console.
ment can provide improved functionality for virtual disaster recovery. Remote mirroring between heterogeneous storage systems (that is, more expensive at the primary site and less costly at the recovery site) contributes to lower costs.
FINAL WORD ON VIRTUALIZATION Whether used singly or combined, server virtualization and storage virtualization are making an impact on IT’s ability to deliver DR, and to deliver it cost effectively. If your company has been on the sidelines, crossing its collective fingers and hoping a disaster never strikes, it might be time to inves- tigate virtualization. And if you have virtualization in place, you should have the basic elements for an effective and cost-efficient DR environment. It’s time to take the next steps. 2
Lauren Whitehouse is a senior analyst focusing on backup and recovery software and replication solutions at Enterprise Strategy Group, Milford, Mass.
STORAGE
Whether used singly or combined, server virtualization and storage virtualization are making an impact on IT’s ability to deliver DR, and to deliver it cost effectively.
Quantum’s DXi-Series Appliances with deduplication provide higher performance at lower cost than the leading competitor.
Preserving The World’s Most Important Data. Yours.™
Contact us to learn more at (866) 809-5230 or visit www.quantum.com/dxi
©2011 Quantum Corporation. All rights reserved.
Quantum has helped some of the largest organizations in the world integrate
deduplication into their backup process. The benefi ts they report are immediate and
signifi cant—faster backup and restore, 90%+ reduction in disk needs, automated DR
using remote replication, reduced administration time—all while lowering overall costs
and improving the bottom line.
Our award-winning DXi®-Series appliances deliver a smart, time-saving approach
to disk backup. They are acknowledged technical leaders. In fact, our DXi6500 was
just nominated as a “Best Backup Hardware” fi nalist in Storage Magazine’s Best
Product of the Year Awards—it’s both faster and up to 45% less expensive than the
leading competitor.
Faster performance. Easier deployment. Lower cost.
provide higher p leading competi
Preserving The World’s Most Importan
Contact us to learn more at (8
©2011 Quantum Corporation. All rights reserved.
Q
d
s
u
a
O
t
j
P
le
G
F
http://www.youtube.com/QuantumCorp
http://twitter.com/QuantumCorp
http://www.facebook.com/quantumcorp
nOBODY WANTS TO pay for something they’re not using, but enterprise data storage managers do it all the time. The inflexible nature of disk storage purchasing and provisioning leads to shockingly low levels of capacity utilization. Improving the efficiency of storage has been a persistent theme of the industry and a goal for most storage professionals for a decade, but only thin provisioning technology has delivered tangible, real-world benefits.
Thin provisioning
in depth Thin provisioning can help you use your disk capacity much more efficiently, but you need to get under the hood a little to understand how thin provisioning will work in your environment.
BY STEPHEN FOSKETT
The concept of thin provisioning may be simple to comprehend, but it’s a complex technology to implement effectively. If an array only allocates storage capacity that contains data, it can store far more data than one that allocates all remaining (and unnecessary) “white space.” But storage arrays are quite a few steps removed from the applications that store and use data, and no standard communication mechanism gives them insight into which data is or isn’t being used.
Storage vendors have taken a wide variety of approaches to address this issue, but the most effective mechanisms are difficult to implement in existing storage arrays. That’s why next-generation storage systems, often from smaller companies, have included effective thin provisioning technology for some time, while industry stalwarts may only now be adding this capability.
STORAGE
WHEN EVALUATING a storage array that includes thin provisioning, consider the following questions, which reflect the broad spec- trum of approaches to this challenge. Note that not all capabilities are required in all situations.
• Is thin provisioning included in the purchase price or is it an extra-cost option?
• Does the array support zero page reclaim? How often does the reclamation process run?
• What is the page size or thin provisioning increment?
• Does thin provisioning work in concert with snapshots, mirroring and replication? Is thick-to-thin replication supported?
• What does the array do when it entirely fills up? What’s the process of alerting, freeing capacity and halting writes?
• Does the array support WRITE_SAME? What about SCSI UNMAP or ATA TRIM?
• Is there a VMware vStorage APIs for Array Integration (VAAI) “block zeroing” plug-in? Is it the basic T10 plug-in or a specialized one for this array family?
WHAT YOU SHOULD ASK ABOUT THIN PROVISIONING
THIN ALLOCATION-ON-WRITE Traditional storage provisioning maintains a one-to-one map between internal disk drives and the capacity used by servers. In the world of block storage, a server would “see” a fixed-size drive, volume or LUN and every bit of that capacity would exist on hard disk drives residing in the storage array. The 100 GB C: drive in a Windows server, for example, would access 100 GB of reserved RAID-protected capacity on a few disk drives in a storage array.
The simplest implementation of thin provisioning is a straightforward evolu- tion of this approach. Storage capacity is aggregated into “pools” of same- sized pages, which are then allocated to servers on demand rather than on initial creation. In our example, the 100 GB C: drive might contain only 10 GB of files, and this space alone would be mapped to 10 GB of capacity in the array. As new files are written, the array would pull additional capacity from the free pool and assign it to that server.
This type of “allocate-on-write” thin provisioning is fairly widespread today. Most midrange and enterprise storage arrays, and some smaller devices, include this capability either natively or as an added-cost option. But there are issues with this approach.
One obvious pitfall is that such systems are only thin for a time. Most file systems use “clear” space for new files to avoid fragmentation; deleted con- tent is simply marked unused at the file system layer rather than zeroed out or otherwise freed up at the storage array. These systems will eventually gobble up their entire allocation of storage even without much additional data being written. This not only reduces the efficiency of the system but risks “over-commit” issues, where the array can no longer meet its allocation commitments and write operations come to a halt.
That doesn’t suggest, however, that thin provisioning is useless without thin reclamation (see “The enemies of thin,” p. 25), but the long-term benefit of the technology may be reduced. Plus, since most storage managers assume that thin storage will stay thin, effectively reclaiming unused space is rapidly becoming a requirement.
THE THIN RECLAMATION CHALLENGE The tough part of thin provisioning technology is reclaiming unused capacity rather than correctly allocating it. Returning no-longer-used capacity to the
STORAGE
Storage capacity is aggregated into “pools” of same- sized pages, which are then allocated to servers on demand rather than on initial creation.
EMC2, EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. © Copyright 2011 EMC Corporation. All rights reserved.
FOR LESS FLY FIRST CLASS
Introducing EMC® VNXe™. Simple and efficient storage starting under $10K.
free pool is the key differentiator among thin provisioning implementations, and the industry is still very much in a state of flux in this regard.
The root cause of the thin reclamation challenge is a lack of communication between applications and data storage systems. As noted earlier, file systems aren’t generally thin-aware, and no mechanism exists to report when capacity is no longer needed. The key to effective thin provisioning is discovering opportunities to reclaim unused capacity; there are essentially
STORAGE
“I MAY NEED 500 GB or more for this application,” the DBA thinks, so just to play it safe she asks the storage administrator for 1 TB. The storage admin has the same idea, so he allocates 2 TB to keep the DBA out of his office. This familiar story is often blamed for the sad state of storage capacity utilization, but is that justified?
In most enterprise storage environments, poor capacity utilization can come from many sources:
• Annual and per-project purchasing cycles that encourage occa- sional over-buying of storage capacity that may never be used
• Ineffective resource monitoring and capacity planning processes that obscure capacity requirements
• Incomplete storage networking that strands capacity out of reach of the systems needing it
• Disjointed allocation procedures resulting in assigned-but- never-used storage capacity
• Inflexible operating systems and file systems that make it difficult to grow and shrink as storage demands change
Thin provisioning can be effective in many of these situations, but it’s no magic bullet. Organizations with poor purchasing and capacity planning processes may not benefit much, and all the capacity in the world is useless if it can’t be accessed over a segmented SAN. But even the most basic thin provisioning system can go a long way to repurpose never-used storage capacity.
THE ENEMIES OF THIN
two ways to accomplish this: • The storage array can snoop the data it receives and stores, and attempt
to deduce when opportunity exists to reclaim capacity • The server can be modified to send signals to the array, notifying it when
capacity is no longer used
The first option is difficult to achieve but can be very effective, since operating system vendors don’t seem eager to add thin-enhancing features to their file systems. Products like Data Robotics Inc.’s Drobo storage systems snoop on certain known partition and file system types to determine which disk blocks are unused and then reclaim them for future use. But that approach is extremely difficult in practice given the huge number of operating systems, applications and volume managers in use.
Therefore, the key topic in enterprise thin provisioning involves the latter approach: improving the communication mechanism between the server and storage systems.
ZERO PAGE RECLAIM Perhaps the best-known thin-enabling technology is zero page reclaim. It works something like this: The storage array divides storage capacity into “pages” and allocates them to store data as needed. If a page contains only zeroes, it can be “reclaimed” into the free-capacity pool. Any future read requests will simply result in zeroes, while any writes will trigger another page being allocated. Of course, no technology is as simple as that.
Actually writing all those zeroes can be problematic, however. It takes just as much CPU and I/O effort to write a 0 as a 1, and inefficiency in these areas is just as much a concern for servers and storage systems as storage capacity. The T10 Technical Committee on SCSI Storage Interfaces has specified a SCSI command (WRITE_SAME) to enable “deduplication” of those I/Os, and this has been extended with a so-called “discard bit” to notify arrays that they need not store the resulting zeroes.
Most storage arrays aren’t yet capable of detecting whole pages of zeroes on write. Instead, they write them to disk and a “scrubbing” process later de- tects these zeroed pages and discards them, so they appear used until they’re scrubbed and discarded. This process can be run on an automated schedule or
STORAGE
Most storage arrays aren’t yet capable of detecting whole pages of zeroes on write.
COMPARING THE total cost of ownership (TCO) for enterprise storage solutions is controversial, with self-serving and incomplete models the norm for storage vendors. Before spending money on cost- saving, efficiency-improving technologies like thin provisioning, it’s wise to create a model internally to serve as a reality check for vendor assumptions and promises.
A complete TCO includes more than just the cost of hardware and software— operations and maintenance, data center costs and the expenses associated with purchasing, migration and decommissioning storage arrays must be considered. And it’s a good idea to consider the multiplier effect of inefficient allocation of resources: Leaving 1 GB unused for every one written doubles the effective cost of storage. With end-to-end storage utilization averaging below 25%, this multiplier can add up quickly.
Such cost models often reveal the startling fact that storage capacity on hard disk drives (or new solid-state disks or SSDs) is a small component of TCO—often less than 15% of total cost. But that doesn’t mean driving better capacity utilization is a wasted effort. Eliminating the multiplier effect from inefficient utilization can have a far-greater impact on TCO than merely packing more bits onto a disk drive.
Consider the operational impact of thin provisioning, as well as its mechanical impact on storage density. Thin systems may require less administration because capacity can be allocated without traditional constraints, but that could lead to a nightmare scenario of overallocated arrays running out of capacity and bringing apps to a halt. The best thin storage systems are also highly virtualized, flexible and instrumented, allowing improved operational efficiency and high utilization.
THIN PROVISIONING AND TCO
Consider the operational impact of thin provisioning, as well as its mechanical impact on storage density.
EMC: #1 IN STORAGE FOR VIRTUALIZATION
Source: Enterprise Strategy Group (ESG) Data Center Spending Intentions Survey EMC2, EMC, the EMC logo and where information lives are registered trademarks or trademarks of EMC Corporation in the United States and other countries. © Copyright 2010 EMC Corporation. All rights reserved. 2128
manually initiated by an administrator. And some arrays only detect zeroed pages during a mirror or migration, further reducing capacity efficiency.
BUILDING BRIDGES Even if an array has a feature-complete zero page reclaim capability, it will only be functional if zeroes are actually written. The server must be instructed to write zeroes where capacity is no longer needed, and that’s not the typical default behavior. Most operating systems need a command, like Windows’ “sdelete –c” or something on the order of NetApp’s SnapDrive, to make this happen, and these are only run occasionally.
Some applications, including VMware ESX volumes, do indeed zero- out new space and the ESX command “eagerzeroedthick” will even clear out space. Although certain compatibility issues remain, notably with VMotion, ESX is becoming increasingly thin- aware. The vStorage APIs for Array Integration (VAAI), added in ESX 4.1, includes native “block zeroing” support for certain storage systems. ESX uses a plug-in, either a special-purpose one or the generic T10 WRITE_SAME support, to signal an array that VMFS capacity is no longer needed.
Symantec Corp. is also leading the charge to support thin provisioning. The Veritas Thin Reclamation API, found in the Veritas Storage Foundation product, includes broad support for most major storage arrays. It uses a variety of communication mechanisms to release unneeded capacity, and is fully integrated with the VxFS file system and volume manager. Storage Foundation also includes the SmartMove migration facility, which assists thin arrays by only transferring blocks containing data.
Thin awareness in other systems is coming more slowly. Another standard command, ATA TRIM, is intended to support solid-state storage, but it could also send thin reclamation signals, along with its SCSI cousin, UNMAP. Microsoft and Linux now support TRIM, and could therefore add thin provisioning support in the future as well. They could also modify the way in which storage is allocated and released in their file systems.
GETTING THINNER Thin provisioning is not without its challenges, but the benefits are many. It’s one of the few technologies that can improve real-world storage utilization
STORAGE
Even if an array has a feature-complete zero page reclaim capability, it will only be functional if zeroes are actually written.
even when the core issue isn’t technology related. Indeed, the ability of thin provisioning to mask poor storage forecasting and allocation processes contributed to the negative image many, including me, had of it. But as the technology improves and thin reclamation becomes more automated, this technology will become a standard component in the enterprise storage arsenal. 2
Stephen Foskett is an independent consultant and author specializing in enterprise storage and cloud computing. He is responsible for Gestalt IT, a community of independent IT thought leaders, and organizes their Tech Field Day events. He can be found online at GestaltIT.com, FoskettS.net and on Twitter at @SFoskett.
STORAGE
www.pillardata.com
© 2010 Pillar Data Systems Inc. All rights reserved. Pillar Data Systems, Pillar Axiom, AxiomONE and the Pillar logo are all trademarks or registered trademarks of Pillar Data Systems.
Putting your data on Pillar Axiom® is the most efficient way to boost productivity, cut costs, and put more money to the bottom line.
Don’t put up with the wasteful ways of legacy storage systems. They are costing you way too much in floor space, energy, and money.
Make the move to Pillar Axiom, the world’s most efficient system, because it’s truly Application-Aware. Get the industry’s highest utilization rate – up to 80%. Guaranteed. Slash energy costs. Save floor space. Reduce TCO by as much as 50%. And increase user satisfaction immeasurably.
It’s time to Stop Storage Waste and see how efficient you can be. www.pillardata.com
Thank You. Your CEO, CFO and CIO
Download a complimentary white paper: Bringing an End to Storage Waste. www.pillardata.com/endwaste
Exchange 2010 and storage systems
The latest version of Exchange Server has some significant changes that will impact the storage supporting the mail system.
BY BRIEN M. POSEYwITH EXCHANGE SERVER 2010, Microsoft Corp. made some major changes to the database structure that underlies the email application. These architectural changes have a significant impact on planning for Exchange Server’s data storage requirements.
The biggest change Microsoft made was eliminating single-instance storage (SIS). Previously, if a message was sent to multiple recipients, only one copy of the message was stored within the mailbox database. User mailboxes received pointers to the message rather than a copy of the entire message.
The elimination of single-instance storage means that when a message is sent to multiple recipients, each recipient receives a full copy of the message. In terms
of capacity planning, the overall impact of this change will vary depending on how many messages include attachments.
Text and HTML-based messages are typically small and will have a minimal impact on capacity planning, and Microsoft further reduces the impact by automatically compressing such messages. However, if you have users who routinely send large attachments to multiple recipients, those messages could have a major impact on database growth. Microsoft’s primary goal in designing the new database architecture was to decrease database I/O requirements. As such, Microsoft chose not to compress message attachments because of the additional I/O that would have been required to compress/ decompress them.
It may seem odd that at a time when storage managers are looking to reduce duplication in primary storage Microsoft removes a data reduction feature from Exchange. But Microsoft scrapped single-instance storage because Exchange mailbox databases perform much more efficiently without it. Microsoft claims database I/O requirements have been reduced by approximately 70% in Exchange 2010.
One of the most common methods of keeping Exchange 2010 mailbox databases from growing too large is to use mailbox quotas. Quotas prevent individual mailboxes from exceeding a predetermined size, and the quotas in Exchange 2010 work as they did in previous versions of Exchange with one notable exception. Exchange 2010 introduces the concept of archive mailboxes (discussed later). If a user has been given an archive mailbox, the mailbox quota won’t count the archive mailbox’s contents when determining how much storage the user is consuming. Exchange does, however, let you manage archive storage through a separate quota.
The use of mailbox quotas is a tried-and-true method for limiting data storage consumption. But Microsoft has been encouraging organizations to make use of low-cost storage rather than mailbox quotas. The argument is that organizations can accommodate the increased database size without spending a lot on expensive storage solutions.
The low-cost storage recommendation is based on more than just storage cost. Many organizations have been forced to set stringent mailbox quotas that have forced users to delete important messages. Ostensibly, cheaper storage will allow for larger mailbox quotas or for the elimination of quotas altogether.
STORAGE
One of the most common methods of keeping Exchange 2010 mailbox databases from growing too large is to use mailbox quotas.
Previously, using lower end storage subsystems in production Exchange Server environments was unheard of, but Exchange 2010’s reduced I/O requirements make storage options such as SATA drives practical. And Exchange Server 2010 is flexible in terms of the types of storage it can use; it will work with direct-attached storage (DAS) or storage-area network (SAN) storage (or with an iSCSI connection to a storage pool). However, Microsoft does prevent you from storing Exchange Server data on any storage device that must be
STORAGE
REPLACE THIRD-PARTY PRODUCTS?
Prior to the release of Exchange Server 2010, an entire industry emerged around creating archival and e-discovery products for Exchange Server. Now that Exchange 2010 offers native support for user archives and has built in e-discovery capabilities, it seems only natural to consider whether these new features can replace third- party products.
Exchange 2010’s e-discovery and archiving features may be suffi- cient for some smaller organizations, but they’re not enterprise-ready. The archiving and e-discovery features both have limitations you won’t encounter with most third-party tools.
For example, Exchange 2010’s archive mailboxes aren’t a true archiving solution. Archive mailboxes let users offload important messages to a secondary mailbox that’s not subject to strict retention policies or storage quotas. But if you want to do true archiving at the organizational level you still must use Exchange’s journaling feature. The journal works, but third-party archivers provide much better control over message archival, retention and disposal.
The situation’s the same for Exchange 2010’s multi-mailbox e-discovery search feature. Multi-mailbox search has some major limitations. For example, it can only be used with Exchange 2010 mailboxes, so you’ll still need a third-party product to search legacy Exchange mailboxes or PSTs.
Multi-mailbox search also lacks some of the rich reporting options and export capabilities commonly found in specialized e-discovery products.
Up to 85% of computing capacity sits idle in distributed environments. A smarter planet needs smarter infrastructure. Let’s build a smarter planet. ibm.com/dynamic
IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.
accessed through a mapped drive letter. So you won’t be able to store a mailbox database on a network-attached storage (NAS) system unless it supports iSCSI connectivity.
ADDITIONAL CONSIDERATIONS Even though low-cost storage might provide adequate performance, it’s still important to choose a storage subsystem that also meets your organization’s reliability requirements. For instance, if you opt for SATA storage, it’s best to create a fault-tolerant SATA array. Microsoft recommends using RAID 1+0 arrays. Some organizations use RAID 5 because it’s less costly and still provides fault tolerance, but RAID 1+0 arrays generally offer better performance.
It’s worth noting that database size can have a direct impact on performance. As a general rule, mailbox databases on standalone mailbox servers should be limited to 200 GB or less. If a mailbox database grows larger than 200 GB, you may benefit from dividing the database into multiple, smaller databases. For mailbox databases that are part of a Database Availability Group, the recommended maximum database size is 2 TB.
DETERMINING STORAGE REQUIREMENTS Determining the storage requirements for an Exchange 2010 deployment can be a big job, but Microsoft offers a free tool that can help. The Exchange 2010 Mailbox Server Role Requirements Calculator is an Excel spreadsheet that calculates your Exchange storage requirements based on your organization’s
Exchange usage. To use the Exchange
2010 Mailbox Server Role Requirements Calculator, fill in a series of cells by answering questions related to the intended Exchange Server configuration and usage (see “Exchange 2010 Mailbox Server Role Requirements Calculator” screenshot at left). For instance, the spreadsheet asks questions about the average size of an email message
STORAGE
and the number of messages users send and receive each day. Formulas built into the spreadsheet will use the information you provide to determine the required storage architecture.
Keep in mind, however, that while the Exchange 2010 Mailbox Server Role Requirements Calculator may be the best tool available for estimating Exchange mailbox server storage requirements, the recommendations it offers are only as accurate as the data you provide. To compensate, Microsoft recommends you provision enough disk space to accommodate at least 120% of the calculated maximum database size.
EXCHANGE ARCHIVE MAILBOXES There are other factors to consider that may impact your Exchange Server storage planning, such as whether you plan to implement user archive mailboxes, a new and optional feature. User archive mailboxes are secondary mailboxes that can be used for long-term retention of messages. What makes archive mailboxes different from other Exchange archiving methods is that unlike a more traditional archive (such as a journal mailbox), the user retains ownership of the items in the archive mailbox. As such, each user’s archives are readily accessible.
Archive mailboxes are designed to take the place of PST files. But unlike PST files, archive mailboxes are stored within a mailbox database on the Exchange Server where they can be managed and regulated by the Exchange administrator.
In the original RTM release of Ex- change 2010, user archive mailboxes were in the same mailbox database as users’ primary mailboxes. In SP1, Microsoft provided the option of relocat- ing user archive mailboxes to a separate mailbox database that allows the archives to be offloaded so they don’t impact the primary mailbox storage.
Microsoft generally recommends placing the archive mailboxes on a low- end mailbox server that uses inexpensive direct-attached storage (such as a SATA array). Remember, if a mailbox database contains only archive mailboxes then it won’t be subject to the same I/O load as a mailbox database that’s
STORAGE
What makes archive mailboxes different from other Exchange archiving methods is that unlike a more traditional archive (such as a journal mailbox), the user retains ownership of the items in the archive mailbox.
“When it comes to disaster recovery,
ShadowProtect® makes me fast, flexible and reliable.”
“When it comes to yoga, I’m just a disaster.”
The fast, flexible and reliable disaster recovery, data protection and system migration that ShadowProtect Server™ 4
provides will not only help expand your business, but give you the time to “expand your mind” as well.
New VirtualBoot™ technology lets you recover a 1.5 TB server in just 3 minutes! Download a FREE trial version at www.storagecraft.com/shadow_protect_server.php. It’s simple and painless. (That’s more than can be said for the Lotus position.)
StorageCraft is proud that ShadowProtect has been selected as a finalist for the
Storage magazine/SearchStorage.com 2010 Product of the Year.
used to store the user’s primary mailboxes. Another advantage to using low- cost storage for user archive mailboxes is that doing so makes it practical to set a high mailbox capacity quota on the archive mailboxes. (See “Can Ex- change Server’s archiving and e-discovery replace third-party products?” p. 34.)
JOURNAL JUGGLING Another consideration to take into account is the journal mailbox. If you use journaling to archive messages at the hub transport level then all the archived messages are placed into the journal mailbox.
I’ve never come across any Microsoft best practices for the placement of journal mailboxes, but I like to put the journal mailbox in its own mailbox database. This is because the journaling process tends to be very I/O intensive and placing the journal mailbox in a dedicated mailbox database ensures its I/O doesn’t degrade the performance of the other mailbox databases. If all messages are journaled, locating the journal mailbox within the same store as the user mailboxes will double the I/O requirements because Exchange 2010 doesn’t use single-instance storage. In other words, journaling causes an extra copy of each message to be created within the mailbox store.
If you were to create the journal mailbox in the same database as the user mailboxes, it would have a major impact on the replication process (assuming that database availability groups are being used—see “Protecting Exchange Data,” p. 40).
Another advantage to locating the journal mailbox in a separate mailbox database is that it makes it easy to manage storage quotas and message retention based on mailbox function. You can create one set of policies for user mailboxes and another set of requirements for the journal mailbox.
DISCOVERY MAILBOX The last type of mailbox you should consider when planning for Exchange 2010 storage is the discovery mailbox. The discovery mailbox is only used when a multi-mailbox search (e-discovery) is performed. The search results are stored in the discovery mailbox.
STORAGE
I’ve never come across any Microsoft best practices for the placement of journal mailboxes, but I like to put the journal mailbox in its own mailbox database.
PROTECTING EXCHANGE DATA
Exchange Server has always been somewhat difficult to protect. If you do a traditional nightly backup of your Exchange servers, a failure could potentially result in the loss of a full day’s worth of messages. For most companies, such a loss is unacceptable.
Exchange administrators have taken a number of different steps to prevent substantial data loss. In Exchange 2007, for example, it was a common practice to use continuous replication to replicate mailbox data to another mailbox server. A continuous replication solution provides fault tolerance and acts as a mechanism for protecting data between backups. (Of course, using a continuous data protection solution such as System Center Data Protection Manager is also a good option.)
Some observers feel Microsoft is working toward making Exchange Server backups completely unnecessary. The idea is that Database Availability Groups will eventually make Exchange resilient enough that you won’t need backups.
Database Availability Groups are an Exchange 2010 feature that lets you create up to 16 replicas of a mailbox database. These replicas reside on other mailbox servers, and it’s even possible to create database replicas in alternate data centers. Despite the degree to which Database Availability Groups can protect mailbox data, you shouldn’t abandon your backups just yet.
Having multiple replicas of each database makes it easier to protect Exchange Server, but if a mailbox database becomes corrupted or gets infected with a virus, the corruption or viral code is copied to the replica databases.
But Microsoft does offer a delayed playback feature in which lagged copy servers are used to prevent transactions from being instantly committed to replica databases. If a problem occurs, you’ll have enough time to prevent the bad data from being committed to a replica database. Once you’ve stopped the bad data from spread- ing, you can revert all your mailbox databases to match the state of the uncorrupted replica.
While this approach sounds great in theory, Microsoft still has a lot of work to do to make it practical. Right now the procedure requires you to take an educated guess as to which transaction log contains the first bit of corruption and then work through a compli- cated manual procedure to prune the log files. So while Exchange 2010’s storage architecture makes it easier to protect your data by way of Database Availability Groups, you shouldn’t rely on them as the only mechanism for protecting Exchange data.
By default, the discovery mailbox is assigned a 50 GB quota. This sounds large, but it may be too small for performing e-discovery in a large organization.
When it comes to choosing a storage location for a discovery mailbox, capacity is generally more important than performance. While the e-discovery process is I/O intensive, the I/O load is split between the database containing the user mailboxes and the database holding the discovery mailbox.
If e-discovery isn’t a priority, then you may consider not even bothering to create a discovery mailbox until you need it. If that’s not an option, your best bet is to place the mailbox in a dedicated mailbox database that lives on a low-cost storage system with plenty of free disk space.
MORE PLANNING REQUIRED Clearly, there are a number of considerations that must be taken into account when planning an Exchange Server storage architecture. Even though Exchange 2010 isn’t as I/O intensive as its predecessors, I/O performance should still be a major consideration in the design process. Other important considerations include capacity and fault tolerance. 2
Brien M. Posey is a seven-time Microsoft MVP for his work with Exchange Server, Windows Server, Internet Information Server (IIS) and File Systems/Storage. He has served as CIO for a nationwide chain of hospitals and was once a network administrator for the Department of Defense at Fort Knox.
STORAGE
Learn More!
• Colo-Level Virtualization
catch-up we were creating the next breakthrough in business continuity.
MirrorCloud is a feature-rich, add-on to the robust SmartStyle Computing platform which continuously mirrors data from Windows-based servers and desktops
to the scalable SmartStyle Cloud Servers. It is expandable up to 100
reliable than RAID 6!
Score a ‘technical’ knock out with your customers today!
Announcing a Business Continuity Solution in a Weight Class By Itself
aS DATA GROWTH and the costs associated with it keep rising, leveraging storage infrastructure hosted by a service provider and made available to subscribers over a network is gaining in popularity. That means cloud storage resources are frequently being combined with existing, on-premises backup technologies to provide off-site copies for long-term retention and, in some cases, for just- in-case-of-a-disaster copies. In addition, a few vendors are attacking the issue of bandwidth and optimizing cloud backup storage to ensure the implementation is up to the task and, importantly, makes fiscal sense.
INTEREST IN CLOUD STORAGE ESG polled 611 IT professionals responsible for evaluating, purchasing and/or operating corporate IT and data centers in North America and Western Europe and found 61% were using or interested in using infrastructure as a service (IaaS). With IaaS, the service provider owns the equipment and is responsible for housing, running and maintaining it, with the subscriber typically paying on a per-use basis. Subscribers have access to a virtual pool of shared resources that promise elasticity, so storage for off-site backup copies is available on demand. The costs associated with owning and managing resources at a second site are reduced—and the need to maintain a secondary site for off-site disk or tape storage is eliminated.
But backing up to and recovering from cloud storage may introduce challenges for large backup sets. The daily volume of backup data, and the time needed to complete transfers, may require more bandwidth and time than is available. IT organizations often struggle with the tradeoff between the high costs of purchasing more bandwidth and extending backup windows or recovery time.
hot spots | lauren whitehouse
Backing up to the cloud requires new approach to bandwidth
Can you use cloud storage for backup? Sure, but beware the bandwidth and transfer issues
that can arise, and take note of the progress several key vendors have made in this space.
STORAGE
OPTIMIZING BANDWIDTH FOR BACKUP But alternatives exist to address these issues. Products from vendors such as NetEnrich, Pancetera Software and Riverbed Technology make a hybrid backup configuration (the combination of on-premises backup technologies and cloud-based storage services) more feasible via technologies that optimize bandwidth.
NetEnrich saw an opportunity to provide virtual storage in the cloud for EMC Data Domain customers. Subscribers use existing on-premises backup software and EMC Data Domain appliances to protect data locally. For an off-site copy in the cloud, backup data stored on the local EMC Data Domain appliance is replicated to NetEnrich’s Data Recovery Vault, which is based on EMC Data Domain deduplication appliances at the NetEnrich data center. Data Domain dedupe and compression radically reduce the capacity of bandwidth required for remote replication of backup copies.
Pancetera Unite is a virtual appliance that dramatically reduces I/O and bandwidth usage for backup and replication of VMware virtual server environments. Pancetera’s SmartRead and SmartMotion technologies optimize the capture and movement of virtual machines into the cloud. The product integrates with in-place backup environments to enable optimization without disrupting the status quo. i365 teamed with Pancetera, embedding the Pancetera Unite virtual appliance in i365’s EVault data protection products. The combination reduces the overhead associated with VMware backups initiated by EVault, and optimizes data movement across the LAN or WAN. Pancetera can be combined with WAN acceleration technology to further accelerate transmissions.
Riverbed Whitewater cloud storage accelerator leverages the WAN optimization technology in existing Riverbed offerings to provide a complete data protection service to the cloud. Integrating seamlessly with existing backup technologies and cloud storage providers’ APIs, the appliance-based product provides rapid replication of data to the cloud for off-site retention. A Riverbed Whitewater appliance is installed in the network to serve as the local target for backup jobs and is attached to the Internet for replication of data to Amazon S3 cloud storage. The deduplication and compression technologies that are
Pancetera Unite is a virtual appliance that dramatically reduces I/O and bandwidth usage for backup and replication of VMware virtual server environments.
STORAGE
the cornerstones of Riverbed products deliver WAN optimization and accelerate data transfer.
As users look to cloud storage services as a low-cost alternative to maintaining their own infrastructure, there are clear benefits:
• Provides a more cost-effective strategy than maintaining a corporate- owned and -operated secondary site.
• Eliminates capital and operating costs, including the acquisition and maintenance of hardware, data center floor space, as well as data center environmental factors such as power and cooling.
• Offers more predictable budgeting. • Facilitates disaster recovery via a remote-based copy.
With bandwidth contributing significantly to the hybrid backup configuration bottom line, it makes sense to explore bandwidth-optimizing technologies such as deduplication, compression and WAN acceleration. The latest crop of products introduces hyper-efficiency in LAN/WAN transfer of data center- driven backup copies to cloud-based storage. This is a key area for data storage managers to focus on when considering their hybrid cloud scenarios. 2
Lauren Whitehouse is a senior analyst focusing on backup and recovery software and replication solutions at Enterprise Strategy Group, Milford, Mass.
Products shown above are GSA compliant.
ReadyNAS® Pro 6 ReadyNAS® 3100, 2100, 4200, 3200 (Top to Bottom)
*The 5-Year Hardware Warranty only covers hardware, fans and internal power supplies, and does not include external power supplies or software. Hardware modifications or customization void the warranty. The warranty is only valid for the original purchaser and cannot be transferred.
NETGEAR, the NETGEAR logo, Connect with Innovation, ReadyNAS, ReadyNAS Replicate, and ReadyNAS Vault are trademarks and/or registered trademarks of NETGEAR, Inc. and/or its subsidiaries in the United States and/or other countries. Other brand names mentioned herein are for identification purposes only and may be trademarks of their respective holder(s). Information is subject to change without notice. © 2010 NETGEAR, Inc. All rights reserved.
Backup, Restore and Disaster Recovery • Ideal disk-to-disk backup target for Symantec,
Acronis or StorageCraft
• Ideal target for virtual machine backups with Veeam or Vizioncore
• ReadyNAS® Replicate option for easy offsite disaster recovery
Virtualization • Build affordable virtualization solutions in small or
remote offices
• Ideal backup target for VMs
Cloud Computing • Hybrid cloud solutions for combination local and
hosted file sharing and archiving
• FREE! 100GB of ReadyNAS Vault offsite archive
ReadyNAS® Pro 4
ReadyNAS® Pro 2
Simply Smarter Business Storage for Virtualization,
Backup and Cloud Computing
Reliable • 5 year warranty
vendors
• Reduces operating expenses through automation
Simple • Easy installation
• Painless remote management
eVERYTHING IS “CLOUDY” these days. Hardly a day goes by without yet another player jumping on the cloud bandwagon. Some are legitimately tied to the cloud concept, but others are “cloud washing” or force-fitting their products to the cloud concept because they think if they don’t they’ll fall out of favor with IT users.
However, the questions I’m asked most by IT users are usually on the order of the following:
Our central IT supports several divisions, each of which also has its own IT. One division decided to make a deal with Amazon Web Services and transferred some data to S3 storage. Managers in another division have done deals with Nirvanix or Rackspace or AT&T Synaptic, and sent company data to them. What should we do? We don’t want to suppress innovation, but we feel like we’re losing control.
and . . .
Our storage vendor is asking us to create a private cloud using mostly the same products as before but now with additional federation products. Is the technology ready for building a private cloud?
Here’s how I see it. The cloud is happening, whether you like it or not. It’s a lot like what we saw with storage virtualization in 2000. I felt then that the concept had so much merit it was bound to happen, but it took much longer than seemed logical. That’s simply the reality of IT. Even when a paradigm- shifting technology comes along, it takes time for it to get into daily use. The cloud is similar. Implemented correctly, it’s supposed to improve storage utilization while allowing you to scale up or down at will. You can pay as you grow and enjoy an easy-to-use storage system. So, the question isn’t why, but when and how.
read/write | arun taneja
Don’t let the cloud obscure good judgment
While new and largely untested, cloud storage is likely to become a significant part of
your data storage infrastructure.
FOLLOW THE CLOUD My first piece of advice is don’t fight the cloud. You’ll need to develop in- house expertise to understand what cloud technology is, what’s real and what’s not, who’s in the game and so on. Next, you’ll want to experiment with public cloud offerings using data you can afford to mess around with. You can test the waters to see how scaling works, how services provide security, if data transfer speeds are adequate and so on. You’ll also want to test out recovering files, full volumes and more. These tests should help you to develop guidelines you can provide to business divisions defining what data may or may not be sent outside the company, and how it needs to be managed. This will bring consistency to the enterprise while ensuring that innovation in cloud technology is being exploited.
Perhaps the easiest way to get into the game is to use a gateway product as an on-ramp to the cloud. You want to avoid writing your own cloud interface code even if you’re familiar with the Web services APIs used by most cloud services. The gateway vendors have already done the heavy lifting and provide a standard way of interacting with existing applications (via NFS, CIFS, iSCSI, Fibre Channel), while ac- commodating the idiosyncrasies of each public cloud on the back end. Vendors in this category include Cirtas, Iron Mountain, LiveOffice, Mimecast (for email archiving), Nasuni, Nirvanix, StorSimple, TwinStrata and Zetta among others.
BUILDING YOUR OWN CLOUD For a private cloud, find out what your primary storage vendor is planning. Vendors are at different stages of product development and availability. EMC seems to be ahead right now, having announced and shipped VPLEX, an important federation technology that’s crucial in building large clouds. But all major storage vendors have serious plans to deliver private storage cloud products and services. Not surprisingly, each wants you to build your private cloud almost exclusively with components from them, but from my perspective, no one in the market has all the pieces yet. You may consider other alternatives. Nirvanix, for instance, has created something it calls hNode, or a hybrid node. Essentially, it lets you create a private cloud using the same software Nirvanix uses for its own Storage Delivery Network (SDN); this would allow your private
STORAGE
Perhaps the easiest way to get into the game is to use a gateway product as an on-ramp to the cloud.
cloud to interface with a public cloud based on the Nirvanix architecture.
LONG-TERM CONSIDERATIONS Whatever route you decide to take, keep in mind that it’s one of the most strategic decisions you’ll make. Once you sign on with a vendor you’re likely to be locked in for a long time.
Vendors are all in learning mode today, just as we are. So take the time to study and experiment, before jumping headlong onto the bandwagon. 2
Arun Taneja is founder and president at Taneja Group, an analyst and consulting group focused on storage and storage-centric server technologies. He can be reached at [email protected].
STORAGE
A disaster recovery strategy is paramount for maintaining continuity, and ultimately protects your business from total collapse. Spectra’s solutions
include the latest tape technologies and features that allow you to backup and archive your data with complete confidence, giving you the peace of
mind you need when you get caught by surprise.
Backup • Archive • Tiered Storage
2010 Winner of Storage Magazine Quality Awards for Mid-Range and Enterprise Tape Libraries
Fibre Channel still top dog among disks With 6 Gb SAS disks and emerging solid-state products stirring up the data storage pot, and concerns about ever-escalating capacities, more attention is turn- ing toward drive technologies. In our survey of more than 200 storage users, 69% had Fibre Channel (FC) arrays installed, helping FC remain the most widely installed storage array type. But at 62%, NAS is closing in on FC, and DAS—often overlooked or discounted—was the third most popular alternative. iSCSI SANs and multipro- tocol arrays are each used by approximately one-third of respondents. FC disks account for 54% of all installed disks, followed by SATA and SAS. But that mix will likely change as respondents shop for the average 55 TB of disk capa city they expect to add this year. Storage managers’ shopping carts will be filled with nearly equal parts of SATA (52%), FC (50%) and SAS (47%) disk s. In addition, storage buyers expect SATA/SAS and PCIe solid-state storage to make up 13% and 11%, respectively, of their storage purchases. —Rich Castagna
“I’m waiting for the concept of ‘hybrid’ drives with a small amount of their own SSD—popular in the desktop world now—to make it into enterprise storage.”
—Survey respondent
snapshot
Which type of drive is used for your company’s highest storage tier?
currently installed at your company?
69% Fibre Channel SAN
33% iSCSI SAN
Average TB of disk capacity to be added in 2011
By percentage, the mix of currently installed disk types and 2011 planned additions.
Solid state (PCIe
interface) 8%
SAS 10%
SATA 11%
14%
Will add in 2011
MAY Automated Storage Tiering
Storage tiering is gaining renewed interest, largely due to the emergence of solid-state storage as a viable alternative for high-performance storage. But automated tiering doesn’t just benefit the high end; it’s an effective way to make efficient use of all installed storage resources. We’ll describe what vendors are offering automated tiering capabilities and how they work.
Storage Purchasing Intentions Over the last eight years, Storage magazine and SearchStorage.com have fielded a twice-yearly survey to determine the purchasing plans of storage professionals. This article reports and analyzes the results of the latest edition of the survey and provides insight into emerging trends.
Blueprint for Cloud-Based Disaster Recovery
Cloud storage services may seem perfect for disaster recovery (DR) planning, especially for smaller firms that may not have the resources for collocation facilities. But there’s much more to plan for than just tucking a firm’s data into a safe place in the cloud; getting it back when it’s needed may be the key to whether cloud DR is an appropriate choice.
And don’t miss our monthly columns and commentary, or the results of our
Snapshot reader survey.
Editorial Director Rich Castagna
Creative Director Maureen Joyce
Steve Duplessie, Jacob Gsoedl, W. Curtis Preston
Executive Editor Ellen O’Brien
Senior News Director Dave Raffo
Senior News Writer Sonia Lelii
Features Writer Carol Sliwa
Editorial Assistant Allison Ehrhart
Managing Editor Heather Darcy
Features Writer Todd Erickson
TechTarget Conferences Director of Editorial Events Lindsay Jeanloz
Editorial Events Associate Jacquelyn Hinds
Storage magazine Subscriptions: www.SearchStorage.com
• Preventing Data Overload
• How Dell EqualLogic Auto-Snapshot Manager/VMware Edition Helps Protect Virtual Environments
• Kane County Saves $1 Million with Server and Desktop Virtualization on Dell Servers and Virtualized iSCSI SANs
• Sizing and Best Practices for Deploying VMware View 4.5 on VMware vSphere 4.1 with Dell EqualLogic Storage
See ad page 4
• ESG:What EMC is Doing to Backup
• IDC BUYER CASE STUDY: EMC IT Increasing Efficiency, Reducing Costs, and Optimizing IT with Data Deduplication
SPONSOR RESOURCES
• A guide to understanding what VMware backup method is best for your environment
• HP StoreOnce Deduplication Software: Technology Fueling the Next Phase of Storage Optimization
See ad page 35
• Reduce your data storage footprint and tame the information explosion
• Leverage the IBM Tivoli advantages in storage management
• Virtualize Storage with IBM for an Enhanced Infrastructure
See ad page 46
• Printing Franchisee Uses NETGEAR to Protect Against Site-Wide Disaster® ReadyNAS® 2100
• ReadyNAS® 3200 Boosts Financial Services Firm Productivity by Supporting 20 Virtual Machines, Cutting Rack Space by 80% and Costs by Over 50%
See ad page 20
• Deduplication in the Enterprise Data Center: Assessing the Impact and Benefits
• Deduplication for Dummies
• Checklist: Key factors in planning a virtual desktop infrastructure
• The first step toward a virtual desktop infrastructure: The assessment
Gotta yottabyte?
Virtual disaster recovery
Backing up to the cloud requires new approach to bandwidth
Don't let the cloud obscure good judgment
Fibre Channel still top dog among disks
May 2011 preview/Editorial masthead