Post on 12-Mar-2020
E-Guide
Best bets for backup: How to
optimize your storage and choose
a dedupe method
Believe it or not but there are inexpensive ways to optimize your
enterprise backup system that have a dramatic and positive impact on
your infrastructure as a whole. Check out this expert E-Guide to learn
backup best practices, when it makes sense to make a major upgrade,
and how to figure out what may be slowing down your backup systems
and fixing it. Also learn tips for choosing the best dedupe technology
for your company.
Sponsored By:
SearchDataBackup.com E-Guide
Best bets for backup: How to optimize your storage and choose a dedupe method
Sponsored By: Page 2 of 13
E-Guide
Best bets for backup: How to optimize
your storage and choose a dedupe
method
Table of Contents
Backup best practices: Easy fixes for your enterprise backup system
Deduplication best practices and choosing the best dedupe technology
Resources from EMC Corporation
SearchDataBackup.com E-Guide
Best bets for backup: How to optimize your storage and choose a dedupe method
Sponsored By: Page 3 of 13
Backup best practices: Easy fixes for your enterprise backup system
By W. Curtis Preston
There are easy, cheap things you can do to optimize your enterprise backup system and
have a dramatic impact. This column will discuss how to figure out what may be slowing
down your backup system, how to fix it and will give you some hints as to when to throw in
the towel. You will learn backup best practices, and when the best thing to do is to make a
major upgrade.
The first and most important thing that you must do to be able to improve your enterprise
backup system is to be armed with information. You need to know solid answers to the
following questions:
How much data are you backing up in a full? (You need this number for each client
you are backing up, as well as the aggregate number.)
How much data are you backing up in an incremental? (You also need these for each
client.)
The answer to these questions are best found by querying your backup system. It may take
some time if you've never used that part of your data backup system, so you may need to
get some support (from your backup software vendor, or perhaps a user-based support
community). But if this critical information can't be obtained from your backup software, it's
time to move on to a different backup software package. Symantec NetBackups' bpimagelist
and EMC NetWorker's mminfo commands are examples of how to obtain this information.
The next question you need an answer to is:
What is the network interface for your backup system and how is it configured (e.g.,
TOE, jumbo frames)?
This question is best answered by either the system or network administrator. The answer
you want to hear is "10 GbE offload NIC with jumbo frames," but it's probably not the
answer you're going to hear. 10 GbE is obviously the most recent topology and offers
incredible benefits to your backup system. The most obvious benefit is that you'll have a
network that is faster than your tape drive. Without a network that is faster than your
SearchDataBackup.com E-Guide
Best bets for backup: How to optimize your storage and choose a dedupe method
Sponsored By: Page 4 of 13
target, it is impossible to properly design your data backup system. Therefore, you will need
to change your target or network. If you are unable to upgrade your network, the next most
obvious step would be to move to some type of disk-based backup system, as disk is much
more forgiving of slow networks than tape is. But if it's possible to upgrade your backup
server's NIC, that's a whole lot easier and cheaper than completely rearchitecting your
backup system.
An offload NIC offloads some or all of the TCP processing from your host CPU. If it offloads
all of the processing it is called a TOE card, for TCP Offload Ethernet card. If it offloads some
functions but not others, it is just called an offload card.
Manufacturers of both types of cards would be glad to explain why their approach is better
than the other, but suffice it to say that either is better than having neither. Finally there
are jumbo frames. The standard Maximum Transfer Unit (MTU) of Ethernet is 1,500 bytes
and this value was decided decades ago. The argument for Jumbo Frames is that today's
network speeds are so fast that making a frame every 1,500 bytes requires too many CPU
interrupts; a "jumbo" frame size of 9,000 bytes is more appropriate. If your NIC and
network support Jumbo Frames, your CPU can be interrupted six times less frequently,
increasing the effective speed of your interface.
If you're stuck on 100 MbE, then you really have to upgrade both your network
infrastructure and NIC to have any kind of decent performance. There's no reason to buy a
GbE NIC, though, even if your network doesn't support 10 GbE yet. Buy a 10 GbE NIC and
have it step down to 1 GbE. Then you'll be the first one to take advantage of 10 GbE once
they upgrade the network. If you're using 1 GbE, and your network can support 10 GbE,
upgrading your backup server's NIC to 10 GbE should be the first thing on your priority list.
You should also ask yourself:
What are the I/O throughput capabilities of your backup server?
This question is a little difficult to answer using specifications; it is best to answer via testing
where you take other things out of the picture. The details on how to do this are way
beyond the scope of this column, but a basic suggestion would be to use tools like iostat in
Linux and Unix, and iometer in Windows. How fast can you move I/O through this server?
SearchDataBackup.com E-Guide
Best bets for backup: How to optimize your storage and choose a dedupe method
Sponsored By: Page 5 of 13
Are you limited by the backplane or your I/O bus? If you are, there's not much you can do
other than to upgrade the server on which you're running your backups.
Next, find out:
What is the native transfer rate of your tape drive?
What compression ratio are you getting on your data?
These questions are about finding out how fast your tape drive should be running. So look
at the tapes marked full by your backup software and see how much data you typically fit
on a tape before it says it's full. If you consistently fit 600 GB on a 400 GB tape, then you're
getting 1.5:1 compression. Multiply that number by the vendor's rated throughput for your
drive, and you've got your target throughput number for your tape drive.
Then you need to do whatever you can to supply a rate of data to that tape drive that is
close to its target throughput rate. If your target rate is 150 MBps and you only have a 1
GbE NIC, you see why we spent so much time on talking about the network. Techniques
such as multiplexing (good for backups, can hurt restores) and disk staging (good for
backups, doesn't help restores) are the things you should explore to get your backups to go
fast enough to keep that tape drive happy.
Incremental backups almost always go too slow to keep a tape drive happy, so they will
most certainly need to be staged to disk. Full backups can either be staged to disk or
multiplexed (if your backup software supports it). Once you've made one tape drive happy,
see if you can do the same for two or three or more. Don't back up to more tape drives than
you can stream. Even if you're upgraded to 10 GbE and you're pumping 800 MBps of
backups into your backup server, that's only enough to keep four LTO-5 tape drives
streaming at full speed. If you can't keep one tape drive happy, though, adding more tape
drives to the mix will only make things worse.
What is the fastest speed at which you will need to restore data?
Finally, you need to investigate whether your backup system is capable of performing the
fastest restores that it is required to make. Look at the largest and fastest restore that you'll
ever need to do and test to see if your system is capable of performing that restore. For
SearchDataBackup.com E-Guide
Best bets for backup: How to optimize your storage and choose a dedupe method
Sponsored By: Page 6 of 13
example, make sure that any multiplexing that you're using hasn't had a negative impact on
restore speed.
Although getting the answers to these questions may take some work, you'll be well on your
way to learning some valuable backup best practices, and solving some common enterprise
backup problems.
About this Author: W. Curtis Preston (a.k.a. "Mr. Backup"), executive editor and
independent backup expert, has been singularly focused on data backup and recovery for
more than 15 years. From starting as a backup admin at a $35 billion dollar credit card
company to being one of the most sought-after consultants, writers and speakers in this
space, it's hard to find someone more focused on recovering lost data. He is the webmaster
of BackupCentral.com, the author of hundreds of articles, and the books "Backup and Recovery" and "Using SANs and NAS."
EMC PRESENTS
EMC2, EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks are the property of their respective owners.© Copyright 2011 EMC Corporation. All rights reserved. Source: IDC Worldwide Purpose Built Backup Appliance 2010-2015 Market Analysis and Forecast, 2010 Vendor Shares report (May 2011).
“Discover the Power of Next Generation Backup”
See why EMC is the leader in backup and recovery at www.EMC.com/transformbackup.
SearchDataBackup.com E-Guide
Best bets for backup: How to optimize your storage and choose a dedupe method
Sponsored By: Page 8 of 13
Deduplication best practices and choosing the best dedupe technology
By Todd Erickson, Features Writer
Data deduplication is a technique to reduce storage needs by eliminating redundant data in
your backup environment. Only one copy of the data is retained on storage media, and
redundant data is replaced with a pointer to the unique data copy. Dedupe technology
typically divides data sets in to smaller chunks and uses algorithms to assign each data
chunk a hash identifier, which it compares to previously stored identifiers to determine if the
data chunk has already been stored. Some vendors use delta differencing technology, which
compares current backups to previous data at the byte level to remove redundant data.
Dedupe technology offers storage and backup administrators a number of benefits,
including lower storage space requirements, more efficient disk space use, and less data
sent across a WAN for remote backups, replication, and disaster recovery. Jeff Byrne, senior
analyst for the Taneja Group, said deduplication technology can have a rapid return on
investment (ROI). "In environments where you can achieve 70% to 90% reduction in
needed capacity for your backups, you can pay back your investment in these dedupe
solutions fairly quickly."
While the overall data deduplication concept is relatively easy to understand, there are a
number of different techniques used to accomplish the task of eliminating redundant backup
data, and it's possible that certain techniques may be better suited for your environment.
So when you are ready to invest in dedupe technology, consider the following technology
differences and data deduplication best practices to ensure that you implement the best
solution for your needs.
In this guide on deduplication best practices, learn what you need to know to choose the
right dedupe technology for your data backup and recovery needs. Learn about source vs.
target deduplication, inline vs. post-processing deduplication, and the pros and cons of
global deduplication.
SearchDataBackup.com E-Guide
Best bets for backup: How to optimize your storage and choose a dedupe method
Sponsored By: Page 9 of 13
Source Deduplication vs. Target
Deduping can be performed by software running on a server (the source) or in an appliance
where backup data is stored (the target). If the data is deduped at the source, redundancies
are removed before transmission to the backup target. "If you're deduping right at the
source, you get the benefit of a smaller image, a smaller set of data going across the wire
to the target," Byrne said. Source deduplication uses client software to compare new data
blocks on the primary storage device with previously backed up data blocks. Previously
stored data blocks are not transmitted. Source-based deduplication uses less bandwidth for
data transmission, but it increases server workload and could increase the amount of time it
takes to complete backups.
Lauren Whitehouse, a senior analyst with the Enterprise Strategy Group, said source
deduplication is well suited for backing up smaller and remote sites because increased CPU
usage doesn't have as big of an impact on the backup process. Whitehouse also said
virtualized environments are also "excellent use cases" for source deduplication because of
the immense amounts of redundant data in virtual machine disk (VMDK) files. However, if
you have multiple virtual machines (VMs) sharing one physical host, running multiple hash
calculations at the same time may overburden the host's I/O resources.
Most well-known data backup applications now include source dedupe, including Symantec
Corp.'s Backup Exec and NetBackup, EMC Corp.'s Avamar, CA Inc.'s ArcServe Backup, and
IBM Corp.'s Tivoli Storage Manager (TSM) with ProtecTier.
Target deduplication removes redundant data in the backup appliance -- typically a NAS
device or virtual tape library (VTL). Target dedupe reduces the storage capacity required for
backup data, but does not reduce the amount of data sent across a LAN or WAN during
backup. "A target deduplication solution is a purpose built appliance, so the hardware and
software stack are tuned to deliver optimal performance," Whitehouse said. "So when you
have large backup sets or a small backup window, you don't want to degrade the
performance of your backup operation. For certain workloads, a target-based solution might
be better suited."
SearchDataBackup.com E-Guide
Best bets for backup: How to optimize your storage and choose a dedupe method
Sponsored By: Page 10 of 13
Target deduplication may also fit your environment better if you use multiple backup
applications and some do not have built-in dedupe capabilities. Target-based deduplication
systems include Quantum Corp.'s DXi series, IBM's TSM, NEC Corp.'s Hydrastor series,
FalconStor Software Inc.'s File-interface Deduplication System (FDS), and EMC's Data
Domain series.
Inline Deduplication vs. Post-processing dedupe
Another option to consider is when the data is deduplicated. Inline deduplication removes
redundancies in real time as the data is written to the storage target. Software-only
products tend to use inline processing because the backup data doesn't land on a disk
before it's deduped. Like source deduplication, inline increases CPU overhead in the
production environment but limits the total amount of data ultimately transferred to backup
storage. Asigra Inc.'s Cloud Backup and CommVault Systems Inc.'s Simpana are software
products that use inline deduplication.
Post-process deduplication writes the backup data into a disk cache before it starts the
dedupe process. It doesn't necessarily write the full backup to disk before starting the
process; once the data starts to hit the disk the dedupe process begins. The deduping
process is separate from the backup process so you can dedupe the data outside the backup
window without degrading your backup performance. Post-process deduplication also allows
you quicker access to your last backup. "So on a recovery that might make a difference,"
Whitehouse said.
However, the full backup data set is transmitted across the wire to the deduplication disk
staging area or to the storage target before the redundancies are eliminated, so you have to
have the bandwidth for the data transfer and the capacity to accommodate the full backup
data set and deduplication process. Hewlett-Packard Co.'s StorageWorks StoreOnce
technology uses post-process deduplication, while Quantum Corp.'s DXi series backup
systems use both inline and post-process technologies.
Content-aware or application-aware deduplication products that use delta-differencing
technology can compare the current backup data set with previous data sets. "They
understand the content of that backup stream, and they know the format that the data is in
SearchDataBackup.com E-Guide
Best bets for backup: How to optimize your storage and choose a dedupe method
Sponsored By: Page 11 of 13
when the backup application sends it to that target device," Whitehouse said. "They can
compare the workload of the current backup to the previous backup to understand what the
differences are at a block or at a byte level." Whitehouse said delta-differencing-based
products are efficient but they may have to reverse engineer the backup stream to know
what it looks like and how to do the delta differencing. Sepaton Inc.'s DeltaStor system and
Exagrid System Inc.'s DeltaZone architecture are examples of products that use delta
differencing technology.
Global Deduplication
Global deduplication removes backup data redundancies across multiple devices if you are
using target-based appliances and multiple clients with source-based products. It allows you
to add nodes that talk to each other across multiple locations to scale performance and
capacity. Without global deduplication capabilities, each device dedupes just the data it
receives. Some global systems can be configured in two-node clusters, such as FalconStor
Software's FDS High Availability Cluster. Other systems use grid architectures to scale to
dozens of nodes, such as Exarid Systems'DeltaZone and NEC's Hydrastor.
The more backup data you have, the more global deduplication can increase your dedupe
ratios and reduce your storage capacity needs. Global deduplication also introduces load
balancing and high availability to your backup strategy, and allows you to efficiently manage
your entire backup data storage environment. Users with large amounts of backup data or
multiple locations will gain the most benefits from the technology. Most of the backup
software providers offer products with global dedupe, including Symantec NetBackup and
EMC Avamar, and data deduplication appliances, such as IBM's ProtecTier and Sepaton's
DeltaStor offer global deduplication.
As with all data backup and storage products, the technologies used are only one factor you
should consider when evaluating potential deduplication systems. In fact, according to
Whitehouse, the type of dedupe technologies vendors use is not the first attribute many
administrators look at when investigating deduplication solutions. Price, performance, and
ease of use and integration top deduplication shopper's lists, Whitehouse said. Both
Whitehouse and Byrne recommend first finding out if your current backup product has
SearchDataBackup.com E-Guide
Best bets for backup: How to optimize your storage and choose a dedupe method
Sponsored By: Page 12 of 13
deduplication capabilities. If not, analyze your needs long term and study the vendors'
architectures to determine if they match your workload and scaling requirements.
SearchDataBackup.com E-Guide
Best bets for backup: How to optimize your storage and choose a dedupe method
Sponsored By: Page 13 of 13
Resources from EMC Corporation
EMC Defenders of the Virtual World
EMC Backup to the Future
EMC Backup and Recovery Solutions
About EMC Corporation
EMC Corporation is the world leader in products, services and solutions for information
storage and management that help organizations extract the maximum value from their
information, at the lowest total cost, across every point in the information lifecycle. We are
the information storage standard for every major computing platform and, through our
solutions, serve as caretaker for more than two-thirds of the world's most essential
information. We help enterprises of all sizes manage their growing volumes of information--
from creation to disposal--according to its changing value to the business through
information lifecycle management (ILM) strategies. EMC information infrastructure solutions
are at the heart of this mission, helping organizations manage, use, protect, and share their
information assets more efficiently and cost-effectively. Our world-class solutions integrate
networked storage technologies, storage systems, software, and services.