A Guide to Managing Company Datainfo.4imprint.com/wp...Managing-your-companys-data.pdfA guide to...
Transcript of A Guide to Managing Company Datainfo.4imprint.com/wp...Managing-your-companys-data.pdfA guide to...
4imprint.com
A Guide to Managing
Company Data
© 2012 4imprint, Inc. All rights reserved
Contigency planning:A guide to managing your company’s dataManaging company data may not seem like a critical part of your day-to-day
operations, but the day you lose it will certainly alter your perspective—and
perhaps the very trajectory and success of your business.
It doesn’t matter if your company is goods or service-oriented—information is the
cornerstone of all that you do, no matter what you do. From your daily and yearly
sales figures to vendor invoices and other expenses, important tax information
and details on your employees, it’s probably all in one place: your computer or
your company’s server. Have you ever wondered what would happen if that all
disappeared one day?
Consider the case of the disgruntled employee at an
architectural firm who suspected she was about to
be fired. To exact revenge, she settled on sabotage as
her weapon of choice and deleted a network of files
valued at over $2.5 million. The data was eventually
retrieved, albeit utilizing a very costly recovery
service.1
Then there’s the instance of a pet store chain that housed all of its operational
data in a stand-alone database hosted on its website. An outside Web developer,
attempting to clean up unnecessary coding on their site, accidentally deleted all
business records with one simple keystroke. Without backup, the pet store chain’s
entire inventory, point of sale transaction data and human resource information
were lost. The company never recovered and eventually filed for bankruptcy in
the same year.2
There are more anecdotal horror stories where those came from, but enough of
the bad news. The point is to learn from these unfortunate occurrences, prepare
in advance and prevent any kind of data loss from affecting your business.
This Blue Paper™ identifies the importance of safe data storage and makes the
case for a strong data backup strategy. We will start with a summary of the
evolution of data. Next, we will expound on the different kinds of data that exist
as well as a few basic ideas to consider before launching a storage strategy of
your own. Then, we’ll discuss the likelihood of failure and the proper questions
1 “Angry Employee Deletes All of Company’s Data | Fox News.” Fox News. FOX News Network, 24 Jan. 2008. Web. 14 Aug. 2012. <http://www.foxnews.com/story/0,2933,325285,00.html>.
2 Papadimoulis, Alex. “Death by Delete.” Redmond Developer News. 1105 Media Inc., 1 Jan. 2009. Web. 14 Aug. 2012. <http://reddevnews.com/articles/2009/01/01/death-by-delete.aspx>.
© 2012 4imprint, Inc. All rights reserved
to ask when planning data recovery. Finally, we’ll close with a handful of device
definitions on the most common kinds of data backup software and systems.
Let’s begin!
The evolut ion of dataData is increasing exponentially. In fact, it is estimated that 90 percent of existing
digital data has been created within the last two years.3 While Facebook® and
YouTube® are contributing factors, much of this growth can also be attributed
to big data. Big data refers to colossal databases used to draw conclusions that
are not otherwise obvious. Not only does that mean typical computer activity
like how many times “deli in Manhattan” is searched on Bing® or the number of
Twitter® posts on car insurance, but this also includes daily activities like passing
through a toll booth, duration of a cell phone call and
tracking purchases on a store credit card.
These are awe-inspiring feats when you consider the
humble beginnings of modern computing. While the
invention of the computer is a little difficult to pinpoint,
modern processing is really a child of the 1970s, born
out of decks of programming punch cards.4 Technology
advanced rapidly thereafter and the cost of data storage
dropped dramatically, which accounts for the profound
growth since. Here’s a chronological overview of the major data milestones
throughout the years:
• 1980s: In January 1980, the cost of storing 1GB of data was $193,000.5
However, with the introduction of the floppy disk (1.4MB capacity) in
1981, followed by the CD (700MB capacity) just one year later, data
storage costs fell considerably. 6
• 1990s: In September 1990, the cost of storing 1GB of data dropped to
$9,0007—a 95.34 percent decrease within a decade. In the 90s, advances
in technology brought about extended capacity CDs and DVDs (able
to store up to 4.7GB) and then flash drives, which were capable of
housing between 4MB and 256GB.8
3 “IBM What Is Big Data? - Bringing Big Data to the Enterprise.” What Is Big Data? IBM, n.d. Web. 16 Aug. 2012. <http://www-01.ibm.com/software/data/bigdata/>.
4 Kopplin, John. “An Illustrated History of Computers - Part 2.” Computer Science Lab. N.p., n.d. Web. 16 Aug. 2012. <http://www.computersciencelab.com/ComputerHistory/HistoryPt2.htm>.
5 Komorowski, Matt. “A History of Storage Costs.” Mkomo.com. N.p., n.d. Web. 16 Aug. 2012. <http://www.mkomo.com/cost-per-gigabyte>.
6 A Brief History of Digital Data. Prod. Viet Huynh. YouTube. Sweat & Pixels Design Studio, 22 Sept. 2011. Web. 16 Aug. 2012. <http://www.youtube.com/watch?v=ah14LEFKe8Q>.
7 Ibid.8 Ibid.
© 2012 4imprint, Inc. All rights reserved
• 2000s: In February 2000, the cost of storing 1GB of data dropped again
to $19.709—another 99.78 percent decrease from the previous decade.
In addition to USBs storing between 8MB-256GB, optical formats now
included BluRay® discs with 25GB storage capacity.10 In July 2009, it cost
only $0.07 to house 1GB of data.11 That’s when big data truly began to
flourish due to the relatively low cost to house expansive databases.
Now that we’ve had a brief history lesson on the evolutionary cost of data, let’s
talk about your data and what you need to know to store it safely.
Data management plansThere are six factors that apply to almost every industry that should be adequately
planned for: data growth and the corresponding cost, server space and data
security, peak time and upgrades. Naturally, each organization will have specific
data sets that apply only to their company or industry, but let’s take a general
look at what goes into a data management plan (DMP).
Types of data and the bas ics of data storage strategyUnderstanding the types of files your server holds is the basis for formulating an
effective data management plan. This analysis will also help you plan for growth,
as well as store your data more efficiently.
The first tactic to employ in developing a data storage strategy is data
classification. There is software you can use—like the F5® Data Manager—to paint
a concise picture of the contents on your server. Data analysis and mapping give
you a more in-depth look at:
• What file formats are being created
• Who is creating them
• How old they are
• And how much storage capacity each file consumes
Below are four components worth considering before determining the data
storage strategy that’s right for you. Each is explained and then followed by a
series of questions worth asking and understanding before you make any hard
and fast decisions. For further clarification, refer to an IT professional.
9 Ibid.10 Ibid.11 Ibid.
© 2012 4imprint, Inc. All rights reserved
1.Metadatastandardsanddataprovenance
Metadata provides structured information explaining such details as the
purpose, origin, geographic location, access conditions, and terms of use
of a data collection. To put this into context, files without metadata are
like a library without a card catalogue. Here are a few questions worth
considering when setting your metadata plan in motion:12
• Which metadata standards will you use?
• Why have you chosen them?
• How will you record these details?
• What information is needed to make the
data you collect meaningful to others?
• Likewise, what information do you need to
make that data reusable?
2.Provisionsforprivacy,confidentialityandlicensing
You should first explain how and when the data will become available. If
there is an embargo period for sharing the data, make sure you provide
details explaining the delay. If the data is sensitive in nature—if, for
example, it contains health-related privacy issues or competitive analysis
insight—and public access is inappropriate, address the means by which
you plan to control access. For instance:
• Who will hold the intellectual property rights to the data?
• How long will the original data creator/principal investigator
retain the right to use the data before making it available for
wider distribution?
• Are there any embargo periods for political or commercial
patent reasons? If so, what are the details?
• Describe any permission restrictions that will need to be placed
on the data.
• Are there ethical or privacy issues? If so, how will these be
resolved?
12 Higgins, Sarah. “What Are Metadata Standards.” What Are Metadata Standards | Digital Curation Centre. Digital Curation Centre, n.d. Web. 22 Aug. 2012. <http://www.dcc.ac.uk/resources/briefing-papers/standards-watch-papers/what-are-metadata-standards>.
© 2012 4imprint, Inc. All rights reserved
• If you have approval from the U.S. Department of Health and
Human Services (HHS) Institutional Review Board (IRB), or are in
the process of applying for it, how will you comply with those
obligations?
3.Policiesfordataaccessduringandafteryourproject
Think about how you prepare and manage your data for sharing and
explain how you will actively share your data with non-group members
after the project is complete. You should explain how and where the
data will be accessible as well as identify who will be allowed to use it,
how they will be allowed to utilize it and whether or not they will be
allowed to disseminate it. Think about some of these questions:
• Will your data be accessible?
• How will you make it available? Include resources like
necessary equipment and systems needed to do that.
• What is its intended use?
• Who are its intended users?
• If permission restrictions exist, what is the process for gaining
access to the data?
• Explain how you will store data during the project’s lifetime.
• How you will archive that data?
• If applicable, how will you transfer or transmit that data?
4.Plansforarchivingandpreservation
To archive data is to move less important information from an active
storage device to a less-used storage device for basic retention purposes.
This eases the capacity and enhances the performance of the first, more
active device. In terms of data archival, there are many subject-specific
data repositories, all of which could serve as an archiving option for your
data. But first, ask:
• How long should data be kept beyond the life of the project?
• What data will be preserved in the long-term?
• Which database have you identified as a place to deposit the
data?
© 2012 4imprint, Inc. All rights reserved
• What is the long-term strategy for maintaining and curating
your data?
• What procedures does your intended long-term data storage
facility have in place for preservation and backup?
• Are there any conversions necessary to prepare data for
preservation or data sharing?
What you save and how you save it are directly linked. So be sure to have a solid
understanding of the kinds of files and documents and information formats are
saved on your computer or server. That way, you’ll know just what it will take to
properly save and store your data.
Backup fa i lure: I t happensSomeone once remarked that “there are only two types of hard drives—the ones
that have failed and the ones that will fail.”13 This adequately describes backup
devices: Even though hard drives are not a living organism, they have a definitive
life span and each one will eventually die. According to a study conducted by
Pepperdine University®, here are the most prevalent reasons for failure:14
1. Hardware failure - 40%
2. Human error - 29%
3. Software corruption - 13%
4. Theft - 9%
5. Computer viruses - 6%
6. Hardware destruction - 3%
Whether it’s hardware failure or human error, failure happens. Unfortunately,
lost data cannot be saved by implementing a backup system after it’s gone. Plan
appropriately because data backup failure is not uncommon.
13 “Backing up Data - Why You Need to Do It.” Backing up Data - Why You Need to Do It | PC 911. PC911, 28 Feb. 2011. Web. 17 Sept. 2012. <http://pcnineoneone.com/howto/backup1/>.
14 Smith, David M. “Graziadio Business Review | Graziadio School of Business and Management | Pepperdine University.” The Cost of Lost Data - Graziado Business Review | Graziado School of Business and Management | Pepperdine University. Pepperdine University, 2003. Web. 17 Sept. 2012. <http://gbr.pepperdine.edu/2010/08/the-cost-of-lost-data/>.
© 2012 4imprint, Inc. All rights reserved
So what happens to your business when your data backup fails? Well, in the same
study by Pepperdine University, “a company that experiences a computer outage
lasting for more than 10 days will never fully recover financially.”15 Worse still is
that half of companies that endure such a dilemma will likely be out of business
within five years.
Hard to believe? Well, computer-stored data, though intangible, is worth a
great deal. Value of data lost is determined by its primary utility and frequency
of use, both of which are specific to the business that lost it. Take a moment to
think about the “price” of your data. To do that, you might first think of what
capabilities you would lose if you lost your data.
A lot of them. Maybe even all of them.
Could you function without them? Probably not.
Next step: RecoveryData can be lost in a natural disaster like a flood or fire
or it can be physically stolen if someone takes the computer or primary storage
device. Data can also be lost in a power failure or power surge. To be smart,
implement a few preventative measures in case a backup failure occurs. But first
there are four main items to remember in your quest to recover lost data:
1. Restoretimeobjectives(RTO) refers to the amount of time your
organization needs to recover from a data loss. Many organizations
have multiple RTO’s. For example, one RTO may specify how long
before the major functions of the enterprise are back online while
a second, longer RTO determines how long until everything is fully
recovered.16
2. Restorepointobjectives(RPO) is the maximum length of time you
can do without data. Or rather, how quickly do you want or need it
restored? Like the RTO, the RPO is often assigned critical functions
such as transaction processing. Having a short RPO means having less
immediate functions and recovering to a point further back in time. It
can be anywhere from a few seconds in the case of a sophisticated (and
expensive) remote mirroring system, to several hours, or even several
days for less critical data.
15 Ibid.16 Cook, Rick. “Set Disaster-recovery Objectives.” Set Disaster-recovery Objectives. SearchStorage, n.d. Web. 22
Aug. 2012. <http://searchstorage.techtarget.com/tip/Set-disaster-recovery-objectives>.
© 2012 4imprint, Inc. All rights reserved
3. Networkrecoveryobjective(NRO) is the time needed to recover
network operations, specifically, how long before you appear recovered
to your customers? It includes such jobs as establishing alternate
communications links, reconfiguring Internet servers, setting alternate
TCP/IP addresses and everything else to make the recovery transparent
to customers, remote users and others.
4. Restoregranularityobjectives(RGO) refers to the level of objects that
can be easily recovered (e.g. a file, email, directory, hard drive, full
system image, etc.).
However you lose it, the majority of cases—83 percent—can be recovered. You’ve been warned, though: Recovery can be an expensive operation.17
Device def in i t ionsMost sources available for data storage fail to recognize that in many
organizations, not everyone responsible for IT is necessarily an IT professional.
This is especially true for small businesses where most employees wear multiple
hats. So when it comes to data storage, there are a handful of terms and device
definitions to be familiar with in case data is lost and needs to be restored. Here
are some basic storage hardware configurations to know:
Remotemirroringsystems18
One of the most basic tools for the purposes of data storage and backup is known
as a remote mirroring system (See also: cloud storage.) As its name implies,
it generates a mirror image of the data on one or more disks located locally
or remotely. It functions in real time so as to provide the most current critical
business data accessible via duplicate disks. Information stored on them can be
used for substitution in case of an emergency or be used to facilitate
data migration.
Diskarray
A disk array is a kind of storage system that links multiple hard drives
into one big drive. Disk arrays organize data into something called
logical units (LU).19 To the client, these look like blocks. Small arrays
with only a few disks can store eight LU while larger arrays with hundreds of disks
can store thousands of LU.20
17 Ibid.18 Larsen, Brian. “Disk Mirroring - Local or Remote.” Disk Mirroring - Local or Remote - InfoManagement Direct
Article. InfoManagement Direct, 1 Dec. 2003. Web. 18 Sept. 2012. <http://www.information-management.com/infodirect/20031212/7861-1.html>.
19 “What Is Disk Array?” What Is Disk Array? - A Word Definition From the Webopedia Computer Dictionary. Webopedia, n.d. Web. 17 Sept. 2012. <http://www.webopedia.com/TERM/D/disk_array.html>.
20 Ibid.
© 2012 4imprint, Inc. All rights reserved
The most common kind of disk array is a Redundant Array of Independent Disks
(RAID). The advantage of RAID backup lies in its name: Redundancy implies its
ability to write and store data to multiple locations in case a file is damaged
or stored in a bad cluster. If that’s the case, it is instantaneously rewritten on
another disk in the array, which increases overall storage performance.21 This kind
of configuration is particularly useful for organizations with servers laden with
multimedia-heavy data.22
In case you’re unfamiliar with this term, perhaps you know it as a “drive array”
or “storage array,” which generally mean magnetic or solid state disks. These are
two or more disk drives built into a stand-alone unit, typically using some RAID
configuration (see RAID). However, optical drives (CD, DVD, etc.) also come in
multi-drive units (see optical disc library). See SAN, NAS and server farm.23
Directattachedstorage(DAS)
Direct attached storage involves a direct connection to the server,
either through the use of an internal server disk controller or an
external storage subsystem.24 DAS systems are recognized for their ease
of management, generally low operating costs and overall simplicity.
However, one drawback of using DAS is that it creates information
isolation, meaning that the information is inaccessible from other
servers. Small businesses may see this as only slightly problematic
whereas larger businesses, not being able to access data may become a
serious problem.
Networkattachedstorage(NAS)
As it implies, NAS is storage attached to the common network via Ethernet. It
is essentially a file server that often integrates an optimized operating system
dedicated to file sharing. This means that all processing is done locally at the
client’s request. Besides its reputation for easy installation, another major benefit
to NAS is solving the compatibility issue with Microsoft®’s Windows platform and
UNIX, allowing file access without additional software.
To give this acronym more context, Western Digital®’s WD Sentinel DX4000 is
a prime example of a NAS device designed for small businesses. As with most
devices, installation is as simple as plug and play, which initializes the automatic
21 “RAID - Redundant Array of Independent Disks.” What Is RAID (Redundant Array of Independent Disks)? A Webopedia.com IT DefinitionWebopedia. Webopedia, n.d. Web. 17 Sept. 2012. <http://www.webopedia.com/TERM/R/RAID.html>.
22 Kayne, R., and Niki Foster. “What Are Disk Arrays?” WiseGeek. Conjecture, 11 July 2012. Web. 17 Sept. 2012. <http://www.wisegeek.com/what-are-disk-arrays.htm>.
23 “Encyclopedia.” Disk Array Definition from PC Magazine Encyclopedia. PC Magazine, n.d. Web. 22 Aug. 2012. <http://www.pcmag.com/encyclopedia_term/0,1237,t=hard+disk+array&i=41489,00.asp>.
24 Parwar, Ashwin. “Understanding Storage Basics - DAS-NAS-SAN.” Understanding Storage Basics - DAS-NAS-SAN. WizIQ, n.d. Web. 22 Aug. 2012. <http://www.wiziq.com/tutorial/74910-Understanding-Storage-Basics-DAS-NAS-SAN>.
© 2012 4imprint, Inc. All rights reserved
system configuration. On the users’ end, setting user preferences is the final task.
The major drawback for employing a NAS, however, is its performance. It provides
file-level input/output (I/O) via traditional file shares, while DAS and SAN provide
block-level I/O. If your eyes are already glazing over, you’re not alone.
When thinking of file vs. block access, let’s look at it from another perspective:
File sharing is like reading a classic novel. You have an in-depth view of the
characters, the landscape and the plot. You can revisit each section and draw
deeper conclusions. Conversely, block sharing is similar to the CliffsNotes®
version—you still get useable information, albeit not as complete. Block data is
suitable for images or other large files that are not altered often while file access
is most appropriate for documents requiring change more regularly.
Storageareanetwork(SAN)
Storage Area Networks are designed to be accessible by multiple
servers, just as local area networks (LAN) connect a server to
multiple computers.25 Unlike a DAS or NAS, all of which contain a
single piece of hardware, SANs are built from multiple hardware
components. These components—hubs, switches, bridges, Small
Computer System Interface (SCSI)—are typically connected by a Fibre Channel.
If an Ethernet cable is like a straw pulling information off the network, a Fibre
Channel is like an oil pipeline for information.
These hardware components play a role in three areas: redundancy, speed and
volume. Switches and hubs generally do the same thing. Like the post office,
both process incoming information—or mail. Switches take that information
and quickly deliver it to a specific location—or mailbox. Hubs, however, aren’t
as discerning. Imagine a small apartment building where the mail is left in the
lobby in bulk. Each tenant must sort the mail and determine what is addressed
to them, creating a time-consuming redundancy in the analysis. Both have their
advantages, but hubs operate best with small enterprises, whereas switches are
for more data-intense operations. Referring back to what type of data is being
produced by the organization will help determine which components will be most
beneficial.26
From availability, reliability, scalability, performance, manageability, and return on
information management, SANs have many advantages. 27
25 “SAN.” SAN (Storage Area Network) Definition. TechTerms.com, n.d. Web. 22 Aug. 2012. <http://www.techterms.com/definition/san>.
26 “SAN Tutorial.” Manhattan Skyline GmbH, n.d. Web. 11 Sept. 2012. <http://www.mskl.de/CONTENT/PDF/SAN_Tutorial.pdf>.
27 “Storage Area Networks.” AllSAN.com - All about Storage Area Network. AllSAN.com, n.d. Web. 22 Aug. 2012. <http://allsan.com/sanoverview.php3>.
© 2012 4imprint, Inc. All rights reserved
As we already stated, NAS operates with file level access, whereas DAS and SAN
are block level, but there are several different types of high-speed interfaces
used to determine SAN function. In fact, many SANs today use a combination
of different interfaces. Currently, Fibre Channel serves as the de facto standard
in most SANs. Fibre Channel is an industry-standard interconnect and high-
performance serial I/O protocol that is media independent and supports
simultaneous transfer of many different protocols. Additionally, SCSI interfaces
are frequently used as sub-interfaces between internal components of SAN
members, such as between raw storage disks and a RAID (redundant array of
independent disks) controller.
In an effort to illustrate a few ways to utilize a SAN and the benefits to be had
let’s take, for example, an insurance agency with two locations, each with two
SANs: Location A has SAN 1 programmed to back up its internal operations each
hour. On SAN 2, backup runs for Location B. Location B mirrors this set up. If the
first SAN in location 2 fails, a simple DNS reroute will restore operations within
moments rather than risking several days of downtime while IT tries to remedy
the situation.
In a simplified example, a big box retail chain stores their inventory on Server A
and their transactions on Server B. With the SAN, the sales agent can call upon
both servers to analyze the supply on Server A and demand on Server B, all in real
time and directly from his personal computer.
While all of the aforementioned systems provide backup, various
backup software work better with a SANs. Imagine this scenario:
In a drive-thru, you order a cheese burger and you pull around to
the window, where they provide you with your order. If your order
is correct and timely, do you think twice about the process that
occurred inside? Probably not. The same theory applies to basic data
storage systems.
Conclus ionData storage and backup are complex issues, but they are also critically important.
As you explore storage options for your company’s valuable data, keep these
helpful guidelines in mind:
• Reevaluateyourbackupsoftwareannually. Ask yourself if it is still able
to meet your needs. Organizations that do not monitor data storage
are more likely to let crisis drive them toward an inefficient change.
© 2012 4imprint, Inc. All rights reserved
• Stayontopofyourbackupinfrastructure. Use three simple rules:
Match the class of software to the environment; keep your backup
software up to date; and continue to enhance the architecture as your
performance and capacity needs increase.
• Lookcloselyatdifferentvendors. When evaluating vendor offerings,
look to how they are employing agentless backup, storage level
snapshots, and APIs in the virtual infrastructure (such as VMware) for
fast, low overhead and virtual infrastructure backup.
• Leveragecapacity-basedlicensing. To this end, look to cost
justification, better data management and storage tiers. Some argue
that up to 70 percent of data subject to backup is unchanged and
should not be in primary storage, but rather in an archive. Capacity-
based licensing exposes the cost of backup by data volume, reducing
the volume and thus the cost of backup. Capacity licensing should also
incorporate some overhead for expected data growth.
Even if backup doesn’t seem like a pressing priority right now, you’ll
want to prepare sooner rather than later because backup isn’t
important until it fails. As they say, “There’s no time like the present.”
4imprint serves more than 100,000 businesses with innovative promotional items throughout the United States,
Canada, United Kingdom and Ireland. Its product offerings include giveaways, business gifts, personalized gifts,
embroidered apparel, promotional pens, travel mugs, tote bags, water bottles, Post-it Notes, custom calendars,
and many other promotional items. For additional information, log on to www.4imprint.com.