A Guide to Managing Company Datainfo.4imprint.com/wp...Managing-your-companys-data.pdfA guide to...

4imprint.com

A Guide to Managing

Company Data

© 2012 4imprint, Inc. All rights reserved

Contigency planning:A guide to managing your company’s dataManaging company data may not seem like a critical part of your day-to-day

operations, but the day you lose it will certainly alter your perspective—and

perhaps the very trajectory and success of your business.

It doesn’t matter if your company is goods or service-oriented—information is the

cornerstone of all that you do, no matter what you do. From your daily and yearly

sales figures to vendor invoices and other expenses, important tax information

and details on your employees, it’s probably all in one place: your computer or

your company’s server. Have you ever wondered what would happen if that all

disappeared one day?

Consider the case of the disgruntled employee at an

architectural firm who suspected she was about to

be fired. To exact revenge, she settled on sabotage as

her weapon of choice and deleted a network of files

valued at over $2.5 million. The data was eventually

retrieved, albeit utilizing a very costly recovery

service.1

Then there’s the instance of a pet store chain that housed all of its operational

data in a stand-alone database hosted on its website. An outside Web developer,

attempting to clean up unnecessary coding on their site, accidentally deleted all

business records with one simple keystroke. Without backup, the pet store chain’s

entire inventory, point of sale transaction data and human resource information

were lost. The company never recovered and eventually filed for bankruptcy in

the same year.2

There are more anecdotal horror stories where those came from, but enough of

the bad news. The point is to learn from these unfortunate occurrences, prepare

in advance and prevent any kind of data loss from affecting your business.

This Blue Paper™ identifies the importance of safe data storage and makes the

case for a strong data backup strategy. We will start with a summary of the

evolution of data. Next, we will expound on the different kinds of data that exist

as well as a few basic ideas to consider before launching a storage strategy of

your own. Then, we’ll discuss the likelihood of failure and the proper questions

1 “Angry Employee Deletes All of Company’s Data | Fox News.” Fox News. FOX News Network, 24 Jan. 2008. Web. 14 Aug. 2012. <http://www.foxnews.com/story/0,2933,325285,00.html>.

2 Papadimoulis, Alex. “Death by Delete.” Redmond Developer News. 1105 Media Inc., 1 Jan. 2009. Web. 14 Aug. 2012. <http://reddevnews.com/articles/2009/01/01/death-by-delete.aspx>.


to ask when planning data recovery. Finally, we’ll close with a handful of device

definitions on the most common kinds of data backup software and systems.

Let’s begin!

The evolut ion of dataData is increasing exponentially. In fact, it is estimated that 90 percent of existing

digital data has been created within the last two years.3 While Facebook® and

YouTube® are contributing factors, much of this growth can also be attributed

to big data. Big data refers to colossal databases used to draw conclusions that

are not otherwise obvious. Not only does that mean typical computer activity

like how many times “deli in Manhattan” is searched on Bing® or the number of

Twitter® posts on car insurance, but this also includes daily activities like passing

through a toll booth, duration of a cell phone call and

tracking purchases on a store credit card.

These are awe-inspiring feats when you consider the

humble beginnings of modern computing. While the

invention of the computer is a little difficult to pinpoint,

modern processing is really a child of the 1970s, born

out of decks of programming punch cards.4 Technology

advanced rapidly thereafter and the cost of data storage

dropped dramatically, which accounts for the profound

growth since. Here’s a chronological overview of the major data milestones

throughout the years:

• 1980s: In January 1980, the cost of storing 1GB of data was $193,000.5

However, with the introduction of the floppy disk (1.4MB capacity) in

1981, followed by the CD (700MB capacity) just one year later, data

storage costs fell considerably. 6

• 1990s: In September 1990, the cost of storing 1GB of data dropped to

$9,0007—a 95.34 percent decrease within a decade. In the 90s, advances

in technology brought about extended capacity CDs and DVDs (able

to store up to 4.7GB) and then flash drives, which were capable of

housing between 4MB and 256GB.8

3 “IBM What Is Big Data? - Bringing Big Data to the Enterprise.” What Is Big Data? IBM, n.d. Web. 16 Aug. 2012. <http://www-01.ibm.com/software/data/bigdata/>.

4 Kopplin, John. “An Illustrated History of Computers - Part 2.” Computer Science Lab. N.p., n.d. Web. 16 Aug. 2012. <http://www.computersciencelab.com/ComputerHistory/HistoryPt2.htm>.

5 Komorowski, Matt. “A History of Storage Costs.” Mkomo.com. N.p., n.d. Web. 16 Aug. 2012. <http://www.mkomo.com/cost-per-gigabyte>.

6 A Brief History of Digital Data. Prod. Viet Huynh. YouTube. Sweat & Pixels Design Studio, 22 Sept. 2011. Web. 16 Aug. 2012. <http://www.youtube.com/watch?v=ah14LEFKe8Q>.

7 Ibid.8 Ibid.


• 2000s: In February 2000, the cost of storing 1GB of data dropped again

to $19.709—another 99.78 percent decrease from the previous decade.

In addition to USBs storing between 8MB-256GB, optical formats now

included BluRay® discs with 25GB storage capacity.10 In July 2009, it cost

only $0.07 to house 1GB of data.11 That’s when big data truly began to

flourish due to the relatively low cost to house expansive databases.

Now that we’ve had a brief history lesson on the evolutionary cost of data, let’s

talk about your data and what you need to know to store it safely.

Data management plansThere are six factors that apply to almost every industry that should be adequately

planned for: data growth and the corresponding cost, server space and data

security, peak time and upgrades. Naturally, each organization will have specific

data sets that apply only to their company or industry, but let’s take a general

look at what goes into a data management plan (DMP).

Types of data and the bas ics of data storage strategyUnderstanding the types of files your server holds is the basis for formulating an

effective data management plan. This analysis will also help you plan for growth,

as well as store your data more efficiently.

The first tactic to employ in developing a data storage strategy is data

classification. There is software you can use—like the F5® Data Manager—to paint

a concise picture of the contents on your server. Data analysis and mapping give

you a more in-depth look at:

• What file formats are being created

• Who is creating them

• How old they are

• And how much storage capacity each file consumes

Below are four components worth considering before determining the data

storage strategy that’s right for you. Each is explained and then followed by a

series of questions worth asking and understanding before you make any hard

and fast decisions. For further clarification, refer to an IT professional.

9 Ibid.10 Ibid.11 Ibid.


1.Metadatastandardsanddataprovenance

Metadata provides structured information explaining such details as the

purpose, origin, geographic location, access conditions, and terms of use

of a data collection. To put this into context, files without metadata are

like a library without a card catalogue. Here are a few questions worth

considering when setting your metadata plan in motion:12

• Which metadata standards will you use?

• Why have you chosen them?

• How will you record these details?

• What information is needed to make the

data you collect meaningful to others?

• Likewise, what information do you need to

make that data reusable?

2.Provisionsforprivacy,confidentialityandlicensing

You should first explain how and when the data will become available. If

there is an embargo period for sharing the data, make sure you provide

details explaining the delay. If the data is sensitive in nature—if, for

example, it contains health-related privacy issues or competitive analysis

insight—and public access is inappropriate, address the means by which

you plan to control access. For instance:

• Who will hold the intellectual property rights to the data?

• How long will the original data creator/principal investigator

retain the right to use the data before making it available for

wider distribution?

• Are there any embargo periods for political or commercial

patent reasons? If so, what are the details?

• Describe any permission restrictions that will need to be placed

on the data.

• Are there ethical or privacy issues? If so, how will these be

resolved?

12 Higgins, Sarah. “What Are Metadata Standards.” What Are Metadata Standards | Digital Curation Centre. Digital Curation Centre, n.d. Web. 22 Aug. 2012. <http://www.dcc.ac.uk/resources/briefing-papers/standards-watch-papers/what-are-metadata-standards>.


• If you have approval from the U.S. Department of Health and

Human Services (HHS) Institutional Review Board (IRB), or are in

the process of applying for it, how will you comply with those

obligations?

3.Policiesfordataaccessduringandafteryourproject

Think about how you prepare and manage your data for sharing and

explain how you will actively share your data with non-group members

after the project is complete. You should explain how and where the

data will be accessible as well as identify who will be allowed to use it,

how they will be allowed to utilize it and whether or not they will be

allowed to disseminate it. Think about some of these questions:

• Will your data be accessible?

• How will you make it available? Include resources like

necessary equipment and systems needed to do that.

• What is its intended use?

• Who are its intended users?

• If permission restrictions exist, what is the process for gaining

access to the data?

• Explain how you will store data during the project’s lifetime.

• How you will archive that data?

• If applicable, how will you transfer or transmit that data?

4.Plansforarchivingandpreservation

To archive data is to move less important information from an active

storage device to a less-used storage device for basic retention purposes.

This eases the capacity and enhances the performance of the first, more

active device. In terms of data archival, there are many subject-specific

data repositories, all of which could serve as an archiving option for your

data. But first, ask:

• How long should data be kept beyond the life of the project?

• What data will be preserved in the long-term?

• Which database have you identified as a place to deposit the

data?


• What is the long-term strategy for maintaining and curating

your data?

• What procedures does your intended long-term data storage

facility have in place for preservation and backup?

• Are there any conversions necessary to prepare data for

preservation or data sharing?

What you save and how you save it are directly linked. So be sure to have a solid

understanding of the kinds of files and documents and information formats are

saved on your computer or server. That way, you’ll know just what it will take to

properly save and store your data.

Backup fa i lure: I t happensSomeone once remarked that “there are only two types of hard drives—the ones

that have failed and the ones that will fail.”13 This adequately describes backup

devices: Even though hard drives are not a living organism, they have a definitive

life span and each one will eventually die. According to a study conducted by

Pepperdine University®, here are the most prevalent reasons for failure:14

1. Hardware failure - 40%

2. Human error - 29%

3. Software corruption - 13%

4. Theft - 9%

5. Computer viruses - 6%

6. Hardware destruction - 3%

Whether it’s hardware failure or human error, failure happens. Unfortunately,

lost data cannot be saved by implementing a backup system after it’s gone. Plan

appropriately because data backup failure is not uncommon.

13 “Backing up Data - Why You Need to Do It.” Backing up Data - Why You Need to Do It | PC 911. PC911, 28 Feb. 2011. Web. 17 Sept. 2012. <http://pcnineoneone.com/howto/backup1/>.

14 Smith, David M. “Graziadio Business Review | Graziadio School of Business and Management | Pepperdine University.” The Cost of Lost Data - Graziado Business Review | Graziado School of Business and Management | Pepperdine University. Pepperdine University, 2003. Web. 17 Sept. 2012. <http://gbr.pepperdine.edu/2010/08/the-cost-of-lost-data/>.


So what happens to your business when your data backup fails? Well, in the same

study by Pepperdine University, “a company that experiences a computer outage

lasting for more than 10 days will never fully recover financially.”15 Worse still is

that half of companies that endure such a dilemma will likely be out of business

within five years.

Hard to believe? Well, computer-stored data, though intangible, is worth a

great deal. Value of data lost is determined by its primary utility and frequency

of use, both of which are specific to the business that lost it. Take a moment to

think about the “price” of your data. To do that, you might first think of what

capabilities you would lose if you lost your data.

A lot of them. Maybe even all of them.

Could you function without them? Probably not.

Next step: RecoveryData can be lost in a natural disaster like a flood or fire

or it can be physically stolen if someone takes the computer or primary storage

device. Data can also be lost in a power failure or power surge. To be smart,

implement a few preventative measures in case a backup failure occurs. But first

there are four main items to remember in your quest to recover lost data:

1. Restoretimeobjectives(RTO) refers to the amount of time your

organization needs to recover from a data loss. Many organizations

have multiple RTO’s. For example, one RTO may specify how long

before the major functions of the enterprise are back online while

a second, longer RTO determines how long until everything is fully

recovered.16

2. Restorepointobjectives(RPO) is the maximum length of time you

can do without data. Or rather, how quickly do you want or need it

restored? Like the RTO, the RPO is often assigned critical functions

such as transaction processing. Having a short RPO means having less

immediate functions and recovering to a point further back in time. It

can be anywhere from a few seconds in the case of a sophisticated (and

expensive) remote mirroring system, to several hours, or even several

days for less critical data.

15 Ibid.16 Cook, Rick. “Set Disaster-recovery Objectives.” Set Disaster-recovery Objectives. SearchStorage, n.d. Web. 22

Aug. 2012. <http://searchstorage.techtarget.com/tip/Set-disaster-recovery-objectives>.


3. Networkrecoveryobjective(NRO) is the time needed to recover

network operations, specifically, how long before you appear recovered

to your customers? It includes such jobs as establishing alternate

communications links, reconfiguring Internet servers, setting alternate

TCP/IP addresses and everything else to make the recovery transparent

to customers, remote users and others.

4. Restoregranularityobjectives(RGO) refers to the level of objects that

can be easily recovered (e.g. a file, email, directory, hard drive, full

system image, etc.).

However you lose it, the majority of cases—83 percent—can be recovered. You’ve been warned, though: Recovery can be an expensive operation.17

Device def in i t ionsMost sources available for data storage fail to recognize that in many

organizations, not everyone responsible for IT is necessarily an IT professional.

This is especially true for small businesses where most employees wear multiple

hats. So when it comes to data storage, there are a handful of terms and device

definitions to be familiar with in case data is lost and needs to be restored. Here

are some basic storage hardware configurations to know:

Remotemirroringsystems18

One of the most basic tools for the purposes of data storage and backup is known

as a remote mirroring system (See also: cloud storage.) As its name implies,

it generates a mirror image of the data on one or more disks located locally

or remotely. It functions in real time so as to provide the most current critical

business data accessible via duplicate disks. Information stored on them can be

used for substitution in case of an emergency or be used to facilitate

data migration.

Diskarray

A disk array is a kind of storage system that links multiple hard drives

into one big drive. Disk arrays organize data into something called

logical units (LU).19 To the client, these look like blocks. Small arrays

with only a few disks can store eight LU while larger arrays with hundreds of disks

can store thousands of LU.20

17 Ibid.18 Larsen, Brian. “Disk Mirroring - Local or Remote.” Disk Mirroring - Local or Remote - InfoManagement Direct

Article. InfoManagement Direct, 1 Dec. 2003. Web. 18 Sept. 2012. <http://www.information-management.com/infodirect/20031212/7861-1.html>.

19 “What Is Disk Array?” What Is Disk Array? - A Word Definition From the Webopedia Computer Dictionary. Webopedia, n.d. Web. 17 Sept. 2012. <http://www.webopedia.com/TERM/D/disk_array.html>.

20 Ibid.


The most common kind of disk array is a Redundant Array of Independent Disks

(RAID). The advantage of RAID backup lies in its name: Redundancy implies its

ability to write and store data to multiple locations in case a file is damaged

or stored in a bad cluster. If that’s the case, it is instantaneously rewritten on

another disk in the array, which increases overall storage performance.21 This kind

of configuration is particularly useful for organizations with servers laden with

multimedia-heavy data.22

In case you’re unfamiliar with this term, perhaps you know it as a “drive array”

or “storage array,” which generally mean magnetic or solid state disks. These are

two or more disk drives built into a stand-alone unit, typically using some RAID

configuration (see RAID). However, optical drives (CD, DVD, etc.) also come in

multi-drive units (see optical disc library). See SAN, NAS and server farm.23

Directattachedstorage(DAS)

Direct attached storage involves a direct connection to the server,

either through the use of an internal server disk controller or an

external storage subsystem.24 DAS systems are recognized for their ease

of management, generally low operating costs and overall simplicity.

However, one drawback of using DAS is that it creates information

isolation, meaning that the information is inaccessible from other

servers. Small businesses may see this as only slightly problematic

whereas larger businesses, not being able to access data may become a

serious problem.

Networkattachedstorage(NAS)

As it implies, NAS is storage attached to the common network via Ethernet. It

is essentially a file server that often integrates an optimized operating system

dedicated to file sharing. This means that all processing is done locally at the

client’s request. Besides its reputation for easy installation, another major benefit

to NAS is solving the compatibility issue with Microsoft®’s Windows platform and

UNIX, allowing file access without additional software.

To give this acronym more context, Western Digital®’s WD Sentinel DX4000 is

a prime example of a NAS device designed for small businesses. As with most

devices, installation is as simple as plug and play, which initializes the automatic

21 “RAID - Redundant Array of Independent Disks.” What Is RAID (Redundant Array of Independent Disks)? A Webopedia.com IT DefinitionWebopedia. Webopedia, n.d. Web. 17 Sept. 2012. <http://www.webopedia.com/TERM/R/RAID.html>.

22 Kayne, R., and Niki Foster. “What Are Disk Arrays?” WiseGeek. Conjecture, 11 July 2012. Web. 17 Sept. 2012. <http://www.wisegeek.com/what-are-disk-arrays.htm>.

23 “Encyclopedia.” Disk Array Definition from PC Magazine Encyclopedia. PC Magazine, n.d. Web. 22 Aug. 2012. <http://www.pcmag.com/encyclopedia_term/0,1237,t=hard+disk+array&i=41489,00.asp>.

24 Parwar, Ashwin. “Understanding Storage Basics - DAS-NAS-SAN.” Understanding Storage Basics - DAS-NAS-SAN. WizIQ, n.d. Web. 22 Aug. 2012. <http://www.wiziq.com/tutorial/74910-Understanding-Storage-Basics-DAS-NAS-SAN>.


system configuration. On the users’ end, setting user preferences is the final task.

The major drawback for employing a NAS, however, is its performance. It provides

file-level input/output (I/O) via traditional file shares, while DAS and SAN provide

block-level I/O. If your eyes are already glazing over, you’re not alone.

When thinking of file vs. block access, let’s look at it from another perspective:

File sharing is like reading a classic novel. You have an in-depth view of the

characters, the landscape and the plot. You can revisit each section and draw

deeper conclusions. Conversely, block sharing is similar to the CliffsNotes®

version—you still get useable information, albeit not as complete. Block data is

suitable for images or other large files that are not altered often while file access

is most appropriate for documents requiring change more regularly.

Storageareanetwork(SAN)

Storage Area Networks are designed to be accessible by multiple

servers, just as local area networks (LAN) connect a server to

multiple computers.25 Unlike a DAS or NAS, all of which contain a

single piece of hardware, SANs are built from multiple hardware

components. These components—hubs, switches, bridges, Small

Computer System Interface (SCSI)—are typically connected by a Fibre Channel.

If an Ethernet cable is like a straw pulling information off the network, a Fibre

Channel is like an oil pipeline for information.

These hardware components play a role in three areas: redundancy, speed and

volume. Switches and hubs generally do the same thing. Like the post office,

both process incoming information—or mail. Switches take that information

and quickly deliver it to a specific location—or mailbox. Hubs, however, aren’t

as discerning. Imagine a small apartment building where the mail is left in the

lobby in bulk. Each tenant must sort the mail and determine what is addressed

to them, creating a time-consuming redundancy in the analysis. Both have their

advantages, but hubs operate best with small enterprises, whereas switches are

for more data-intense operations. Referring back to what type of data is being

produced by the organization will help determine which components will be most

beneficial.26

From availability, reliability, scalability, performance, manageability, and return on

information management, SANs have many advantages. 27

25 “SAN.” SAN (Storage Area Network) Definition. TechTerms.com, n.d. Web. 22 Aug. 2012. <http://www.techterms.com/definition/san>.

26 “SAN Tutorial.” Manhattan Skyline GmbH, n.d. Web. 11 Sept. 2012. <http://www.mskl.de/CONTENT/PDF/SAN_Tutorial.pdf>.

27 “Storage Area Networks.” AllSAN.com - All about Storage Area Network. AllSAN.com, n.d. Web. 22 Aug. 2012. <http://allsan.com/sanoverview.php3>.


As we already stated, NAS operates with file level access, whereas DAS and SAN

are block level, but there are several different types of high-speed interfaces

used to determine SAN function. In fact, many SANs today use a combination

of different interfaces. Currently, Fibre Channel serves as the de facto standard

in most SANs. Fibre Channel is an industry-standard interconnect and high-

performance serial I/O protocol that is media independent and supports

simultaneous transfer of many different protocols. Additionally, SCSI interfaces

are frequently used as sub-interfaces between internal components of SAN

members, such as between raw storage disks and a RAID (redundant array of

independent disks) controller.

In an effort to illustrate a few ways to utilize a SAN and the benefits to be had

let’s take, for example, an insurance agency with two locations, each with two

SANs: Location A has SAN 1 programmed to back up its internal operations each

hour. On SAN 2, backup runs for Location B. Location B mirrors this set up. If the

first SAN in location 2 fails, a simple DNS reroute will restore operations within

moments rather than risking several days of downtime while IT tries to remedy

the situation.

In a simplified example, a big box retail chain stores their inventory on Server A

and their transactions on Server B. With the SAN, the sales agent can call upon

both servers to analyze the supply on Server A and demand on Server B, all in real

time and directly from his personal computer.

While all of the aforementioned systems provide backup, various

backup software work better with a SANs. Imagine this scenario:

In a drive-thru, you order a cheese burger and you pull around to

the window, where they provide you with your order. If your order

is correct and timely, do you think twice about the process that

occurred inside? Probably not. The same theory applies to basic data

storage systems.

Conclus ionData storage and backup are complex issues, but they are also critically important.

As you explore storage options for your company’s valuable data, keep these

helpful guidelines in mind:

• Reevaluateyourbackupsoftwareannually. Ask yourself if it is still able

to meet your needs. Organizations that do not monitor data storage

are more likely to let crisis drive them toward an inefficient change.


• Stayontopofyourbackupinfrastructure. Use three simple rules:

Match the class of software to the environment; keep your backup

software up to date; and continue to enhance the architecture as your

performance and capacity needs increase.

• Lookcloselyatdifferentvendors. When evaluating vendor offerings,

look to how they are employing agentless backup, storage level

snapshots, and APIs in the virtual infrastructure (such as VMware) for

fast, low overhead and virtual infrastructure backup.

• Leveragecapacity-basedlicensing. To this end, look to cost

justification, better data management and storage tiers. Some argue

that up to 70 percent of data subject to backup is unchanged and

should not be in primary storage, but rather in an archive. Capacity-

based licensing exposes the cost of backup by data volume, reducing

the volume and thus the cost of backup. Capacity licensing should also

incorporate some overhead for expected data growth.

Even if backup doesn’t seem like a pressing priority right now, you’ll

want to prepare sooner rather than later because backup isn’t

important until it fails. As they say, “There’s no time like the present.”

4imprint serves more than 100,000 businesses with innovative promotional items throughout the United States,

Canada, United Kingdom and Ireland. Its product offerings include giveaways, business gifts, personalized gifts,

embroidered apparel, promotional pens, travel mugs, tote bags, water bottles, Post-it Notes, custom calendars,

and many other promotional items. For additional information, log on to www.4imprint.com.

A Guide to Managing Company Datainfo.4imprint.com/wp...Managing-your-companys-data.pdfA guide to...

Documents

Transcript of A Guide to Managing Company Datainfo.4imprint.com/wp...Managing-your-companys-data.pdfA guide to...