The IBM view on storage archive solutions

33
© 2012 IBM Corporation The IBM view on storage archive solutions: requirements to solve and trends for the future 31st ADLUG ANNUAL MEETING - Firenze, September 19-21st IBM Systems and Technology Group Marco Ceresoli Data Protection and Retention Sales Leader IBM Europe

Transcript of The IBM view on storage archive solutions

Page 1: The IBM view on storage archive solutions

© 2012 IBM Corporation

The IBM view on storage archive solutions: requirements to solve and trends for the future 31st ADLUG ANNUAL MEETING - Firenze, September 19-21st

IBM Systems and Technology Group

Marco CeresoliData Protection and Retention Sales LeaderIBM Europe

Page 2: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

2

Agenda

The growth and the variety of digital information The shift of market dynamics and trends for ArchivingTechnologies for data archiving: comparisonNew trends: Linear Tape File System value propositionRole and history of IBM in Tape technologyCase studies and conclusions

Page 3: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

3

Storage is growing… and not only in terms of capacity

Growth

Digital

Universe

2005150 ExaByte

(150 millions TB)

Growth

Digital

Universe

2011

1.800 ExaByte

(1,8 billionsTB)

• Volumes• Variety

•Velocity

Source: 2011 IDC Digital Universe Study

Page 4: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

4

Every day 15 PetaBytes of new information in digital format are created

80% of this new data is unstructured generated mainly by email, documents, images, video and audio.

EFFECTS…A company with 1,000 employees spend on

average 5,3M$ every year to search for information which is difficult to find.

42% of managers say that they utilize INCORRECT information at least once a week.

During 2007 in the USA there were 37.000 security breaches (cyber attacks) with an increment of 158% versus 2006.

More than 20.000 laws at global level require not only pure storage capacity but classification and Information lifecycle management.

Information Week, “State Of Enterprise Storage Changing Priorities, Changing Practices”, 2009.

Page 5: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

5

Smarter Systems Are Creating an Information Explosion

2005 2006 2007 2008 2009 2010 20110

200

400

600

800

1,000

1,200

1,400

1,600

1,800

Exab

ytes

RFID,Digital TV,

MP3 players,Digital cameras,

Camera phones, VoIP,Medical imaging, Laptops,

smart meters, multi-player games,Satellite images, GPS, ATMs, Scanners,

Sensors, Digital radio, DLP theaters, Telematics ,Peer-to-peer, Email, Instant messaging, Videoconferencing,

CAD/CAM, Toys, Industrial machines, Security systems, Appliances

Storage requirements growing 20-40% per year

Source:: Semantics, “Linked Data” guidelines, 2006.

Page 6: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

6

Changing Market Dynamics & Trends Value has Shifted toward Archiving Software– Shift from Hardware to Archiving Software for addressing compliance, data retention management

and lifecycle governance requirements– Email archiving and eDiscovery adding additional content types

Information Lifecycle Governance is needed– Clients understand they can no longer address data growth issues by adding more storage

Backup as Archive– Significant proportion (over 50%) of customers continue to use backups as archive copies for long

term retention

Industry Specific Archives– Healthcare & Life Sciences requirements for archival of Medical Images and Electronic Medical

Records– Government, Oil & Gas, and other industries demanding solutions specific to their needs– Cross-Industry requirements also rising (e.g., Compliance, retaining Surveillance data for long

periods of time) Cloud Based Archiving

– Hosted offerings replaced by clouds (e.g., for eDiscovery)– Shift in deployment models from ‘siloed’ on-premise installations to consolidated solutions, archive

as a service, and cloud archiving

Page 7: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

7

Significant growth expected in Digital Archiving Archival (Tier 3) data is:

– Fastest growing at 65% CAGR– Stored on Disk, Tape, and Optical Media– (Not captured in Tape IDC or GMV forecasts) Graph illustrates Active and Deep Archiving combined

Page 8: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

8

Why store data for long-term, and how?

Why I need to store for a long time?– Cultural and scientific vale – Value for the company

– More than 22.000 norms/laws at worldwide level to rule the data preservation

How to store this data?– Multi-level storage infrastructure

with different costs– Data reduction (compression and

data deduplication)– Automatic data management

based on archiving rules– Virtualization and independence

from the storage infrastructure – “anywhere” and “self-service”

accessibility cloud-oriented– Focus on storing documents and

data interconnections (metadata) together

Page 9: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

9

What to archive and how much time?

Which data needs to be stored? How long to store?

Source: SNIA – 100 Year Archive Requirements SurveySource: ESG - Requested Record Types During Electronic Discovery Processes

Page 10: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

10

You Might Think “Archiving” Means any of These…

Archive -- a long-term collection of data that typically is fixed-content data; i.e., no I/O writes are allowed to change the data.

Deep archiving – The original definition of archiving, whereby production data is written to another set of storage media (typically tape) and moved offsite while the original version is deleted (typically from disk).

Active archiving – Data for which frequency of access is active rather than inactive, while frequency of updating is nonexistent so the data is fixed (i.e., is unchanging) and not subject to I/O writes that could change the data.

Long-term archiving – Active archived data for which the frequency of access has fallen so low that a tier of more cost-effective storage may be an appropriate place to house the data.

Backup – a dated (i.e., specified-time) duplication of a designated set of data from a data source on one set of media (typically disk) to a backup set of media (either disk or tape)

Vaulting – Typically, the movement of data on tapes from a target site to a protected remote site.

Source of these definitions: Data Protection, David Hill, 2009, CRC Press

Page 11: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

11

Major Archive Segments

Unstructured Data (files)What? MS office, SharePoint, contracts,

images, etc. Why? Reduce storage growth, offer a

service or product, improve performance, lower cost, Compliance

Available products? IBM Content Collector, FileNet, Content Manager, etc.

eMail archiving, eDiscovery . What? email, but any other data type

potentially too Why? Litigation support, Compliance Available products? IBM Content Collector

with IBM disk storage

Structured Data (database archiving) What? Relational tables, rows, periodic

reports, retire applications Why? Reduce storage growth, improve

performance, lower cost, Compliance (reports)

Available products? IBM Optim with IBM disk storage

Unstructured Data (kept from birth)What? Medical Images, “Content” (M&E),

DVS, Seismic shots, ScientificWhy? Reduce storage growth, offer a

service or product, improve performance, lower cost

Available products? VAD Medical Archive solution or or LTFS/tape with an ISV app

Page 12: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

12

Technologies for data archiving and preservation Fault tolerance: redundancy, ECC, RAID(*), ... Data protection: “space-efficient” internal replication Disaster recovery: automated remote data replication Data immutability: NENR(*) e WORM(*)

Archiving and preservation rules: API(*) and standard interfaces Cost reduction: storage tiering, WORN(*)

Data growth reduction: data deduplication and data compression Data security: data encryption and data shredding Access control: tamper protection, audit logs, ...

(*) ECC = Error Correction Code,RAID = Redundant Array of Independent Disk, NENR = Non Erasable Non Rewritable, WORM = Write Once Read Many, API = Application Program Interface, WORN = Write Once Read Never

More than 50 years of continuous innovationMore than 50 years of continuous innovation

Page 13: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

13

Storage management at 360°: archiving, backup, migration, DR

Enterpriseclass

Mid-range Low-cost

NENR WORM NENR

AutomatedOff-line

ManualOff-line

Archiving andILM management

backupcopies

Migration to newtechnologies

Disaster protection

NENR/WORMstorage

Compression?De-duplication?

Encryption?

The processes can be automated and repeatedThe processes can be automated and repeated

Page 14: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

14

The IBM Smart Archive strategy

Collaborative(Quickr, SharePoint)

ERP / CRM …(SAP, PeopleSoft …)

Data

Content(Documents, Images …)

Reports

Paper Email(Notes, Exchange)

On Premise(Custom Config)

Appliance(Pre-Config)

As A Service(SaaS, Cloud Storage)

Cloud Ready Archive Storage with

Optional ECM

Value Added Services• Optimization Services• System Services• Managed Services• Reference Architecture• Information Governance

Optimized and Unified Assessment, Collection and Classification

Flexible and Secure Infrastructure with Unified Retention and Protection

Integrated Compliance, Records Management, Analytics and eDiscovery

Page 15: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

15

Long term data archiving: Total Cost

From: “In Search of the Long-Term Archiving Solution - Tape Delivers Significant TCO Advantage over Disk”, The Clipper Group, Dec.23, 2010.

Page 16: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

16

Long term data archiving: TCO and technology evolution

From: “In Search of the Long-Term Archiving Solution - Tape Delivers Significant TCO Advantage over Disk”, The Clipper Group, Dec.23, 2010.

Page 17: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

17

Tape Advantages for Archiving/Long-Term PreservationTape Disk

Source: Tape The Digital Curator of the Information Age. By Fred Moore, President, Horison, Inc.

Page 18: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

18

The annual rate of areal density increases for TAPE will likely exceed the annual rate of areal density increases for NAND and HDD

– TAPE bit cell is large and paths for scaling to higher bit densities exist– NAND bit cells and HDD Patterned Media bit cells are approaching nanoscale issues in minimum

feature lithography requirements– NAND bit endurance or bit retention and HDD bit stability are approaching

A Possible Annual Areal Density Growth Scenarios

– 20% for HDD – 20% to 30% for NAND Flash– 40% to 80% for TAPE

Implications for Storage: TAPE, NAND, and HDD will continue to

offer complementary storage solutions Implications for TAPE: TAPE volumetric

density will increase, enhancing its cost advantages

Technology Roadmap Comparisons for TAPE, HDD, and NAND Flash Outline : Implications for Data Storage Applications

Page 19: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

19

Annual Areal Density Growth Rate Scenarios

HDD – 20% to 25% – Transition to New Technology, Sensor Output, Lithography NAND Flash – 25% to 30% – Lithography and Endurance TAPE – 40% to 80% -- No Lithography Issues, Mechanical Realities

2002 2004 2006 2008 2010 2012 2014 2016 2018

10000

1000

100

10

1

0.1

AR

EAL

DEN

SITY

(Gbi

t/in²

)

YEAR

HDD ProductsNAND ProductsTAPE Products

40%/yr

NAND

HDD

TAPE

40%/yr

20%/yr

80%/yr

40%/yr

Page 20: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

20

Cost evolution of the magnetic storage

Source: IBM elaboration and Information Storage Industry Consortium (INSIC) – 2008

SSDSSD~6-10X~6-10X

Page 21: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

21

Magnetic Tape The cheaper storage support of the hierarchy Most used for long term archiving purposes LTO (Linear Tape Open) standard: Fifth generation available today with 1,5TB cartridges (3TB

compressed) January 2010: the IBM Zurich Research Laboratory performed a technology demonstration of a

35TB cartridge(1) . Today they are working on a technology demo of a 100TB cartridge.

http://lto.org/technology/roadmap.html

(1) http://www.ibm.com/press/us/en/pressrelease/29245.wss

Page 22: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

22

Rich Media Driving New Storage Requirements

Characteristics of data stored is changing– Mix of traditional business data (ie. transactional, docs,

email, databases, and backup of those assets) vs “rich media” (ie. video, images, digitized content, etc) is rapidly changing

© 2011 IBM Corporation3

• IBM logo must not be moved, added to, or altered inany way.

• Background shouldnot be modified,

• Title/subtitle/confidentiality line: 10pt Arial Regular, whiteMaximum length: 1 line

Information separated by vertical strokes,with two spaces on either side

• Slide heading:28pt Arial Regular, blue R120 | G137 | B251

Maximum length: 2 lines

• Slide body:18pt Arial Regular, black

Square bullet color:teal R045 | G182 | B179

Recommended maximum text length: 5 principal points

• Group name:14pt Arial Regular, white

Maximum length: 1 line

• Copyright: 10pt ArialRegular, white

Optional slide number: 10pt Arial Bold, white

Template release: Oct 02For the latest, go to http://w3.ibm.com/ibm/presentations

Indications in green = Live content

Indications in white = Edit in master

Indications in blue = Locked elements

Indications in black = Optional elements

IBM and BP Internal Use

Smarter Systems Are Creating an Information ExplosionEspecially in Media and Entertainment (M&E)

2005 2006 2007 2008 2009 2010 20110

200

400

600

800

1,000

1,200

1,400

1,600

1,800

Exab

ytes RFID,

Digital TV,MP3 players,

Digital cameras,Camera phones, VoIP,

Medical imaging, Laptops,smart meters, multi-player games,

Satellite images, GPS, ATMs, Scanners,Sensors, Digital radio, DLP theaters, Telematics,

Peer-to-peer, Email, Instant messaging, Videoconferencing,CAD/CAM, Toys, Industrial machines, Security systems, Appliances

Storage requirements growing 20-40% per year

Source:: Semantics, “Linked Data” guidelines, 2006.

Video, images, etc. a major factor driving growth

Access & asset management profiles of rich media are significantly different from traditional business data

– Much of traditional business data stored is a cost centerRegulatory, compliance, disaster recovery for business critical data and

processes– Rich media is primarily stored for monetization purposes

Production archives and asset protection Repurposing content and distributionLong term archives to monetize assets

– BW changes everythingaccess to/from content, business motivation to make content available

Eg. Key to M&E industry move to digital workflows

Page 23: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

23

Self-Describing cartridge– Remove requirement to commit long term to tape software application– Content protection in event of database corruption or loss

Improve content interchange/distribution– Eliminate need for common tape software across enterprise and/or interchange locations– Reduce cost of data interchange

Partial Recall– Eliminate time penalty in moving large video content to tape in event of need small part of video

content (ie. Goal in game) Improved Tier management of content

– Ease complexity in movement from Tier 1 (disk) to Tier 2 (online tape) and Tier 3 (archive)– Improve data import/export to system management

$/GB, Power– Reduce cost of digital storage – power and $/min

Open Standards– Large diverse infrastructure requires open standard – Standard/support of MXF video

Long Term Content Archive Life– Archive life desire for 50-100 years

Elements to address new role of TAPE

Page 24: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

24

LTFS Value Proposition

Digital archives need and want the Value Proposition of Tape:– $/GB – lowest cost storage– Watt/GB – green storage– Portability – ability to manage archive outside system– Scalability – easy to add additional storage (ie. buy cartridge)– Investment protection – LTO has an 8 generation roadmap (up to a 32TB cartridge (compr.))

But - Inhibitors to use tape:– Proprietary tape applications require long term commitment and support of tape application to

maintain archive– Non-self describing data formats requiring centralized archive database to recover content on

individual tapes– Import/export & distribution of tapes in archive is difficult due to proprietary tape applications

Solution: LTFS addresses the inhibitors and unlocks value proposition of tape for digital archives– Open, non-proprietary tape format– Self-describing data structure on cartridge– File system support on Linux, Mac, Windows provides:– Distribution and cross platform interchange– Enables transition to integrated file based tape/disk storage systems

Page 25: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

25

Introduction to LTFS (Linear Tape File System)

IBM Linear Tape File System is:1. Open Format for data which is written to tape

Describes the format of data and meta data stored on tapeMeta data is based on XML schemaDeveloped and disclosed by IBMApplicable to LTO-5 and Jag-4

Requires tape partitioning

2. File System support (code) to R/W tapes in LTFS format externalizes the LTO-5 tape as file system

Enables standard applications to write/read LTFS tapesSupports update, edit, delete of files on LTFS tape Supports partial recall

Available on Linux, Mac OS X and Windows

Engineering EMMYAward – Oct 2011

Page 26: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

26

Logical View of LTFS Volume

BOT

EOT

Index Partition

Data Partition

Guard WrapsLTFS Index XML

File File File

File

LTFS utilizes media partitioning (new to LTO Gen 5 and Jag 4) The tape is logically divided “lengthwise”

• (think C: & D: drives on single hard disk unit)

LTFS places the index on one partition and data on the other

Page 27: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

27

IBM : 60 Years of Tape Innovation

2010TS7610

TS7680

2008TS1130(3592 G3)

1984IBM 34801st cartridge drive

1964IBM 21041st read/back drive

1959IBM 7291st read/write drive

1952IBM 7261st magnetic tape drive

20033592 Gen1

1995IBM 3590

1999IBM 3590E

2005TS1120(3592 G2)

2004LTO Gen3

2002LTO Gen2

2000LTO Gen1

2007LTO Gen4

1962IBM Tractor System

1992IBM 3495

1997VTS G1

2000TS3500

1994IBM 3494

1999VTS G2

2001VTS G3

2006TS7740 (VTS Gen 4)

2005TS7510 VTL

2007TS7520

2007TS3400

2005TS3200TS3300

2007TS7530

2008TS2900TS3500High Density

2008TS7720

2008TS7650G

2009TS7650Appliance

2008TS1130(3592 G3)

1984IBM 34801st cartridge drive

1964IBM 21041st read/back drive

1959IBM 7291st read/write drive

1952IBM 7261st magnetic tape drive

20033592 Gen1

1995IBM 3590

1999IBM 3590E

2005TS1120(3592 G2)

2004LTO Gen3

2002LTO Gen2

2000LTO Gen1

2007LTO Gen4

In tape automation and virtualization1992IBM 3495

1997VTS G1

2000TS3500

1994IBM 3494

1999VTS G2

2001VTS G3

2006TS7740 (VTS Gen 4)

2005TS7510 VTL

2007TS7520

2007TS3400

2005TS3200TS3300

2007TS7530

2008TS2900TS3500High Density

2008TS7720

2008TS7650G

19743850 MSS

2009TS7650Appliance

In tape drive technology2010LTO Gen5

2011TS1140

(3592 G4)

2011TS3500

Connector & Shuttle

2011TS7740

TS7720

Page 28: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

28

LTO Roadmap

http://ultrium.com/technology/roadmap.html

Page 29: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

29

And data deduplication is the key to using more disk more cost effectively!

Page 30: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

30

Scalable Capacity and Performance

Better Performance

Larger Capacity

Scalable

Up to 500 MB/secUp to 500 MB/sec

7 TB to 36 TB 7 TB to 36 TB Useable CapacityUseable Capacity

IBM ProtecTIER® Deduplication Family

Highest PerformanceHighest PerformanceLargest CapacityLargest CapacityHigh AvailabilityHigh Availability

Up to 2800 MB/secUp to 2800 MB/sec

Up to 1 PB Up to 1 PB Useable CapacityUseable Capacity

TS7650G & TS7680 TS7650G & TS7680 ProtecTIER ProtecTIER GatewaysGatewaysTS7650 TS7650

ProtecTIER ProtecTIER AppliancesAppliances

TS7620 TS7620 ProtecTIER ProtecTIER Appliance Appliance ExpressExpress

Up to 145 MB/secUp to 145 MB/sec

5.5 TB and 11 TB 5.5 TB and 11 TB

Useable CapacityUseable Capacity

Good PerformanceGood Performance

Entry LevelEntry Level

Easy to InstallEasy to Install

Page 31: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

31

• During year 2000 IBM and KB projected and implemented a digital data preservation system called DIAS (Digital Information Archiving System).

• DIAS is the solution for the archiving and preservation of the multimedia and electronic digital-format documents.

• DIAS is compliant to the OAIS(1) standards related to the “logical” and “physical preservation”.

• IBM realized the DIAS solution using standard software components of general usage: WebSphere, DB2, Tivoli Storage Manager and Content Manager. IBM DIAS - Digital Information Archiving System

Koninklijke BibliotheekNational Library of the Netherlands

Ingest

Preservation

DataManagement Access

Archival Storage

Delivery&

Capture

Packaging&

Delivery

Administration Monitoring & Logging

Data Data

Query

AIP

SIP DIP

SIP DIP

AIP

(1) OAIS: http://public.ccsds.org/publications/archive/650x0b1.pdf Koninklijke Bibliotheek: http://www.kb.nl/dnp/e-depot/e-depot-en.html

Page 32: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

32

Ecosystem: Thought Equity MotionSports Video Archiving in the Cloud

Challenges• Low cost delivery platform for enterprise scale Video

Supply Chain as a Service • Information growth of ~100 TB per month• Easy self-serve access required by clients

Solution• IBM LTFS at several global locations, including some client

facilities • IBM System Storage® TS3200 Tape Library, LTO®-5 tape

drives

Benefits• Opened up new business opportunities• Enabled more predictable and transparent pricing for

clients• Portable, interoperable, scalable, cost-effective data

protection and long-term storage

‘LTO 5 and LTFS significantly reduce the ancillary costs around storage. This is a real game-changer from IBM’

Mark LemmonsCTO, Thought Equity Motion

TSP03327-USEN-00TEM with LTFS on Youtube: TEM with LTFS on Youtube: http://www.youtube.com/watch?v=M7w0jrkQnj4

Page 33: The IBM view on storage archive solutions

© 2012 IBM Corporation

IBM Systems and Technology Group

33

Thank you for your attention!