Protecting VMware Data Off-site “Tape vs. Cloud Options” Bill Evans, Arkeia Software “Case...

35
Protecting VMware Data Off- site “Tape vs. Cloud Options” Bill Evans, Arkeia Software “Case Study from University of Chicago” Tom Indelli, Senior System Administrator

Transcript of Protecting VMware Data Off-site “Tape vs. Cloud Options” Bill Evans, Arkeia Software “Case...

Protecting VMware Data Off-site

“Tape vs. Cloud Options”

Bill Evans, Arkeia Software

“Case Study from University of Chicago”

Tom Indelli, Senior System Administrator

• Causes of Data Loss• Strategies for Data Protection

– Replication– Backup

Data Loss and Data Protection

© 2011 Arkeia Software. All rights reserved. Page 2

Causes of Data Loss

Source: Kroll Ontrack Inc., 2011

© 2011 Arkeia Software. All rights reserved. Page 3

Data Protection Strategy #1: Replication

► Replication – Additional copy of current data (files, images, objects)

► Replication Options– Disk (or RAID)

– Synchronous • Expensive: data replicated before transaction completes

– Asynchronous• Less expensive: data replication lags behind

► Replication Benefits– Offsite data storage

– Immediate failover© 2011 Arkeia Software. All rights reserved. Page 4

Replication FaithfullyCopies All Errors; Over 50%Of Data Loss is Unprotected

ProtectedUnprotected

Data Protection Strategy #2: Backup

► Backup– Multiple Point-in-time “Restore Points”

► Backup Options– Tape or Disk or Cloud

– Hourly, Daily, Weekly, Monthly, Quarterly, Yearly

► Backup Benefits– Recovery to time in the past

– Offsite data storage

© 2011 Arkeia Software. All rights reserved. Page 5

Backup Requirements

► Secure– Off-site

– Off-line

► Frequent Restore Points– Restore Point Objectives (RPO) to minimize data loss

-1 -2 -7 -14 -30 -60 -180 -365 days

► Rapid Restore Time– Restore Time Objectives (RTO) to minimize down-time

hours +4 +3 +2 +1 +0.5 +0.1

© 2011 Arkeia Software. All rights reserved. Page 6

Off-site Storage

► How to choose?

– Costs

• Fixed

• Variable

– Backup window

– Time-to-restore (RTO)

– Reliability

– Convenience

Backup Agent

Backup Agent

Backup Agent

WAN Backup Server

Backup Agent

Backup Server

Backup Agent

Backup Agent

Backup Agent

Backup Server

Backup Agent

Backup Agent

WAN Backup Server

© 2011 Arkeia Software. All rights reserved. Page 7

Off-site Storage

Backup Agent

Backup Agent

Backup Agent

Backup Server

Backup Agent

Backup Server

Backup Agent

Backup Agent

Backup Agent

Backup Server

Backup Agent

Backup Agent

Backup Server

Fixed Costs

Variable Costs

Backup Window

Time to Restore

High Low Short Short

Low High Long Long

Low High Short Short

© 2011 Arkeia Software, All rights reserved. Page 8

Copy is moved offsite

Copy is moved offsite

Off-site Storage Strategies

► Why is Off-site Storage Important?

– Loss, theft, site destruction

► Strategies

– Tapes on trucks

– Replication to the cloud

► Costs

Data Volume Protected

Cloud

Tape

© 2011 Arkeia Software. All rights reserved. Page 9

University of Chicago: VMware Backup Strategy

Page 10

Tom IndelliSenior Systems Administrator

University of Chicago

Organization

Page 11

University of Chicago– Physical Sciences Division

Activities– Theoretical Chemistry (e.g. Molecular Dynamics)– Theoretical Physics– Science Education

Deployment #1: Data & Servers

Page 12

Data– Theoretical Chemistry & Molecular Dynamics– Simulations of atoms using “trajectory files”

• 20,000 atoms to 100,000 atoms• Jobs run up to 48 hours• Simulate less than 50 nanoseconds of interactions

– Most operation is “batch”, performed on 100-node compute clusters Protected Servers

– 2 Red Hat and 1 MacOS file servers– File servers hold inputs to and results of simulations– 44TB source data

Analyses are computationally-intensive;Physical platforms deliver best performance

Deployment #1: Data Protection

Page 13

Backup Server Solution– Arkeia Network Backup v9 on Red Hat 6.0– 100TB disk (backup target DAS)

Backup Strategy– Backup to Disk

• Weekly full, nightly incremental• Agents backed up concurrently

Offsite Strategy– None

Red Hat EL 6.0

Arkeia Backup Server v9 on RHEL

Compression occurs in Arkeia

agent, before backups are

moved on the LAN

2Gbps LAN

Red Hat EL 6.0

MacOS X

Deployment #2: Data & Servers

Page 14

Data– Web servers– Management software & data– Support software & data– Uninterrupted operation is critical

Protected Servers– 2 ESXi 4.1 hosts with vCenter 4.1 (facilitates upgrades)– 15 - 20 virtual machines– 3TB source data

Deployment #2: Data Protection

Page 15

Backup Server Solution

Backup Strategy– Backup to Disk (20TB EqualLogic SAN)

• Weekly full, nightly incremental• Three groups of backups performed in sequence

– Replicate to Tape Library (Dell Powervault PL-2000 with LTO4 drive)

Offsite Strategy– Tapes moved to another office

Hypervisor #A Hypervisor #B Arkeia Backup Server v9 on RHEL

VM A.1

VM A.2

VM A.3

VM B.1

VM B.2

ANB VM

Compression occurs in Arkeia agent,

before backups are moved over LAN

2Gbps LAN

Deployment #2: Backups

Page 16

Backup GroupData on

Disk

Number of Virtual

Machine Images

Full/ Incremental

Backups (with CBT)

Compression Multiplier

(With Dedupe)Retention

Period

Approx Backup Storage

Required

PSD 650 GB 1 Weekly / Daily

4.3(7.9) 60 days 1.6 TB

Heliopolis 1,600 GB 1 Weekly /

Daily 1.7 60 days 9.8 TB

General VM Pool 700 GB 12 Weekly /

Daily2.2

(13.5) 60 days 3.3 TB

= 19 LTO4 Cartridges

Deployment #2: vStorage Usage

Page 17

Backups via vCenter Backups use Changed Block Tracking (CBT)

– Full backups (“Thin full” with CBT)– Incremental backups

Restores– Perform occasional full-image restores– Have tested single-file restores

Costs of Tape vs. Cloud for 18TB

Page 18

Tape– 22 LTO4 tapes (18TB) @$30/cartridges = $660– 1 TL-2000 = $10,000 (amortized over 3 years)– One year costs = $4,000 + tape shuffling

Public Cloud– 18TB @$0.125/GB/month (Amazon) = $2,300/month– One year costs = $28,000 Does Not

Include Costs of Bandwidth

Summary

Page 19

UChicago has both virtual and physical environments– Physical systems are a better fit for some workloads– Want one backup solution to protect both environments

Off-site storage is required– Off-line is a bonus

vSphere Changed Block Tracking – Accelerates incremental backups– Reduces storage

Tom Indelli

Senior Systems Administrator

[email protected]

Thank You

Page 20

• Why Hybrid?• Data Volume Limits• Cloud Infrastructure Requirements

Hybrid Cloud Backup

© 2011 Arkeia Software. All rights reserved. Page 21

“Hybrid” Cloud Backup

► Perform backup on LAN– Fast backups, fast restores

► Replicate backups to cloud for safe-keeping– Secure data

© 2011 Arkeia Software. All rights reserved. Page 22

Backup Agent

Backup Server

Backup Agent

Backup Agent

Step 1 Step 2

“Hybrid” Cloud Backup

► Full Backup– If time < one week: Over the WAN

– If time > one week: Via portable media

► Daily Incremental Backup– If time < 24 hours: Over the WAN

– If time > 24 hours: Impossible

© 2011 Arkeia Software. All rights reserved. Page 23

Backup Agent

Backup Server

Backup Agent

Backup Agent

Incremental Backup Size Limits

Cloud Backup:

Incremental Size Is 0.01% to 20% of

Full Backup

Cloud Strategies: Replication Window

© 2011 Arkeia Software. All rights reserved. Page 24

Backup Agent

Backup Server

Backup Agent

Backup Agent

Incremental(1%) Backup

FullBackup

Role of Deduplication in Backup

► Shrinks Data– Reduces Storage

– Shortens Backup Window

► Data Scenarios– Primary Data

– Secondary Data

© 2011 Arkeia Software. All rights reserved. Page 25

Across computers(e.g. word.exe)

Across/Within Files(e.g. PPT files)

Over Time(e.g. outlook.pst)

• Storage-only v.s. Storage-and-Server• File Recovery vs. Disaster Recovery

Hybrid Cloud Recovery

© 2011 Arkeia Software. All rights reserved. Page 26

Cloud Recovery Strategies

► Data are Secure– Deduplicated

– Compressed

– Encrypted

► How to Recover/Extract?

© 2011 Arkeia Software. All rights reserved. Page 27

Backup Agent

Backup Server

Backup Agent

Backup Agent

Cloud Recovery Strategies

► How to Recover/Extract?– Restore (via big pipe) to servers in cloud

– Restore (via portable media) to new location

© 2011 Arkeia Software. All rights reserved. Page 28

Backup Agent

Backup Server

Backup Agent

Backup Agent

Backup Server

Hybrid Cloud Backup Summary

► Alternative to Tape► …But Maximum Data-Protection Limit

– Imposed by incremental backup size

► Primary Cost of Hybrid Cloud– Bandwidth

– (Then target disk)

► Pay Attention to Recovery Strategy– Instantiate in Cloud

– Recovery on Portable Media

© 2011 Arkeia Software. All rights reserved. Page 29

Arkeia Software

► Company

– Founded 1996; HQ in San Diego

► Products

– Arkeia Network Backup Suite

• Backup/Recovery

• Disaster Recovery

– Virtual and Physical Environments

• vSphere (with CBT), Hyper-V, XenServer

• Linux, Windows…AIX, BSD, HP-UX, MacOS, Netware, Solaris (200+ platforms)

– Software, Appliances, Virtual Appliances

– Disk, Tape, Cloud

► Customers

– 7,000 mid-market customers in 70 countries

– Enterprises, Governments, Service Providers

© 2011 Arkeia Software. All rights reserved. Page 30

Bill EvansArkeia Software

[email protected]

Please Contact Me

© 2011 Arkeia Software. All rights reserved. Page 31

Manon Buettner, Principal

Nuvalo

[email protected]

+1 408-605-6455

Jo Peterson, Regional Manager

Teleproviders

[email protected]

+1 949-268-2633

Resources for last-mile internet for data centers and enterprises

Detail 1 of 3: “Incrementals Forever”

► How Does it Work?– Initially, one full backup

– Subsequently, “incrementals forever”

© 2011 Arkeia Software. All rights reserved. Page 32

Day0 1 2 3 4 5 6 7 8 9 …

t

TraditionalBackupPolicy

► How to recover disk space at target?– “Synthetic backups”

Detail 2 of 3: Multiple Sources

► Deduplication consolidation– Static storage cannot resolve duplicates

© 2011 Arkeia Software. All rights reserved. Page 33

Backup Agent

Backup Server

Backup Agent

Backup Agent

Backup Agent

Backup Server

Backup Agent

Backup Agent

X

X

► Deduplication vs. Encryption– Dedupe → Compress → Encrypt

Detail 3 of 3: WAN Bandwidth

► Data Compression– File-grain compression

• Examples: LZ-77, JPEG, MPEG

– Inter-file deduplication • Examples: SIS, fixed-block, variable-block, progressive-dedupe

► TCP Optimization?

© 2011 Arkeia Software. All rights reserved. Page 34

Warnings:1. Latency Optimization Bandwidth Optimization2. No compression of compressed or random data

Data Loss Universe

Media failure

Software Error

DisastersLoss of Hardware

User Error

© 2011 Arkeia Software. All rights reserved. Page 35