Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

39
Site-Wide Backup Site-Wide Backup Briefing Briefing Ray Pasetes Core Support Services April 16, 2004

Transcript of Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Page 1: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Site-Wide BackupSite-Wide BackupBriefingBriefing

Ray PasetesCore Support Services

April 16, 2004

Page 2: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

AgendaAgenda

• Progress UpdateProgress Update

• Proposed ModelProposed Model

• Proposed RolloutProposed Rollout

• Initial CostsInitial Costs

• Estimated 6-year TCOEstimated 6-year TCO

• Needed DecisionsNeeded Decisions

• Critical TimelineCritical Timeline

Page 3: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Progress UpdateProgress Update

Page 4: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Progress UpdateProgress Update

• 12/03: Reduce scope to division-wide12/03: Reduce scope to division-wide– Site-wide requirements too diverseSite-wide requirements too diverse– Start small, let others see it work, then Start small, let others see it work, then

expandexpand• 1/04: Interview groups1/04: Interview groups

– 6 Groups6 Groups•ClueD0, D0-offline, CDF, CMS, FTP, ISAClueD0, D0-offline, CDF, CMS, FTP, ISA•SCS (represents 11 clusters/groups)SCS (represents 11 clusters/groups)

– Along with CSI, 22 total clusters/groups Along with CSI, 22 total clusters/groups representedrepresented

Page 5: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Progress UpdateProgress Update

• 2/1: Interviews finished2/1: Interviews finished• 2/06: CSS department review of data2/06: CSS department review of data

– Most accepted 8to17 x5 serviceMost accepted 8to17 x5 service– Most accepted same day or NBD restoreMost accepted same day or NBD restore– Investigate using existing resourcesInvestigate using existing resources– Investigate costs for deploymentInvestigate costs for deployment- Initial deployment will backup 7+TB of Initial deployment will backup 7+TB of

datadata- Start pilot smallStart pilot small

Page 6: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Proposed ModelProposed Model

Page 7: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

GoalsGoals

• Provide a reliable data backup serviceProvide a reliable data backup service• Reduce redundant effort, allowing Reduce redundant effort, allowing

division to be more productivedivision to be more productive• Long-term goal: reduce overall Long-term goal: reduce overall

division spending on data backups via division spending on data backups via consolidationconsolidation

• Long term goal: service accessible Long term goal: service accessible across entire site, desktops includedacross entire site, desktops included

Page 8: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Service ModelService Model

• Use farms model –> backup blocksUse farms model –> backup blocks– Backup blocks == 1 server + 4 or more Backup blocks == 1 server + 4 or more

tape drivestape drives– Smaller customers share the same Smaller customers share the same

backup blockbackup block– Larger customers would have their own Larger customers would have their own

backup block(s)backup block(s)

Page 9: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Costing ModelCosting Model

• Charge based on total GB backed upCharge based on total GB backed up– Cost should completely cover tape costsCost should completely cover tape costs– Cost should cover hardware and Cost should cover hardware and

software maintenance costssoftware maintenance costs– Cost should cover hardware costsCost should cover hardware costs

• ““Profit” will be used to expand and Profit” will be used to expand and enhance the systemenhance the system

Page 10: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Costing Model - ExampleCosting Model - Example

• Year 1 customer: $1.15/GB on tape/yr or Year 1 customer: $1.15/GB on tape/yr or $34.50/GB of data to backup/year$34.50/GB of data to backup/year– Covers hardware, tape and maintenance costsCovers hardware, tape and maintenance costs

• Year 2+ customer: $0.33/GB on tape/yr or Year 2+ customer: $0.33/GB on tape/yr or $9.82/GB of data to backup/year$9.82/GB of data to backup/year– Covers maintenance, tapes, additional slots, Covers maintenance, tapes, additional slots,

etc.etc.– No charge for existing hardwareNo charge for existing hardware

• No connection or per client feesNo connection or per client fees• No restore feesNo restore fees

Page 11: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Proposed RolloutProposed Rollout

Page 12: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Proposed RolloutProposed Rollout

• Start pilot small ~ 7.1TBStart pilot small ~ 7.1TB

• The following groups will be asked to The following groups will be asked to partake in pilot projectpartake in pilot project

- Astro, CEPA, CMS, D0, ESH, FESS, ISA, KTEV, - Astro, CEPA, CMS, D0, ESH, FESS, ISA, KTEV, LSS, MINOS, NUMI, PPD, SDSS, SIDET, Theory, LSS, MINOS, NUMI, PPD, SDSS, SIDET, Theory, VMSVMS

• Desktops NOT part of initial rolloutDesktops NOT part of initial rollout

• Lessons learned in year 1 determine year Lessons learned in year 1 determine year 2 growth2 growth

Page 13: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Proposed Rollout – Proposed Rollout – UnknownsUnknowns

• Limit of each backup blockLimit of each backup block– Highly dependent on tape rotation and Highly dependent on tape rotation and

daily deltadaily delta– A single server at CMU can handle 15TB A single server at CMU can handle 15TB

of dataof data

• Compressibility of dataCompressibility of data– Affects cost of serviceAffects cost of service– Affects performanceAffects performance

Page 14: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Proposed Rollout – TimelineProposed Rollout – Timeline

• 9/1 – Equipment delivered9/1 – Equipment delivered

• 9/8 – Equipment Installed9/8 – Equipment Installed

• 9/8-9/15 – Functionality test9/8-9/15 – Functionality test

• 9/16-10/01 – Systems testing9/16-10/01 – Systems testing

• 10/04 – Start rollout10/04 – Start rollout

• 12/04 – Complete initial rollout12/04 – Complete initial rollout

• ~2FTE effort required for initial rollout~2FTE effort required for initial rollout

Page 15: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Initial CostsInitial Costs

Page 16: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Costs - LibraryCosts - Library

Three SolutionsThree Solutions

• STK and 9940B tape drivesSTK and 9940B tape drives

• ADIC and LTO-2 tape drivesADIC and LTO-2 tape drives

• SpectraLogic and SAIT-1 tape drivesSpectraLogic and SAIT-1 tape drives

Page 17: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Costs – LibraryCosts – LibraryItemsItems STKSTK SpectraSpectra D0 ADICD0 ADIC CDF CDF

ADICADICLibraryLibrary $0$0 $69,418.12$69,418.12 $19,302.00$19,302.00 $19,302.00$19,302.00

++

Drives Drives NeededNeeded

8x 9940B8x 9940B 8x SAIT-18x SAIT-1 8x LTO-28x LTO-2 8x LTO-28x LTO-2

Drive CostDrive Cost $208,000$208,000 $122,894.87$122,894.87 $76,480$76,480 $76,480$76,480

Tapes Tapes NeededNeeded

10951095 450450 10951095 10951095

Tape CostTape Cost $78,292$78,292

($357.50/($357.50/TB)TB)

$80,970$80,970

($359.87/($359.87/TB)TB)

$72,270$72,270

($330/TB)($330/TB)$72,270$72,270

($330/TB)($330/TB)

TiBS S/W TiBS S/W PortPort

$30,000+$30,000+ $0$0 $30,000+$30,000+ $30,000+$30,000+

MaintenanceMaintenance $6,960$6,960 $14,436$14,436 $39,276$39,276 $48,015$48,015

Total year 1Total year 1 $323,252+$323,252+ $287,719.29$287,719.29 $237,328+$237,328+ $246,067+$246,067+

Page 18: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Costs – ServerCosts – Server

• 2x Sun V440 -- $28K2x Sun V440 -- $28K

• 1x SATA RAID Disk Cache -- $ 14K1x SATA RAID Disk Cache -- $ 14K

Page 19: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Costs – Software - TiBSCosts – Software - TiBS

• No additional client costsNo additional client costs• Servers: $8250 eachServers: $8250 each

– 2 have already been purchased2 have already been purchased– Additional cost for OFM s/w for WindowsAdditional cost for OFM s/w for Windows

• Process packs Process packs – Processes == number of parallel backupsProcesses == number of parallel backups– 45 processes have been purchased45 processes have been purchased

• Maintenance – 15%Maintenance – 15%– Currently $11,590.35 annuallyCurrently $11,590.35 annually

Prices change if not managed by CSSPrices change if not managed by CSS

Page 20: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Possible FundingPossible Funding

• CSS Backup OP: $130KCSS Backup OP: $130K

• CSS Backup EQ: $120KCSS Backup EQ: $120K

• CMS Backup EQ: $100KCMS Backup EQ: $100K

• Other groups?Other groups?

Page 21: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Estimated 6-year TCOEstimated 6-year TCO

Two Growth ModelsTwo Growth Models1.1. Fermi StandardFermi Standard

• 10% daily delta10% daily delta• Data growth doubles yearlyData growth doubles yearly• Approximately 40% above industry Approximately 40% above industry

standardstandard

2.2. Fermi ActiveFermi Active• 45% daily delta (CMS scenario)45% daily delta (CMS scenario)• Data growth doubles yearlyData growth doubles yearly

Page 22: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Estimated Year 2 Estimated Year 2 (14.2TB(14.2TB))

• Fermi StandardFermi Standard– Double slotsDouble slots– Add caching diskAdd caching disk– Tape cost down 25%Tape cost down 25%

• Configuration:Configuration:– 2 backup blocks2 backup blocks– Slots 2x year 1Slots 2x year 1– Increased caching Increased caching

diskdisk

• Fermi ActiveFermi Active– ~Triple slots~Triple slots– Double backup blocksDouble backup blocks– Add caching diskAdd caching disk– Tape cost down 25%Tape cost down 25%

• Configuration:Configuration:– 4 backup blocks4 backup blocks– Slots ~3x year 1Slots ~3x year 1– Increased caching Increased caching

diskdisk

Page 23: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Estimated Year 3Estimated Year 3(28.4TB)(28.4TB)

• Fermi StandardFermi Standard– Double SlotsDouble Slots– Add Caching diskAdd Caching disk

• Configuration:Configuration:– 2 backup blocks2 backup blocks– Slots 4x year 1Slots 4x year 1– Increased caching Increased caching

diskdisk

• Fermi ActiveFermi Active– Double backup blocksDouble backup blocks– Double slotsDouble slots– Add caching diskAdd caching disk– Drive cost down 25%Drive cost down 25%

• ConfigurationConfiguration– 8 backup blocks8 backup blocks– Slots ~6.5x year 1Slots ~6.5x year 1– Increased caching Increased caching

diskdisk

Page 24: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Estimated Year 4Estimated Year 4

• Tape technology changesTape technology changes– Media capacity quadruplesMedia capacity quadruples– Drive performance quadruplesDrive performance quadruples

• 10GigE standard10GigE standard– Servers now equipped with 10GiGEServers now equipped with 10GiGE– Increased bus speedsIncreased bus speeds– Generally available to the labGenerally available to the lab

Page 25: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Estimated Year 4Estimated Year 4

• Roll in new serversRoll in new servers– Backup block == 1 server + 8 tape drivesBackup block == 1 server + 8 tape drives

• Migrate customers onto new serversMigrate customers onto new servers

• Allow old tapes to migrate off via tape Allow old tapes to migrate off via tape retention policiesretention policies

• Slowly tear down old backup blocksSlowly tear down old backup blocks

• Approximately 1 year migrationApproximately 1 year migration

Page 26: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Estimated Year 4Estimated Year 4(56.4TB)(56.4TB)

• Fermi StandardFermi Standard– 2 new backup blocks2 new backup blocks– Increase slots by halfIncrease slots by half

• Configuration:Configuration:– 2 old backup blocks2 old backup blocks– 2 new backup blocks2 new backup blocks– 2/3 library old slots2/3 library old slots– 1/3 library new slots1/3 library new slots– Slots 6x year 1Slots 6x year 1

• Fermi ActiveFermi Active– 2 new backup blocks2 new backup blocks– Increase slots by halfIncrease slots by half

• Configuration:Configuration:– 8 old backup blocks8 old backup blocks– 2 new backup blocks2 new backup blocks– 2/3 library old slots2/3 library old slots– 1/3 library new slots1/3 library new slots– Slots ~10x year 1Slots ~10x year 1

Page 27: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Estimated Year 5Estimated Year 5(112.8TB)(112.8TB)

• Fermi StandardFermi Standard– Add caching diskAdd caching disk– Convert slotsConvert slots– Tape cost down 25%Tape cost down 25%

• Configuration:Configuration:– 2 backup blocks2 backup blocks– All slots newAll slots new

• 2/3 library used2/3 library used

• 1/3 unused1/3 unused

– Slots 6x year 1Slots 6x year 1

• Fermi ActiveFermi Active– Double backup blocksDouble backup blocks– Convert slotsConvert slots– Add caching diskAdd caching disk– Tape cost down 25%Tape cost down 25%

• Configuration:Configuration:– 4 backup blocks4 backup blocks– All slots newAll slots new

• 2/3 library used2/3 library used

• 1/3 unused1/3 unused

– Slots ~10x year 1Slots ~10x year 1

Page 28: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Estimated Year 6Estimated Year 6(225.6TB)(225.6TB)

• Fermi StandardFermi Standard

– Add caching diskAdd caching disk– Increase slots by Increase slots by

1/31/3

• ConfigurationConfiguration– 2 backup blocks2 backup blocks– Slots 8x year 1Slots 8x year 1

• Fermi ActiveFermi Active– Add caching diskAdd caching disk- 3 additional backup 3 additional backup

blocksblocks- Increase slots 1/3Increase slots 1/3- Drive cost down 25%Drive cost down 25%

- Configuration:Configuration:- 7 backup blocks7 backup blocks- Slots ~13x year 1Slots ~13x year 1

Page 29: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Estimated 6-year TCOEstimated 6-year TCO

YearYear Spectra Spectra StandardStandard

Spectra Spectra ActiveActive

ADIC ADIC StandardStandard

ADIC ADIC ActiveActive

11 $329,71$329,7199

$329,71$329,7199

$279,32$279,3288

$279,32$279,3288

22 $135,06$135,0644

$454,40$454,4022

$142,84$142,8488

$474,45$474,4588

33 $132,56$132,5699

$721,76$721,7677

$234,42$234,4200

$830,74$830,7488

44 $582,05$582,0577

$756,79$756,7955

$723,14$723,1422

EE

55 $194,19$194,1922

$623,28$623,2800

$268,68$268,6888

EE

66 $367,68$367,6811

$1.13M$1.13M $505,08$505,0833

EE

TotalTotal ~$1.74M~$1.74M ~$4.02M~$4.02M ~$2.15M~$2.15M EE

Page 30: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Needed DecisionsNeeded Decisions

Page 31: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Decisions – LibraryDecisions – Library

• Share D0 ADIC robot?Share D0 ADIC robot?– Initially, makes most sense financiallyInitially, makes most sense financially– Backup s/w will need to be portedBackup s/w will need to be ported

• Custom engineering costsCustom engineering costs– Current s/w assumes all tapes are owned by backup Current s/w assumes all tapes are owned by backup

serviceservice

• We will be the only deploymentWe will be the only deployment• Fermi will need to provide development resourcesFermi will need to provide development resources

– HardwareHardware– People/coordinationPeople/coordination

– Active scenario outgrows robot at year 4. This assumes Active scenario outgrows robot at year 4. This assumes ZERO ZERO growth by D0 and SDSS (current users of robot).growth by D0 and SDSS (current users of robot).

– Standard scenario will use 2/3 of D0 robot.Standard scenario will use 2/3 of D0 robot.

Page 32: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Decisions - LibraryDecisions - Library

• Purchase new robot? (SpectraLogic)Purchase new robot? (SpectraLogic)– Higher initial costHigher initial cost– Lower TCO over 6 years. Lower TCO over 6 years.

•Years 2-6 lower operational costs than ADICYears 2-6 lower operational costs than ADIC

– Backup software already works with it. Backup software already works with it. Service can be brought up more quickly.Service can be brought up more quickly.

– SpectraLogic library running TiBS SpectraLogic library running TiBS software to be deployed at MIT (software to be deployed at MIT (not for public not for public

disclosuredisclosure).).

Page 33: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Decisions - LibraryDecisions - Library

• Purchase new robot (part 2)?Purchase new robot (part 2)?– Small footprint. Fully populated library Small footprint. Fully populated library

is 7 racks wide, 2 racks deep.is 7 racks wide, 2 racks deep.– Library supports SAIT, LTO, LTO-2, SDLT, Library supports SAIT, LTO, LTO-2, SDLT,

DLTDLT– In ’05, will support diskIn ’05, will support disk– Relatively new product. Higher Relatively new product. Higher

possibility hardware costs will decrease.possibility hardware costs will decrease.

Page 34: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Footprint ComparisonFootprint Comparison

Page 35: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Critical TimelineCritical Timeline

Page 36: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

Critical TimelineCritical Timeline

• 4/21: Division request for departments to 4/21: Division request for departments to determine how budgets will be spentdetermine how budgets will be spent– CSI/SCS robots running out of roomCSI/SCS robots running out of room

• Windows support 2.3TB pending decisionWindows support 2.3TB pending decision

– D0 Legato renewalD0 Legato renewal

• 6/1/04: Promised date to CMS for backup 6/1/04: Promised date to CMS for backup service to be onlineservice to be online

• 6/30/04: Latest time for TiBS to begin 6/30/04: Latest time for TiBS to begin porting effort to make FY ’04 timeline.porting effort to make FY ’04 timeline.

Page 37: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

FiniFini

Open DiscussionOpen Discussion

Page 38: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

CSI Backup System - CSI Backup System - ExampleExample

• Initially deployed in 2001 for AFS Initially deployed in 2001 for AFS backupsbackups

• Expanded to include UNIX and some Expanded to include UNIX and some windowswindows

• 2003 – SCS added to backup system2003 – SCS added to backup system

• Data has grown >600% since 2001.Data has grown >600% since 2001.

• Backup system has not grownBackup system has not grown

Page 39: Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.

CSI Backup System - CSI Backup System - ExampleExample