Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.
Transcript of Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.
Site-Wide BackupSite-Wide BackupBriefingBriefing
Ray PasetesCore Support Services
April 16, 2004
AgendaAgenda
• Progress UpdateProgress Update
• Proposed ModelProposed Model
• Proposed RolloutProposed Rollout
• Initial CostsInitial Costs
• Estimated 6-year TCOEstimated 6-year TCO
• Needed DecisionsNeeded Decisions
• Critical TimelineCritical Timeline
Progress UpdateProgress Update
Progress UpdateProgress Update
• 12/03: Reduce scope to division-wide12/03: Reduce scope to division-wide– Site-wide requirements too diverseSite-wide requirements too diverse– Start small, let others see it work, then Start small, let others see it work, then
expandexpand• 1/04: Interview groups1/04: Interview groups
– 6 Groups6 Groups•ClueD0, D0-offline, CDF, CMS, FTP, ISAClueD0, D0-offline, CDF, CMS, FTP, ISA•SCS (represents 11 clusters/groups)SCS (represents 11 clusters/groups)
– Along with CSI, 22 total clusters/groups Along with CSI, 22 total clusters/groups representedrepresented
Progress UpdateProgress Update
• 2/1: Interviews finished2/1: Interviews finished• 2/06: CSS department review of data2/06: CSS department review of data
– Most accepted 8to17 x5 serviceMost accepted 8to17 x5 service– Most accepted same day or NBD restoreMost accepted same day or NBD restore– Investigate using existing resourcesInvestigate using existing resources– Investigate costs for deploymentInvestigate costs for deployment- Initial deployment will backup 7+TB of Initial deployment will backup 7+TB of
datadata- Start pilot smallStart pilot small
Proposed ModelProposed Model
GoalsGoals
• Provide a reliable data backup serviceProvide a reliable data backup service• Reduce redundant effort, allowing Reduce redundant effort, allowing
division to be more productivedivision to be more productive• Long-term goal: reduce overall Long-term goal: reduce overall
division spending on data backups via division spending on data backups via consolidationconsolidation
• Long term goal: service accessible Long term goal: service accessible across entire site, desktops includedacross entire site, desktops included
Service ModelService Model
• Use farms model –> backup blocksUse farms model –> backup blocks– Backup blocks == 1 server + 4 or more Backup blocks == 1 server + 4 or more
tape drivestape drives– Smaller customers share the same Smaller customers share the same
backup blockbackup block– Larger customers would have their own Larger customers would have their own
backup block(s)backup block(s)
Costing ModelCosting Model
• Charge based on total GB backed upCharge based on total GB backed up– Cost should completely cover tape costsCost should completely cover tape costs– Cost should cover hardware and Cost should cover hardware and
software maintenance costssoftware maintenance costs– Cost should cover hardware costsCost should cover hardware costs
• ““Profit” will be used to expand and Profit” will be used to expand and enhance the systemenhance the system
Costing Model - ExampleCosting Model - Example
• Year 1 customer: $1.15/GB on tape/yr or Year 1 customer: $1.15/GB on tape/yr or $34.50/GB of data to backup/year$34.50/GB of data to backup/year– Covers hardware, tape and maintenance costsCovers hardware, tape and maintenance costs
• Year 2+ customer: $0.33/GB on tape/yr or Year 2+ customer: $0.33/GB on tape/yr or $9.82/GB of data to backup/year$9.82/GB of data to backup/year– Covers maintenance, tapes, additional slots, Covers maintenance, tapes, additional slots,
etc.etc.– No charge for existing hardwareNo charge for existing hardware
• No connection or per client feesNo connection or per client fees• No restore feesNo restore fees
Proposed RolloutProposed Rollout
Proposed RolloutProposed Rollout
• Start pilot small ~ 7.1TBStart pilot small ~ 7.1TB
• The following groups will be asked to The following groups will be asked to partake in pilot projectpartake in pilot project
- Astro, CEPA, CMS, D0, ESH, FESS, ISA, KTEV, - Astro, CEPA, CMS, D0, ESH, FESS, ISA, KTEV, LSS, MINOS, NUMI, PPD, SDSS, SIDET, Theory, LSS, MINOS, NUMI, PPD, SDSS, SIDET, Theory, VMSVMS
• Desktops NOT part of initial rolloutDesktops NOT part of initial rollout
• Lessons learned in year 1 determine year Lessons learned in year 1 determine year 2 growth2 growth
Proposed Rollout – Proposed Rollout – UnknownsUnknowns
• Limit of each backup blockLimit of each backup block– Highly dependent on tape rotation and Highly dependent on tape rotation and
daily deltadaily delta– A single server at CMU can handle 15TB A single server at CMU can handle 15TB
of dataof data
• Compressibility of dataCompressibility of data– Affects cost of serviceAffects cost of service– Affects performanceAffects performance
Proposed Rollout – TimelineProposed Rollout – Timeline
• 9/1 – Equipment delivered9/1 – Equipment delivered
• 9/8 – Equipment Installed9/8 – Equipment Installed
• 9/8-9/15 – Functionality test9/8-9/15 – Functionality test
• 9/16-10/01 – Systems testing9/16-10/01 – Systems testing
• 10/04 – Start rollout10/04 – Start rollout
• 12/04 – Complete initial rollout12/04 – Complete initial rollout
• ~2FTE effort required for initial rollout~2FTE effort required for initial rollout
Initial CostsInitial Costs
Costs - LibraryCosts - Library
Three SolutionsThree Solutions
• STK and 9940B tape drivesSTK and 9940B tape drives
• ADIC and LTO-2 tape drivesADIC and LTO-2 tape drives
• SpectraLogic and SAIT-1 tape drivesSpectraLogic and SAIT-1 tape drives
Costs – LibraryCosts – LibraryItemsItems STKSTK SpectraSpectra D0 ADICD0 ADIC CDF CDF
ADICADICLibraryLibrary $0$0 $69,418.12$69,418.12 $19,302.00$19,302.00 $19,302.00$19,302.00
++
Drives Drives NeededNeeded
8x 9940B8x 9940B 8x SAIT-18x SAIT-1 8x LTO-28x LTO-2 8x LTO-28x LTO-2
Drive CostDrive Cost $208,000$208,000 $122,894.87$122,894.87 $76,480$76,480 $76,480$76,480
Tapes Tapes NeededNeeded
10951095 450450 10951095 10951095
Tape CostTape Cost $78,292$78,292
($357.50/($357.50/TB)TB)
$80,970$80,970
($359.87/($359.87/TB)TB)
$72,270$72,270
($330/TB)($330/TB)$72,270$72,270
($330/TB)($330/TB)
TiBS S/W TiBS S/W PortPort
$30,000+$30,000+ $0$0 $30,000+$30,000+ $30,000+$30,000+
MaintenanceMaintenance $6,960$6,960 $14,436$14,436 $39,276$39,276 $48,015$48,015
Total year 1Total year 1 $323,252+$323,252+ $287,719.29$287,719.29 $237,328+$237,328+ $246,067+$246,067+
Costs – ServerCosts – Server
• 2x Sun V440 -- $28K2x Sun V440 -- $28K
• 1x SATA RAID Disk Cache -- $ 14K1x SATA RAID Disk Cache -- $ 14K
Costs – Software - TiBSCosts – Software - TiBS
• No additional client costsNo additional client costs• Servers: $8250 eachServers: $8250 each
– 2 have already been purchased2 have already been purchased– Additional cost for OFM s/w for WindowsAdditional cost for OFM s/w for Windows
• Process packs Process packs – Processes == number of parallel backupsProcesses == number of parallel backups– 45 processes have been purchased45 processes have been purchased
• Maintenance – 15%Maintenance – 15%– Currently $11,590.35 annuallyCurrently $11,590.35 annually
Prices change if not managed by CSSPrices change if not managed by CSS
Possible FundingPossible Funding
• CSS Backup OP: $130KCSS Backup OP: $130K
• CSS Backup EQ: $120KCSS Backup EQ: $120K
• CMS Backup EQ: $100KCMS Backup EQ: $100K
• Other groups?Other groups?
Estimated 6-year TCOEstimated 6-year TCO
Two Growth ModelsTwo Growth Models1.1. Fermi StandardFermi Standard
• 10% daily delta10% daily delta• Data growth doubles yearlyData growth doubles yearly• Approximately 40% above industry Approximately 40% above industry
standardstandard
2.2. Fermi ActiveFermi Active• 45% daily delta (CMS scenario)45% daily delta (CMS scenario)• Data growth doubles yearlyData growth doubles yearly
Estimated Year 2 Estimated Year 2 (14.2TB(14.2TB))
• Fermi StandardFermi Standard– Double slotsDouble slots– Add caching diskAdd caching disk– Tape cost down 25%Tape cost down 25%
• Configuration:Configuration:– 2 backup blocks2 backup blocks– Slots 2x year 1Slots 2x year 1– Increased caching Increased caching
diskdisk
• Fermi ActiveFermi Active– ~Triple slots~Triple slots– Double backup blocksDouble backup blocks– Add caching diskAdd caching disk– Tape cost down 25%Tape cost down 25%
• Configuration:Configuration:– 4 backup blocks4 backup blocks– Slots ~3x year 1Slots ~3x year 1– Increased caching Increased caching
diskdisk
Estimated Year 3Estimated Year 3(28.4TB)(28.4TB)
• Fermi StandardFermi Standard– Double SlotsDouble Slots– Add Caching diskAdd Caching disk
• Configuration:Configuration:– 2 backup blocks2 backup blocks– Slots 4x year 1Slots 4x year 1– Increased caching Increased caching
diskdisk
• Fermi ActiveFermi Active– Double backup blocksDouble backup blocks– Double slotsDouble slots– Add caching diskAdd caching disk– Drive cost down 25%Drive cost down 25%
• ConfigurationConfiguration– 8 backup blocks8 backup blocks– Slots ~6.5x year 1Slots ~6.5x year 1– Increased caching Increased caching
diskdisk
Estimated Year 4Estimated Year 4
• Tape technology changesTape technology changes– Media capacity quadruplesMedia capacity quadruples– Drive performance quadruplesDrive performance quadruples
• 10GigE standard10GigE standard– Servers now equipped with 10GiGEServers now equipped with 10GiGE– Increased bus speedsIncreased bus speeds– Generally available to the labGenerally available to the lab
Estimated Year 4Estimated Year 4
• Roll in new serversRoll in new servers– Backup block == 1 server + 8 tape drivesBackup block == 1 server + 8 tape drives
• Migrate customers onto new serversMigrate customers onto new servers
• Allow old tapes to migrate off via tape Allow old tapes to migrate off via tape retention policiesretention policies
• Slowly tear down old backup blocksSlowly tear down old backup blocks
• Approximately 1 year migrationApproximately 1 year migration
Estimated Year 4Estimated Year 4(56.4TB)(56.4TB)
• Fermi StandardFermi Standard– 2 new backup blocks2 new backup blocks– Increase slots by halfIncrease slots by half
• Configuration:Configuration:– 2 old backup blocks2 old backup blocks– 2 new backup blocks2 new backup blocks– 2/3 library old slots2/3 library old slots– 1/3 library new slots1/3 library new slots– Slots 6x year 1Slots 6x year 1
• Fermi ActiveFermi Active– 2 new backup blocks2 new backup blocks– Increase slots by halfIncrease slots by half
• Configuration:Configuration:– 8 old backup blocks8 old backup blocks– 2 new backup blocks2 new backup blocks– 2/3 library old slots2/3 library old slots– 1/3 library new slots1/3 library new slots– Slots ~10x year 1Slots ~10x year 1
Estimated Year 5Estimated Year 5(112.8TB)(112.8TB)
• Fermi StandardFermi Standard– Add caching diskAdd caching disk– Convert slotsConvert slots– Tape cost down 25%Tape cost down 25%
• Configuration:Configuration:– 2 backup blocks2 backup blocks– All slots newAll slots new
• 2/3 library used2/3 library used
• 1/3 unused1/3 unused
– Slots 6x year 1Slots 6x year 1
• Fermi ActiveFermi Active– Double backup blocksDouble backup blocks– Convert slotsConvert slots– Add caching diskAdd caching disk– Tape cost down 25%Tape cost down 25%
• Configuration:Configuration:– 4 backup blocks4 backup blocks– All slots newAll slots new
• 2/3 library used2/3 library used
• 1/3 unused1/3 unused
– Slots ~10x year 1Slots ~10x year 1
Estimated Year 6Estimated Year 6(225.6TB)(225.6TB)
• Fermi StandardFermi Standard
– Add caching diskAdd caching disk– Increase slots by Increase slots by
1/31/3
• ConfigurationConfiguration– 2 backup blocks2 backup blocks– Slots 8x year 1Slots 8x year 1
• Fermi ActiveFermi Active– Add caching diskAdd caching disk- 3 additional backup 3 additional backup
blocksblocks- Increase slots 1/3Increase slots 1/3- Drive cost down 25%Drive cost down 25%
- Configuration:Configuration:- 7 backup blocks7 backup blocks- Slots ~13x year 1Slots ~13x year 1
Estimated 6-year TCOEstimated 6-year TCO
YearYear Spectra Spectra StandardStandard
Spectra Spectra ActiveActive
ADIC ADIC StandardStandard
ADIC ADIC ActiveActive
11 $329,71$329,7199
$329,71$329,7199
$279,32$279,3288
$279,32$279,3288
22 $135,06$135,0644
$454,40$454,4022
$142,84$142,8488
$474,45$474,4588
33 $132,56$132,5699
$721,76$721,7677
$234,42$234,4200
$830,74$830,7488
44 $582,05$582,0577
$756,79$756,7955
$723,14$723,1422
EE
55 $194,19$194,1922
$623,28$623,2800
$268,68$268,6888
EE
66 $367,68$367,6811
$1.13M$1.13M $505,08$505,0833
EE
TotalTotal ~$1.74M~$1.74M ~$4.02M~$4.02M ~$2.15M~$2.15M EE
Needed DecisionsNeeded Decisions
Decisions – LibraryDecisions – Library
• Share D0 ADIC robot?Share D0 ADIC robot?– Initially, makes most sense financiallyInitially, makes most sense financially– Backup s/w will need to be portedBackup s/w will need to be ported
• Custom engineering costsCustom engineering costs– Current s/w assumes all tapes are owned by backup Current s/w assumes all tapes are owned by backup
serviceservice
• We will be the only deploymentWe will be the only deployment• Fermi will need to provide development resourcesFermi will need to provide development resources
– HardwareHardware– People/coordinationPeople/coordination
– Active scenario outgrows robot at year 4. This assumes Active scenario outgrows robot at year 4. This assumes ZERO ZERO growth by D0 and SDSS (current users of robot).growth by D0 and SDSS (current users of robot).
– Standard scenario will use 2/3 of D0 robot.Standard scenario will use 2/3 of D0 robot.
Decisions - LibraryDecisions - Library
• Purchase new robot? (SpectraLogic)Purchase new robot? (SpectraLogic)– Higher initial costHigher initial cost– Lower TCO over 6 years. Lower TCO over 6 years.
•Years 2-6 lower operational costs than ADICYears 2-6 lower operational costs than ADIC
– Backup software already works with it. Backup software already works with it. Service can be brought up more quickly.Service can be brought up more quickly.
– SpectraLogic library running TiBS SpectraLogic library running TiBS software to be deployed at MIT (software to be deployed at MIT (not for public not for public
disclosuredisclosure).).
Decisions - LibraryDecisions - Library
• Purchase new robot (part 2)?Purchase new robot (part 2)?– Small footprint. Fully populated library Small footprint. Fully populated library
is 7 racks wide, 2 racks deep.is 7 racks wide, 2 racks deep.– Library supports SAIT, LTO, LTO-2, SDLT, Library supports SAIT, LTO, LTO-2, SDLT,
DLTDLT– In ’05, will support diskIn ’05, will support disk– Relatively new product. Higher Relatively new product. Higher
possibility hardware costs will decrease.possibility hardware costs will decrease.
Footprint ComparisonFootprint Comparison
Critical TimelineCritical Timeline
Critical TimelineCritical Timeline
• 4/21: Division request for departments to 4/21: Division request for departments to determine how budgets will be spentdetermine how budgets will be spent– CSI/SCS robots running out of roomCSI/SCS robots running out of room
• Windows support 2.3TB pending decisionWindows support 2.3TB pending decision
– D0 Legato renewalD0 Legato renewal
• 6/1/04: Promised date to CMS for backup 6/1/04: Promised date to CMS for backup service to be onlineservice to be online
• 6/30/04: Latest time for TiBS to begin 6/30/04: Latest time for TiBS to begin porting effort to make FY ’04 timeline.porting effort to make FY ’04 timeline.
FiniFini
Open DiscussionOpen Discussion
CSI Backup System - CSI Backup System - ExampleExample
• Initially deployed in 2001 for AFS Initially deployed in 2001 for AFS backupsbackups
• Expanded to include UNIX and some Expanded to include UNIX and some windowswindows
• 2003 – SCS added to backup system2003 – SCS added to backup system
• Data has grown >600% since 2001.Data has grown >600% since 2001.
• Backup system has not grownBackup system has not grown
CSI Backup System - CSI Backup System - ExampleExample