The NERSC Global File System NERSC June 12th, 2006

29
1 The NERSC Global File System NERSC June 12th, 2006

description

The NERSC Global File System NERSC June 12th, 2006. Overview. NGF: What/Why/How NGF Today Architecture Who’s Using it Problems/Solutions NGF Tomorrow Performance Improvements Reliability Enhancements New Filesystems(/home). What is NGF?. NERSC Global File System - what. - PowerPoint PPT Presentation

Transcript of The NERSC Global File System NERSC June 12th, 2006

Page 1: The NERSC Global File System NERSC  June 12th, 2006

1

The NERSC Global File System

NERSC

June 12th, 2006

Page 2: The NERSC Global File System NERSC  June 12th, 2006

2

Overview

• NGF: What/Why/How• NGF Today

– Architecture– Who’s Using it– Problems/Solutions

• NGF Tomorrow– Performance Improvements– Reliability Enhancements– New Filesystems(/home)

Page 3: The NERSC Global File System NERSC  June 12th, 2006

3

What is NGF?

Page 4: The NERSC Global File System NERSC  June 12th, 2006

4

NERSC Global File System - what

• What do we mean by a global file systems?

– Available via standard APIs for file system access on all NERSC systems.

• POSIX

• MPI-IO

– We plan on being able to extend that access to remote sites via future enhancements.

– High Performance

• NGF is seen as a replacement for our current file systems, and is expected to meet the same high performance standards

Page 5: The NERSC Global File System NERSC  June 12th, 2006

5

NERSC Global File System - why

• Increase User productivity

– To reduce users’ data management burden.

– Enable/Simplify workflows involving multiple NERSC computational systems

– Accelerate the adoption of new NERSC systems

• Users have access to all of their data, source code, scripts, etc. the first time they log into the new machine

• Enable more flexible/responsive management of storage

– Increase Capacity/Bandwidth on demand

Page 6: The NERSC Global File System NERSC  June 12th, 2006

6

NERSC Global File System - how

• Parallel• Network/SAN heterogeneous access model• Multi-Platform (AIX/linux for now)

Page 7: The NERSC Global File System NERSC  June 12th, 2006

7

NGF Today

Page 8: The NERSC Global File System NERSC  June 12th, 2006

8

NGF current architecture

• NGF is a GPFS file system using GPFS multi-cluster capabilities

• Mounted on all NERSC systems as /project

• External to all NERSC computational clusters

• Small linux server cluster managed separately from computational systems.

• 70 TB user visible storage. 50+ Million inodes.

• 3GB/s aggregate bandwith

Page 9: The NERSC Global File System NERSC  June 12th, 2006

9

NGF Current Configuration

Page 10: The NERSC Global File System NERSC  June 12th, 2006

10

/project

• Limited initial deployment - no homes, no /scratch

• Projects can include many users potentially using multiple systems(mpp, vis, …) and seemed to be prime candidates to benefit from the NGF shared data access model

• Backed up to HPSS bi-weekly

– Will eventually receive nightly incremental backups.

• Default project quota:

– 1 TB

– 250,000 inodes

Page 11: The NERSC Global File System NERSC  June 12th, 2006

11

/project – 2

• Current usage– 19.5 TB used (28% of capacity)– 2.2 M inodes used (5% of capacity)

• NGF /project is currently mounted on all major NERSC systems (1240+ clients):

– Jacquard, LNXI Opteron System running SLES 9

– Da Vinci, SGI Altix running SLES 9 Service Pack 3 with direct storage access

– PDSF IA32 Linux cluster running Scientific Linux

– Bassi, IBM Power5 running AIX 5.3

– Seaborg, IBM SP running AIX 5.2

Page 12: The NERSC Global File System NERSC  June 12th, 2006

12

/project – problems & Solutions

• /project has not been without it’s problems– Software bugs

• 2/14/06 outage due to Seaborg gateway crash – problem reported to IBM, new ptf with fix installed.

• GPFS on AIX5.3 ftruncate() error on compiles – problem reported to IBM. efix now installed on Bassi.

– Firmware bugs• FibreChannel Switch bug – firmware upgraded.• DDN firmware bug(triggered on rebuild) – firmware upgraded

– Hardware Failures• Dual disk failure in raid array – more exhaustive monitoring of

disk health including soft errors now in place

Page 13: The NERSC Global File System NERSC  June 12th, 2006

13

NGF – Solutions

• General actions taken to improve reliability.

– Pro-active monitoring – see the problems before they’re problems

– Procedural development – decrease time to problem resolution/perform maintenance without outages

– Operations staff activities – decrease time to problem resolution

– PMRs filed and fixes applied – prevent problem recurrence

– Replacing old servers – remove hardware with demonstrated low MTBF

• NGF Availability since 12/1/05: ~99% (total down time: 2439 minutes)

Page 14: The NERSC Global File System NERSC  June 12th, 2006

14

Current Project Information

• Projects using /project file system: (46 projects to date)– narccap: North American Regional Climate Change

Assessment Program – Phil Duffy, LLNL• Currently using 4.1 TB

• Global model with fine resolution in 3D and time; will be used to drive regional models

• Currently using only Seaborg

– mp107: CMB Data Analysis – Julian Borrill, LBNL• Currently using 2.9 TB

• Concerns about quota management and performance

– 16 different file groups

Page 15: The NERSC Global File System NERSC  June 12th, 2006

15

Current Project Information

• Projects using /project file system (cont.):– incite6: Molecular Dynameomics – Valerie Daggett, UW

• Currently using 2.1 TB

– snaz: Supernova Science Center – Stan Woosley, UCSC• Currently using 1.6 TB

Page 16: The NERSC Global File System NERSC  June 12th, 2006

16

Other Large Projects

Project PI Usage

snap Saul Perlmutter

922 GB

aerosol Catherine Chuang

912 GB

acceldac Robert Ryne 895 GB

vorpal David Bruhwiler

876 GB

m526 Peter Cummings

759 GB

gc8 Martin Karplus

629 GB

incite7 Cameron Geddes

469 GB

Page 17: The NERSC Global File System NERSC  June 12th, 2006

17

NGF Performance

• Many users have reported good performance for their applications(little difference from /scratch)

• Some applications show variability of read performance(MADCAP/MADbench) – we are investigating this actively.

Page 18: The NERSC Global File System NERSC  June 12th, 2006

18

MADbench Results

Operation Min Max Mean StdDev

Bassi Home Read

12.3 35.3 22.0 3.5

Bassi Home Write

28.2 46.5 32.9 1.8

Bassi Scratch Read

2.6 27.1 3.3 1.6

Bassi Scratch Write

1.2 8.5 2.0 0.5

Bassi Project Read

10.9 245.2 56.7 58.0

Bassi Project Write

8.5 21.7 9.8 0.9

Seaborg Home Read

33.8 103.9 41.3 6.9

Seaborg Home Write

17.8 22.9 19.3 0.9

Seaborg Scratch Read

24.8 56.5 37.8 2.4

Seaborg Scratch Write

4.9 14.0 10.4 1.8

Seaborg Project Read

34.9 261.2 56.2 34.7

Seaborg Project Write

13.9 135.5 17.1 7.9

Page 19: The NERSC Global File System NERSC  June 12th, 2006

19

Bassi Read Performance

Page 20: The NERSC Global File System NERSC  June 12th, 2006

20

Bassi Write Performance

Page 21: The NERSC Global File System NERSC  June 12th, 2006

21

Current Architecture Limitations

• NGF performance is limited by the architecture of current NERSC systems

– Most NGF I/O uses GPFS TCP/IP storage access protocol• Only Da Vinci can access NGF storage directly via FC.

– Most NERSC systems have limited IP bandwidth outside of the cluster interconnect.

• 1 gig-e per I/O node on Jacquard. each compute node uses only 1 I/O node for NGF traffic. 20 I/O noodes feed into 1 10Gb ethernet

• Seaborg has 2 gateways with 4xgig-e bonds. Again each compute node uses only 1 gateway.

• Bassi nodes each have 1-gig interfaces all feeding into a single 10Gb ethernet link

Page 22: The NERSC Global File System NERSC  June 12th, 2006

22

NGF tomorrow(and beyond …)

Page 23: The NERSC Global File System NERSC  June 12th, 2006

23

Performance Improvements

• NGF Client System Performance upgrades– Increase client bandwidth to NGF via hardware and routing improvements.

• NGF storage fabric upgrades– Increase Bandwidth and ports of NGF storage fabric to support future

systems.

• Replace old NGF Servers– New servers will be more reliable.

– 10-gig ethernet capable.

• New Systems will be designed to support High performance to NGF.

Page 24: The NERSC Global File System NERSC  June 12th, 2006

24

NGF /home

• We will deploy a shared /home file system in 2007

– Initially only home for 1 system, may be mounted on others.

– New systems thereafter all have home directories on NGF /home

– Will be a new file system with tuning parameters configured for small file accesses.

Page 25: The NERSC Global File System NERSC  June 12th, 2006

25

/home layout – decision slide

Two options

1. A user’s login directory is the same for all systems

– /home/matt/

2. A user’s login directory is a different subdirectory of the user’s directory for each system

– /home/matt/seaborg

– /home/matt/jacquard

– /home/matt/common

– /home/matt/seaborg/common -> ../common

Page 26: The NERSC Global File System NERSC  June 12th, 2006

26

One directory for all

• Users see exactly the same thing in their home dir every time they log in, no matter what machine they’re on.

• Problems

– Programs sometimes change the format of their configuration files(dotfiles) from one release to another without changing the file’s name.

– Setting $HOME affects all applications not just the one that needs different config files

– Programs have been known to use getpwnam() to determine the users home directory, and look there for config files rather than in $HOME

– Setting $HOME essentially emulates the effect of having separate home dirs for each system

Page 27: The NERSC Global File System NERSC  June 12th, 2006

27

One directory per system

• By default users start off in a different directory on each system

• Dotfiles are different on each system unless the user uses symbolic links to make them the same

• All of a users files are accessible from all systems, but a user may need to “cd ../seaborg” to get at files he created on seaborg if he’s logged into a different system

Page 28: The NERSC Global File System NERSC  June 12th, 2006

28

NGF /home conclusion

• We currently believe that the multiple directories option will result in less problems for the users, but are actively evaluating both options.

• We would welcome user input on the matter.

Page 29: The NERSC Global File System NERSC  June 12th, 2006

29

NGF /scratch

• We plan on deploying a shared /scratch to NERSC-5 sometime in 2008