Future Requirements for NSF ATM Computing Jim Kinter, co-chair Presentation to CCSM Advisory Board 9...

15
Future Requirements for NSF ATM Computing Jim Kinter, co-chair Presentation to CCSM Advisory Board 9 January 2008

Transcript of Future Requirements for NSF ATM Computing Jim Kinter, co-chair Presentation to CCSM Advisory Board 9...

  • Slide 1

Future Requirements for NSF ATM Computing Jim Kinter, co-chair Presentation to CCSM Advisory Board 9 January 2008 Slide 2 Charge to Committee Committee: Formed as sub-committee of AC-GEO to advise ATM on the future of computing; members selected by ATM from atmospheric sciences research community and supercomputing community Background: Recognizing that both advances in technology and the recent creation of the NSF Office of Cyberinfrastructure have opened up a wide range of opportunities for providing the computational services needed by atmospheric research, ATM wishes to survey the possible approaches. ATM seeks input from the atmospheric research community on how best to meet future needs, including how to minimize any potential disruption to individual research programs. Charge: The panel is charged with advising ATM on the merits of different possible strategies for ensuring access to adequate high-end computing services for the atmospheric sciences community over the next decade. In particular, the panel is asked to: 1. Review relevant materials describing the anticipated computational requirements of the atmospheric science research community; 2. Develop a list of different possible strategies for meeting the atmospheric sciences computing needs over the period 2011-2016; 3. Provide an analysis of the merits of the various strategies developed under (2), including a discussion of the costs and benefits; 4. Provide a recommendation to ATM about the best strategy to pursue. Report: Preliminary advice early Feb; report to AC-GEO in April 2008 Slide 3 40 Years of Supercomputing Slide 4 THEN Top speed: 10 4 FLOPS In 18 months, my computer will have 2X transistors and be 2X faster Cold War drives computing industry Needed big building with lots of cheap power, cooling Apps: NWP, missile trajectories ATM computing done at NCAR (FFRDC) NOW Top speed: 10 13 FLOPS (+10 9 X) In 18 months my chips will have 2X transistors Video gaming and finance sector drive computing industry Need big building with lots of expensive power, cooling Apps: NWP, bomb design, circuit design, aerospace design, molecular dynamics, solar interior, human blood flow, earthquake modeling, N-body problem, QCD, AR4: CCSM development at NCAR, production elsewhere (DOE, Earth Simulator) Slide 5 40 Years of Supercomputing THEN Top speed: 10 4 FLOPS In 18 months, my computer will have 2X transistors (Moores Law) and be 2X faster Cold War drives computing industry Needed big building with lots of cheap power, cooling Apps: NWP, missile trajectories ATM computing done at NCAR (FFRDC) NOW Top speed: 10 13 FLOPS (+10 9 X) In 18 months my chips will have 2X transistors (Moores Law) Video gaming and finance sector drive computing industry Need big building with lots of expensive power, cooling Apps: NWP, bomb design, circuit design, aerospace design, molecular dynamics, solar interior, human blood flow, earthquake modeling, N-body problem, QCD, AR4: CCSM development at NCAR, production elsewhere (DOE, Earth Simulator) Slide 6 40 Years of Supercomputing THEN Top speed: 10 4 FLOPS In 18 months, my computer will have 2X transistors (Moores Law) and be 2X faster Cold War drives computing industry Needed big building with lots of cheap power, cooling Apps: NWP, missile trajectories ATM computing done at NCAR (FFRDC) NOW Top speed: 10 13 FLOPS (+10 9 X) In 18 months my chips will have 2X transistors (Moores Law) Entertainment and finance sectors drive computing industry Need big building with lots of expensive power, cooling Apps: NWP, bomb design, circuit design, aerospace design, molecular dynamics, solar interior, human blood flow, earthquake modeling, N-body problem, QCD, AR4: CCSM development at NCAR, production elsewhere (DOE, Earth Simulator) Slide 7 40 Years of Supercomputing THEN Top speed: 10 4 FLOPS In 18 months, my computer will have 2X transistors (Moores Law) and be 2X faster Cold War drives computing industry Needed big building with lots of cheap power, cooling Apps: NWP, missile trajectories ATM computing done at NCAR (FFRDC) NOW Top speed: 10 13 FLOPS (+10 9 X) In 18 months my chips will have 2X transistors (Moores Law) Entertainment and finance sectors drive computing industry Need big building with lots of expensive power, cooling Apps: NWP, bomb design, circuit design, aerospace design, molecular dynamics, solar interior, human blood flow, earthquake modeling, N-body problem, QCD, AR4: CCSM development at NCAR, production elsewhere (DOE, Earth Simulator) In between client-server model and commodity clusters significantly reduced power/cooling requirement BUT Near future: power cost = system cost Slide 8 Slide 9 40 Years of Supercomputing THEN Top speed: 10 4 FLOPS In 18 months, my computer will have 2X transistors (Moores Law) and be 2X faster Cold War drives computing industry Needed big building with lots of cheap power, cooling Apps: NWP, missile trajectories ATM computing done at NCAR (FFRDC) NOW Top speed: 10 13 FLOPS (+10 9 X) In 18 months my chips will have 2X transistors (Moores Law) Entertainment and finance sectors drive computing industry Need big building with lots of expensive power, cooling Apps: NWP, climate modeling, bomb design, circuit design, aerospace design, molecular dynamics, solar interior, human blood flow, earthquake modeling, N-body problem, QCD, AR4: CCSM development at NCAR, production elsewhere (DOE, Earth Simulator) Slide 10 40 Years of Supercomputing THEN Top speed: 10 4 FLOPS In 18 months, my computer will have 2X transistors (Moores Law) and be 2X faster Cold War drives computing industry Needed big building with lots of cheap power, cooling Apps: NWP, missile trajectories ATM computing done at NCAR (FFRDC) NOW Top speed: 10 13 FLOPS (+10 9 X) In 18 months my chips will have 2X transistors (Moores Law) Entertainment and finance sectors drive computing industry Need big building with lots of expensive power, cooling Apps: NWP, climate modeling, bomb design, circuit design, aerospace design, molecular dynamics, solar interior, human blood flow, earthquake modeling, N-body problem, QCD, AR4: CCSM development at NCAR-CSL, production elsewhere (DOE, Earth Simulator) Slide 11 CSL Resources for CCSM Process CCSM Production gets special status All other requests reviewed for computational appropriateness, readiness, criticality and relevance to climate simulation and own scientific goals Overall correction if merit-based allocations are above/below available resources CCSM Production 6.5M CPU-hrs over Dec07 - May09 (~750 CPU-yrs) Issues: Flat sub-allocation of resources to CCSM working groups - why no priorities? Insufficient interaction among WGs to coordinate numerical experiments CCSM Development 3.1M CPU-hrs over Dec07 - May09 (~350 CPU-yrs) Issues: Too little effort to move toward petascale computing Worries about sub-critical human resources for algorithms, HEC etc. Same concerns as expressed for Production request Slide 12 AR5 Production Elsewhere DOE - NERSC and ORNL (~40K CPU-yrs/yr) NASA - Columbia (~10K CPU-yrs/yr) NOAA - GFDL??? International - Earth Simulator? (ES-2???) Industry - IBM? Cray?? SGI??? Slide 13 AR5 Production Elsewhere DOE - NERSC and ORNL (~40K CPU-yrs/yr) NASA - Columbia (~10K CPU-yrs/yr) International - Earth Simulator? (ES-2???) Industry - IBM? Cray?? SGI??? NSF - TeraGrid 12K CPU-yrs/yr in 2007 80K CPU-yrs/yr in 2008 150K CPU-yrs/yr in 2009 250K CPU-yrs/yr in 2010 500K CPU-yrs/yr in 2011 Slide 14 TeraGrid FY2007 Usage PHY 13% AST 14% DMR 7% DMS 0% CHE 15% BIO 24% GEO 8% Industry 8% ENG 7% Other 1% CIS 4% PHY AST DMR DMS CHE BIO GEO Industry ENG CIS Other MPS 49% (incl. ATM) Slide 15 TeraGrid HPC Usage by Site FY2006.5 (4/01/06 through 3/31/07) FY2006.5 Total: ~110M CPU-hrs or ~12.5K CPU-yrs Dell Intel64 linux (9600) Dell PowerEdge linux (5840) IBM e1350 (3072) Cray XT3 (4136) IBM Blue Gene (6144 + 2048) IBM Power 4+ (2176) NCSA 37% SDSC 23% PSC 20% TACC 15% Indiana 2% Purdue 2% ANL 1% ORNL 0% NCSA: 24 ATM-related projects 13 universities 42 CPU-years NCAR: 400 ATM-related projects 100 universities 1.4K CPU-years +69K CPU-yrs in 2008 (Sun Opteron 4-core) +80K CPU-yrs in 2009 (Cray Opteron 8-core) +400K CPU-yrs in 2011 (IBM Power7 8-core)