Taking Supercomputer Power to the End of Moore’s Law and...
Transcript of Taking Supercomputer Power to the End of Moore’s Law and...
Review & Approval System - Search Detail https://cfwebprod.sandia.gov/cfdocs/RAA/templates/index.cfm
1 of 2 1/2/2008 8:50 PM
New SearchRefine SearchSearch Results
Clone RequestEdit RequestCancel Request
Search Detail
Submittal DetailsDocument Info Title : Copy of Taking Supercomputer Power to the End of Moore’s Law and Beyond Document Number : 5228960 SAND Number : 2005-0454 P Review Type : Electronic Status : Approved Sandia Contact : DEBENEDICTIS,ERIK P. Submittal Type : Viewgraph/Presentation Requestor : DEBENEDICTIS,ERIK P. Submit Date : 01/18/2005 Peer Reviewed? : NAuthor(s) DEBENEDICTIS,ERIK P. Event (Conference/Journal/Book) Info Name : Meetings with various visitors (NSA, Lockheed-Martin) at Sandia City : Albuquerque State : NM Country : USA Start Date : 12/16/2004 End Date : 12/31/2005 Partnership Info Partnership Involved : No Partner Approval : Agreement Number : Patent Info Scientific or Technical in Content : Yes Technical Advance : No TA Form Filed : No SD Number : Classification and Sensitivity Info
Title : Unclassified-Unlimited Abstract : Document : Unclassified-Unlimited
Additional Limited Release Info : None.
DUSA : None.
Routing DetailsRole Routed To Approved By Approval Date
Derivative Classifier Approver SUMMERS,RANDALL M. SUMMERS,RANDALL M. 01/18/2005 Conditions:
Review & Approval System - Search Detail https://cfwebprod.sandia.gov/cfdocs/RAA/templates/index.cfm
2 of 2 1/2/2008 8:50 PM
Preliminary Manager Approver PUNDIT,NEIL D. PUNDIT,NEIL D. 01/18/2005 Conditions:
Classification Approver WILLIAMS,RONALD L. WILLIAMS,RONALD L. 01/24/2005 Conditions:
Manager Approver PUNDIT,NEIL D. Auto-Approved 01/24/2005 Conditions:
Administrator Approver LUCERO,ARLENE M. KRAMER,SAMUEL 04/18/2007
Created by WebCo Problems? Contact CCHD: by email or at 845-CCHD (2243).
For Review and Approval process questions please contact the Application Process Owner
Erik P. DeBenedictisErik P. DeBenedictisSandia National Laboratories
December 16, 2004
Taking Supercomputer Power to the End of Moore’s Law and Beyond
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
SAND2005-0454P
Applications & Hardware
1 Zettaflops
100 Exaflops
10 Exaflops
1 Exaflops
100 Petaflops
10 Petaflops
1 Petaflops
100 Teraflops
System Performance
2000 2010 2020 2030 Year
↑
Red Storm Cluster/MPP
Technology
Plasma Fusion
Simulation [Jardin 03]
2000 20202010
No schedule provided by source
Applications
[Jardin 03] S.C. Jardin, “Plasma Science Contribution to the SCaLeS Report,” Princeton Plasma Physics Laboratory, PPPL-3879 UC-70, available on Internet. [Malone 03] Robert C. Malone, John B. Drake, Philip W. Jones, Douglas A. Rotman, “High-End Computing in Climate Modeling,” contribution to SCaLeS report. [NASA 99] R. T. Biedron, P. Mehrotra, M. L. Nelson, F. S. Preston, J. J. Rehder, J. L. Rogers, D. H. Rudy, J. Sobieski, and O. O. Storaasli, “Compute as Fast as the Engineers Can Think!” NASA/TM-1999-209715, available on Internet. [NASA 02] NASA Goddard Space Flight Center, “Advanced Weather Prediction Technologies: NASA’s Contribution to the Operational Agencies,” available on Internet. [SCaLeS 03] Workshop on the Science Case for Large-scale Simulation, June 24-25, proceedings on Internet a http://www.pnl.gov/scales/. [DeBenedictis 04], Erik P. DeBenedictis, “Matching Supercomputing to Progress in Science,” July 2004. Presentation at Lawrence Berkeley National Laboratory, also published as Sandia National Laboratories SAND report SAND2004-3333P. Sandia technical reports are available by going to http://www.sandia.gov and accessing the technical library.
Compute as fast as the engineer
can think [NASA 99]
↓
100× ↑1000×
[SCaLeS 03]
Geo
data
Ear
th
Sta
tion
Ran
ge
[NA
SA
02]
Full Global Climate [Malone 03]
Nanotech + Reversible Logic μP
(green) best-case logic (red)
Quantum Computing Not Obviously Measured
in FLOPS
↑
Architecture: IBM Cyclops, FPGA, PIM
Outline
• Architecture Advances to Complete the Run of Moore’s Law
• Nanotech and Reversible Logic to Solve the Most Ambitious Problems
• Quantum Computing Alternative
FPGA, PIM, and ASIC
• Sandia-based FPGA accelerator work– Arithmetic IP with interface to supercomputers and
algorithms – Keith Underwood, Scott Hemmert
• In-house work and collaborations on PIM– Collaboration with Notre Dame and Caltech/JPL;
instruction sets for PIMs– BG/Cyclops with NASA/NSA (see subsequent slides)– AMD collaboration re. PIM/DIMM
• 9200 manages ASIC development– Cray SeaStar for supercomputer communications
• Sandia has microelectronic fab capability (1700/CINT)
FPGA = Field Programmable Gate Array; IP = Intellectual Property (logic designs); PIM = Processor in Memory; DIMM = memory chip; 9200 = Bill Camp’s Organization; ASIC = Application Specific Integrated Circuit; 1700 = Sandia’s rad-hard fab line; CINT = Center for Integrated NanoTechnologies
1 Petaflops Cyclops Sept. 2005 $15-18M
Cyclops Collaboration
• Cyclops POC is Baron Mills
• Project Objectives– Puts systems software
onto Cyclops to make it a “production” supercomputer
– Demonstrate 4 Science Applications @ 100×
FLOPS/$ over conventional supercomputer
• Funding status– Proposal to NASA,
currently under consideration
– Seeking other funding
• Possible Value of a Collaboration– Special-purpose
machine becomes general purpose, increasing utility
Outline
• Architecture Advances to Complete the Run of Moore’s Law
• Nanotech and Reversible Logic to Solve the Most Ambitious Problems
• Quantum Computing Alternative
Global Warming Requires Zettaflops
1 Zettaflops
1 Exaflops
10 Petaflops
100 Teraflops
10 Gigaflops
Ensembles, scenarios 10×
Embarrassingly Parallel
New parameterizations 100×
More Complex Physics
Model Completeness 100×
More Complex Physics
Spatial Resolution 104×
(103×-105×)Resolution
Issue Scaling
Clusters Now In Use(100 nodes, 5% efficient)
100 Exaflops Run length 100×
Longer Running Time
Ref. “High-End Computing in Climate Modeling,” Robert C. Malone, LANL, John B. Drake, ORNL, Philip W. Jones, LANL, and Douglas A. Rotman, LLNL (2004)
Nanotech + Reversible Logic
• Leadership– Extreme Computing/
Zettaflops workshop• www.zettaflops.org
– Conference on Extreme Computing in planning stage
• Similar in theme to Petaflops workshops of the 1990s
• Technology– Sandia work on
• architecture• performance modeling
of science apps.– Nanotech
• Notre Dame Quantum Dots
• Sandia/LANL CINT– Reversible Logic
• Mike Frank, Florida State University
CINT = Center for Integrated NanoTechnologies
An Exemplary Device: Quantum Dots
• Pairs of molecules create a memory cell or a logic gate
Ref. “Clocked Molecular Quantum-Dot Cellular Automata,” Craig S. Lent and Beth Isaksen IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 50, NO. 9, SEPTEMBER 2003
Atmosphere Simulation at a ZettaflopsSupercomputer is 211K chips, each with 70.7K nodes of 5.77K cells of 240 bytes; solves 86T=44.1Kx44.1Kx 44.1K cell problem. System dissipates 332KW from the faces of a cube 1.53m on a side,for a power density of 47.3KW/m2. Power: 332KW active components; 1.33MW refrigeration; 3.32MW wall power; 6.65MW from power company.System has been inflated by 2.57 over minimum size to provide enough surface area to avoid overheating.Chips are at 99.22% full, comprised of 7.07G logic, 101M memory decoder, and 6.44T memory transistors.Gate cell edge is 34.4nm (logic) 34.4nm (decoder); memory cell edge is 4.5nm (memory).Compute power is 768 EFLOPS, completing an iteration in 224µs and a run in 9.88s.
Outline
• Architecture Advances to Complete the Run of Moore’s Law
• Nanotech and Reversible Logic to Solve the Most Ambitious Problems
• Quantum Computing Alternative
Quantum Computing Alternative
• Quantum Computing algorithms for “physical simulation” headed toward addressing our mission space
• Sandia has top notch physicists and mathematicians
• Action (across Sandia)– LDRD on architcture
below– Ion trap research (1700)– DES algorithm (5600)
Quantum Core
Future Red Storm
Visualization I/O
9200 = Bill Camp’s Organization; 1700 = Sandia’s rad-hard fab line; 5600 = Sandia Information Operations Organization CINT = Center for Integrated Nanotechnology; LDRD = Lab Directed Research and Development
Diff
eren
tiato
r • Sandia has mfg. capability (1700/CINT)
• Track record assembling software and tools for production super- computers (9200)
Summary
• Architectures– 100×
over Moore’s Law– Physics limit = 10 Exaflops– Action: Blue Gene/Cyclops
proposal to NASA, etc.• For science applications
with legacy code, etc.• BG/C=Baron Mills
– Action: FPGA + PIM• Nanotech + Reversible Logic
– Most ambitious problems in science peak at 1 Zettaflops (today)
• Global Warming, Whole Cell Simulation, …
• Collaboration opportunity– Action: R&D, algorithms– Action: Workshops
• Can you participate?
• Quantum Computing– Plan for alternative
“Quantum Red Storm” & use for ambitious science and engineering problem
– Action: R&D, planning
FPGA = Field Programmable Gate Array; PIM = Processor in Memory
Backup
8 Petaflops
80 Teraflops
Projected ITRS improvement to 22 nm
(100×)
Lower supply voltage (2×)
ITRS committee of experts
ITRS committee of experts
Expert Opinion
Scientific Supercomputer Limits
Reliability limit 750KW/(80kB T)2×1024 logic ops/s
Esteemed physicists (T=60°C junction temperature)
Best-Case Logic
Microprocessor Architecture
Physical Factor
Source of Authority
Assumption: Supercomputer is size & cost of Red Storm: US$100M budget; consumes 2 MW wall power; 750 KW to active components
100 Exaflops
Derate 20,000 convert logic ops to floating point
Floating point engineering(64 bit precision)
40 Teraflops Red Storm contract
1 Exaflops
800 Petaflops
125:1
Uncertainty (6×) Gap in chartEstimate
Improved devices (4×) Estimate4 Exaflops 32 Petaflops
Derate for manufacturing margin (4×)
Estimate
25 Exaflops 200 Petaflops
Cyclops Project
• Concept– Scientific applications
can run with Cyclops’ mix of features and will benefit from its performance
• We proposed 4 (next slides)
– New ideas on threaded programming are OK, but can be compatible with current methods
• So we do both
• Technical Effort Required– Start by creating a
software environment that is nearly Red Storm compliant
• Run existing apps, tools, file system
• MPI+OpenMP– Add multi-threaded
programming model from within full featured RS-like environment
Application 1: DSMC
• Direct Simulation Monte Carlo (DSMC)– Ran fine on nCUBE with
500K memory/node; ought to run on Cyclops with internal memory only
– Needs FLOPS for simulating spacecraft at lower altitude
• Earth• Mars
Application 2: Solar System Orbital Planning
• Mathematically, the solar system is full of tunnels that spacecraft can follow between planets at very low fuel consumption
• Basic calculation is just a orbital integration
Halo Orbit Around Earth L2 , Portal to the IPS
Earth
MoonLunar L1Halo Orbit
Lunar L2Halo Orbit
A Piece of Earth’s IPS
Earth’s IPS Approaching the Halo Orbit Portal
Tunnels of the Lunar IPSHalo Orbit Around Earth L2 , Portal to the IPS
Earth
MoonLunar L1Halo Orbit
Lunar L2Halo Orbit
A Piece of Earth’s IPS
Earth’s IPS Approaching the Halo Orbit Portal
Tunnels of the Lunar IPS
Application 3: Structural Simulation
• Salinas structural simulation– Finite element code– Multi physics
Figure 8: Payload simulated for vibrational modes by Salinas at 500K degrees of freedom freedom
Application 4: Climate Modeling
• Model Earth’s atmosphere– Runs a multi-physics
cloud-resolving model– One processor in each
chip runs global atmospheric dynamics
– Other 64 processors run a cloud-resolving sub-model
Cyclops Chip
1. Cell of dynamical core 2. Local threads of (10x8) cloud resolving model
Nei
ghbo
r Chi
p
Neighbor C
hip
Neighbor Chip
Figure 6: Atmospheric Dynamics and Cloud Resolving Simulation on Cyclops
Quantum Computing for Physical Science
Classical 8D integration264 function evaluations
double sum = 0.0;for (int x1 = 0; x1 < 256; x1++)for (int x2 = 0; x2 < 256; x2++)for (int x3 = 0; x3 < 256; x3++)for (int x4 = 0; x4 < 256; x4++)for (int x5 = 0; x5 < 256; x5++)for (int x6 = 0; x6 < 256; x6++)for (int x7 = 0; x7 < 256; x7++)for (int x8 = 0; x8 < 256; x8++)sum += E(x1, x2, x3, x4, x5, x6, x7 x8);
Quantum 8D integration64 function evaluations
double sum = 0.0;quantum int x1 = x2 = x3 = x4 =
x5 = x6 = x7 = x8 = 1/√8(|00000000> + |11111111>);
sum = SUM E(x1, x2, x3, x4, x5, x6, x7 x8);
Problem: Perform a numerical integration over an 8 dimensional space, with 256 mesh points in each dimension (total 264 points).
“wildcard” quantum
value
Evaluate E for all 264 values of
argument
Sort of a “global sum”