Taking Supercomputer Power to the End of Moore’s Law and...

Post on 26-Feb-2020

0 views 0 download

Transcript of Taking Supercomputer Power to the End of Moore’s Law and...

Review & Approval System - Search Detail https://cfwebprod.sandia.gov/cfdocs/RAA/templates/index.cfm

1 of 2 1/2/2008 8:50 PM

New SearchRefine SearchSearch Results

Clone RequestEdit RequestCancel Request

Search Detail

Submittal DetailsDocument Info Title : Copy of Taking Supercomputer Power to the End of Moore’s Law and Beyond Document Number : 5228960 SAND Number : 2005-0454 P Review Type : Electronic Status : Approved Sandia Contact : DEBENEDICTIS,ERIK P. Submittal Type : Viewgraph/Presentation Requestor : DEBENEDICTIS,ERIK P. Submit Date : 01/18/2005 Peer Reviewed? : NAuthor(s) DEBENEDICTIS,ERIK P. Event (Conference/Journal/Book) Info Name : Meetings with various visitors (NSA, Lockheed-Martin) at Sandia City : Albuquerque State : NM Country : USA Start Date : 12/16/2004 End Date : 12/31/2005 Partnership Info Partnership Involved : No Partner Approval : Agreement Number : Patent Info Scientific or Technical in Content : Yes Technical Advance : No TA Form Filed : No SD Number : Classification and Sensitivity Info

Title : Unclassified-Unlimited Abstract : Document : Unclassified-Unlimited

Additional Limited Release Info : None.

DUSA : None.

Routing DetailsRole Routed To Approved By Approval Date

Derivative Classifier Approver SUMMERS,RANDALL M. SUMMERS,RANDALL M. 01/18/2005 Conditions:

Review & Approval System - Search Detail https://cfwebprod.sandia.gov/cfdocs/RAA/templates/index.cfm

2 of 2 1/2/2008 8:50 PM

Preliminary Manager Approver PUNDIT,NEIL D. PUNDIT,NEIL D. 01/18/2005 Conditions:

Classification Approver WILLIAMS,RONALD L. WILLIAMS,RONALD L. 01/24/2005 Conditions:

Manager Approver PUNDIT,NEIL D. Auto-Approved 01/24/2005 Conditions:

Administrator Approver LUCERO,ARLENE M. KRAMER,SAMUEL 04/18/2007

Created by WebCo Problems? Contact CCHD: by email or at 845-CCHD (2243).

For Review and Approval process questions please contact the Application Process Owner

Erik P. DeBenedictisErik P. DeBenedictisSandia National Laboratories

December 16, 2004

Taking Supercomputer Power to the End of Moore’s Law and Beyond

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

SAND2005-0454P

Applications & Hardware

1 Zettaflops

100 Exaflops

10 Exaflops

1 Exaflops

100 Petaflops

10 Petaflops

1 Petaflops

100 Teraflops

System Performance

2000 2010 2020 2030 Year

Red Storm Cluster/MPP

Technology

Plasma Fusion

Simulation [Jardin 03]

2000 20202010

No schedule provided by source

Applications

[Jardin 03] S.C. Jardin, “Plasma Science Contribution to the SCaLeS Report,” Princeton Plasma Physics Laboratory, PPPL-3879 UC-70, available on Internet. [Malone 03] Robert C. Malone, John B. Drake, Philip W. Jones, Douglas A. Rotman, “High-End Computing in Climate Modeling,” contribution to SCaLeS report. [NASA 99] R. T. Biedron, P. Mehrotra, M. L. Nelson, F. S. Preston, J. J. Rehder, J. L. Rogers, D. H. Rudy, J. Sobieski, and O. O. Storaasli, “Compute as Fast as the Engineers Can Think!” NASA/TM-1999-209715, available on Internet. [NASA 02] NASA Goddard Space Flight Center, “Advanced Weather Prediction Technologies: NASA’s Contribution to the Operational Agencies,” available on Internet. [SCaLeS 03] Workshop on the Science Case for Large-scale Simulation, June 24-25, proceedings on Internet a http://www.pnl.gov/scales/. [DeBenedictis 04], Erik P. DeBenedictis, “Matching Supercomputing to Progress in Science,” July 2004. Presentation at Lawrence Berkeley National Laboratory, also published as Sandia National Laboratories SAND report SAND2004-3333P. Sandia technical reports are available by going to http://www.sandia.gov and accessing the technical library.

Compute as fast as the engineer

can think [NASA 99]

100× ↑1000×

[SCaLeS 03]

Geo

data

Ear

th

Sta

tion

Ran

ge

[NA

SA

02]

Full Global Climate [Malone 03]

Nanotech + Reversible Logic μP

(green) best-case logic (red)

Quantum Computing Not Obviously Measured

in FLOPS

Architecture: IBM Cyclops, FPGA, PIM

Outline

• Architecture Advances to Complete the Run of Moore’s Law

• Nanotech and Reversible Logic to Solve the Most Ambitious Problems

• Quantum Computing Alternative

FPGA, PIM, and ASIC

• Sandia-based FPGA accelerator work– Arithmetic IP with interface to supercomputers and

algorithms – Keith Underwood, Scott Hemmert

• In-house work and collaborations on PIM– Collaboration with Notre Dame and Caltech/JPL;

instruction sets for PIMs– BG/Cyclops with NASA/NSA (see subsequent slides)– AMD collaboration re. PIM/DIMM

• 9200 manages ASIC development– Cray SeaStar for supercomputer communications

• Sandia has microelectronic fab capability (1700/CINT)

FPGA = Field Programmable Gate Array; IP = Intellectual Property (logic designs); PIM = Processor in Memory; DIMM = memory chip; 9200 = Bill Camp’s Organization; ASIC = Application Specific Integrated Circuit; 1700 = Sandia’s rad-hard fab line; CINT = Center for Integrated NanoTechnologies

1 Petaflops Cyclops Sept. 2005 $15-18M

Cyclops Collaboration

• Cyclops POC is Baron Mills

• Project Objectives– Puts systems software

onto Cyclops to make it a “production” supercomputer

– Demonstrate 4 Science Applications @ 100×

FLOPS/$ over conventional supercomputer

• Funding status– Proposal to NASA,

currently under consideration

– Seeking other funding

• Possible Value of a Collaboration– Special-purpose

machine becomes general purpose, increasing utility

Outline

• Architecture Advances to Complete the Run of Moore’s Law

• Nanotech and Reversible Logic to Solve the Most Ambitious Problems

• Quantum Computing Alternative

Global Warming Requires Zettaflops

1 Zettaflops

1 Exaflops

10 Petaflops

100 Teraflops

10 Gigaflops

Ensembles, scenarios 10×

Embarrassingly Parallel

New parameterizations 100×

More Complex Physics

Model Completeness 100×

More Complex Physics

Spatial Resolution 104×

(103×-105×)Resolution

Issue Scaling

Clusters Now In Use(100 nodes, 5% efficient)

100 Exaflops Run length 100×

Longer Running Time

Ref. “High-End Computing in Climate Modeling,” Robert C. Malone, LANL, John B. Drake, ORNL, Philip W. Jones, LANL, and Douglas A. Rotman, LLNL (2004)

Nanotech + Reversible Logic

• Leadership– Extreme Computing/

Zettaflops workshop• www.zettaflops.org

– Conference on Extreme Computing in planning stage

• Similar in theme to Petaflops workshops of the 1990s

• Technology– Sandia work on

• architecture• performance modeling

of science apps.– Nanotech

• Notre Dame Quantum Dots

• Sandia/LANL CINT– Reversible Logic

• Mike Frank, Florida State University

CINT = Center for Integrated NanoTechnologies

An Exemplary Device: Quantum Dots

• Pairs of molecules create a memory cell or a logic gate

Ref. “Clocked Molecular Quantum-Dot Cellular Automata,” Craig S. Lent and Beth Isaksen IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 50, NO. 9, SEPTEMBER 2003

Atmosphere Simulation at a ZettaflopsSupercomputer is 211K chips, each with 70.7K nodes of 5.77K cells of 240 bytes; solves 86T=44.1Kx44.1Kx 44.1K cell problem. System dissipates 332KW from the faces of a cube 1.53m on a side,for a power density of 47.3KW/m2. Power: 332KW active components; 1.33MW refrigeration; 3.32MW wall power; 6.65MW from power company.System has been inflated by 2.57 over minimum size to provide enough surface area to avoid overheating.Chips are at 99.22% full, comprised of 7.07G logic, 101M memory decoder, and 6.44T memory transistors.Gate cell edge is 34.4nm (logic) 34.4nm (decoder); memory cell edge is 4.5nm (memory).Compute power is 768 EFLOPS, completing an iteration in 224µs and a run in 9.88s.

Outline

• Architecture Advances to Complete the Run of Moore’s Law

• Nanotech and Reversible Logic to Solve the Most Ambitious Problems

• Quantum Computing Alternative

Quantum Computing Alternative

• Quantum Computing algorithms for “physical simulation” headed toward addressing our mission space

• Sandia has top notch physicists and mathematicians

• Action (across Sandia)– LDRD on architcture

below– Ion trap research (1700)– DES algorithm (5600)

Quantum Core

Future Red Storm

Visualization I/O

9200 = Bill Camp’s Organization; 1700 = Sandia’s rad-hard fab line; 5600 = Sandia Information Operations Organization CINT = Center for Integrated Nanotechnology; LDRD = Lab Directed Research and Development

Diff

eren

tiato

r • Sandia has mfg. capability (1700/CINT)

• Track record assembling software and tools for production super- computers (9200)

Summary

• Architectures– 100×

over Moore’s Law– Physics limit = 10 Exaflops– Action: Blue Gene/Cyclops

proposal to NASA, etc.• For science applications

with legacy code, etc.• BG/C=Baron Mills

– Action: FPGA + PIM• Nanotech + Reversible Logic

– Most ambitious problems in science peak at 1 Zettaflops (today)

• Global Warming, Whole Cell Simulation, …

• Collaboration opportunity– Action: R&D, algorithms– Action: Workshops

• Can you participate?

• Quantum Computing– Plan for alternative

“Quantum Red Storm” & use for ambitious science and engineering problem

– Action: R&D, planning

FPGA = Field Programmable Gate Array; PIM = Processor in Memory

Backup

8 Petaflops

80 Teraflops

Projected ITRS improvement to 22 nm

(100×)

Lower supply voltage (2×)

ITRS committee of experts

ITRS committee of experts

Expert Opinion

Scientific Supercomputer Limits

Reliability limit 750KW/(80kB T)2×1024 logic ops/s

Esteemed physicists (T=60°C junction temperature)

Best-Case Logic

Microprocessor Architecture

Physical Factor

Source of Authority

Assumption: Supercomputer is size & cost of Red Storm: US$100M budget; consumes 2 MW wall power; 750 KW to active components

100 Exaflops

Derate 20,000 convert logic ops to floating point

Floating point engineering(64 bit precision)

40 Teraflops Red Storm contract

1 Exaflops

800 Petaflops

125:1

Uncertainty (6×) Gap in chartEstimate

Improved devices (4×) Estimate4 Exaflops 32 Petaflops

Derate for manufacturing margin (4×)

Estimate

25 Exaflops 200 Petaflops

Cyclops Project

• Concept– Scientific applications

can run with Cyclops’ mix of features and will benefit from its performance

• We proposed 4 (next slides)

– New ideas on threaded programming are OK, but can be compatible with current methods

• So we do both

• Technical Effort Required– Start by creating a

software environment that is nearly Red Storm compliant

• Run existing apps, tools, file system

• MPI+OpenMP– Add multi-threaded

programming model from within full featured RS-like environment

Application 1: DSMC

• Direct Simulation Monte Carlo (DSMC)– Ran fine on nCUBE with

500K memory/node; ought to run on Cyclops with internal memory only

– Needs FLOPS for simulating spacecraft at lower altitude

• Earth• Mars

Application 2: Solar System Orbital Planning

• Mathematically, the solar system is full of tunnels that spacecraft can follow between planets at very low fuel consumption

• Basic calculation is just a orbital integration

Halo Orbit Around Earth L2 , Portal to the IPS

Earth

MoonLunar L1Halo Orbit

Lunar L2Halo Orbit

A Piece of Earth’s IPS

Earth’s IPS Approaching the Halo Orbit Portal

Tunnels of the Lunar IPSHalo Orbit Around Earth L2 , Portal to the IPS

Earth

MoonLunar L1Halo Orbit

Lunar L2Halo Orbit

A Piece of Earth’s IPS

Earth’s IPS Approaching the Halo Orbit Portal

Tunnels of the Lunar IPS

Application 3: Structural Simulation

• Salinas structural simulation– Finite element code– Multi physics

Figure 8: Payload simulated for vibrational modes by Salinas at 500K degrees of freedom freedom

Application 4: Climate Modeling

• Model Earth’s atmosphere– Runs a multi-physics

cloud-resolving model– One processor in each

chip runs global atmospheric dynamics

– Other 64 processors run a cloud-resolving sub-model

Cyclops Chip

1. Cell of dynamical core 2. Local threads of (10x8) cloud resolving model

Nei

ghbo

r Chi

p

Neighbor C

hip

Neighbor Chip

Figure 6: Atmospheric Dynamics and Cloud Resolving Simulation on Cyclops

Quantum Computing for Physical Science

Classical 8D integration264 function evaluations

double sum = 0.0;for (int x1 = 0; x1 < 256; x1++)for (int x2 = 0; x2 < 256; x2++)for (int x3 = 0; x3 < 256; x3++)for (int x4 = 0; x4 < 256; x4++)for (int x5 = 0; x5 < 256; x5++)for (int x6 = 0; x6 < 256; x6++)for (int x7 = 0; x7 < 256; x7++)for (int x8 = 0; x8 < 256; x8++)sum += E(x1, x2, x3, x4, x5, x6, x7 x8);

Quantum 8D integration64 function evaluations

double sum = 0.0;quantum int x1 = x2 = x3 = x4 =

x5 = x6 = x7 = x8 = 1/√8(|00000000> + |11111111>);

sum = SUM E(x1, x2, x3, x4, x5, x6, x7 x8);

Problem: Perform a numerical integration over an 8 dimensional space, with 256 mesh points in each dimension (total 264 points).

“wildcard” quantum

value

Evaluate E for all 264 values of

argument

Sort of a “global sum”