One step ahead. The Challenges of Architectures that Grow to Petascale and can be Sustained...
-
Upload
belinda-barnett -
Category
Documents
-
view
212 -
download
0
Transcript of One step ahead. The Challenges of Architectures that Grow to Petascale and can be Sustained...
One step ahead
The Challenges of Architectures that Grow to
Petascale and can be Sustained Economically
Steve Reinhardt
Principal Engineer, SGI
spr at sgi.com
SGI’s systems are evolving to enable ultrascale versions of today’s
applications and enable a new type of computational science, while remaining
economically sustainable.
Agenda
• Besides Architecture…• Enabling Ultra-scale Applications• Enabling New Computational Science• Sustaining Economically
Besides Hardware Architecture ...
• Efficient execution environment• RAS • OS architecture
– Linux scaled aggressively, with multiples in ultrascale configurations
• Robust scheduling• RAS • Packaging density / heat dissipation• RAS
Agenda
• Besides Architecture…• Enabling Ultra-scale Applications• Enabling New Computational Science• Sustaining Economically
Local Performance:Needed Flexibility of Memory Access
Note: Original (Jan2003) models used for both X1 and Altix
0.1
1
10
100
cache stride1 gather/scatter
Ban
dwid
th (G
B/s
)
X1
Altix
Price Performance
0.01
0.10
1.00
10.00
cache stride1 gather/scatter
Cos
t of B
andw
idth
(MB
/s p
er $
)
X1
Altix
Absolute Performance
Driven by focus of engineering team
Driven by cost of large engineering team
Driven by parts replication cost
Ideal Machine (Technical/Economic Balance)
Price PerformanceAbsolute Performance
• High, cost-effective cache bandwidth of mass market parts• Highest cost-effective memory bandwidth• Design focus on gather/scatter
0.1
1
10
100
cache stride1 gather/scatter
Ban
dwid
th (G
B/s
)
X1
Altix
ideal
0.01
0.10
1.00
10.00
cache stride1 gather/scatter
Co
st o
f B
and
wid
th (
MB
/s p
er $
)
X1
Altix
ideal
Note: For O(100KP) petascale machines, value of O(5X) processor performance advantage is less than today
Local Performance: Multi-Paradigm
Low Data locality High
Lo
w
Co
mp
ute
h
igh
Inte
ns
ity
Vector-like
PIM-like
Scalar
Application-specific
Ultraviolet : Concept Architecture
MPUMPU MPU
UV Petascale GAM
. Globally Addressable . Low Latency . High Bandwidth . O(100K) Ports
GPUI/O
APU
Global Performance
• Communications– grids becoming more dynamic -> low latency essential – processor counts growing -> low latency essential– low latency -> global address space– in clock periods, remote memory getting further away– bandwidth-conserving operations needed– high absolute link performance
• Synchronization– current mechanisms insufficient for ultrascale– optimizations will help, but maybe not enough– new mechanisms needed
• Dynamic load balancing– mechanisms need to mature, and interfaces become standard
Challenges
• Clear virtual machine and performance models for these new mechanisms
• Compilers/tools that exploit these mechanisms mostly automatically and accept user hints
• Appropriate performance balance for typical uses• Need to gain successful experience at very large scale (10-30KP) before going to ultrascale (100KP)
Agenda
• Besides Architecture…• Enabling Ultra-scale Applications• Enabling New Computational Science• Sustaining Economically
Scientific Process
Observe existing datafor patterns
Hypothesize modelsthat match the data
Test those modelsto understand accuracy(i.e., add new data)
**Believed first coined by Scott Studham et al., PNNL
Scientific Process
Observe existing datafor patterns
Hypothesize modelsthat match the data
Test those modelsto understand accuracy(i.e., add new data)
“First Principles” computing;most of current HPC
“Dynamic Network Inference” computing**
•Query: When we know what we want and how to ask for it•Inference: When we know only somewhat what we want•Exploration: When we know little, but anticipate more
“planned serendipity”
**Believed first coined by Scott Studham et al., PNNL
Example: Post-Genomic Biology
• <10% of the human genome is known to code for proteins
• Selective pressure generally removes unused genetic material
• What is the other 90% of the genome doing?– Have the raw data (genome)– Need to add other types of data (e.g., protein association info)– Multi-petabytes of data all told– Probably not a purely computational problem
Differences from First Principles
• Data access patterns ~impossible to predict a priori -> low latency / global address space
• New tools for data exploration needed– need to automatically search for new, perhaps-vaguely-defined, patterns
(that foster new theory)– highly interactive/coupled with the scientist’s thought process– but beware difficulty of launching new languages
• Contents of memory much more valuable– RAS
“and now for something completely different”: Star-P
• Developed by Alan Edelman and colleagues at MIT, etc.• Simple extensions to the MATLAB® language
– data parallel, MIMD, and mixed
• Builds on the existing base of MATLAB programs– broadening the market for HPC systems
• New back-end server implemented for parallel execution• Preserves key MATLAB strengths:
– very high level language– interactivity / exploration– easy visualization
“Put the fun back in supercomputing”
Agenda
• Besides Architecture…• Enabling Ultra-scale Applications• Enabling New Computational Science• Sustaining Economically
Key Points
• SGI retains system focus• …but uses commodity components wherever practical
– Exploit best mass-market processors (Itanium™)• augment to make suitable for wider range of HPC apps
– Use Linux fully• reap the cost benefits of reduced support of proprietary Unix™ variant
– IFB cables, EFI firmware
• Innovations for ultrascale must be relevant for wider markets– e.g., multi-paradigm computing must accelerate ISV apps
• Use new technologies to broaden the market– e.g., Star-P
SGI’s systems are evolving to enable ultrascale versions of today’s
applications and enable a new type of computational science, while remaining
economically sustainable.
One step ahead
“There are no technology-independent lessons in computer science.”
Butler Lampson, Xerox PARC