SGI Contributions to Supercomputing by 2010
Steve ReinhardtDirector of [email protected]
Data Access
VisualizationHPCScalable servers and superclusters• SGI® Origin® family• SGI® Altix™ 3000 family
SGI® NUMAflex™
Supercomputing Aspects of SGI
• Deliver data wherever the users are•CXFS/WAN demo at SC’02
• Each server reads directly, at channel speeds• Biggest installed configuration .5PB
• “VAN” • Deliver images
wherever the users are• Enable
collaboration
NOTE: No“enterprise”references
•Memory is unifying theme •globally addressable up to O(PB)•incorporating varied processing types•latency (-> 500ns for 10KP) •bandwidth (local stride-1 B:F -> 2.0+ local gather/scatter B:F .5-1.0
remote bisection BW B:F -> .3) •Sustained performance
•differentiated scaling (latency & bandwidth)•better memory interface•new synchronization substrate
•Raise the level of programming abstraction•UPC/CAF (near-term)•parallel Matlab (radical)
SGI in HPC
SGI in HPC
•SGI Origin® family•MIPS processors, Irix OS•exploit low power consumption, ISA control
•SGI Altix™ family•IPF processors, Linux OS•exploit SGI interconnect, with industry-standard ISA and rapid OS maturation
Balancing High Innovation and Profitability
low Differentiation high
low
P
rofi
tabi
lity
h
igh
“Death Valley”:enough differentiation to have higher cost but not enough to have high value
System / Component Differentiation
System Cost
System Value
OS
Interconnect
Memory
Processor
Ideal Differentiation
System Cost
System Value
OS
Interconnect
Memory
Processor
SGI Origin series
System Cost
System Value
OS
Interconnect
Memory
Processor
Quadrics cluster
System Cost
System Value
OS
Interconnect
Memory
Processor
IBM SP3 system
System Cost
System Value
OS
Interconnect
Memory
Processor
SGI Altix system
System Cost
System Value
OS
Interconnect
Memory
Processor
7
12
25
31
14
23
32
63
27
125
0 25 50 75 100 125
HP Superdome™
HP AlphaServer™ GS
IBM® eServer™ p690
SGI® Altix™ 3000 Family
GB/sec
64P
32P
16P
• World-record result for a µP-based system; fourth
overall• .8 B:F (6.4GB/s shared by 2x4GF processors)• Single kernel; NUMA placement support in Linux
STREAM Triad Results
Interconnect Scaling
0
200
400
600
800
1000
1200
1400
1600
2P 4P 8P 16P 32P 64P 128P … 10KP
MPI bandwidth versus distance (MB/s)
Coming
soon
Altix 3000 Throughput Performance
2438
1
2484
1
2448
6
2452
1
2460
1
20000
21000
22000
23000
24000
25000
Stan
dalo
ne
Job
1/8p
Job
2/8p
Job
3/8p
Job
4/8p
Elap
sed
Tim
e (s
ec)
Throughput of 4 jobs, each
8P, crash application
System:Altix 3000, 32P, 64GB, XVM, TP900
Individual jobs in the throughput mix are between 0.4% and 1.8 % slower than the standalone case
Summary: SGI for HPC
• Long-term directions– Memory: globally addressable, high BW, low latency– Strong delivered performance
• differentiated scaling (latency & bandwidth)• better memory interface• new synchronization substrate
– Raise the level of programming abstraction• UPC/CAF (near-term); parallel Matlab (radical)
• Near-term deliverables– Altix 3000 system
• distinguished performance• rapidly maturing Open Source software base
Top Related