Download - SGI Contributions to Supercomputing by 2010 Steve Reinhardt Director of Engineering [email protected].

SGI Contributions to Supercomputing by 2010

Steve ReinhardtDirector of [email protected]

Data Access

VisualizationHPCScalable servers and superclusters• SGI® Origin® family• SGI® Altix™ 3000 family

SGI® NUMAflex™

Supercomputing Aspects of SGI

• Deliver data wherever the users are•CXFS/WAN demo at SC’02

• Each server reads directly, at channel speeds• Biggest installed configuration .5PB

• “VAN” • Deliver images

wherever the users are• Enable

collaboration

NOTE: No“enterprise”references

•Memory is unifying theme •globally addressable up to O(PB)•incorporating varied processing types•latency (-> 500ns for 10KP) •bandwidth (local stride-1 B:F -> 2.0+ local gather/scatter B:F .5-1.0

remote bisection BW B:F -> .3) •Sustained performance

•differentiated scaling (latency & bandwidth)•better memory interface•new synchronization substrate

•Raise the level of programming abstraction•UPC/CAF (near-term)•parallel Matlab (radical)

SGI in HPC

SGI in HPC

•SGI Origin® family•MIPS processors, Irix OS•exploit low power consumption, ISA control

•SGI Altix™ family•IPF processors, Linux OS•exploit SGI interconnect, with industry-standard ISA and rapid OS maturation

Balancing High Innovation and Profitability

low Differentiation high

low

P

rofi

tabi

lity

h

igh

“Death Valley”:enough differentiation to have higher cost but not enough to have high value

System / Component Differentiation

System Cost

System Value

OS

Interconnect

Memory

Processor

Ideal Differentiation

System Cost

System Value

OS

Interconnect

Memory

Processor

SGI Origin series

System Cost

System Value

OS

Interconnect

Memory

Processor

Quadrics cluster

System Cost

System Value

OS

Interconnect

Memory

Processor

IBM SP3 system

System Cost

System Value

OS

Interconnect

Memory

Processor

SGI Altix system

System Cost

System Value

OS

Interconnect

Memory

Processor

7

12

25

31

14

23

32

63

27

125

0 25 50 75 100 125

HP Superdome™

HP AlphaServer™ GS

IBM® eServer™ p690

SGI® Altix™ 3000 Family

GB/sec

64P

32P

16P

• World-record result for a µP-based system; fourth

overall• .8 B:F (6.4GB/s shared by 2x4GF processors)• Single kernel; NUMA placement support in Linux

STREAM Triad Results

Interconnect Scaling

0

200

400

600

800

1000

1200

1400

1600

2P 4P 8P 16P 32P 64P 128P … 10KP

MPI bandwidth versus distance (MB/s)

Coming

soon

Altix 3000 Throughput Performance

2438

1

2484

1

2448

6

2452

1

2460

1

20000

21000

22000

23000

24000

25000

Stan

dalo

ne

Job

1/8p

Job

2/8p

Job

3/8p

Job

4/8p

Elap

sed

Tim

e (s

ec)

Throughput of 4 jobs, each

8P, crash application

System:Altix 3000, 32P, 64GB, XVM, TP900

Individual jobs in the throughput mix are between 0.4% and 1.8 % slower than the standalone case

Summary: SGI for HPC

• Long-term directions– Memory: globally addressable, high BW, low latency– Strong delivered performance

• differentiated scaling (latency & bandwidth)• better memory interface• new synchronization substrate

– Raise the level of programming abstraction• UPC/CAF (near-term); parallel Matlab (radical)

• Near-term deliverables– Altix 3000 system

• distinguished performance• rapidly maturing Open Source software base