22.04.23
SHORT OVERVIEW OF CURRENT STATUS A. A. MoskovskyProgram Systems Institute, Russian Academy of Sciences
IKI - MSR Research WorkshopMoscow, 10-12 June, 2009
“SKIF-GRID” SUPERCOMPUTING PROJECT OF THE UNION STATE OF RUSSIA AND BELARUS
22.04.23 Slide 2 22
Pereslavl-ZalesskyPereslavl-Zalessky
Russian Golden Ring Russian Golden Ring City: 857 years oldCity: 857 years old
Hometown of Great Hometown of Great Dukes of RussiaDukes of Russia
The first building site The first building site Peter The Great navyPeter The Great navy
Ancient capital of Ancient capital of Russian Orthodox Russian Orthodox churchchurch
Moscow
Pereslavl Zalessky
120 km
22.04.23 Slide 3
“SKIF-GRID” PROJECT TIMELINE
1. 2000-2004 - SKIF project, SKIF K-1000 is #98 in Top500
2. June 2004 – first proposal filed for “SKIF-GRID” project
3. March 2007 – approved by Government4. March 2008 - SKIF-MSU supercomputer deployed
(#36 in June 08 Top 500)5. May 2008 - “SKIF-Testbed” federation created.6. March 2009 – alliance agreement signed for SKIF
series 4 development
22.04.23 Slide 4
PROJECT ORGANIZATION: 2007-2008
Project directions1. Grid technology2. Supercomputers
• SW• HW
3. Security4. Pilot projects –
applications of HPC and grid technology
22.04.23 Slide 5
«SKIF MSU»
22.04.23 Slide 6
SKIF MSU
Theoretical peak performance 60 TFlops
47 TFlops Linpack Advanced clustering
solutions: diskless
computational nodes
Original blade design
Parameter Value
CPU architecture: x86-64
CPU model: Intel XEON E5472 3,0 GHz (4-cores)
Nodes (dual CPU) 625
CPU cores total 5 000Interconnect Infiniband DDR,
Fat Tree
22.04.23 Slide 7
«SKIF-Testbed» a/k/a “SKIF-Polygon”
Federation of HPC centers, ~100 Tflops
4 computers in the current Top 500 MSU (#35 in Top500) South Urals State
University Tomsk State
University UFA state technical
university
22.04.23 Slide 8
Middleware platform – UNICORE 6.1
X.509 for security Certificate Authority at Pereslavl-Zalessky (PyCA) Site platform
UNICORE 6.1 Java 1.5 Linux Torque
Experimental sites: UNICORE is complemented with additional services/modules
22.04.23 Slide 9
Applications (2007-2008)
HPC applications: Drug design (MSU Belozersky Institute, SRCC,
Chelyabinsk SU) Inverse problems in soil remote sensing (SRCC) Computational chemistry (MSU Chemistry department)
Geophysical data services Mammography database prototype (N.N. Semenov Chemical
Physics Institute, RAS) Text mining (PSI RAS) Engineering (South Ural University …) Space Research Institute... …
22.04.23
SKIF-Aurora
2009-2010: second phase of SKIF-GRID project
22.04.23 Slide 11
SKIF Series 4: original R&D goals
Highest density of performance(biggest possible number CPU per 1U) Smaller latency Less cables and connectors — better reliability Enlarged emission of heat per 1U
• We need new technology of cooling… How to? Improved Interconnect: we need better scalability,
bandwidth and latency that it’s provided by best available solutions (eg. Infiniband QDR)
New approach to monitoring and management of the supercomputer
Combining standard CPUs and accelerators in computational nodes of the supercomputer
22.04.23 Slide 12
Spring’2008: SKIF Series 4 — How To?
22.04.23 Slide 13
Summer’2008: SKIF Series 4 — Know How!
Italian-Russian Cooperation «SKIF Series 4» ==
«SKIF-AURORA Project» Designed by an alliance of
Eurotech, PSI RAS and RSC SKIF with support by Intel
To be present at ISC 09
Program SystemsInstitute of RAS
22.04.23 Slide 14
SKIF-Aurora distinctive features
No moving parts Liquid cooling – power efficiency X86_64 processors (IntelNehalem) 3-D torus interconnect Redundant management/monitoring
subsystem FPGA on board (optional) SSD disks (optional) QDR Infiniband
22.04.23 Slide 15
SKIF-Aurora
32 nodes per chassis 64 CPUs in 6U
Up to 8 chassis per rack Up to 512 CPU per rack Up to 2048 cores
To build 500 TFlops 21 racks in 2009 scalable due to 3-D torus
10 kW per chassis
22.04.23 Slide 16
SKIF-AURORA: Designed by the alliance of Eurotech, PSI RAS and RSC SKIF
PCBs, mechanics,
power supply, cooling,1 and 2 levels of
management system
3 level of management
system, Interconnect
(3D-torus: firmware,
routing, drivers,
MPI-2…), FPGA as
accelerator
22.04.23 Slide 17
SKIF-AURORA Management Subsystem
22.04.23 Slide 18
3-D torus interconnect implementation
System Interconnect, 3D-torus
Subsidiary Interconnect, Infiniband
FPGA FPGA FPGA FPGA...
CPU CPU CPU CPUstandard part
non-standard part
Only QCD specific is implemented by Italian team Russian teams to upgrade network to general-purpose
interconnect (MPI 2.0), due to appear fall 2009
22.04.23 Slide 19
R&D Directions Using FPGA
Collective MPI operations using FPGA FPGA to facilitate support of PGAS-languages (UPC, Titanium, etc) FPGA+CPU hybrid computing
22.04.23 Slide 20
Conclusions
Is based on collaboration between international teams
Harnesses shared expertise and results Aimed to develop a family of petascale-level
supercomputers with innovative techniques: Higher density of CPUs (flops per volume) Efficient water cooling system Scalable powerful 3D-Torus Interconnect Etc.
22.04.23 Slide 21
Datacenter visualization
22.04.23 Slide 22
Datacenter visualization
Top Related