Ken ichi Itakura (JAMSTEC) - Department of Physics
Transcript of Ken ichi Itakura (JAMSTEC) - Department of Physics
http://www.jamstec.go.jp
The Architecture and the Application Performance of the Earth Simulator
Ken’ichi Itakura (JAMSTEC)
15 Dec., 2011 1ICTS-TIFR Discussion Meeting-2011
Location of Earth Simulator Facilities
Yokohama
Earth Simulator Site
Tokyo
2
HQ
Earth Simulator Building
3
Cross‐sectional View of the Earth Simulator Building
Double Floor for Cables
Air Return Duct
Lightning Conductor
Power Supply SystemAir Conditioning system
Seismic Isolation System
Earth Simulator System
4
Earth Simulator
Earth Simulator
March, 2002 ~ March.2009 (1/2 : ~ Sep.2008)
Peak Performance : 40 T Flops
Main Memory : 10 T Bytes
Earth Simulator(Ⅱ)
March, 2009 ~
Peak Performance : 131 T Flops
Main Memory : 20 T Bytes
5
Development of ES started in 1997 with the aim of making a comprehensive understanding of global environmental changes such as global warming.
TOP500 List Earth Simulator got the top position at iSC02 (June 2002) and keep the top position two and half year.
Its construction was completed at the end of February, 2002 and the operation started from March 1, 2002 at the Earth Simulator Center
New Earth Simulator System (ES2) was installed late 2008 and started operation in march 2009.
Development of the Earth Simulator (ES)
6
Earth Simulator (ES2)
7
Operation Network
Maintenance Network
User Network
Labs, User Terminals
Processor nodes (PNs)
Login server
Operation Servers
Storage server
FC Switch
4Gbps Fiber ChannelSAN (Storage Area Network)
Usable Capacity 1.5PBRAID6 HDD
SX-9/E 160 processor nodes (PNs)(including interactive 2nodes)Total peak performance 131TFLOPSTotal main memory 20TBData storage system 500TBInter nodes network (IXS) 64GB/sec (bidirectional) /node
10GbE(partially link aggregation
40GbE)
NQS2 batch job system on PNsAgent request supportUse statistical and resource information managementAutomatic power saving management
FC SwitchFC Switch
ES2 System outline
Maximum power consumption 3000kVA8
New System Layout
50m
65mNewEarth Simulator(ES2)
Original Earth Simulator(Stoped)
Original Earth simulator is opened on March, 2002.
New System start operation on March, 2009.
9
Calculation nodesCalculation nodes
160 nodes
L Batch nodeS Batch node
Interactive node
2 nodes 2 nodes156 nodes
WORK area
Login server
login
HOME/DATA areas
Possible to refer
10
Clustering of nodes to control the system (transparent for uses) .A cluster is consists of 32 nodes.156 nodes are for batch jobs (batch clusters).
Calculation nodesCalculation nodes
160 nodes
L Batch nodeS Batch node
Interactive node
2 nodes 2 nodes156 nodes
WORK area
Login server
login
HOME/DATA areas
Possible to refer
11
Providing special 4 nodes for TSS and
small batch jobs.
Configuration of the TSS cluster.TSS nodes [2 nodes → 1node (changed in 2010)]
Nodes for Single Node batch jobs
[2 nodes → 3 nodes],
Calculation nodesCalculation nodes
160 nodes
L Batch nodeS Batch node
Interactive node
2 nodes 2 nodes156 nodes
WORK area
Login server
login
HOME/DATA areas
Possible to refer
12
Configuration of the batch cluster.Nodes for Multi‐Nodes batch jobs,
System disks for user‐file staging
Calculation nodesCalculation nodes
160 nodes
L Batch nodeS Batch node
Interactive node
2 nodes 2 nodes156 nodes
WORK area
Login server
login
HOME/DATA areas
Possible to refer
13
Storage of user files for batch jobs on a mass‐storage system.Automated file recall (Stage‐In) and migration (Stage‐Out).Connection of all the clusters to a mass‐storage system by IOCS (Linux WS)
14
ES ES2 (SX‐9/E) Ratio
CPUClock Cycle 1GHz 3.2GHz 3.2x
Performance 8GF 102.4GF 12.8x
Node
#CPUs 8 8 1x
Performance 64GF 819.2GF 12.8x
Memory 16GB 128GB 8x
Network 12.3GB/s x2 8GB/s x8 x2 5.2x
System
#Nodes 640 160 1/4x
Performance 40TF 131TF 3.2x
Memory 10TB 20TB 2x
Network Full Crossbar2 Level Fat‐Tree
Full Bisection Bandwidth-
Hardware Spec.
ES2 software
• OS‐ SUPER‐UX
• Environment‐ Fortran90‐ C/C++‐MPI‐1/2‐ HPF‐MathKeisan (BLAS, LAPACK,etc)‐ ASL‐ Cross‐compiler on Login‐server (linux)
15
ES2 Operation
How many user?
About 800 people
How many jobs?
About 10,000 jobs per month
Power(include cooling)?
About 3000KVA, about 70% of Original ES
Average load for job running?
70—80%, most of the rest is used for pre/post processing.
ES2 Operation
17
Projects
28th Sep, 2010 DOE HPC Best Practices Workshop 18
FY2011 ES2 Projects○ Proposed Research Projects : 29
Earth Science :18 Innovation : 11
○ Contract Research Projects・KAKUSHIN 5・ The Strategic Industrial Use (Industrial) 13・CREST 1
○ JAMSTEC Research Projects 14・JAMSTEC ・Collaboration Research ・Industrial fee-based usage (New project is accepted at any time.)
Users : 565Organization 125
University 57, Government 15, Company 34, International 19
Proposed
Contract
JAMSTEC
Resource Allocation
#nodes 1~4
28.7%
#nodes 5~8
17.6%
#nodes 9~16
12.4%
#nodes 17~32
21.8%
#nodes 33~64
16.7%
#nodes over 65
2.9%
Computing Resource Distribution (Based on Job Size)
FY2010
19
ES2 Application Field
Atmospheric and
Oceanographic Science28%
Solid Earth Science16%
Global Warming: IPCC
41%
Epoch‐Making Simulation
11%
Industorial Use4%
FY2010
20
21
ES2 Node Utilization(FY2010)
Stopped Operation on 14 Mar.
22
ES2 Node Utilization(FY2011)
※Degeneration operation is carried out for power saving.
April May June July August Sep. Oct.
ES22009(KWH)
3,065 3,105 2,944 3,084 2,973 3,042 3,091
ES2008(KWH)
3,987 4,013 3,978 4,015 4,138 3,9861,752(half
System)
Reduction rate
76.9% 77.4% 74.0% 76.8% 71.9% 76.3%(176.5
%)
・ES2 power consumption is reduced about 75% from ES. ・The ratio of peak performance and power consumption is 4.34 times better than ES.
23
Application Performance
ES2 Application ‐1
25
AFESOFESCFES
ES2 Application ‐2
26
ES2 Application ‐3
27
ES2 Application ‐4
28
Code Name Elapse Time on ES[sec]
#CPUs on ES Elapse Time on ES2[sec]
(Speedup ratio)
#CPUs on ES2
PHASE 135.3 4096 62.2 (2.18) 1024
NICAM‐K* 214.7 2560 109.3 (1.97) 640
MSSG 173.9 4096 86.5 (2.01) 1024
SpecFEM3D 96.3 4056 45.5 (2.12) 1014
Seism3D 48.8 4096 15.6 (3.13) 1024
Speedup ratio harmonic mean 2.22
Performance Evaluation ResultsIn ES Real Applications
29
ES2 is 2.22 times faster
WRF
• WRF (Weather Research and Forecasting Model) is a mesoscale meteorological simulation code which has been developed under the collaboration among US institutions, including NCAR (National Center for Atmospheric Research) and NCEP (National Centers for Environmental Prediction). JAMSTEC has optimized WRFV2 on the Earth Simulator (ES2) renewed in 2009 with the measurement of computational performance.
• As a result, we successfully demonstrated that WRFV2 can run on the ES2 with outstanding performance and the sustained performance.
WRF Performance on ES
TOP500
2002Jun
2002Nov
2003Jun
2003Nov
2004Jun
2004Nov
2005Jun
2005Nov
2006Jun
2006Nov
2007Jun
2007Nov
2008Jun
2008Nov
2009Jun
2009Nov
2010Jun
2010Nov
2011Jun
2011Nov
系列1 1 1 1 1 1 3 4 7 10 14 20 30 49 73 22 31 37 54 68 94Rank
Rank 94 on Nov. 2011
32
HPC Challenge Awards
• The Competition will focus on four of the most challenging benchmarks in the suite: – Global HPL ‐ the Linpack TPP benchmark which measures the floating
point rate of execution for solving a linear system of equations. DGEMM ‐measures the floating point rate of execution of double precision real matrix‐matrix multiplication.
– Global RandomAccess ‐measures the rate of integer random updates of memory (GUPS).
– EP STREAM (Triad) per system ‐ a simple synthetic benchmark program that measures sustainable memory bandwidth (in GB/s) and the corresponding computation rate for simple vector kernel.
– Global FFT ‐measures the floating point rate of execution of double precision complex one‐dimensional Discrete Fourier Transform (DFT).
33
The 2009 HPC Challenge Class 1 Awards:G-HPL Achieved System Affiliation
1st place 1533 Tflop/sCray XT5 ORNL
1st runner up 736 Tflop/sCray XT5 UTK
2nd runner up 368 Tflop/sIBM BG/P LLNL
G-RandomAccess Achieved System Affiliation
1st place 117 GUPSIBM BG/P LLNL
1st runner up 103 GUPSIBM BG/P ANL
2nd runner up 38 GUPSCray XT5 ORNL
G-FFT Achieved System Affiliation
1st place 11 Tflop/sCray XT5 ORNL
1st runner up 8 Tflop/sCray XT5 UTK
2nd runner up 7 Tflop/sNEC SX-9 JAMSTEC
EP-STREAM-Triad (system)
Achieved System Affiliation
1st place 398 TB/sCray XT5 ORNL
1st runner up 267 TB/sIBM BG/P LLNL
2nd runner up 173 TB/sNEC SX-9 JAMSTEC
28th Sep, 2010 DOE HPC Best Practices Workshop 34
XT5@ORNL is 2.3PF and ES2 is 131TF at peak. That is about 17 times . However, G‐FFT
performance is only 1.5 times.
New Earth Simulator(ES2 SX‐9/E)HPC Challenge Awards 2010
G‐FFT No1in the World
Efficiency of G-FFT
ES2 11.88TF/131.07TF = 9.1%XT5 10.70TF/2044.7TF = 0.52%
17.5 Times Better
http://www.hpcchallenge.org/
New Earth Simulator(ES2 SX‐9/E)HPC Challenge Awards 2010
EP‐STREAM‐Triad No.3
35
HPC Challenge Awards 2010
Peak Performance
15.6 Time
2045
131
‐ 0.1
11.9
10.7
G-FFT Performance
Earth Simulator 2Jaguar(ORNL)
We are BIG!!
36
The 2011 HPC Challenge Class 1 Awards:
TOP500(Linpack) #cores G‐FFT Effective Performance Ratio #cores
JAMSTEC ES2 (SX‐9)
Rank94
122 TF 1,280 Rank2
11.9 TF 9.1 % 1,280
RIKEN AICS(K Computer)
Rank1
10,510 TF 705,024 Rank1
34.7 TF 1.47 % 147,456
Compare K is 86 times faster than ES2
K is only 2.9 times faster than ES2
ES is 6.2 times highereffective than K
37
Thank you for your kind attention !Thank you for your kind attention !
JAMSTECJAMSTECJapan Agency for Marine-Earth Science and TechnologyJapan Agency for Marine-Earth Science and Technology