HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon)...

23
International Advanced Research Workshop on High Performance Computing, Grids and Clouds 2010 June 21~June 25 2010 Cetraro Italy June 21 June 25 2010, Cetraro, Italy HPC Infrastructure and HPC Infrastructure and GPU Computing Activities in KISTI k i Hongsuk Yi hsyi@kisti re kr hsyi@kisti.re.kr KISTI Supercomputing Center

Transcript of HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon)...

Page 1: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

International Advanced Research Workshop on High Performance Computing, Grids and Clouds 2010June 21~June 25 2010 Cetraro ItalyJune 21 June 25 2010, Cetraro, Italy

HPC Infrastructure andHPC Infrastructure and GPU Computing Activities in KISTI

k iHongsuk Yi

hsyi@kisti re [email protected] Supercomputing Center

Page 2: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

Outline

HPC infrastructure and activities in KISTIHeterogeneous Computing with GPUHeterogeneous Computing with GPU

What is the scalability Heterogeneous Computing with MPI+CUDA

KISTI Supercomputing Center HPC2010_Italy_2

Page 3: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

Where is KISTI?

KISTI is responsible for national cyber-infrastructure of KoreaMission is enable Discovery through National Projects

I will don’t talk about, today Grid Project (2002~2009) ~ K*Grid

S i P j t (2005 )HPC Infrastructure &

e-Science Project (2005 ~) Scientific Cloud Computing (2009~)Research Network Project (2005~)

Multi-GPU programming

Keep securing/providing Computing Help Korea research communitiesCyber-p g p g

world-classsupercomputing systems

/Network Resource

communities to be equipped with proper knowledge of Cyber-infra.

infrastructure Environment

Make best use ofwhat the center has, to create new values

Value-addingCenter

Validatenewly emerging concepts, ideas, tools, and systemsK*G id S i tifi Cl d

TestbedCenter

KISTI Supercomputing Center Heterogeneous Computing_3

K*Grid, Scientific Cloud Center

Page 4: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

History of Supercomputers in Korea

Exa

Time for new multi-petaflops System in Korea

Nobel’02a

Peta

Nobel’024 TFHam. ’01

435GF

Tera

T3E’97130GF SUN’10

307 TFe a

GigaIBM’930 TF

Mega

Giga 30 TF

Mega

2GF ‘88 16GF’93SX5 ‘0280GF

SX6 ’03160GF

KISTI Supercomputing Center HPC2010_Italy_4

1970 1980 1990 20102000 2020 2030

Page 5: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

HPC ACT of Korea

HPC ACT has been started from 2004The ACT is currently awaiting the approval of the National Assembly

PurposeTo provide for a well coordinated national program to ensure continued Korea role in HPC and its applications byrole in HPC and its applications by improving the coordination of supercomputing resource on HPC maximizing the effectiveness of the Korea’s networks research (KREONET)

We can make more contribution by expanding support for National agenda research program in the field of computational science, and development of cyber-infrastructure environment, ad well as applications ofdevelopment of cyber infrastructure environment, ad well as applications of extreme scale computation

KISTI Supercomputing Center HPC2010_Italy_5

Page 6: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

NCRC

National Core Research Center for Computational Science and Technology

Budget ~ 10M$Sep. 2010~ (not yet completely determined)

Application DomainsApplication DomainsEnergy transformation by quantum simulation Migration of pollution by air including yellow sandg p y g yNew material for Energy

KISTI Supercomputing Center HPC2010_Italy_6

Page 7: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

Supercomputing Resource (Tachyon)

Tachyon-II is the 15th ranked in Top500 (June, 2010)Sun Blade x6048, Intel Nehalem procs~26,232 (Memory~157 TB )Peak ~ 307 Tflops (Sustained Peak 274 Tflops)

Providing about 30% of whole computing capacity for public esea ch in Ko earesearch in KoreaUsers form 200 institutes in Korea Utilization : 70~80%Little room for large scale grand challenge problems

50

30

40

10

20Chemistry

Earth/Weather

Mechanics

Physics

Electronics

O h

KISTI Supercomputing Center

0

2003 2004 2005 2006

Others

HPC2010_Italy_7

Page 8: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

User Support and Applications Support code optimization and parallelization

We optimize and parallelize user’s code.P f i t f 5 ti t th 1 000 f ld-

Year 2004 2005 2006 2007 2008 2009Optimization 6 12 13 15 20 20

Performance improvements from 5 times to more than 1,000-fold

p

Parallelization 6 8 11 15 20 25

User’s Application through Grand Challenge Problems

Magnetic control of edge spinsby K. S. Kim, Nature Nanotech 3 408 (2008)Nature Nanotech. 3, 408 (2008)

Hydrogen Storage Materials by J. Ihm,

Phys. Rev. Lett. 97, 056104 (2006)

KISTI Supercomputing Center HPC2010_Italy_8

y , ( )

Page 9: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

Supercomputing Resource (GAIA)

GAIA-II is SMP Cluster 393th in Top500, 2009.11IBM POWER6 5 GH P 595IBM POWER6 5 GHz, Power 595,Number of Procs ~ 1,536 (64 cores/node) Rpeak~30.7 Tflops (Sustained Peak 23.3 p p (Tflops)Memory (8.7 TB)

KISTI Supercomputing Center HPC2010_Italy_9

Page 10: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

Hybrid Programming : Multi-Zone NPBGAIA-2 (IBM p6 5GHz) Big memory ~16GB/cores

BT-MZ with class F Memory required ~ 5 TB

Bin-packing algorithm The size of zones varies ~20

MPI+OpenMP Programming Performance ~ 4.5TF (15%)

KISTI Supercomputing Center HPC2010_Italy_10

Page 11: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

GPU computing for visualization

All KISTI’s visualization systems have direct connection to GLORIAD, whose bandwidth is 10 Gbps

Visualization ComputerVisualization Computer

Total number of nodes ±150

CPU # of CPU 800CPU # of CPU cores 800+

Total memory 3.5+ TB

GPUModel NVIDIA Quadro

FX 5600

# of GPUs 96+

Network

Interconnection 20 Gbps

External

KISTI Supercomputing Center

o External network 160+ Gbps

HPC2010_Italy_11

Page 12: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

GPU Computing Activities

KSCSE (Korea Society for Computational Sciences and Engineering)

Establish as a new computing society in 2009Support GPU computing Forum and workshop 200 participants two days May 2010 Seoul 200 participants, two days, May 2010, Seoul

Open for international collaboration on the extreme scale computing

KISTI Supercomputing Center HPC2010_Italy_12

Page 13: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

Heterogeneous Computing Testbed

Heterogeneous Computing System refer to system that use a variety of different types of computational units.A computational unit could be a GPU, co-processor, FPGA

KISTI Heterogeneous GPU TestbedNVIDIA 2* S1070 (8 GPU ) D870 GTX280NVIDIA 2* S1070 (8 GPUs), D870, GTX280

PCI-e : 16x

KISTI Supercomputing Center HPC2010_Italy_13

Page 14: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

Performance Benefit of GPU Computing The ratio of operations to elements transfer : O(N)

Matrix-Matrix Multiplication Operation N3 Transfer 3xN2, Scaling O(N) Operation N , Transfer 3xN , ,Scaling O(N)

Matrix-Matrix Addition Operation N2, Transfer 3xN2, Scaling O(1)

NVIDA GTX280 (Peak 933 GHz)

40% Efficiency

Matrix-Matrix

6x

MultiplicationC= AxB

6x

KISTI Supercomputing Center HPC2010_Italy_14

Page 15: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

Image Compression Using SVD• SVD is an important factoriz

ation of matrix• with many applications in signal p

rocessing and statistics.• culaSgeSVD() by using CULA

• RBG full color• 2048x2048 total 4,194,304 pixels

Original 12 288 KBOriginal 12,288 KB

1 Rank 12 KB

10 Rank 120 KB

50 Rank 600 KB

80 Rank 960 KB

KISTI Supercomputing Center

100 Rank 1,200 KB

HPC2010_Italy_15

Page 16: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

3D FDTD on GPU

• Time domain simulation: FDTD,…• Frequency domain simulation: FEM, BEM,..

Finite-Difference Time-DomainDivide both space and time into discrete grids (i, j, k+1)

(i+1, j+1, k+1)

Hz

3D FDTD Benchmark ResultsMemory ~ 988 MB, Grid~300x300x240 (i+1, j+1, k)

Ez

Ey

HyHx

(i, j, k) (i+1, j, k)Ex

Ey

Ht

1

E E

t

1

H

J

250

300

By Q. Park and K. Kim, Korea Univ.

150

200

50

100

KISTI Supercomputing Center

0

Intel Opt 9800GF GTX280 GTX480 C1060

HPC2010_Italy_16

Page 17: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

RNG and Monte Carlo AlgorithmIsing Model Model and probability weight

L

i

L

jiij sHssJE 1 [ / ]p exp E k T

1000 MC steps and 512 threads per block independent of block number and (tx=512 ty=1 by=1) bx

i

iji

jiij sHssJE11,

[ / ]( ) Bp exp E k T

Z T

number and (tx=512, ty=1,by=1), bx Size of the warp ~32 and the number of thread block ~ 30

warp

KISTI Supercomputing Center HPC2010_Italy_17

Page 18: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

Performance of Ising Model on a GPU Use the checkerboard decomposition

to avoid the read/write conflictsth i fi ld d th d l f th RNGthe spin field and the seed values of the RNGsDivide the spin on the lattice into blocks on the GPU.

Example for a (20x16) lattice

KISTI Supercomputing Center HPC2010_Italy_18HPC2010_Italy_18

Page 19: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

MPI Virtual Topology + CUDA Model VT is weak scaling problem The problem size grows in direct proportion to the num. of cores Using PBC ith MPI Send e () Using PBC with MPI_Sendrecv() Intel Nehalem 8 cores + Nvidia Tesla C1060*8GPUs

PCI-E cables

KISTI Supercomputing Center HPC2010_Italy_19

Page 20: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

Heterogeneous Communication Pattern

GPU0 -------- CPU1 --- CPU2 -------> GPU2PCIe PCIeInfiniBand

KISTI Supercomputing Center

There is an extra cudaMemcopy() involved in message passingHPC2010_Italy_20

Page 21: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

GPU scaling Issues

Achieving good scaling is more difficult with GPUsThe kernels are much faster so the MPI communication becomes a larger faction of the overall execution time

KISTI Supercomputing Center HPC2010_Italy_21

Page 22: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

Summary

HPC ACT of Korea is in progress The act is awaiting the approval of the Korea assemblyTime for heterogeneous petaflops system in KoreaConsider too many things, power, space, user’s ability of porting

In the MPI+CUDA model achieving good scaling is moreIn the MPI+CUDA model, achieving good scaling is more difficult than pure MPI since

the kernels are still faster on the GPUThere is an another communication over head between CPU and GPUs

KISTI Supercomputing Center HPC2010_Italy_22

Page 23: HPCInfrastructureandHPC Infrastructure and GPU Computing ... · Supercomputing Resource (Tachyon) Tachyon-II is the 15th ranked in Top500 (June, 2010) Sun Blade x6048, Intel Nehalem

Q&A

Thank You!KISTI Supercomputing Center

Thank You!HPC2010_Italy_23