Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN...

29
© NVIDIA Corporation 2011 Supercomputing with NVIDIA GPUs HPCN Workshop, May, 2011 Axel Koehler- NVIDIA

Transcript of Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN...

Page 1: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

Supercomputing with NVIDIA GPUsHPCN Workshop, May, 2011Axel Koehler- NVIDIA

Page 2: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

NVIDIA Introduction and HPC Evolution of GPUs

Public, based in Santa Clara, CA | ~$4B revenue | ~6000 employees

Founded in 1999 with primary business in semiconductor industry

Products for graphics in workstations, notebooks, mobile devices, etc.

Began R&D of GPUs for HPC in 2004, released first Tesla and CUDA in 2007

Development of GPUs as a co-processing accelerator for x86 CPUs

2004: Began strategic investments in GPU as HPC co-proces sor

2006: G80 first GPU with built-in compute features, 128 c ores; CUDA SDK Beta

2007: Tesla 8-series based on G80, 128 cores – CUDA 1.0, 1 .1

2008: Tesla 10-series based on GT 200, 240 cores – CUDA 2. 0, 2.3

2009: Tesla 20-series, code named “Fermi” up to 512 cores – CUDA SDK 3.0

HPC Evolution of GPUs

3 Generations ofTesla in 3 Years

Page 3: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

#1 : Tianhe-1A7168 Tesla GPU’s 2.5 PFLOPS

#3 : Nebulae4650 Tesla GPU’s 1.2 PFLOPS

We not only created the world's fastest computer, but also implemented

a heterogeneous computing architecture incorporating CPU and GPU,

this is a new innovation. ” Premier Wen JiabaoPublic comments acknowledging Tianhe-1A

#4 : Tsubame 2.04224 Tesla GPU’s 1.194 PFLOPS

Tesla GPUs Power 3 of Top 5 Supercomputers

Page 4: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

3 of Top5 Supercomputers

0

1

2

3

4

5

6

7

8

0

500

1000

1500

2000

2500

3000

Tianhe-1A Jaguar Nebulae Tsubame Hopper II Tera 100

Meg

awat

ts

Gig

aflo

ps

Page 5: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

3 of Top5 Supercomputers

0

1

2

3

4

5

6

7

8

0

500

1000

1500

2000

2500

3000

Tianhe-1A Jaguar Nebulae Tsubame Hopper II Tera 100

Meg

awat

ts

Gig

aflo

ps

Page 6: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

GPU Computing TodayBy the Numbers:

CUDA Capable GPUs200+ Million

CUDA Toolkit Downloads600,000+

Active GPU Computing Developers100,000+

Members in Parallel Nsight Developer Program8,000

Universities Teaching CUDA Worldwide362

CUDA Centers of Excellence Worldwide11

Page 7: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

Wide Adoption of Tesla GPUs

FinanceGovernmentEdu/ResearchOil and gas Life Sciences Manufacturing

Reverse Time

Migration

Kirchoff Time

Migration

Reservoir Sim

Astrophysics

Molecular

Dynamics

Weather / Climate

Modeling

Signal Processing

Satellite Imaging

Video Analytics

Synthetic Aperture

Radar

Bio-chemistry

Bio-informatics

Material Science

Sequence Analysis

Genomics

Risk Analytics

Monte Carlo

Options Pricing

Insurance

modeling

Structural

Mechanics

Computational

Fluid Dynamics

Machine Vision

Electromagnetics

Page 8: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

MATLAB makes GPUs more accessible

Scientist /Practitioner

Developer /Computer Scientist

Computational Expertise Domain Expertise

MATLAB Benefits• Faster time to discovery• Empowers scientist /

practitioner• No need for programming

expertise• No custom tools• Automated application

deployment

Language Integration

CUDA C / C++

High-LevelTechnical

ComputingLanguages

1 million+ MATLAB licensees

Page 9: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

GPU Progress – CAE ISV Software

Available

Today

Product

in 2011

Product

Evaluation

Research

Evaluation

GPU Status Structural Mechanics Fluid Dynamics Electromagnetics

ANSYS Mechanical

AFEA

Abaqus/Standard

LS-DYNA implicit

Marc

MD Nastran

RADIOSS implicit

PAM-CRASH implicit

NX Nastran

RecurDyn

AcuSolve

Moldflow

Culises (OpenFOAM)

Particleworks

CFD-ACE+

Abaqus/CFD

FloEFD

STAR-CCM+

ANSYS CFD (FLUENT+CFX)

LS-DYNA

Abaqus/Explicit

RADIOSS

PAM-CRASH

CFD++

LS-DYNA CFD

Nexxim

EMPro

CST MS

XFdtd

SEMCAD X

Xpatch

HFSS

Maxwell

Page 10: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

3 billion transistors

Over 2 x the cores (512 total)

8× the peak DP performance

ECC

L1 and L2 caches

~2× memory bandwidth (GDDR5)

Up to 1 Terabyte of GPU memory

Concurrent kernels

Hardware support for C++

DR

AM

I/F

HO

ST

I/F

Gig

a Th

read

DR

AM

I/F

DR

AM

I/FD

RA

M I/F

DR

AM

I/FD

RA

M I/F

L2

The ‘Fermi’ ArchitectureThe Soul of a Supercomputer in the body of a GPU

Page 11: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

Workstations2 to 4 Tesla GPUs

Integrated CPU-GPU Servers & Blades

Tesla Data Center & Workstation GPU Solutions

Tesla M-series GPUsM2090 | M2070 | M2050

Tesla C-series GPUsC2070 | C2050

M2090 M2070 M2050Cores 512 448 448

Memory 6 GB 6 GB 3 GB

Memory bandwidth (ECC off)

177.6 GB/s 148.8 GB/s 148.8 GB/s

Peak PerfGflops

Single Precision

1331 1030 1030

Double Precision

665 515 515

C2070 C2050448 448

6 GB 3 GB

144 GB/s 144 GB/s

1030 1030

515 515

Page 12: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

CUDA GPU Roadmap16

2

4

6

8

10

12

14

DP GFLOPS per Watt

2007 2009 2011 2013

TeslaFermi

Kepler

Maxwell

Page 13: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

NVIDIA Developer Eco -System

C

C++

Fortran

OpenCL

DirectCompute

Java

Python

GPU Compilers

PGI Accelerator

CAPS HMPP

mCUDA

OpenMP

Parallelizing

Compilers

BLAS

FFT

LAPACK

NPP

Video

Imaging

GPULib

Libraries

GPGPU Consultants & Training

ANEO GPU Tech

Debuggers

& Profilers

cuda-gdb

NV Visual Profiler

Parallel Nsight

Visual Studio

Allinea

TotalView

VampirTrace

MATLAB

Mathematica

NI LabView

pyCUDA

Numerical

Packages

Bright Cluster

Manager

Platform LSF /

Symphony

Altair PBS Pro

Torque

GridEngine

Cluster

Tools

OEM solutions +

Cloud Platform Provider

Amazon EC2

Peer 1

Page 14: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

CUDA 4.0: Highlights

• Share GPUs across multiple threads

• Single thread access to all GPUs

• No-copy pinning of system memory

• New CUDA C/C++ features

• Thrust templated primitives library

• NPP image/video processing library

• Layered Textures

Easier ParallelApplication Porting

• Auto Performance Analysis

• C++ Debugging

• GPU Binary Disassembler

• cuda-gdb for MacOS

New & Improved Developer Tools

• Unified Virtual Addressing

• NVIDIA GPUDirect™ v2.0

• Peer-to-Peer Access

• Peer-to-Peer Transfers

• GPU-accelerated MPI

Faster Multi-GPU Programming

Page 15: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

C++ Templatized Algorithms & Data Structures (Thrust)

Powerful open source C++ parallel algorithms & data structures

Similar to C++ Standard Template Library (STL)

Automatically chooses the fastest code path at comp ile time

Divides work between GPUs and multi-core CPUs

Parallel sorting @ 5x to 100x faster

Data Structures

• thrust::device_vector

• thrust::host_vector

• thrust::device_ptr

• Etc.

Algorithms

• thrust::sort

• thrust::reduce

• thrust::exclusive_scan

• Etc.

Page 16: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

Unified Virtual Addressing Easier to Program with Single Address Space

No UVA: Multiple Memory Spaces UVA : Single Address Space

System

Memory

CPU GPU0

GPU0

Memory

GPU1

GPU1

Memory

System

Memory

CPU GPU0

GPU0

Memory

GPU1

GPU1

Memory

PCI-e PCI-e

0x0000

0xFFFF

0x0000

0xFFFF

0x0000

0xFFFF

0x0000

0xFFFF

Page 17: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

Unified Virtual Addressing

One address space for all CPU and GPU memoryDetermine physical memory location from pointer val ueEnables libraries to simplify their interfaces (e.g . cudaMemcpy)

Supported on Tesla 20-series and other Fermi GPUs64-bit applications on Linux and Windows TCC

Before UVA With UVA

Separate options for each permutation One function handles all cases

cudaMemcpyHostToHostcudaMemcpyHostToDevicecudaMemcpyDeviceToHostcudaMemcpyDeviceToDevice

cudaMemcpyDefault(data location becomes an implementation detail)

Page 18: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

NVIDIA GPUDirect™ :Towards Eliminating the CPU Bottleneck

• Direct access to GPU memory for 3rd

party devices

• Eliminates unnecessary sys memcopies & CPU overhead

• Supported by Mellanox and Qlogic

• Up to 30% improvement in communication performance

Version 1.0 for applications that communicate

over a network

• Peer-to-Peer memory access, transfers & synchronization

• Less code, higher programmer productivity

Details @ http://www.nvidia.com/object/software-for-tesla-products.html

Version 2.0for applications that communicate

within a node

Page 19: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

GPUDirect v2.0: Peer-to -Peer Communication

Direct Access Direct Transfers

GPU1

GPU1

Memory

GPU0

GPU0

Memory

Load / Store cudaMemcpy()

GPU0

GPU0

Memory

GPU1

GPU1

Memory

PCI-e PCI-e

Page 20: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

GPUDirect v2.0: Peer-to -Peer Communication

Direct communication between GPUsFaster - no system memory copy overheadMore convenient multi-GPU programming

Direct TransfersCopy from GPU 0 memory to GPU 1 memoryWorks transparently with UVA

Direct AccessGPU0 reads or writes GPU 1 memory (load/store)

Supported only on Tesla 20-series (Fermi)64-bit applications on Linux and Windows TCC

Page 21: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

EchelonNVIDIA’s Extreme-Scale Computing Project

Page 22: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

Optimize the Storage Hierarchy2

Tailor Memory to the Application3

Data Movement Dominates Power1

Power is THE Problem

Page 23: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

Applications with Hierarchical Reuse Want a Deep Storage Hierarchy

P P P P P P P P P P P P P P P P

L2 L2 L2 L2

L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1

L3

L4

Page 24: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

Applications with Plateaus Want a Shallow Storage Hierarchy

P P P P P P P P P P P P P P P P

NoC

L2 L2 L2 L2

L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1

Page 25: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

Configurable Memory Can Do BothAt the Same Time

Flat hierarchy for large working setsDeep hierarchy for reuse“Shared” memory for explicit managementCache memory for unpredictable sharing

P

L1

SRAM SRAM SRAM SRAM

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

NoC

Page 26: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

Lane - DFMAs, 20 GFLOPS

P P P P P P P P

Switch

L1$

SM - 8 lanes, 160 GFLOPS

1024 SRAM Banks, 256KB each

NIMC MC

SM SM SM SM

NoC

SM LP LP

SRAM SRAM SRAM

Chip – 128 SMs, 20.48 TFLOPS + 8 Latency Processors

GPU Chip20TF DP256MB

GPU Chip20TF DP256MB

1.4TB/sDRAM BW

150GB/sNetwork BW

DRAMStack

DRAMStack

DRAMStack

NVMemory

Node MCM – 20 TFLOPS + 256 GB

Echelon Architecture

Page 27: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

Echelon System Sketch

Self-Aware OS

Self-Aware Runtime

Locality-AwareCompiler & Autotuner

Echelon System , 400 Cabinets, 1 EF, 15 MW)Cabinet 0 (C0) , 16 Modules, 2.6PF, 205TB/s, 32TB

Module 0 (M)) , 8 Nodes, 160TF, 12.8TB/s, 2TB M15Node 0 (N0) 20TF, 1.6TB/s, 256GB

Processor Chip (PC)

L0

C0

SM0

L0

C7

NoC

SM127

MC NICL20 L21023

DRAMCube

DRAMCube

NV RAM

High-Radix Router Module (RM)

CN

Dragonfly Interconnect (optical fiber)

N7

LC0

LC7

Page 28: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

GPU Computing Enables Ex aScaleAt Reasonable Power2

The GPU is the ComputerA general purpose computing engine, not just an accelerator3

GPU Computing is #1 TodayOn Top 500 AND Dominant on Green 5001

GPU Computing is the Future

The Real Challenge is Software4

Page 29: Supercomputing with NVIDIA GPUs - t-systems-sfr.com · Supercomputing with NVIDIA GPUs HPCN Workshop, May, ... released first Tesla and CUDA in 2007 ... ANSYS Mechanical AFEA

© NVIDIA Corporation 2011

Supercomputing with NVIDIA GPUsHPCN Workshop, May, 2011Axel Koehler- NVIDIA