GPU-based High-Performance Simulations for...

31
Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center GPU-based High-Performance Simulations for Spintronics Jan Jacob 1 , Darren Schmidt 2 , Qing Ruan 2 , Lothar Wenzel 2 , Vivek Amin 3 , and Jairo Sinova 3 1 University of Hamburg, Institute of Applied Physics, Hamburg, Germany 2 National Instruments, Austin, TX, USA 3 Texas A&M University, Department of Physics and Astronomy, College Station, TX, USA NVIDIA GPU Technology Conference, San Jose, CA, USA, May 14-17, 2012

Transcript of GPU-based High-Performance Simulations for...

Page 1: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

GPU-based High-Performance Simulations

for Spintronics Jan Jacob1, Darren Schmidt2, Qing Ruan2, Lothar Wenzel2, Vivek Amin3, and Jairo Sinova3

1University of Hamburg, Institute of Applied Physics, Hamburg, Germany

2National Instruments, Austin, TX, USA 3Texas A&M University, Department of Physics and Astronomy, College Station, TX, USA

NVIDIA GPU Technology Conference, San Jose, CA, USA, May 14-17, 2012

Page 2: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Overview

Introduction to the Underlying Physics

The Basic Algorithm

Optimizations

Benchmarks

Multicore-CPUs

NVIDIA Tesla GPUs

March 2012 GTC 2012, San Jose, May 14-17 2012

2

Page 3: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

The Physics

March 2012 GTC 2012, San Jose, May 14-17 2012

3

Transport of Charge / Spin / Heat, etc. through a Scattering Region can be described by Schrödinger‘s Equation HΨ = EΨ

More complex structures: numerically solve in a tight-binding model (only nearest neighbor interaction)

Hamiltonian H becomes a matrix

Commonly used approach: Green‘s Function Method

Page 4: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

More details

Conductivity can be obtained by:

The scattering matrix can be determined by:

Discretized version:

Differential Operator -> Matrix Operator Definition of derivatives:

March 2012 GTC 2012, San Jose, May 14-17 2012

4

Page 5: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

More details 2

Green‘s function of the system (including self-energy term to describe the leads):

Transmission is then obtained by:

March 2012 GTC 2012, San Jose, May 14-17 2012

5

Page 6: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

The Algorithm

1. The Hamiltonian Matrix defines the System

2. The Transverse Modes define the Occupied States

3. The Self-Energies describe the Contact Leads

4. The Green‘s Function of the System describes it‘s Scattering Properties

5. The Γ Matrices connect the Leads to the System

6. The Transmission and Reflection is obtained by Multipliying the G and Γ Matrices.

1. Define Hamiltonian Matrix (User Input)

2. Obtain Transverse Modes (Calculate Eigensystem)

3. Obtain Self-Energies (Scalar Operations)

4. Obtain Green‘s Function (Matrix Inversion)

5. Obtain Γ (Scalar Operations)

6. Obtain Transmission (Matrix Multiplication)

March 2012 GTC 2012, San Jose, May 14-17 2012

6

Page 7: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Optimizations

Step 1 requires to create large matrices memory issues!

Can be reduces by using extreme sparsity of the matrices

Creating only small blocks of the large matrices right, when they are needed is even more efficent

Step 4 requires to invert these large matrices computational issues

Main issue of the algorithm, will be addressed in detail on the next slides

1. Define Hamiltonian Matrix (User Input)

2. Obtain Transverse Modes (Calculate Eigensystem)

3. Obtain Self-Energies (Scalar Operations)

4. Obtain Green‘s Function (Matrix Inversion)

5. Obtain Γ (Scalar Operations)

6. Obtain Transmission (Matrix Multiplication)

March 2012 GTC 2012, San Jose, May 14-17 2012

7

Page 8: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Our View Of The Computational Map P

rob

lem

Siz

e

Cycle Time (Maximum Allowed)

10 ms

100 ms

1 ms

1 s

FPG

A

CPU

GPU

RT-GPU CPU

or GPU

----------

Power vs. $$$

March 2012 8 GTC 2012, San Jose, May 14-17 2012

Page 9: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Algorithm 0.

March 2012 GTC 2012, San Jose, May 14-17 2012

9

Page 10: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Algorithm 1.

March 2012 GTC 2012, San Jose, May 14-17 2012

10

1. Use optimized, multicore-ready inversion and multiplication algorithms

Intel MKL wrapped into LabVIEW via High-Performance Analysis Library (beta)

Page 11: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Algorithm 2.

March 2012 GTC 2012, San Jose, May 14-17 2012

11

2. Use sparsity of the matrices

PARDISO direct sparse linear solver

Page 12: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Optimizations for the matrix inversion

3. Use block-tridiagonal solver

Roll-your-own

March 2012 GTC 2012, San Jose, May 14-17 2012

12

Page 13: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Algorithm 3.

March 2012 GTC 2012, San Jose, May 14-17 2012

13

3. Use block-tridiagonal solver

Roll-your-own

Page 14: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Optimizations for the matrix inversion

4. Make use of the fact that not the full matrix is needed for the result

Improved block-tridiagonal solver, that only calculates the necessary blocks

March 2012 GTC 2012, San Jose, May 14-17 2012

14

Page 15: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Algorithm 4.

March 2012 GTC 2012, San Jose, May 14-17 2012

15

Page 16: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Optimizations for the matrix inversion 1. Use optimized, multicore-ready inversion and multiplication algorithms

Intel MKL wrapped into LabVIEW via High-Performance Analysis Library (beta)

2. Use sparsity of the matrices

PARDISO direct sparse linear solver

3. Use block-tridiagonal solver

Roll-your-own

4. Make use of the fact that not the full matrix is needed for the result

Improved block-tridiagonal solver, that only calculates the necessary blocks

5. Implement a highly parallel version of the improved block-tridiagonal solver

Pipelining

March 2012 GTC 2012, San Jose, May 14-17 2012

16

Page 17: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Algorithm 5.

March 2012 GTC 2012, San Jose, May 14-17 2012

17

Page 18: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Transfer of the Final Algorithm to GPUs

LabVIEW GPU Analysis Toolkit (alpha; public beta release soon) provides CUDA Functionality in LabVIEW

(Wrapper)

The Algorithm is that Memory-Efficient that the whole Problem can be uploaded to the GPU

(low time-losses due to data transfer between CPU and GPU)

Further improvement of performance expected

March 2012 GTC 2012, San Jose, May 14-17 2012

18

Page 19: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Benchmark - Environment

IBM idataplex M360 computing server

2x Intel 6-core Xeon X5650 @2.67 GHz

48 GB RAM

2 NVIDIA Tesla M2070 GPUs with 3 GB RAM

Windows 2008 Server Enterprise

LabVIEW 2011 64-Bit

High-Performance Analysis Library Toolkit (64-bit beta)

GPU Analysis Toolkit (alpha)

March 2012 GTC 2012, San Jose, May 14-17 2012

19

Page 20: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Benchmarks – CPU

March 2012 GTC 2012, San Jose, May 14-17 2012

20

Page 21: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Benchmarks – CPU & GPU

March 2012 GTC 2012, San Jose, May 14-17 2012

21

Page 22: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Summary - NEGF

The well known and commonly used Non-Equilibrium Green‘s Function approach for

Simulations of Transport in Nanostructures can be siginificantly optimized

It‘s implementation on Multicore-CPUs as well as GPU has been demonstrated with

significant speed-up compared to the basic algorithm

The presented Basic Algorithm for 2D Transport of Charges can analogously be

expanded to 3D and additional degrees of freedom

March 2012 GTC 2012, San Jose, May 14-17 2012

22

Page 23: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Current Project – multi-GPU stabilized transfer-matrix algorithm

Similar and very flexible algorithm to

compute transport properties of

nanostructures

Main part of the algorithm:

March 2012 GTC 2012, San Jose, May 14-17 2012

23

Page 24: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Calculation of C2 (per iteration)

Page 25: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Calculation of D2 (per Iteration)

Page 26: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Calculation of C1 (per iteration)

Page 27: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Calculation of D1 (per iteration)

Page 28: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

GPU 1

Page 29: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

GPU 2

Page 30: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Page 31: GPU-based High-Performance Simulations for Spintronicson-demand.gputechconf.com/gtc/2012/presentations/S0379-GPU-Base… · GPU-based High-Performance Simulations for Spintronics

Jan Jacob Institute of Applied Physics and Advanced Microstructure Research Center

Thank you for your attention!

Financial support by the German Science Foundation DFG via

Research Training Group 1286 “Functional Metal-Semiconductor Hybrid-Systems”

and DFG-Project Me916/11-1 “Spin-filter cascades in InAs heterostructures”,

by the Free and Hanseatic City of Hamburg via the Excellence Cluster “Nanospintronics”,

by the Office of Naval Research via ONR-N00014110780,

and by the National Science Foundation by NSF-MRSEC DMR-0820414, NSF-DMR-1105512, NHARP

31 March 2012 GTC 2012, San Jose, May 14-17 2012