Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a...

17
Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter semester 2016/17

Transcript of Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a...

Page 1: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Final PresentationProject Dirac Equation in (1+1)-D

on a staggered gridwith perfectly matched boundary

High Performance Computing 1Seminar

Winter semester 2016/17

Page 2: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Short Recaptiulation

● Solution of the Dirac Equation on a staggered grid for spinors u and v

to avoid the Fermion-doubling problem, see e.g. [Gattringer/Lang: Quantum Chromodynamics on the Lattice, Springer 2010, p. 110f]

Page 3: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Perfectly Matched Layer Method

● Wave solution become unphysically reflected at the boundaries

● Perfectly Matched Boundary Method absorbs wave functions as the approach the boundary

● Method: Analytic continuation of the spatial coordinate into the complex plane:

● Exponential decay of wave function:

Page 4: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Dirac Equation with PML

Page 5: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Discretized Dirac Equation on the staggered grid

Page 6: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Numerical Solution for initial Gaussian wave package

Page 7: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Algorithm

● 4 equations for 2 spinors u, v, and 2 auxiliary fields psi_u, psi_v

● Corresponds to 4 for-loops (temporal integration) over the 4 quadratures

● Temporal quadrature for u and psi_u as well as v and psi_v can be combined: loop merging

Page 8: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Resolving Dependencies

First for-loop Second for-loop

Page 9: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Code optimization

● Version 1.0 original pure Python code

● Version 1.1 loop-merging● Version 2.0 transfer of

quadrature to Fortran subroutine (Python-Fortran wrapper code)

● Version 2.1 Implementation of OpenMP git version history

Page 10: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Loop merging

Page 11: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Interfacing Python with Fortran subroutine

● Program control (parameters such as grid size, inis, bcs) with Python

● Quadrature is transferred to Fortran subroutine called from Python (using f2py, generating shared object file, which can be accessed like a Python module)

● Result is returned to Python main script for postprocessing (plotting, movie, postprocessing the result, ...)

● Extreme speedup by transfer of the quadrature to Fortran

Compilation of Fortran source code with f2py:

#!/usr/bin/env python

import os

cmd0 = "rm solve_DiracEqn_on_the_lattice.so"

cmd1 = "f2py -c --fcompiler=gnu95 --f90flags='-fopenmp' -lgomp -m " \

+ "solve_DiracEqn_on_the_lattice solve_DiracEqn_on_the_lattice.f95"

cmd2 = "time ./DiracEqn1p1D_fortran_interface.py"

os.system(cmd0)

os.system(cmd1)

Page 12: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Calling the Fortran subroutine from Python

Page 13: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Computation time of pure Python code vs Python-Fortran interface code

Page 14: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Code Parallelization with OpenMP

● Implementation of OpenMP in Fortran● Only the quadrature (2 temporal do-

loops) are parallelized● Spatial grid size N_x = 512 chosen to

be multiple of 1, 2, 4, 8, 16 to allow for equal chunk sizes of spatial sections (figure on the right)

● Computation time measured only for this Fortran subroutine, not including the serial part

● Code run on 12-core machine 143.50.47.128 10 times for each setting of gridsize and number of cores – minimum taken for determining the speedup Thread no. 1 2 3 4 ...

Page 15: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Scheduled static do loops

● Spinors and auxiliary fields are shared between threads, only loop index is private

● Threads process chunks of size (N_x – 2)/num_cores

● Forcing ordered sequence not strictly needed

● omp do ordered (finally commented out here) does not lead to speedup while omp do schedule(static) does

Page 16: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Computation time vs. number of threads

Page 17: Final Presentation - uni-graz.at...Final Presentation Project Dirac Equation in (1+1)-D on a staggered grid with perfectly matched boundary High Performance Computing 1 Seminar Winter

Comparison of original pure Python code with OpenMP-Fortran version