4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1...

Post on 12-Jan-2016

212 views 0 download

Transcript of 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1...

4/18/00 Spring 2000 FFTw workshop 1

AHPCC/NCSA WORKSHOPFast Fourier Transform Using FFTw

Guobin Ma1 (gbma@ahpcc.unm.edu),

Sirpa Saarinen2 (sirpa@ncsa.uiuc.edu),

and Paul M. Alsing1 (alsing@ahpcc.unm.edu),1AHPCC, 2NCSA

http://www.ahpcc.unm.edu/Workshop/FFTW

4/18/00 Spring 2000 FFTw workshop 2

ContentsFFT basic (Paul)

What is FFT and why FFT

FFTwOutline of FFTW (Guobin)

Characteristics C routines

Performance and C example codes (Sirpa) Fortran wrappers and example codes (Guobin)

Exercises (skipped)

4/18/00 Spring 2000 FFTw workshop 3

FFT Basic

What is FFT and why FFT

by Paul Alsing

4/18/00 Spring 2000 FFTw workshop 4

Fourier Transform: frequency analysis of time series data.DFT: Discrete Fourier Transform (N time/freq points)

FFT: Fast Fourier Transform: efficient implementation ~O(Nlog2N)

12/,,2/,

1,,1,0,,

1

2

1 1

0

/22

1

0

/22

NNntN

nf

Nktktthh

eHN

hdefHth

ehHdtethfH

n

kkk

N

n

Nnkink

tfi

N

k

Nnkikn

tfi

4/18/00 Spring 2000 FFTw workshop 5

Aliasing issues:

Let fc = Nyquist Frequency

= 1/(2t). A sine wave

sampled at fc will be sampled at

2 points, the peak and the trough.

Frequency components f > | fc |

will be falsely folded back into

the range -fc < f < fc.

4/18/00 Spring 2000 FFTw workshop 6

Fourier Transform: radix 2, Danielson-Lanczos

sh

nHsh

nH

HWH

eWheWhe

hehe

heH

k

onk

en

on

nen

NinN

kk

NnkinN

kk

Nnki

N

kk

NnkiN

kk

Nnki

N

kk

Nnkin

' original theof components odd thefrom formed N/2length of

FT theofcomponent th theis ;' original theof componentseven

thefrom formed N/2length of FT theofcomponent th theis where

, /212/

0

2//212/

0

2//2

12/

0

/12212/

0

/22

1

0

/2

4/18/00 Spring 2000 FFTw workshop 7

Fourier Transform: radix 2, Danielson-Lanczos (cont.)

8/length of are ,,,,,,,

4/length of are ,,,

2/length of are ,

Nlength of is

steps8log,

,

,

2424

424

22

NHHHHHHHH

NHHHH

NHH

H

NHWHWHWHW

HWHWHWH

HWHWHWH

HWHH

ooon

oeon

eoon

eeon

ooen

oeen

eoen

eeen

oon

oen

eon

een

on

en

n

oooon

nooen

noeon

noeen

n

eoon

neoen

neeon

neeen

oon

noen

neon

neen

on

nenn

4/18/00 Spring 2000 FFTw workshop 8

Fourier Transform: radix 2, butterfly Cooley-Tukey algorithm

We finally get down to 1-point transforms such as

The question is: which value of m corresponds to which pattern

of e’s and o’s?

The answer is:

Let {e=0,o=1}. Reverse the pattern of e’s and o’s and you will

have the value of m in binary.

1-Nm 0 somefor e)input valu (some moeeeoeeoeo

n hH

4/18/00

Bit reversal:The Cooley-Tukeyalgorithm first rearranges the datain bit reversed form,then builds up thetransform in

N log2N iterations

(decimation in time).

eee

eeo

eoe

eoo

oee

oeo

ooe

eee

eee

eeo

eoe

eoo

oee

oeo

ooe

eee

4/18/00 10

Ordering oftime series(coord space)and frequenciesin fourier (momentum) space.

11

Example Application: Quantum MechanicsPropagation of (dimensionless) Schrodinger Wave Function

tk

tk

e

e

tke

tx

tx

e

e

txe

txeee

VTHtxettx

txHtxtxVx

tx

t

txi

Ntki

tki

tiT

NttxiV

ttxiV

tiV

tTitiVtiT

tHi

N

N

,ˆ space, (momentum)fourier In

,

,

,,space coordinateIn

,

,,,

0,,,,,

1

2/2/1

2/2/1

2/

1

,

,

2/2/

2

2

2

21

1

4/18/00

x

y

y

x

transpose

Transpose data to keepy transforms continguousin memory.

x data is contiguous in memory (Fortran)

Serial FFTs

transposeIn parallel, all x transformsare local operations on eachprocessor (no communication)

In performing the transposeprocessors must perform anAll-to-All communication.

Parallel FFTs

y

xP0 P3P1 P2

x

y P2P0 P1 P3

4/18/00 Spring 2000 FFTw workshop 14

Outline of FFTw

By Guobin Ma

4/18/00 Spring 2000 FFTw workshop 15

Characteristics of FFTwC routines generated by Caml-Light ML1D/nD, real/complex dataArbitrary input size, not necessary 2n

Serial/Parallel, Share/Distributed MemoryFaster than all others, high performancePortable, automatically adapt to machine

4/18/00 Spring 2000 FFTw workshop 16

Two Phases of FFTwHardware dependent algorithmPlanner

‘Learn’ the fast way on your machineProduce a data structure --‘plan’Reusable

ExecutorCompute the transform

Apply to all FFTw operation modes 1D/nD, complex/real, serial/parallel

4/18/00 Spring 2000 FFTw workshop 17

C Routines of FFTwRoutines

1D/nD complex1D/nD realCorresponding parallel (MPI) ones

ArgumentsSpecial notesData formats

4/18/00 Spring 2000 FFTw workshop 18

1D Complex TransformTypical call

#include <fftw.h>…{ fftw_complex in[N], out[N]; fftw_plan p; … p = fftw_create_plan(int n, fftw_direction dir, int flags); … fftw_one(p, in, out); … fftw_destroy_plan(p);}

4/18/00 Spring 2000 FFTw workshop 19

1D Complex Transform (cont.) Routines

fftw_plan fftw_create_plan(int n, fftw_direction dir, int flags);

void fftw_one(fftw_plan plan, fftw_complex *in, fftw_complex *out);

fftw_plan fftw_create_plan_specific(int n, fftw_direction dir, int flags,

fftw_complex *in, int istride,fftw_complex *out, int ostride);

4/18/00 Spring 2000 FFTw workshop 20

1D Complex Transform (cont.) Routines (cont.)

void fftw(fftw_plan plan, int howmany,fftw_complex *in, int istride, int

idist, fftw_complex *out, int ostride, int odist);

fftw_destroy_plan(fftw_plan plan);

4/18/00 Spring 2000 FFTw workshop 21

1D Complex Transform (cont.) Arguments

plan: data structure containing all the information

n: data size

dir: FFTW_FORWARD (-1), FFTW_BACKWORD (+1)

flags: FFTW_MESURE, FFTW_ESTIMATE, FFTW_OUT_PLACE,FFTW_IN_PLACE, FFTW_USE_WISDOM, separated

by |

howmany: number of transforms / input arrays

in, istride, idist: input arrays, in[i*istride+j*idist]

out, ostride, odist: output arrays, ...

4/18/00 Spring 2000 FFTw workshop 22

1D Complex Transform (cont.) Notes

out of place (default), in[N], out[N]

in place, save memory, cost more timeignore ostride and odist; ignore out

in-order output, 0 frequency at out[0]

unnormalized, factor of N

4/18/00 Spring 2000 FFTw workshop 23

nD Complex TransformRoutines, similar to 1D case, except …

fftwnd_plan fftwnd_create_plan(int rank, const *int n, fftw_direction dir, int flags);

void fftwnd_one(fftwnd_plan plan, , );

fftwnd_plan fftw_create_plan_specific(int rank, const *int n, fftw_direction dir, , , , , );

void fftwnd(fftwnd_plan plan, , , , , , , );

fftwnd_destroy_plan(fftwnd_plan plan);

4/18/00 Spring 2000 FFTw workshop 24

nD Complex Transform (cont.)Arguments

rank: dimensionality of the arrays to be transformed

n: pointer to an array of rank - size of each dimension, e.g. n[8,4,5]

row-major for C, column-major for Fortran

Special routines for 2D and 3D cases

nd -> 2d, 3d

n_dim -> nx, ny or nx, ny, nz

4/18/00 Spring 2000 FFTw workshop 25

1D Real TransformRoutines, similar to 1D complex case, except …

rfftw_plan rfftw_create_plan( , , );

void rfftw_one(rfftw_plan plan, fftw_real *in, fftw_real *out);

rfftw_plan rfftw_create_plan_specific(int n, fftw_direction dir, int flags, fftw_real *in, int istride, fftw_real *out, int ostride);

void rfftw(rfftwnd_plan plan, int howmany, fftw_real *in, int istride, int idist, fftw_real *out, int ostride, int odist);

rfftw_destroy_plan(rfftw_plan plan);

4/18/00 Spring 2000 FFTw workshop 26

1D Real Transform (cont.)Arguments

dir: FFTW_REAL_TO_COMPLEX = FFTW_FORWARD = -1 FFTW_COMPLEX_TO_REAL = FFTW_BACK_WARD = 1

others have the same meaning as before

4/18/00 Spring 2000 FFTw workshop 27

nD Real TransformRoutines, similar to 1D real case, but …

rfftwnd_plan rfftwnd_create_plan(int rank, const *int n, fftw_direction dir, int flags);

void rfftwnd_one_real_to_complex(rfftwnd_plan plan, fftw_real *in, fftw_complex *out);

void rfftwnd_one_complex_to_real(rfftwnd_plan plan, fftw_complex *in, fftw_real *out);

void rfftwnd_real_to_complex(rfftwnd_plan plan, int howmany, fftw_real *in, int istride, int idist, fftw_complex *out, int ostride, int odist);

4/18/00 Spring 2000 FFTw workshop 28

nD Real Transform (cont.)Routines (cont.)

void rfftwnd_complex_to_real(rfftwnd_plan plan, int howmany, fftw_complex *in, int istride, int idist, fftw_real *out, int ostride, int odist);

rfftwnd_destroy_plan(rfftwnd plan);

Special 2D and 3D routines

4/18/00 Spring 2000 FFTw workshop 29

nD Array Format

nD arrays stored as a single contiguous blockC order, Row-major order

First index most slowly, last most quickly

Fortran order, Column-major orderFirst index most quickly, last most slowly

Static Array - no problemDynamic Array - may have problem in nD case

4/18/00 Spring 2000 FFTw workshop 30

Parallel FFTw

Multi-thread Skipped

MPI nD complex

RoutinesNotesData Layout

1D complexnD real

4/18/00 Spring 2000 FFTw workshop 31

nD Complex MPI FFTwRoutines, similar to uniprocessor case, except mpi…

fftwnd_mpi_plan fftwnd_create_plan(mpi_comm comm, int rank, const *int n, fftw_direction dir, int flags);

void fftwnd_mpi_local_size(fftwnd_mpi_plan p, int *local_first, int *local_first_start, int *local_second_after_transpose, int *local_second_start_after_transpose, int *total_local_size);

local_data = (fftw_complex*) malloc(sizeof(fftw_complex) * total_local_size);

work = (fftw_complex*) malloc(sizeof(fftw_complex) * total_local_size);

4/18/00 Spring 2000 FFTw workshop 32

nD Complex MPI FFTw (cont.)

Routines (cont.)

void fftwnd_mpi(fftwnd_mpi_plan p, int n_fields, fftw_complex *local_data, fftw_complex *work, fftw_mpi_output_order output_order);

void fftw_mpi_destroy_plan(fftwnd_mpi_plan p);

4/18/00 Spring 2000 FFTw workshop 33

nD Complex MPI FFTw (cont.)Notets

First argument: comm - MPI communicatorData layoutAll fftw_mpi are in-placework:

Optional, Same size as local_data, great efficiency by extra storage

output_order: normal/transposedtransposed: performance improvements, need to reshape the data manually, may have problem sometimes

4/18/00 Spring 2000 FFTw workshop 34

nD Complex MPI FFTw (cont.)Data layout

Distributed dataDivided according to row (1st dimension) in CDivided according to column (last dimension) in Fortran

Given plan, all other parameters regarding to data layout are determined by fftwnd_mpi_local_sizetotal_local_size = n1/np*n1*n2…*nk*n_fieldstransposed_order: n2 will be the 1st dimension in output

inverse transform n[n2,n1,n3,...,nk]

4/18/00 Spring 2000 FFTw workshop 35

1D Complex MPI FFTw Routines, similar to nD case, except no nd…

fftw_mpi_plan fftw_create_plan(mpi_comm comm, int n, fftw_direction dir, int flags);

void fftw_mpi_local_size(fftw_mpi_plan p, int *local_n, int *local_n_start, int *local_n_after_transpose, int *local_start_after_transpose, int *total_local_size);

4/18/00 Spring 2000 FFTw workshop 36

1D Complex MPI FFTw (cont.) Routines (cont.)

void fftw_mpi(fftw_mpi_plan p, int n_fields, fftw_complex *local_data, fftw_complex *work, fftw_mpi_output_order output_order);

void fftw_mpi_destroy_plan(fftw_mpi_plan p);

Generally worse speedup than nD, fit large size

4/18/00 Spring 2000 FFTw workshop 37

nD Real MPI FFTw

Similar to that for uniprocessor and complex MPI Speedup 2, save 1/2 space at the expense of more complicated data formatCan have transposed-order output dataNo 1D Real MPI FFTw

4/18/00 Spring 2000 FFTw workshop 38

Break

4/18/00 Spring 2000 FFTw workshop 39

FFTw Performance

By Sirpa Saarinen

http://www.ncsa.uiuc.edu/MEDIA/agppt/myFFTW2.ppt

4/18/00 Spring 2000 FFTw workshop 41

FFTW Fortran Wrappersand Example Codes

By Guobin Ma

4/18/00 Spring 2000 FFTw workshop 42

FFTw Fortran-Callable WrappersRoutine names, append _f77 in C routine names

fftw/fftwnd/rfftw/rfttwnd ->

fftw_f77/fftwnd_f77/rfftw_f77/rfttwnd_f77fftw_mpi/fftwnd_mpi -> fftw_f77_mpi/fftwnd_f77_mpie.g. fftwnd_create_plan(3, n_dim, FFTW_FORWARD, FFTW_ESTIMATE | FFTW_IN_PLACE)

-> fftwnd_f77_create_plan(plan, 3, n_dim, FFTW_FORWARD, FFTW_ESTIMATE + FFTW_IN_PLACE)

4/18/00 Spring 2000 FFTw workshop 43

FFTw Fortran-Callable WrappersNotes

Any function that returns a value is converted into a subroutines with an additional (first) parameter. No null in Fortran, must allocate and pass an array for out. nD arrays, column-major, Fortran orderplan variables: be declared as integer

ConstantsFFTW_FORWARD, FFTW_BACKWARD, FFTW_IN_PLACE, …

separated by ‘+’ instead of ‘|’In file fortran/fftw_f77.i, fftw_f90.i, fftw_f90_mpi.i

4/18/00 Spring 2000 FFTw workshop 44

Fortran ExamplesSource codes at AHPCC (tested on Turing, BB, SGI):

~gbma/workshop/fftw/codes orhttp://www.arc.unm.edu/~gbma/Workshop/FFTW/codesComplex data

1D serial, fftw_1d.f901D parallel, fftw_1d_p.f90nD serial, fftw_3d.f90nD Parallel

Normal order, fftw_3d_p_n.f90 Transposed order, fftw_3d_p_t.f90

4/18/00 Spring 2000 FFTw workshop 45

Fortran Examples (cont.)1D case

Input

Forward output Inverse output

nD caseInput

Forward outputInverse output

2

2)(

N

N

ikxdkkexf

)1,...12,2,12,...,,...,2,1,0()( NNNkkF)(xf

zyxzkykxki

zyx dkdkdkekkkzyxf zyx )(),,(

),,( zyxf

)1,...,,...,2,1,0(),,( zyxzyx kkkkkkF

4/18/00 Spring 2000 FFTw workshop 46

1D Serial Fortran ExampleFFTw codes

...

call fftw_f77_create_plan(plan_forward,N, &

FFTW_FORWARD, FFTW_ESTIMATE)

call fftw_f77_create_plan(plan_reverse,N, &

FFTW_BACKWARD,FFTW_ESTIMATE)

...

call fftw_f77_one(plan_forward,in,out)

...

call fftw_f77_one(plan_reverse,out,in)

...

call fftw_f77_destroy_plan(plan_forward)

call fftw_f77_destroy_plan(plan_reverse)

4/18/00 Spring 2000 FFTw workshop 47

1D Parallel Fortran ExampleFFTw codes

...

call fftw_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,N, &

FFTW_FORWARD,FFTW_ESTIMATE)

...

call fftw_f77_mpi_local_sizes(p_fwd, local_n, local_start, &

local_n_after_trans, local_start_after_trans, total_local_size)

...

allocate( psi_local(0:total_local_size-1) )

...

allocate( work(0:total_local_size-1) )

4/18/00 Spring 2000 FFTw workshop 48

1D Parallel Fortran Example (cont.)FFTw codes (cont.)

...

call fftw_f77_mpi(p_fwd,1,psi_local,work,USE_WORK)

...

call fftw_f77_mpi_destroy_plan(p_fwd)

...

call fftw_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD,N, &

FFTW_BACKWARD,FFTW_ESTIMATE)

...

call fftw_f77_mpi(p_rvs,1,psi_local,work,USE_WORK)

...

call fftw_f77_mpi_destroy_plan(p_rvs)

4/18/00 Spring 2000 FFTw workshop 49

nD Serial Fortran ExampleFFTw codes

call fftwnd_f77_create_plan(p_fwd,nd,n_dim, &

FFTW_FORWARD,FFTW_ESTIMATE + FFTW_IN_PLACE)

call fftwnd_f77_one(p_fwd,psi,0)

call fftwnd_f77_destroy_plan(p_fwd)

call fftwnd_f77_create_plan(p_rvs,nd,n_dim, &

FFTW_BACKWARD,FFTW_ESTIMATE + FFTW_IN_PLACE)

call fftwnd_f77_one(p_rvs,psi,0)

call fftwnd_f77_destroy_plan(p_rvs)

4/18/00 Spring 2000 FFTw workshop 50

nD Parallel Fortran Example FFTw codes, normal order, nD local array

n_dim(1)=nx; n_dim(2)=ny; n_dim(3)=nz

call fftwnd_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,&

nd,n_dim,FFTW_FORWARD,FFTW_ESTIMATE)

call fftwnd_f77_mpi_local_sizes(p_fwd, local_nlast, &

local_last_start, local_nlast2_after_trans, &

local_last2_start_after_trans, total_local_size)

allocate( psi_local(0:nx-1,0:ny-1,0:local_nlast-1) )

allocate( work(0:nx-1,0:ny-1,0:local_nlast-1) )

4/18/00 Spring 2000 FFTw workshop 51

nD Parallel Fortran Example (cont.) FFTw codes, normal order, nD local array (cont.)

call fftwnd_f77_mpi(p_fwd,1,psi_local,work,USE_WORK,order)

call fftwnd_f77_mpi_destroy_plan(p_fwd)

call fftwnd_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD, &

nd,n_dim,FFTW_BACKWARD,FFTW_ESTIMATE)

call fftwnd_f77_mpi(p_rvs,1,psi_local,work,USE_WORK,order)

call fftwnd_f77_mpi_destroy_plan(p_rvs)

4/18/00 Spring 2000 FFTw workshop 52

nD Parallel Fortran Example (cont.) FFTw codes, transposed order, 1D local array

n_dim(1)=nx; n_dim(2)=ny; n_dim(3)=nz

call fftwnd_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,&

nd,n_dim,FFTW_FORWARD,FFTW_ESTIMATE)

call fftwnd_f77_mpi_local_sizes(p_fwd, local_nlast, &

local_last_start, local_nlast2_after_trans, &

local_last2_start_after_trans, total_local_size)

allocate( psi_local(0:total_local_size-1) )

allocate( work(0:total_local_size-1) )

4/18/00 Spring 2000 FFTw workshop 53

nD Parallel Fortran Example (cont.) FFTw codes, transposed order, 1D local array (cont.)

call fftwnd_f77_mpi(p_fwd,1,psi_local,work,USE_WORK,order)

call fftwnd_f77_mpi_destroy_plan(p_fwd)

n_dim(1)=nx; n_dim(2)=nz; n_dim(3)=ny

call fftwnd_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD, &

nd,n_dim,FFTW_BACKWARD,FFTW_ESTIMATE)

call fftwnd_f77_mpi(p_rvs,1,psi_local,work,USE_WORK,order)

call fftwnd_f77_mpi_destroy_plan(p_rvs)

4/18/00 Spring 2000 FFTw workshop 54

nD Parallel Fortran Example (cont.) Notes

Normal orderEasy to code, ‘low’ performance

Transposed order‘High’ performance, complicated to code, user reorder data

Use-workHigh efficiency, large memory space

4/18/00 Spring 2000 FFTw workshop 55

Run the Examples at AHPCC Copy files to your directory

cp ~gbma/workshop/fftw/codes/*.* .Compile

make filename.turmake filename.bbmake filename.sgiwith link specification -lfftw -lfftw_mpi (only for MPI)

RunBB: qsub -I -l nodes=2

mpirun -np 2 -machinefile $PBS_NODEFILE filename.bbTuring: filename.turSGI: mpirun -np 2 filename.sgi

4/18/00 Spring 2000 FFTw workshop 56

References Numerical Recipe (FOTRAN)

by / William T. Vetterling et al., New York : Cambridge University Press, 1992

Numerical integration by P. J. Davis & P. Rabinowitz, Waltham, Mass., Blaisdell Pub. Co. 1967

www.fftw.orgFFTW User’s manual

by M. Frigo & S. G. Johnson

4/18/00 Spring 2000 FFTw workshop 57

Acknowledgement Brain Baltz

installation of FFTw at AHPCCrunning MPI at AHPCC

John Greenfieldsetting up the grid access

Andrew Pinedacomputer work environment at AHPCC

Brain Smith & Susan Atlas many stimulated discussions

Many others ...