A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati...

31
A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1 , Alan Richardson 2 , Terrence Liao 3 , Henri Calandra 3 , Barbara Chapman 1 Data-Intensive Scalable Computing Systems 2012 (DISCS’12) Workshop, November 16, 2012 1 Department of Computer Science, University of Houston 2 Department of Earth, Atmospheric, and Planetary Sciences, MIT 3 Total E&P 1 DISCS'12 Workshop

Transcript of A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati...

Page 1: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 1

A Coarray Fortran Implementation to Support Data-Intensive Application Development

Deepak Eachempati1, Alan Richardson2, Terrence Liao3, Henri Calandra3, Barbara Chapman1

Data-Intensive Scalable Computing Systems 2012 (DISCS’12) Workshop, November 16, 2012

1 Department of Computer Science, University of Houston2 Department of Earth, Atmospheric, and Planetary Sciences, MIT

3 Total E&P

Page 2: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 2

Industry is looking for faster and more cost-effective ways to process massiveamounts of data• more powerful hardware• more productive programming models• innovative software techniques

Oil and Gas Industry: Compute Needs

Page 3: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 3

Outline

• Fortran 2008 parallel processing additions (CAF)

• CAF Implementation in OpenUH Fortran compiler

• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks

Page 4: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 4

Outline

• Fortran 2008 parallel processing additions (CAF)

• CAF Implementation in OpenUH Fortran compiler

• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks

Page 5: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 5

Coarray Model in Fortran 2008

• Derives from Co-Array Fortran (CAF)• SPMD execution model, PGAS memory model– execution entities called images– coarrays: globally-accessible, symmetric data

objects • additional intrinsic subroutines/functions for

querying process and data information• additional statements in language for

synchronization

Page 6: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 6

Working with Distributed Data using Coarrays

… … … … ……

1

2

3

4

M

1 2 3 4 *

real:: B[M, *]

B references local BB[3,4] references local BB[3,3] references B in left

neighbor

Page 7: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 7

Working with Distributed Data using Coarrays

… … … … ……

1

2

3

4

M

1 2 3 4 *

real:: B(10,10)[M, *]

B(2:4,2:4) references local subarray of B

B(2:4,2:4)[3,4] references local subarray of B

B(2:4,2:4)[3,3] references subarray of B in left neighbor

Page 8: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 8

2D Halo Exchange Example with CAF

real :: a(0:R+1, 0:C+1)[pR,*]…a(R+1,1)[top(1),top(2)] = a(1,1:C)

a(0,1:C)[bottom(1),bottom(2)] = a(R,1:C)

a(1:R,0)[right(1),right(2)] = a(1:R,C)

a(1:R,C+1)[left(1),left(2)] = a(1:R,1)

sync all

Page 9: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 9

2D Halo Exchange with MPIreal :: a(0:R+1, 0:C+1)…call mpi_isend( a(1,1:C), C, mpi_real, & top(myp), TAG, ...)call mpi_irecv( a(R+1,1:C), C, mpi_real, & bottom(myp), TAG, ...)call mpi_isend( a(R,1:C), C, mpi_real, & bottom(myp), TAG, ...)call mpi_irecv( a(0,1:C), C, mpi_real, & top(myp), TAG, ...)call mpi_isend( a(1:R,C), R, mpi_real, & right(myp), TAG, ...)call mpi_irecv( a(1:R,0), R, mpi_real, & left(myp), TAG, ...)call mpi_isend( a(1:R,1), R, mpi_real, & left(myp), TAG, ...)call mpi_irecv( a(C+1,1:R), R, mpi_real, & right(myp), TAG, ...)call mpi_waitall( 8, ...)

Page 10: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 10

Outline

• Fortran 2008 parallel processing additions (CAF)

• CAF Implementation in OpenUH Fortran compiler

• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks

Page 11: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 11

Implementation of CAF • OpenUH compiler

– an industry-quality, optimizing compiler based on Open64– features: dependence and data-flow analysis, interprocedural

analysis, OpenMP– backend supports multiple targets (x86_64, IA64, IA32, MIPS, PTX)

Fortran Front-Endwith coarray

support

CAFSource

Code

Coarray Translation

Phase

OpenUHCAF Runtime

Library

Loop OptimizerGlobal Optimizer

Code Gen

exec.

OpenUH Compiler

Page 12: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 12

Runtime Support for CAF

Runtime Interface (libcaf)

1-sided Communication

PGAS Memory Allocation

Synchronization

Collectives Support (e.g. reductions)

Atomics

Portable Communication Substrate: GASNet or ARMCI

Page 13: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 13

Comparison with other Implementations

Compiler Commercial/Free Fortran 2008 Coarray Support?

OpenUH Free Yes

G95 Partially Free, No longer supported

Missing Locks Support

Gfortran Free In progress

Rice CAF 2.0 Free Partially, but adds different features

Cray Fortran Commercial Yes

Intel Fortran Commercial Yes

Page 14: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 14

Outline

• Fortran 2008 parallel processing additions (CAF)

• CAF Implementation in OpenUH Fortran compiler

• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks

Page 15: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 15

Seismic Subsurface Imaging:Reverse Time Migration

• A source wave is emitted per shot• Reflected waves captured by array of sensors• RTM (in time domain) uses finite difference method to

numerically solve wave equation and reconstruct subsurface image (in parallel, with domain decomposition)

Page 16: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 16

RTM Implementations

• Isotropic– simplest model – assumes reflected waves propagate at same speed

in every direction from a point– only swaps faces (8 swaps in halo exchange)

• Tilted Transverse Isotropy (TTI)– assumes waves may propagate at different speeds– swaps faces and edges (18 swaps in halo

exchange)

Page 17: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 17

Typical Data Usage

• Generally several thousand shots– data parallel problem, where each shot can be

processed independently in parallel– each shot handles several GB of data– so, total data to analyze is in terabytes range

• Handling I/O– C I/O reads in velocity and coefficient models– Shot headers read by master and distributed– Each processor writes to a distinct file, and file is

merged in post-processing step

Page 18: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 18

Results for CAF RTM portTotal Domain Size: 1024 x 768 x 512 (3.0 GB, per shot)Forward ShotIsotropic case: up to 32% faster compared to corresponding MPI implementationTTI case: competitive performance with MPI

Page 19: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 19

Results for CAF RTM portTotal Domain Size: 1024 x 768 x 512 (3.0 GB, per shot)Backward ShotIsotropic case: performance hit at 256 procsTTI case: lagging a bit behind MPI

Page 20: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 20

Outline

• Fortran 2008 parallel processing additions (CAF)

• CAF Implementation in OpenUH Fortran compiler

• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks

Page 21: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 21

Extending Fortran for Parallel I/O

• We are currently designing a prototype implementation for a parallel I/O language extension

• Fortran I/O was not yet extended to facilitate cooperative I/O to shared files– original Co-Array Fortran specified a simple

extension to Fortran I/O– parallel I/O may be added in a future version of

the standard

Page 22: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 22

Fortran I/O

• Fortran provides interfaces for formatted and unformatted I/O

record 1

record 2

record 3

record 4

open( 10, file=‘fn’, action=‘write’, & access=‘direct’, recl=k )…write (10, rec=3) A

A

write

file ‘fn’ connected to unit 10

Page 23: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 23

Current limitations of I/O

• Issues:1. no defined, legal way for multiple images to

access the same file2. a file is a 1-dimensional sequence of records3. records are read/written one at a time4. no mechanism for collectives accesses to a

shared file amongst multiple images

Page 24: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 24

Proposed Extension for Parallel I/O

• Allow a file to be “share-opened”, e.g. OPEN( 10, file=‘fn’, TEAM=‘yes’, …)– all images form a team with shared access to the same

file– implicit synchronization

• recommended only for direct access mode• FLUSH statement used to ensure changes by one

image are visible to other images in team• CLOSE statement has implicit image synchronization

Page 25: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 25

Further extensions we’re exploring• Multi-dimensional view of records• Read/write multiple records at a time• Collective read/write operations on

shared files1,1

open( 10, file=‘fn’, action=‘write’, & access=‘direct’, ndim=2, & dims=(/M/), team=‘yes’, recl=k )…

file ‘fn’ connected to unit 10

1,2 1,3 …

2,1 2,2 2,3 …

3,1 3,2 3,3 …

4,1 4,2 4,3 …

5,1 5,2 5,3 …

M,1 M,2 M,3 …

Page 26: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 26

Further extensions we’re exploring• Multi-dimensional view of records• Read/write multiple records at a time• Collective read/write operations on

shared files1,1

write (10, rec_lb=(/ 2,2 /), rec_ub=(/ 4,3 /) ) & A(1:4, 1:2)

file ‘fn’ connected to unit 10

1,2 1,3 …

2,1 2,2 2,3 …

3,1 3,2 3,3 …

4,1 4,2 4,3 …

5,1 5,2 5,3 …

M,1 M,2 M,3 …

A(1:4,1:2)

write

Page 27: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 27

Further extensions we’re exploring• Multi-dimensional view of records• Read/write multiple records at a time• Collective read/write operations on

shared files

1,1

type(T) :: A(2,2)[3,*] …my_rec_lbs = get_rec_lbs( this_image() )my_rec_ubs = get_rec_ubs( this_image() )write_team( 10, rec_lb=my_rec_lbs, & rec_lb=my_rec_lbs) & A(:,:)

file ‘fn’ connected to unit 10

1,2 1,3 1,4

2,1 2,2 2,3 2,4

3,1 3,2 3,3 3,4

4,1 4,2 4,3 4,4

5,1 5,2 5,3 5,4

6,1 6,2 6,3 6,4

A(1:2,1:2)[1,1]

A(1:2,1:2)[2,1]

A(1:2,1:2)[1,2]

A(1:2,1:2)[2,2]

A(1:2,1:2)[3,1] A(1:2,1:2)[3,2]

write_team

Page 28: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 28

Leverage Global Arrays as memory buffers for I/O

• Implementation in progress which utilizes global arrays (GA) as I/O buffers in memory

I/O requests

asynchronous disk updates

compute nodes

I/O nodes

Page 29: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 29

Outline

• Fortran 2008 parallel processing additions (CAF)

• CAF Implementation in OpenUH Fortran compiler

• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks

Page 30: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 30

In Summary

• Fortran coarray model may be used for processing large data sets

• Developed implementation that’s freely available and used it to develop RTM application

• Fortran’s I/O model doesn’t support parallel I/O for large-scale, multi-dimensional array data sets, and we are working on addressing this

Page 31: A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

DISCS'12 Workshop 31

Thanks