A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati...
-
Upload
harriet-thornton -
Category
Documents
-
view
215 -
download
2
Transcript of A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati...
DISCS'12 Workshop 1
A Coarray Fortran Implementation to Support Data-Intensive Application Development
Deepak Eachempati1, Alan Richardson2, Terrence Liao3, Henri Calandra3, Barbara Chapman1
Data-Intensive Scalable Computing Systems 2012 (DISCS’12) Workshop, November 16, 2012
1 Department of Computer Science, University of Houston2 Department of Earth, Atmospheric, and Planetary Sciences, MIT
3 Total E&P
DISCS'12 Workshop 2
Industry is looking for faster and more cost-effective ways to process massiveamounts of data• more powerful hardware• more productive programming models• innovative software techniques
Oil and Gas Industry: Compute Needs
DISCS'12 Workshop 3
Outline
• Fortran 2008 parallel processing additions (CAF)
• CAF Implementation in OpenUH Fortran compiler
• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks
DISCS'12 Workshop 4
Outline
• Fortran 2008 parallel processing additions (CAF)
• CAF Implementation in OpenUH Fortran compiler
• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks
DISCS'12 Workshop 5
Coarray Model in Fortran 2008
• Derives from Co-Array Fortran (CAF)• SPMD execution model, PGAS memory model– execution entities called images– coarrays: globally-accessible, symmetric data
objects • additional intrinsic subroutines/functions for
querying process and data information• additional statements in language for
synchronization
DISCS'12 Workshop 6
Working with Distributed Data using Coarrays
… … … … ……
…
…
…
…
…
1
2
3
4
M
1 2 3 4 *
real:: B[M, *]
B references local BB[3,4] references local BB[3,3] references B in left
neighbor
DISCS'12 Workshop 7
Working with Distributed Data using Coarrays
… … … … ……
…
…
…
…
…
1
2
3
4
M
1 2 3 4 *
real:: B(10,10)[M, *]
B(2:4,2:4) references local subarray of B
B(2:4,2:4)[3,4] references local subarray of B
B(2:4,2:4)[3,3] references subarray of B in left neighbor
DISCS'12 Workshop 8
2D Halo Exchange Example with CAF
real :: a(0:R+1, 0:C+1)[pR,*]…a(R+1,1)[top(1),top(2)] = a(1,1:C)
a(0,1:C)[bottom(1),bottom(2)] = a(R,1:C)
a(1:R,0)[right(1),right(2)] = a(1:R,C)
a(1:R,C+1)[left(1),left(2)] = a(1:R,1)
sync all
DISCS'12 Workshop 9
2D Halo Exchange with MPIreal :: a(0:R+1, 0:C+1)…call mpi_isend( a(1,1:C), C, mpi_real, & top(myp), TAG, ...)call mpi_irecv( a(R+1,1:C), C, mpi_real, & bottom(myp), TAG, ...)call mpi_isend( a(R,1:C), C, mpi_real, & bottom(myp), TAG, ...)call mpi_irecv( a(0,1:C), C, mpi_real, & top(myp), TAG, ...)call mpi_isend( a(1:R,C), R, mpi_real, & right(myp), TAG, ...)call mpi_irecv( a(1:R,0), R, mpi_real, & left(myp), TAG, ...)call mpi_isend( a(1:R,1), R, mpi_real, & left(myp), TAG, ...)call mpi_irecv( a(C+1,1:R), R, mpi_real, & right(myp), TAG, ...)call mpi_waitall( 8, ...)
DISCS'12 Workshop 10
Outline
• Fortran 2008 parallel processing additions (CAF)
• CAF Implementation in OpenUH Fortran compiler
• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks
DISCS'12 Workshop 11
Implementation of CAF • OpenUH compiler
– an industry-quality, optimizing compiler based on Open64– features: dependence and data-flow analysis, interprocedural
analysis, OpenMP– backend supports multiple targets (x86_64, IA64, IA32, MIPS, PTX)
Fortran Front-Endwith coarray
support
CAFSource
Code
Coarray Translation
Phase
OpenUHCAF Runtime
Library
Loop OptimizerGlobal Optimizer
Code Gen
exec.
OpenUH Compiler
DISCS'12 Workshop 12
Runtime Support for CAF
Runtime Interface (libcaf)
1-sided Communication
PGAS Memory Allocation
Synchronization
Collectives Support (e.g. reductions)
Atomics
Portable Communication Substrate: GASNet or ARMCI
DISCS'12 Workshop 13
Comparison with other Implementations
Compiler Commercial/Free Fortran 2008 Coarray Support?
OpenUH Free Yes
G95 Partially Free, No longer supported
Missing Locks Support
Gfortran Free In progress
Rice CAF 2.0 Free Partially, but adds different features
Cray Fortran Commercial Yes
Intel Fortran Commercial Yes
DISCS'12 Workshop 14
Outline
• Fortran 2008 parallel processing additions (CAF)
• CAF Implementation in OpenUH Fortran compiler
• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks
DISCS'12 Workshop 15
Seismic Subsurface Imaging:Reverse Time Migration
• A source wave is emitted per shot• Reflected waves captured by array of sensors• RTM (in time domain) uses finite difference method to
numerically solve wave equation and reconstruct subsurface image (in parallel, with domain decomposition)
DISCS'12 Workshop 16
RTM Implementations
• Isotropic– simplest model – assumes reflected waves propagate at same speed
in every direction from a point– only swaps faces (8 swaps in halo exchange)
• Tilted Transverse Isotropy (TTI)– assumes waves may propagate at different speeds– swaps faces and edges (18 swaps in halo
exchange)
DISCS'12 Workshop 17
Typical Data Usage
• Generally several thousand shots– data parallel problem, where each shot can be
processed independently in parallel– each shot handles several GB of data– so, total data to analyze is in terabytes range
• Handling I/O– C I/O reads in velocity and coefficient models– Shot headers read by master and distributed– Each processor writes to a distinct file, and file is
merged in post-processing step
DISCS'12 Workshop 18
Results for CAF RTM portTotal Domain Size: 1024 x 768 x 512 (3.0 GB, per shot)Forward ShotIsotropic case: up to 32% faster compared to corresponding MPI implementationTTI case: competitive performance with MPI
DISCS'12 Workshop 19
Results for CAF RTM portTotal Domain Size: 1024 x 768 x 512 (3.0 GB, per shot)Backward ShotIsotropic case: performance hit at 256 procsTTI case: lagging a bit behind MPI
DISCS'12 Workshop 20
Outline
• Fortran 2008 parallel processing additions (CAF)
• CAF Implementation in OpenUH Fortran compiler
• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks
DISCS'12 Workshop 21
Extending Fortran for Parallel I/O
• We are currently designing a prototype implementation for a parallel I/O language extension
• Fortran I/O was not yet extended to facilitate cooperative I/O to shared files– original Co-Array Fortran specified a simple
extension to Fortran I/O– parallel I/O may be added in a future version of
the standard
DISCS'12 Workshop 22
Fortran I/O
• Fortran provides interfaces for formatted and unformatted I/O
record 1
record 2
record 3
record 4
…
open( 10, file=‘fn’, action=‘write’, & access=‘direct’, recl=k )…write (10, rec=3) A
A
write
file ‘fn’ connected to unit 10
DISCS'12 Workshop 23
Current limitations of I/O
• Issues:1. no defined, legal way for multiple images to
access the same file2. a file is a 1-dimensional sequence of records3. records are read/written one at a time4. no mechanism for collectives accesses to a
shared file amongst multiple images
DISCS'12 Workshop 24
Proposed Extension for Parallel I/O
• Allow a file to be “share-opened”, e.g. OPEN( 10, file=‘fn’, TEAM=‘yes’, …)– all images form a team with shared access to the same
file– implicit synchronization
• recommended only for direct access mode• FLUSH statement used to ensure changes by one
image are visible to other images in team• CLOSE statement has implicit image synchronization
DISCS'12 Workshop 25
Further extensions we’re exploring• Multi-dimensional view of records• Read/write multiple records at a time• Collective read/write operations on
shared files1,1
…
open( 10, file=‘fn’, action=‘write’, & access=‘direct’, ndim=2, & dims=(/M/), team=‘yes’, recl=k )…
file ‘fn’ connected to unit 10
1,2 1,3 …
2,1 2,2 2,3 …
3,1 3,2 3,3 …
4,1 4,2 4,3 …
5,1 5,2 5,3 …
M,1 M,2 M,3 …
DISCS'12 Workshop 26
Further extensions we’re exploring• Multi-dimensional view of records• Read/write multiple records at a time• Collective read/write operations on
shared files1,1
…
write (10, rec_lb=(/ 2,2 /), rec_ub=(/ 4,3 /) ) & A(1:4, 1:2)
file ‘fn’ connected to unit 10
1,2 1,3 …
2,1 2,2 2,3 …
3,1 3,2 3,3 …
4,1 4,2 4,3 …
5,1 5,2 5,3 …
M,1 M,2 M,3 …
A(1:4,1:2)
write
DISCS'12 Workshop 27
Further extensions we’re exploring• Multi-dimensional view of records• Read/write multiple records at a time• Collective read/write operations on
shared files
1,1
type(T) :: A(2,2)[3,*] …my_rec_lbs = get_rec_lbs( this_image() )my_rec_ubs = get_rec_ubs( this_image() )write_team( 10, rec_lb=my_rec_lbs, & rec_lb=my_rec_lbs) & A(:,:)
file ‘fn’ connected to unit 10
1,2 1,3 1,4
2,1 2,2 2,3 2,4
3,1 3,2 3,3 3,4
4,1 4,2 4,3 4,4
5,1 5,2 5,3 5,4
6,1 6,2 6,3 6,4
A(1:2,1:2)[1,1]
A(1:2,1:2)[2,1]
A(1:2,1:2)[1,2]
A(1:2,1:2)[2,2]
A(1:2,1:2)[3,1] A(1:2,1:2)[3,2]
write_team
DISCS'12 Workshop 28
Leverage Global Arrays as memory buffers for I/O
• Implementation in progress which utilizes global arrays (GA) as I/O buffers in memory
I/O requests
asynchronous disk updates
compute nodes
I/O nodes
DISCS'12 Workshop 29
Outline
• Fortran 2008 parallel processing additions (CAF)
• CAF Implementation in OpenUH Fortran compiler
• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks
DISCS'12 Workshop 30
In Summary
• Fortran coarray model may be used for processing large data sets
• Developed implementation that’s freely available and used it to develop RTM application
• Fortran’s I/O model doesn’t support parallel I/O for large-scale, multi-dimensional array data sets, and we are working on addressing this
DISCS'12 Workshop 31
Thanks