Fire Benchmark Parallelisation Programming of Supercomputers WS 11/12 Sam Maurus.

21
Fire Benchmark Parallelisatio n Programming of Supercomputers WS 11/12 Sam Maurus

Transcript of Fire Benchmark Parallelisation Programming of Supercomputers WS 11/12 Sam Maurus.

Fire Benchmark ParallelisationProgramming of Supercomputers WS 11/12

Sam Maurus

What is Fire Benchmark?• CFD solver for arbitrary geometries• This project concerned itself with the gccg solver

How Fast is Fire Benchmark Sequentially?

What effect does the input file-format have?

Data structures in gccg

Points

Elements

Data structures in gccg

x

yz

𝑃2

(

points array

𝑒1

Data structures in gccg

elems array

Data structures in gccg

𝑒1

lcc array

Data distribution approach

Process 0 (root) Process 1

Process 2

Process 3

Root Process Tasks:• Read input file• Partition elements using chosen

approach• Create and send relevant mapping

arrays to each processes• Broadcast common data package

to each processor

= lcc, ne, epart, countPart, bs_local, be_local…

Communication model

P3

P3

Communication model

has_ghost_neighbour array

P3

P3

P3

P3

P3has_ghost_neighbour = 0 has_ghost_neighbour = 1 P5

Communication model

Process x

Process 0 Process 1 Process k(k = count)…

Computational loop, phase one:• Start Isend to required processes (where cellCountsToSend[i] > 0)• Start Irecv from required processes (where cellCountsToRecv[i] > 0)• Process local elements that have no ghost neighbours• Wait on all requests• Update remaining local elements

Communication model

Problems overcomeMPI_WAIT FUNCTION• Problem: MPI_Wait was being executed both for the send and receive

requests for every element processed• Solution: has_ghost_neighbour array introduced, allowing for

intermediate computation. MPI_Wait then only called once for each request.

BEFORE AFTER

Problems overcomeREDUNDANT REPROCESSING OF INPUT FILE• Problem: Input file was being read once at initialisation and again

for writing the result (redundant)• Solution: ‘Write solution’ code was refactored to re-use the relevant

file information obtained from the first read

BEFORE AFTER

Speedup – cojack

Speedup – pent

Speedup – drall

Speedup – tjunc

Speedup – full execution

Thanks for listening • Discussion time!