Adams_SIAMCSE15

18
Mark F. Adams Jed Brown Matt Knepley Segmental Refinement: A Multigrid Technique for Data Locality

Transcript of Adams_SIAMCSE15

Page 1: Adams_SIAMCSE15

Mark F. Adams

Jed Brown

Matt Knepley

Segmental Refinement:

A Multigrid Technique for Data Locality

Page 2: Adams_SIAMCSE15

2

Communication costs at all levels of memory hierarchy have been increasing relative to processor speeds for 30+ years

Energy constraints exacerbates cost of memory movement

• Tipping point where radical ideas should be investigated

Brandt (1977) proposed Segmental Refinement (SR)

Serial ultra-low memory multigrid method

Brandt & Diskin (1994) apply idea parallel processing

We continue this work

• First published multilevel numerical results

SR uses buffering and does not “communicate” finest levels

• Do not replicate multigrid semantics exactly

Not brute force “communication avoiding”

• Game: keep textbook MG efficiency, asymptotically exact

Rethinking multigrid solvers for new architectures

Page 3: Adams_SIAMCSE15

3

Multigrid Communication Patterns: vertical (tree) & horiz.

FMG starts with accurate solve on coarsest grid

Page 4: Adams_SIAMCSE15

4

Nearest neighbor

intra grid comm. Inter grid tree++ comm.

Refine grid split processes, building tree

Multigrid Communication Patterns: vertical (tree) & horiz.

Page 5: Adams_SIAMCSE15

5

FMG goes back to coarse grid after each new level

Multigrid Communication Patterns: vertical (tree) & horiz.

Page 6: Adams_SIAMCSE15

6

Nearest neighbor

intra grid comm. Inter grid tree++ comm.

Back down – Two level FMG

Multigrid Communication Patterns: vertical (tree) & horiz.

Page 7: Adams_SIAMCSE15

7

Nearest neighbor

intra grid comm. Inter grid tree++ comm.

refine & populate procs

Multigrid Communication Patterns: vertical (tree) & horiz.

Page 8: Adams_SIAMCSE15

8

Nearest neighbor

intra grid comm. Inter grid tree++ comm.

Populate & refine

Fully populated – continue refinement

Multigrid Communication Patterns: vertical (tree) & horiz.

Page 9: Adams_SIAMCSE15

9

Nearest neighbor

intra grid comm. Inter grid tree+ comm.

Populate & refine

Multigrid Communication Patterns: vertical (tree) & horiz.

Page 10: Adams_SIAMCSE15

10

SR removes horizontal communication at some scale

Nearest neighbor

intra grid comm. Inter grid tree+ comm.

Populate & refine

Segmental refinement removes communication finest levels

Page 11: Adams_SIAMCSE15

11

SR uses conventional FMG solver “coarse” grid solver

Buffer cells added to finer grids & don’t update

Error decays

Experimentally

find adequate

buffer schedule

acceptable level

of accuracy

Segmental refinement technique – Brandt & Diskin

O(1)

O(log N)

Page 12: Adams_SIAMCSE15

12

Use linear buffer schedule: # buffer cells J level i: A + B(K – i), i > 0

• K SR grids, i = 0 transition level

• A & B independent integer parameters ≥ 0

• Only few (A) buffer cells on fine grid, more on coarse grids

Model problem

• Cell-centered finite difference 27-point stencil Laplacian

• Cartesian isotropic grids

• Piecewise constant Restriction, linear Prolongation

• 2nd order Chebyshev smoother in V(2,2) cycles

• Full multigrid with linear FMG prolongation

• u=(x4-Lxx2)(y4-Lyy

2)(z4-Lzz2); L= (2,1,1)

Segmental refinement parameters of interest:

• A: Number of buffer cells in finest grid

• B: Increase buffer cells per level

• N0: Size of transition level sub-domain

Accuracy w.r.t. segmental refinement parameters

Page 13: Adams_SIAMCSE15

13

Probe parameter 5D space (A, B, N0, K, esr)

Fine grid

buffer size

A = 4

N0 (K)

Increment

B16 (6) 8 (5) 4 (4)

0 5.7 2.6 1.2

1 2.0 1.4 1.0

2 1.3 1.1 NA

3 1.1 1.0 NA

Ratio (esr/econv) SR err to convention MG solver error

Define acceptable error as ~10% (esr/econv ~1.1)

Observe isosurfaces …

Implies N0 & B increase w K

Dependence on N0 not recognized B & D analysis• This need corroboration &

possibly amelioration

Implies new data model for asymptotic analysis …

More data in paper

Page 14: Adams_SIAMCSE15

14

Data movement complexity analysis – new data model

Machine model (that is proper “basis” for new DM)• Q words (small patches) memory and processes on fine grid

• √Q memory partitions

• √Q words per memory partition

New DM: transition level fits on one memory partition

Log(NK)/2 + 1

Log(NK)/2

Page 15: Adams_SIAMCSE15

15

Data movement complexity analysis

Complexity model (again proper “basis” for new DM):• Near – intra-partition – communication

• Far – inter-partition – communication

• Horizontal communication (residual, etc.): CH

• Vertical communication (Restrict & Prolong): CV

Comm. type Near (L2/8) Far (L2/8)

Coarse grids 3(6cH + 2cV) 0

Conv. fine grids 6cH 6cH + 2cV

SR fine grids 6cH 0 + 2cVSR removes fine grid communication

Unlike conventional communication avoiding, SR reduces

bisection bandwidth. In 3D: N2 -> N log2(N)

Page 16: Adams_SIAMCSE15

16

Conclusion: SR severs horizontal comm. at some scale

log NK1-K2

K2= O(1)

Designed two SR data models

• Each removes horizontal communication at level in memory hierarchy

Multiple SR models

combined address all

levels of interest …

Future work:

• Corroborate:

Try H.O. Prolong.

Vertex centered FE

• Extend to more apps

• New SR data models

• ...

Page 17: Adams_SIAMCSE15

17

Weak scaling: 2 - 8K sockets Edison (Cray XC30)

Some indication of SR

catching up

(4 SR levels)

O(1) SR levels

Not log (N)

Page 18: Adams_SIAMCSE15

18

Thank you

https://bitbucket.org/madams/srgmg

(SISC) paper, code, data, parse & run scripts: