Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 ·...

24
Technische Universit¨ at M ¨ unchen SBAC-PAD’2013 Invasive Compute Balancing for Applications with Hybrid Parallelization M. Schreiber, C. Riesinger , T. Neckel, H.-J. Bungartz Technische Universit¨ at M ¨ unchen October 25, 2013 Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization SBAC-PAD’2013, October 25, 2013 1

Transcript of Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 ·...

Page 1: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

SBAC-PAD’2013

Invasive Compute Balancing for Applications with HybridParallelization

M. Schreiber, C. Riesinger, T. Neckel, H.-J. Bungartz

Technische Universitat Munchen

October 25, 2013

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 1

Page 2: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Topics

Motivation

Methodology & BackgroundHybrid ParallelizationCompute MigrationInvasive Computing

Application: Tsunami Simulation

ResultsArtificial WorkloadTsunami Simulation

Conclusion

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 2

Page 3: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Topics

Motivation

Methodology & BackgroundHybrid ParallelizationCompute MigrationInvasive Computing

Application: Tsunami Simulation

ResultsArtificial WorkloadTsunami Simulation

Conclusion

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 3

Page 4: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Motivation

HPC Simulations withDynamic Adaptive Mesh Refinement (DAMR)

• Use high resolution grids in feature-rich areas• Save computations in feature-poor areas• Efficient parallelization of DAMR is challenging

Compute imbalances• Changing number of cells

per compute unit over simulation time• Several approaches available

to tackle imbalances

Goal

Maximize efficiency of DAMR simulations

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 4

Page 5: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Topics

Motivation

Methodology & BackgroundHybrid ParallelizationCompute MigrationInvasive Computing

Application: Tsunami Simulation

ResultsArtificial WorkloadTsunami Simulation

Conclusion

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 5

Page 6: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Hybrid Parallelization

Exploit the best of both worlds: Distributed and shared memory parallelization

Distributed memory parallelization

• Communication with messages over buffers• Mandatory to program big clusters• Possible overhead due to data migration

Shared memory parallelization

• Same memory accessible for all threads• Trend towards thousands of cores (Xeon Phi, GPUs, etc.)

+ -common address space access conflicts for shared resources

cache coherency false sharing for cachesavoids data migration management tables

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 6

Page 7: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Hybrid Parallelization in our context

• We use hybrid parallelization on cache-coherent memory systems• We start a constant number of MPI ranks• We start a constant number of threads (e.g. one per core)

cores

cache coherentshared memorybus system

MPI ranksMPI rank 0 MPI rank 1

workerthreads

logical separationof applications

ph

ysic

al la

yer

log

ical la

yer

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 7

Page 8: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Compute Migration

To overcome the issue of load imbalances due to dynamic adaptive grids,we use compute migration instead of data migration.

• Instead of copying data between MPI ranks,we assign threads to MPI ranks⇒ avoids copy operations

• The number of threads per MPI rank is not fixed but variable over runtime⇒ satisfies dynamic demands

• Number of threads per MPI rank is relatively small⇒ lower runtime overhead

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 8

Page 9: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Compute Migration

To overcome the issue of load imbalances due to dynamic adaptive grids,we use compute migration instead of data migration.

• Instead of copying data between MPI ranks,we assign threads to MPI ranks⇒ avoids copy operations

• The number of threads per MPI rank is not fixed but variable over runtime⇒ satisfies dynamic demands

• Number of threads per MPI rank is relatively small⇒ lower runtime overhead

cores

cache coherentshared memorybus system

MPI ranksMPI rank 0 MPI rank 1

workerthreads

logical separationof applications

ph

ysic

al la

yer

log

ical la

yer

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 8

Page 10: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Compute Migration

To overcome the issue of load imbalances due to dynamic adaptive grids,we use compute migration instead of data migration.

• Instead of copying data between MPI ranks,we assign threads to MPI ranks⇒ avoids copy operations

• The number of threads per MPI rank is not fixed but variable over runtime⇒ satisfies dynamic demands

• Number of threads per MPI rank is relatively small⇒ lower runtime overhead

cores

cache coherentshared memorybus system

MPI ranksMPI rank 0 MPI rank 1

workerthreads

logical separationof applications

ph

ysic

al la

yer

log

ical la

yer

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 8

Page 11: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Compute Migration

To overcome the issue of load imbalances due to dynamic adaptive grids,we use compute migration instead of data migration.

• Instead of copying data between MPI ranks,we assign threads to MPI ranks⇒ avoids copy operations

• The number of threads per MPI rank is not fixed but variable over runtime⇒ satisfies dynamic demands

• Number of threads per MPI rank is relatively small⇒ lower runtime overhead

cores

cache coherentshared memorybus system

MPI ranksMPI rank 0 MPI rank 1

workerthreads

logical separationof applications

ph

ysic

al la

yer

log

ical la

yer

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 8

Page 12: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Invasive Computing

• Compute migration realized with Invasive Computing Paradigms• Processes can specify varying resource requirements during runtime• Requirements are specified by application developer

Interfaces

invade

Resources are exclusivelyrequested depending onparticular application-specificrequirements

infect

After invading specificresources, programuses them for certaincomputations

retreat

Releasecomputationresources

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 9

Page 13: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Invasive Computing: Resource Manager

Properties

• Resources are dynamically assignedduring runtime to overcomedynamically changing demands

• Global decisions base onPerformance Graphs

• Performance Graphs have to beprovided by the application

• Realized as an own thread

Advantages

• Finds global optimum of computing resources utilization• Allows to run different applications with varying load• Avoids cache thrashing due to core multiplexing

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 10

Page 14: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Topics

Motivation

Methodology & BackgroundHybrid ParallelizationCompute MigrationInvasive Computing

Application: Tsunami Simulation

ResultsArtificial WorkloadTsunami Simulation

Conclusion

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 11

Page 15: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Application: Tsunami Simulation with SWE 1/3

Governing equations: Shallow Water Equations (SWE)

Homogeneous form given by conservation law of hyperbolic equations:

∂U(x , y , t)∂t

+∂G(U(x , y , t))

∂x+

∂H(U(x , y , t))

∂y= 0

with

U = (h, hu, hv)T , G(U) =

huhu2 + 1

2 gh2

huv

, H(U) =

hvhuv

hv2 + 12 gh2

h: Height of water relative to ground sea levelu: Velocity in x-directionv : Velocity in y-directionU: Conserved quantitiesG,H: Flux functions describe the change of conserved quantities over time

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 12

Page 16: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Application: Tsunami Simulation 2/3

Step 1: Weak form

By multiplying the equation with a test function ϕ and applying the divergencetheorem we get the weak form:∫

TUtϕi︸ ︷︷ ︸

mass-term

−∫

TG(U) · ∂ϕi

∂x+ H(U) · ∂ϕi

∂y︸ ︷︷ ︸stiffness-term

+

∮TF(U)ϕi · ~n︸ ︷︷ ︸

flux-term

= 0

ϕi : Test functionT : Triangle grid cell~n(x , y): Outward pointing normal of the grid cell

Step 2: Approximation

U(x , y , t) ≈ U(x , y , t) =∑

i

Ui (t) ϕi (x , y)

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 13

Page 17: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Application: Tsunami Simulation 3/3

Step 3: Rearrangement

• G and H evaluated nodal-wise with Lagrangereconstruction of a polynomial

• Explicit Euler time time stepping• Rearrange to do computations basing on

matrix/matrix and vector/matrix operations

U t+∆ti = U t

i + ∆tM−1(Sx U(t) + Sy U(t) + F(U−(t), U+(t))

)F : Flux via the boundaries (e.g. Lax-Friedrich flux)

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 14

Page 18: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Topics

Motivation

Methodology & BackgroundHybrid ParallelizationCompute MigrationInvasive Computing

Application: Tsunami Simulation

ResultsArtificial WorkloadTsunami Simulation

Conclusion

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 15

Page 19: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Results: Artificial Workload

System• 4× Intel Xeon E7-4850 @ 2.00 GHz• 4× 10 physical cores plus hyper-threading• 256 GB memory accessible by all cores• Threading implemented using TBB

Results & interpretation

0123456789

10

problem size

break even point

inva

sive

run

time

norm

aliz

edby

non

-inva

sive

run

time

Invasive vs. non-invasive scenariowith different workload sizes

⇒ With big problem sizes, InvasiveComputing using computebalancing outperforms animplementation with equallydistributed work to all ranks

⇒ With a problem size of 131072(triangles), the simulationrun-time was improved by 53%

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 16

Page 20: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Results: Tsunami Simulation 1/2

Setup

• Initial refinement depth of 14, thus creating (2× 2)14 grid-cells• The square domain is split along the diagonals and one quarter is

assigned to one MPI rank during the whole simulation time

⇒ Due to propagating wave and thus grid refinement load-imbalances occur

Results

0

5

10

15

20

25

30

35

40

stac

ked

core

-to-

rank

sch

edul

ing

real-time

Rank 3

Rank 2

Rank 1

Rank 0

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 17

Page 21: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Results: Tsunami Simulation 2/2

0

50

100

150

200

250

300

40 / 2 40 / 4 20 / 2 20 / 4

sim

latio

n tim

e in

sec

onds

cores / MPI ranks

non-inavsive

invasive

Interpretation

• Computational efficiency mostly improved by invasive compute migration• The higher the number of ranks, the higher the potential improvement

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 18

Page 22: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Topics

Motivation

Methodology & BackgroundHybrid ParallelizationCompute MigrationInvasive Computing

Application: Tsunami Simulation

ResultsArtificial WorkloadTsunami Simulation

Conclusion

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 19

Page 23: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Conclusion

• Compute migration as alternative solution forload-imbalances (which can, e.g., result from dynamicadaptive grids)

• Extension of the invasive paradigmto support compute balancing

• Clear interfaces (invade, infect, retreat) forapplication developer to dynamically manage resources

• Explicit scaling data for resource managerto find global optimum

• Also applicable with independent applicationsrunning on the same system

• Robust optimizations in performance for simulationsexecuted with hybrid parallelization on shared memorysystems !

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 20

Page 24: Invasive Compute Balancing for Applications with Hybrid Parallelization … · 2013-11-21 · Technische Universit¨at Munc¨ hen SBAC-PAD’2013 Invasive Compute Balancing for Applications

Technische Universitat Munchen

Final slide

This work was supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Centre ”InvasiveComputing” (SFB/TR 89).

Christoph Riesinger: Invasive Compute Balancing for Applications with Hybrid Parallelization

SBAC-PAD’2013, October 25, 2013 21