Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V...

22
Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY IMPORTANT AGAIN?

description

The Mapping Problem Given a set of communicating parallel “entities”, map them onto physical processors Entities – COMM_WORLD ranks in case of an MPI program – Objects in case of a Charm++ program Aim – Balance load – Minimize communication traffic April 16th, 20093Abhinav Bhatele

Transcript of Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V...

Page 1: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Effects of contention on message latencies in large supercomputers

Abhinav S Bhatele and Laxmikant V KaleParallel Programming Laboratory, UIUC

IS TOPOLOGY IMPORTANT AGAIN?

Page 2: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 2

Outline

April 16th, 2009

Why should we consider topology aware mappingfor optimizing performance?

Demonstrate the effects of contention onmessage latencies through simple MPI benchmarks

Obtaining topology information: TopoManager APICase Study: OpenAtom

Page 3: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 3

The Mapping Problem

• Given a set of communicating parallel “entities”, map them onto physical processors

• Entities– COMM_WORLD ranks in case of an MPI program– Objects in case of a Charm++ program

• Aim– Balance load– Minimize communication traffic

April 16th, 2009

Page 4: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 4

Target Machines• 3D torus/mesh interconnects• Blue Gene/P at ANL:

– 40,960 nodes, torus - 32 x 32 x 40• XT4 (Jaguar) at ORNL:

– 8,064 nodes, torus - 21 x 16 x 24

April 16th, 2009

• Other interconnects– Fat-tree– Kautz graph: SiCortex

Page 5: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 5

Motivation• Consider a 3D mesh/torus interconnect• Message latencies can be modeled by (Lf/B) x D + L/BLf = length of flit, B = bandwidth,D = hops, L = message size

When (Lf * D) << L, first term is negligible

April 16th, 2009

But in presence of contention …

Page 6: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 6

MPI Benchmarks†

• Quantification of message latencies and dependence on hops– No sharing of links (no contention)– Sharing of links (with contention)

April 16th, 2009

† http://charm.cs.uiuc.edu/~bhatele/phd/contention.htm

Page 7: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 7

WOCON: No contention

• A master rank sends messages to all other ranks, one at a time (with replies)

April 16th, 2009

Page 8: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 8

WOCON: Results

April 16th, 2009

ANL Blue Gene/P PSC XT3

(Lf/B) x D + L/B

Page 9: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 9

WICON: With Contention

• Divide all ranks into pairs and everyone sends to their respective partner simultaneously

April 16th, 2009

Near Neighbor: NNRandom: RND

Page 10: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 10

WICON: Results

April 16th, 2009

ANL Blue Gene/P PSC XT3

Page 11: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 11

Message Latencies and Hops

• Pair each rank with a partner which is ‘n’ hops away

April 16th, 2009

Page 12: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 12April 16th, 2009

Page 13: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 13

Results

April 16th, 2009

8 times

Page 14: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 14April 16th, 2009

Page 15: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 15

Topology Manager API†

• The application needs information such as– Dimensions of the partition– Rank to physical co-ordinates and vice-versa

• TopoManager: a uniform API– On BG/L and BG/P: provides a wrapper for system calls– On XT3 and XT4, there are no such system calls

• Help from PSC and ORNL staff to discovery topology at runtime– Provides a clean and uniform interface to the application

April 16th, 2009

† http://charm.cs.uiuc.edu/~bhatele/phd/topomgr.htm

Page 16: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 16

OpenAtom

• Ab-Initio Molecular Dynamics code• Communication is static and structured• Challenge: Multiple groups of objects with

conflicting communication patterns

April 16th, 2009

Page 17: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 17

Parallelization using Charm++

April 16th, 2009

[10] Eric Bohm, Glenn J. Martyna, Abhinav Bhatele, Sameer Kumar, Laxmikant V. Kale, John A. Gunnels, and Mark E. Tuckerman. Fine Grained Parallelization of the Car-Parrinello ab initio MD Method on Blue Gene/L. IBM J. of R. and D.: Applications of Massively Parallel Systems, 52(1/2):159-174, 2008.

Page 18: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 18

Topology Mapping of Chare Arrays

April 16th, 2009

State-wisecommunication

Plane-wisecommunication

Joint work with Eric J. Bohm

Page 19: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 19

Results on Blue Gene/P (ANL)

1024 2048 4096 81920

2

4

6

8

10

12

w256 Default BG/Pw256 Topology BG/P

No. of cores

Tim

e pe

r ste

p (s

ecs)

April 16th, 2009

Page 20: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 20

Results on XT3 (BigBen@PSC)

512 1024 20480

1

2

3

4

5

6

7

8

w256 Default XT3w256 Topology XT3

No. of cores

Tim

e pe

r ste

p (s

ecs)

April 16th, 2009

Page 21: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 21

Summary

April 16th, 2009

1. Topology is important again2. Even on fast interconnects such as Cray

3. In presence of contention, bandwidth occupancy effects message latencies significantly4. Increases with the number of hops each message travels

5. Topology Manager API: A uniform API for IBM and Cray machines6. Case Studies: OpenAtom, NAMD, Stencil7. Eventually, an automatic mapping framework

Page 22: Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Abhinav Bhatele 22

Acknowledgements:1. Argonne National Laboratory: Pete Beckman, Tisha Stacey2. Pittsburgh Supercomputing Center: Chad Vizino, Shawn Brown3. Oak Ridge National Laboratory: Patrick Worley, Donald Frederick4. IBM: Robert Walkup, Sameer Kumar5. Cray: Larry Kaplan6. SiCortex: Matt Reilly

References:1. Abhinav Bhatele, Laxmikant V. Kale, Dynamic Topology Aware Load Balancing Algorithms for MD Applications, To appear in Proceedings of International Conference on Supercomputing, 20092. Abhinav Bhatele, Laxmikant V. Kale, An Evaluative Study on the Effect of Contention on Message Latencies in Large Supercomputers, To appear in Proceedings of Workshop on Large-Scale Parallel Processing (IPDPS), 20093. Abhinav Bhatele, Laxmikant V. Kale, Benefits of Topology-aware Mapping for Mesh Topologies, LSPP special issue of Parallel Processing Letters, 2008

April 16th, 2009