Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May...

22
Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May...

Page 1: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Benefits of sampling in tracefiles

Harald Servat

Program Development for Extreme-Scale ComputingMay 3rd, 2010

Page 2: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

2May 3rd, 2010

Outline

Instrumentation and sampling Folding

Summarized traces Some results Current work

Page 3: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

3May 3rd, 2010

Instrumentation

Performance tools based on instrumentation Granularity of the results depends on the

application structure Data gathered includes:

Performance counters, callstack, message size…

Page 4: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

4May 3rd, 2010

Sampling

Sampling reaches any application point at a interval Easily tunable frequency Gather performance counters and callstack

Page 5: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

5May 3rd, 2010

Main objective

Combine both mechanisms Deeper performance details Using PAPI_overflow(..)

... what about frequency trade-off? Not too high to disrupt the performance data Not too low to get useful information

Page 6: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

6May 3rd, 2010

Work done: Folding

Harald Servat, Germán Llort, Judit Giménez, Jesús Labarta: Detailed performance analysis using coarse grain sampling. PROPER, 2009.

Objective: get detailed metrics with few samples Benefits from both high and low frequencies!

Take advantage of stationary behavior of scientific applications

Build synthetic region from scattered samples Reintroduce into the tracefile at chosen ratio

Page 7: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

7May 3rd, 2010

Folding: Moving samples

Main idea: Move samples to the target iteration preserving their original relative time.

Steps

Page 8: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

8May 3rd, 2010

Folding: Interpolation

Instructions evolution for routine copy_faces of NAS MPI BT B

No instrumentation points within the routine, but we got details

Red crosses represent the folded samples and show the completed instructions from the start of the routine

Green line is the curve fitting of the folded samples and is used to reintroduce the values into the tracefile

Blue line is the derivative of the curve fitting

Page 9: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

9May 3rd, 2010

Folding areas

Folding is applied to delimited regions Previously instrumented

User function Iteration

Automatically obtained from the gathered results Clusters of computation bursts

Juan González, Judit Giménez, Jesús Labarta, Automatic detection of parallel applications computation phases, IPDPS 2009

Delimited time regionsMarc Casas, Rosa M. Badia, Jesús Labarta, Automatic

Structure Extraction from MPI Applications Tracefiles, Euro-Par 2007

Page 10: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

10May 3rd, 2010

Impact of the sampling frequency

The more samples being fold, the more detailed results

Longer executions Increase frequency Reach stability?

Example:

NAS BT class B copy_faces

showing from 10 to 200 iterations

20 samples per second @ SGI Altix

Page 11: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

11May 3rd, 2010

Impact of the sampling frequency

Choosing a sampling frequency is important Sampling frequency can couple with application frequency Choose frequencies based on prime factors

Page 12: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

12May 3rd, 2010

Outline

Instrumentation and sampling Folding

Summarized traces Some results Current work

Page 13: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

13May 3rd, 2010

Dealing with large scale traces

Jesús Labarta, Judit Giménez, Eloy Martínez, Pedro González, Harald Servat, Germán Llort, Xavier Aguilar: Scalability of tracing and visualization tools, PARCO 2005.

Application’s behavior can be divided in: Communication phases Intensive computation phases

Instrumentation library that identifies relevant computation phases

Page 14: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

14May 3rd, 2010

Dealing with large scale traces

Information emitted at phase change Punctual (callstack) Aggregated

Hardware Counters Software Counters

Number of point-to-point and collective operations Number of bytes transferred Time in MPI

Page 15: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

15May 3rd, 2010

Example

PEPC 16384 tasks on Jaguar

Duration of the computation bursts

# of MPI collective operations

Page 16: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

16May 3rd, 2010

Benefits of summarized tracefiles

Important trace size reduction Gadget2 (128) – 10 Gbytes down to 428 Mbytes PEPC (16k) – 19 Gbytes down to 400 Mbytes PFLOTRAN (16k) – +250Gbytes down to 6 Gbytes

Whole execution analysis

Page 17: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

17May 3rd, 2010

Working with large traces?

We're dealing with large scale executions Maintain scalability of tracing + sampling

By adding more data? Use folding to reduce data

Example (Gadget2 using 128 tasks) 100 its, 5 samples/s during 90minutes ~ 236MB Folding on 1 iteration @ 200 samples/s ~ 64 MB

Page 18: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

18May 3rd, 2010

Outline

Instrumentation and sampling Folding

Summarized traces Combining mechanisms Some results Current work

Page 19: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

19May 3rd, 2010

Gadget2 analysis, 128 tasks

32% 16%

13% 8%

forc

e_

tre

e.c

+7

5

-g

ravi

ty_

tre

e.c

+1

67

gra

vity

_tr

ee

.c +

52

8-

de

nsi

ty.c

+1

67

forc

e_

tre

e.c

+1

70

1-

hyd

ra.c

+2

46

pre

dic

t.c

+9

2-

pm

_p

erio

dic

.c +

38

5

Page 20: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

20May 3rd, 2010

PEPC analysis, 32 tasks

45% 37%

5% 3%

tre

e_

asw

alk

.f9

0 +

16

2-

tre

e_

asw

alk

.f9

0 +

38

0

tre

e_

do

ma

ins.

f90

+5

48

-tr

ee

_b

ran

che

s.f9

0 +

15

5

tre

e_

bra

nch

es.

f90

+5

48

-tr

ee

_p

rop

ert

ies.

f90

+3

28

tre

e_

asw

alk

.f9

0 +

38

0-

tre

e_

asw

alk

.f9

0 +

16

2

Page 21: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

21May 3rd, 2010

Current directions

We work on: Is there an optimal sampling frequency? Quantify correctness and validate the results Callstack analysis

Page 22: Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

Program Development forExtreme-Scale Computing

22May 3rd, 2010

Thank you!