Stochastic Program Execution Tracing

25
Stochastic Program Execution Tracing Jeff Odom, UMD

description

Stochastic Program Execution Tracing. Jeff Odom, UMD. SIGMA Goals. IBM/UMD tools to understand caches Focus of detailed statistics Complement existing hardware counters Ability to handle real applications MPI and OpenMP programs Fortran and C Provide hints about restructuring - PowerPoint PPT Presentation

Transcript of Stochastic Program Execution Tracing

Page 1: Stochastic Program Execution Tracing

Stochastic Program Execution Tracing

Jeff Odom, UMD

Page 2: Stochastic Program Execution Tracing

University of Maryland2

SIGMA Goals

IBM/UMD tools to understand caches– Focus of detailed statistics– Complement existing hardware counters

Ability to handle real applications– MPI and OpenMP programs– Fortran and C

Provide hints about restructuring– Padding (both inter and intra data

structures)– Blocking

UMD effort funded by PERC2

Page 3: Stochastic Program Execution Tracing

University of Maryland3

Original SIGMA Approach

Static instrumentation– Capture full information about memory use– Produce compact trace

• Extracts loops and memory strides

Post execution tools– Detailed simulator

• Full discrete event simulator– Memory profiler

• Portion of accesses attributed to each data structure

Page 4: Stochastic Program Execution Tracing

University of Maryland4

Representing Program Execution

Capture full execution behavior– Record all basic blocks and memory

addresses– Produces large traces (due to looping)

Trace compression– Maintain pattern buffer – Scan for repeating patterns

• Extract memory strides– Repeat algorithms for nested loopsBLK1 ADR ADR ADRBLK2

100 200 300

4 4 4

300 500

4 4

ADR ADR

250

7

BLK3RPT

Count

Length

Base

Stride

Page 5: Stochastic Program Execution Tracing

University of Maryland5

Trace Compression Isn’t Enough

A few seconds…– Slows execution considerably– Generates gigabytes

Orig Time (s)

Slowdown Trace Size (KB)

seis 8 4463x 1,934,667

BT 8 6000x 74,221

swim 396 777x 29

Page 6: Stochastic Program Execution Tracing

University of Maryland6

Sampling

We want…– Shorter execution times– Smaller traces

We need…– Representative traces– Where to sample?

Timestep boundary– Outermost loop– Requires manual identification (for now)

Page 7: Stochastic Program Execution Tracing

University of Maryland7

Dyninst + SIGMA = dynSIGMA

Dyninst adds flexibility– Vary sample rate without recompilation– Adaptive/progressive rate during execution– Target application runs at native speed when

instrumentation turned off

Leverage existing SIGMA infrastructure– Only generate trace– Offline simulation/profiling steps unchanged

Dual application framework– Mutatee generates trace– Mutator toggles instrumentation

Page 8: Stochastic Program Execution Tracing

University of Maryland8

Memtime

Simple but effective metric of application memory performance

n

iii TlatencyTmisslhmemtime

1

)(

miss TLB of penalty

misses TLB

levelat latency cache

levelat hits

levels cache

Tlatency

Tmiss

il

ih

n

i

i

Page 9: Stochastic Program Execution Tracing

University of Maryland9

Characteristic Pattern

Local and global data objects given canonical name

Vector of objects’ memtime is characteristic data pattern

Comparison of characteristic patterns done with simple linear correlation

Can also be applied for function objects

Page 10: Stochastic Program Execution Tracing

University of Maryland10

Example Application: seis

Seismic simulation from SPEChpc2002– Models multiple seismic processes– Process results pipelined

Variable timesteps– Different data pattern for each process

C & Fortran– Fortran – data processing– C – dynamic memory management, IO

Page 11: Stochastic Program Execution Tracing

University of Maryland11

Space & Time Gains From Sampling

Trace Size (MB)

Time (h:m:s) Correlation

1.00% 13.51 9:04 0.996139

2.50% 33.14 40:00 0.997124

5.00% 66.33 1:12:48 0.997307

10.00% 133.17 2:16:00 0.997131

Full (SIGMA) 1,889.32 9:55:04

Original seis 0:08

Includes 0:12 instrumentation overhead

Page 12: Stochastic Program Execution Tracing

University of Maryland12

Challenge of Irregularity

Compression requires regular accesses

Sampling may hide poor compression– Each sample may compress poorly– Offset by low sampling rate

Sampling may not be accurate enough– Control flow sampled as well– Sample boundary requires manual definition

Page 13: Stochastic Program Execution Tracing

University of Maryland13

Hybrid Traces

Accuracy may be more important than execution time, but storage capacity may be limited

Modeling data access at particular points can be more accurate than timestep sampling

Many codes are mostly regular, but irregular patterns spoil compression

Page 14: Stochastic Program Execution Tracing

University of Maryland14

Modified Linear Regression

Establish linear pattern (min 3 points) at each memory access location

Look for repetitions of pattern with higher-level strides

Once input no longer matches pattern, treat further input as irregular until new pattern discovered

Page 15: Stochastic Program Execution Tracing

University of Maryland15

Modified Linear Regression

Irregular sequence modeled using uniform distribution

Pattern matching done local to each instrumentation (memory access) point– Original SIGMA pattern matches globally

Page 16: Stochastic Program Execution Tracing

University of Maryland16

Modified Linear Regression

Example: 0, 1, 2, 5, 9, 10, 11, 12, 2, 5

Page 17: Stochastic Program Execution Tracing

University of Maryland17

Modified Linear Regression

Example: 0, 1, 2, 5, 9, 10, 11, 12, 2, 5

Page 18: Stochastic Program Execution Tracing

University of Maryland18

Modified Linear Regression

Example: 0, 1, 2, 5, 9, 10, 11, 12, 2, 5

Becomes: 0 + x + 10y + {5,9,2,5}

Page 19: Stochastic Program Execution Tracing

University of Maryland19

Modified Linear Regression

Example: 0, 1, 2, 5, 9, 10, 11, 12, 2, 5

Becomes: 0 + x + 10y + {5,9,2,5}

Becomes: 0 + x + 10y + {l:2, h:9}

Page 20: Stochastic Program Execution Tracing

University of Maryland20

Experiment Setup

NAS Parallel Benchmarks 3.2 Serial Version, Class S

IBM XL C 8.0, XL Fortran 10.1 DyninstAPI 5.0, including

– Liveness analysis• Up to 90% runtime reduction by excluding

one SPR (MQ)• Additional 3% improvement with other

GPR/FPR– Transactional instrumentation

Instrumentation always on (no sampling)

Page 21: Stochastic Program Execution Tracing

University of Maryland21

Transactional Instrumentation

Reduces– Memory allocation– Insertion time

Atomic operation

BPatch_thread *thr;

BPatch_process *proc;

proc = thr->getProcess();

proc->beginInsertionSet();

thr->insertSnippet(…);

thr->insertSnippet(…);

proc->finalizeInsertionSet(true);

Page 22: Stochastic Program Execution Tracing

University of Maryland22

Trace Size

BT CG EP FT LU MG SP

OriginalSize (KB) 16,732 489,81

7648,32

3344 1,011 495 1,405

Reduction w/ Irreg Comp (KB)

(20) 289,551

98,620 0 (53) (90) 78

-30.0%

-20.0%-10.0%

0.0%10.0%

20.0%

30.0%40.0%

50.0%60.0%

70.0%

BT CG EP FT LU MG SP

Page 23: Stochastic Program Execution Tracing

University of Maryland23

Accuracy

Memtime (s)1 – CorrelationOriginal New

BT 1.2139 1.2139 2.3 E-8

CG 0.2442 0.2403 5.7 E-8

EP 2.2881 2.2898 9.4 E-7

LU 0.3205 0.3206 8.2 E-8

MG 0.0558 0.0558 1.3 E-5

SP 0.5162 0.5161 4.0 E-8

Page 24: Stochastic Program Execution Tracing

University of Maryland24

Future Work

Larger datasets (NPB Class B,C)– Some results already gathered for W

Distributions other than uniform Irregular control flow

– Example: Upper triangular matrix does not need to iterate all MxN values

– Uses edge instrumentation• BPatch_basicBlock::getIncomingEdges• BPatch_basicBlock::getOutgoingEdges• BPatch_edge::getPoint

Page 25: Stochastic Program Execution Tracing

University of Maryland25

Questions?