S. Boeriu 1 and J.C. Bruch, Jr. 2 1 Center for Computational Science and Engineering

62
1 Performance analysis Performance analysis tools applied to a tools applied to a finite adaptive mesh finite adaptive mesh free boundary seepage free boundary seepage parallel algorithm parallel algorithm S. Boeriu 1 and J.C. Bruch, Jr. 2 1 Center for Computational Science and Engineering 2 Department of Mechanical and Environmental Engineering and Department of Mathematics University of California, Santa Barbara

description

Performance analysis tools applied to a finite adaptive mesh free boundary seepage parallel algorithm. S. Boeriu 1 and J.C. Bruch, Jr. 2 1 Center for Computational Science and Engineering 2 Department of Mechanical and Environmental Engineering and Department of Mathematics - PowerPoint PPT Presentation

Transcript of S. Boeriu 1 and J.C. Bruch, Jr. 2 1 Center for Computational Science and Engineering

Page 1: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

1

Performance analysis tools Performance analysis tools applied to a finite adaptive mesh applied to a finite adaptive mesh free boundary seepage parallel free boundary seepage parallel algorithmalgorithm

S. Boeriu1 and J.C. Bruch, Jr.2

1Center for Computational Science and Engineering2Department of Mechanical and Environmental Engineering

and Department of Mathematics

University of California, Santa Barbara

http://www.engineering.ucsb.edu/~hpscicom

Page 2: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

2

AcknowledgementsAcknowledgements

This material is based upon work supported by the National Science Foundation under Grant #0086262. This research was supported in part by NSF cooperative agreement ACI-9619020 through computing resources provided by the National Partnership for Advanced Computational Infrastructure at the San Diego Supercomputer Center.

http://www.npaci.edu/Horizon/guide_linked/bh_tools_txt.html

Page 3: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

3

Outline of PresentationOutline of Presentation

1. Introduction (Physical problem)2. Problem formulation3. Fixed domain formulation4. Numerical algorithm5. Test case6. Performance tools and considerations

a. VAMPIR b. PARAVER7. Diagnostic example8. Conclusions

Page 4: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

4

Figure 1. Seepage through a rectangular dam.

Physical problemPhysical problem

Page 5: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

5

Simplifying assumptionsSimplifying assumptions

i. The soil in the flowfield is homogeneous and isotropic

ii. Capillary and evaporation effects are neglected

iii. The flow obeys Darcy’s Law

iv. Two-dimensional

v. Steady state

Page 6: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

6

Mathematical formulationMathematical formulation

Darcy’s Law: Potential Function:

Velocity Components:

Continuity Equation:

Irrotationality Condition:

Cauchy-Riemann Equations:

Laplace’s Equations:

0x yu v

2 20 , 0

( / )q K grad h K grad p g y

( / )K p g y

,x yu v

0y xu v ,x y y x

Page 7: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

7Figure 2. Mathematical formulation of physical problem.

Problem formulationProblem formulation

Page 8: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

8

Extension of solution domainExtension of solution domain

The solution domain is extended to the known region

Then extend continuously to be defined on by setting

' ' '

1 1( , ) : 0 ,0 ( ), ,0 FF F C

D x y x x y x x x x y y

D

( , ) {x y ( , )x y in

y in D

Page 9: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

9

This yields

in the sense of distributions where

1 0D D

in D and in

( )y x yDin D

Page 10: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

10

Fixed domain formulationFixed domain formulation

Figure 3. Fixed domain mathematical formulation.

Page 11: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

11

Numerical AlgorithmNumerical Algorithm

A minimization problem can be formulated in terms of the functional

( ) ( , ) 2( , ) ,J a f K

where a is a bilinear form, continuous, symmetric, positive definite on R and i.e.,

f R

( , )

( , )

D

D

a dxdy

f f dxdy

Page 12: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

12

The functional J has one and only one minimum on a closed convex set. The minimum is found using the following algorithm:

1( 1/ 2) ( 1) ( )

1 1

1 i Nn n n

i ij j ij jj j iii

a aa

( 1) ( ) ( 1/ 2) ( ) ( ) ( 1/ 2) ( )( ( )) max(0, ( ))n n n n n n ni i i i i i i iP

( , ), ( , ), ,

, 1,..., ,

.

Nij i j i i i

i

where a a N N f f N N is thecanonical basis of R

P is the projectionontheconvex set i N and N is the number

of nodes

Page 13: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

13

Finite Element Error Analysis

Adaptive Mesh Finite Element Analysis (FEA)

General Equation for FEA:

Ku f

Page 14: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

14

Error Analysis

Error Definition:

where is the approximation of the exact solution ;

is the calculated of an element (constant);

is the shape function;

and

q

q

N

qq̂

T

Sx y

Page 15: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

15

Averaging Technique:

Error Estimate in an Element:

ˆ ˆq qe q q e where q N q

Page 16: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

16

Error Norm of the Whole Computation Domain:

Percentage Error:

2

2

1

2

2 2

1

ˆ ˆ ˆ( ) ( )

ˆ ˆ

Tq q qL R

Ne

q qL ii

e e e dR

e e

20 2

1

ˆ100%

i

Neq

Ri

ewhere q q dR q

q

Page 17: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

17

Local Mesh Refinement

Desired Criteria:

Desired Local Error Criteria:

Error Ratio:

New Element Size:

0max maxwhere is the desired error

12 2

max max maxˆ , maxq i

qe e e is the allowableelement error

Ne

max

ˆ, 1

q ii i

erefinetheelement

e

ii new

i

AA

Page 18: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

18

Mesh Refinement

Page 19: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

19

Test caseTest case

2 3y

1.85

1 40x

1 10y

0.0001Stopping error criterion

Page 20: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

20

Results

Page 21: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

21

Page 22: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

22Figure 4. Domain decomposition for Pass 4 of Case 1.

Page 23: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

23 Figure 5. Speedup for Case 1.

Page 24: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

24

Performance tools and Performance tools and considerationsconsiderations

The parallel program is monitored while

it is executed. Monitoring produces

performance data that is interpreted in

order to reveal areas of poor performance.

The program is then altered and the

process is repeated until an acceptable

level of performance is reached.

Page 25: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

25

VAMPIRVAMPIR (Visualization and Analysis of MPI Resources – 2.0)(Visualization and Analysis of MPI Resources – 2.0)

VAMPIR 2.0 is a post-mortem trace visualization tool from Pallas GmbH

http://www.pallas.com

It uses the profile extensions to MPI and permits analysis of the message events where data is transmitted between processors during execution of a parallel program. It has a convenient user-interface and an excellent zooming and filtering. Global displays show all selected processes.

Page 26: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

26

• Global Timeline: detailed application execution over time axis

• Activity Chart: presents per-process profiling information

• Summaric Chart: aggregated profiling information

• Communication Statistics: message statistics for each process pair

• Global Communication Statistics: collective operations statistics

• I/O Statistics: MPI I/O operation statistics

• Calling Tree: global dynamic calling tree

Page 27: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

27

Page 28: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

28

Page 29: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

29

Page 30: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

30

Page 31: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

31

Page 32: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

32

Page 33: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

33

Page 34: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

34

Page 35: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

35

Page 36: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

36

PARAVERPARAVER(Parallel Program Visualization and Analysis Tool)(Parallel Program Visualization and Analysis Tool)

PARAVER is a flexible parallel program visualization and analysis tool based on an easy-to-use Motif GUI (graphical user interface)

PARAVER was developed to respond to the basic need to have a qualitative perception of the application behavior by visual inspection and then to be able to focus on the detailed quantitative analysis of the problems.

Page 37: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

37

Paraver Paraver (Parallel Program Visualization and Analysis Tool)(Parallel Program Visualization and Analysis Tool)

Powerful flexible parallel program visualization tool based on an easy-to-use Motif GUI (graphical user interface)

Developed by : European Center for Parallelism of

Barcelona (CEPBA) Universitat Politecnica de Catalunya http://www.cepba.upc.es/

Page 38: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

38

Paraver is designed to visualize and analyze - Communication and load balance - Combining OpenMP and MPI - Hardware performance and counters

Usage- Compile programs with special libraries

- Run programs to produce trace files - View and analyze traces - Designed to help in program understanding and optimization

Page 39: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

39

Page 40: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

40

Page 41: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

41

Page 42: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

42

Page 43: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

43

Page 44: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

44

Page 45: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

45

Page 46: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

46

Page 47: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

47

Inefficient programming exampleInefficient programming example

Load imbalance (inefficient memory use)

TLB (translation lookaside buffer) misses

Page 48: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

48Figure 6. Stage 1 – Processor 0 – Mesh Map

Page 49: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

49

Figure 7. Stage 1 – Processor 3 – Mesh Map

Page 50: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

50

Figure 8. Stage 1 – VAMPIR – Activity Chart

Page 51: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

51

Figure 9. Stage 1 - PARAVER – Global Display

Page 52: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

52

Figure 10. Stage 4 - VAMPIR – Activity Chart

Page 53: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

53

Figure 11. Stage 4 - VAMPIR – Display Chart

Page 54: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

54

Table 8. TLB misses.Table 8. TLB misses.

STAGES Proc. 0 Proc. 3

1 TLB misses 9,464 7,870

4 TLB misses 12,210 208,341

Page 55: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

55Figure 12. Stage 4 - Processor 0 – Mesh Map

Page 56: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

56Figure 13. Stage 4 – Processor 3 – Mesh Map

Page 57: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

57

Table 9. Stage 4 timing of the SOR module.Table 9. Stage 4 timing of the SOR module.

Processor Time spent in SOR

0 0.3671

1 0.4068

2 0.6940

3 0.8393

Page 58: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

58Figure 14. Stage 4 – VAMPIR – Activity Chart

Page 59: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

59Figure 15. Stage 4 – VAMPIR – Display Chart

Page 60: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

60

Figure 16. Stage 4 – PARAVER – Global Display

Page 61: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

61

ConclusionsConclusions

A significant factor that affects the performance of a parallel application is the balance between communication and workload. The challenge of the message passing model is in reducing message traffic over the interconnection network. To fully understand the

performance behavior of such applications, analysis and

visualization tools are needed. Two such tools, VAMPIR

and PARAVER, were used to analyze the performance of

the seepage application. It was seen that optimization of

the parallel code can be carried out in an iterative process

involving these tools to investigate performance issues.

Page 62: S. Boeriu 1  and J.C. Bruch, Jr. 2 1 Center  for Computational Science and Engineering

62

Web SitesWeb SitesProject site http://www.engineering.ucsb.edu/~hpscicom

San Diego Supercomputer Centerhttp://www.npaci.edu/Horizon/guide_linked/bh_tools_txt.html

VAMPIR http://www.pallas.comPARAVER

http://www.cepba.upc.es/