S. Boeriu 1 and J.C. Bruch, Jr. 2 1 Center for Computational Science and Engineering

Post on 30-Dec-2015

18 views 0 download

description

Performance analysis tools applied to a finite adaptive mesh free boundary seepage parallel algorithm. S. Boeriu 1 and J.C. Bruch, Jr. 2 1 Center for Computational Science and Engineering 2 Department of Mechanical and Environmental Engineering and Department of Mathematics - PowerPoint PPT Presentation

Transcript of S. Boeriu 1 and J.C. Bruch, Jr. 2 1 Center for Computational Science and Engineering

1

Performance analysis tools Performance analysis tools applied to a finite adaptive mesh applied to a finite adaptive mesh free boundary seepage parallel free boundary seepage parallel algorithmalgorithm

S. Boeriu1 and J.C. Bruch, Jr.2

1Center for Computational Science and Engineering2Department of Mechanical and Environmental Engineering

and Department of Mathematics

University of California, Santa Barbara

http://www.engineering.ucsb.edu/~hpscicom

2

AcknowledgementsAcknowledgements

This material is based upon work supported by the National Science Foundation under Grant #0086262. This research was supported in part by NSF cooperative agreement ACI-9619020 through computing resources provided by the National Partnership for Advanced Computational Infrastructure at the San Diego Supercomputer Center.

http://www.npaci.edu/Horizon/guide_linked/bh_tools_txt.html

3

Outline of PresentationOutline of Presentation

1. Introduction (Physical problem)2. Problem formulation3. Fixed domain formulation4. Numerical algorithm5. Test case6. Performance tools and considerations

a. VAMPIR b. PARAVER7. Diagnostic example8. Conclusions

4

Figure 1. Seepage through a rectangular dam.

Physical problemPhysical problem

5

Simplifying assumptionsSimplifying assumptions

i. The soil in the flowfield is homogeneous and isotropic

ii. Capillary and evaporation effects are neglected

iii. The flow obeys Darcy’s Law

iv. Two-dimensional

v. Steady state

6

Mathematical formulationMathematical formulation

Darcy’s Law: Potential Function:

Velocity Components:

Continuity Equation:

Irrotationality Condition:

Cauchy-Riemann Equations:

Laplace’s Equations:

0x yu v

2 20 , 0

( / )q K grad h K grad p g y

( / )K p g y

,x yu v

0y xu v ,x y y x

7Figure 2. Mathematical formulation of physical problem.

Problem formulationProblem formulation

8

Extension of solution domainExtension of solution domain

The solution domain is extended to the known region

Then extend continuously to be defined on by setting

' ' '

1 1( , ) : 0 ,0 ( ), ,0 FF F C

D x y x x y x x x x y y

D

( , ) {x y ( , )x y in

y in D

9

This yields

in the sense of distributions where

1 0D D

in D and in

( )y x yDin D

10

Fixed domain formulationFixed domain formulation

Figure 3. Fixed domain mathematical formulation.

11

Numerical AlgorithmNumerical Algorithm

A minimization problem can be formulated in terms of the functional

( ) ( , ) 2( , ) ,J a f K

where a is a bilinear form, continuous, symmetric, positive definite on R and i.e.,

f R

( , )

( , )

D

D

a dxdy

f f dxdy

12

The functional J has one and only one minimum on a closed convex set. The minimum is found using the following algorithm:

1( 1/ 2) ( 1) ( )

1 1

1 i Nn n n

i ij j ij jj j iii

a aa

( 1) ( ) ( 1/ 2) ( ) ( ) ( 1/ 2) ( )( ( )) max(0, ( ))n n n n n n ni i i i i i i iP

( , ), ( , ), ,

, 1,..., ,

.

Nij i j i i i

i

where a a N N f f N N is thecanonical basis of R

P is the projectionontheconvex set i N and N is the number

of nodes

13

Finite Element Error Analysis

Adaptive Mesh Finite Element Analysis (FEA)

General Equation for FEA:

Ku f

14

Error Analysis

Error Definition:

where is the approximation of the exact solution ;

is the calculated of an element (constant);

is the shape function;

and

q

q

N

qq̂

T

Sx y

15

Averaging Technique:

Error Estimate in an Element:

ˆ ˆq qe q q e where q N q

16

Error Norm of the Whole Computation Domain:

Percentage Error:

2

2

1

2

2 2

1

ˆ ˆ ˆ( ) ( )

ˆ ˆ

Tq q qL R

Ne

q qL ii

e e e dR

e e

20 2

1

ˆ100%

i

Neq

Ri

ewhere q q dR q

q

17

Local Mesh Refinement

Desired Criteria:

Desired Local Error Criteria:

Error Ratio:

New Element Size:

0max maxwhere is the desired error

12 2

max max maxˆ , maxq i

qe e e is the allowableelement error

Ne

max

ˆ, 1

q ii i

erefinetheelement

e

ii new

i

AA

18

Mesh Refinement

19

Test caseTest case

2 3y

1.85

1 40x

1 10y

0.0001Stopping error criterion

20

Results

21

22Figure 4. Domain decomposition for Pass 4 of Case 1.

23 Figure 5. Speedup for Case 1.

24

Performance tools and Performance tools and considerationsconsiderations

The parallel program is monitored while

it is executed. Monitoring produces

performance data that is interpreted in

order to reveal areas of poor performance.

The program is then altered and the

process is repeated until an acceptable

level of performance is reached.

25

VAMPIRVAMPIR (Visualization and Analysis of MPI Resources – 2.0)(Visualization and Analysis of MPI Resources – 2.0)

VAMPIR 2.0 is a post-mortem trace visualization tool from Pallas GmbH

http://www.pallas.com

It uses the profile extensions to MPI and permits analysis of the message events where data is transmitted between processors during execution of a parallel program. It has a convenient user-interface and an excellent zooming and filtering. Global displays show all selected processes.

26

• Global Timeline: detailed application execution over time axis

• Activity Chart: presents per-process profiling information

• Summaric Chart: aggregated profiling information

• Communication Statistics: message statistics for each process pair

• Global Communication Statistics: collective operations statistics

• I/O Statistics: MPI I/O operation statistics

• Calling Tree: global dynamic calling tree

27

28

29

30

31

32

33

34

35

36

PARAVERPARAVER(Parallel Program Visualization and Analysis Tool)(Parallel Program Visualization and Analysis Tool)

PARAVER is a flexible parallel program visualization and analysis tool based on an easy-to-use Motif GUI (graphical user interface)

PARAVER was developed to respond to the basic need to have a qualitative perception of the application behavior by visual inspection and then to be able to focus on the detailed quantitative analysis of the problems.

37

Paraver Paraver (Parallel Program Visualization and Analysis Tool)(Parallel Program Visualization and Analysis Tool)

Powerful flexible parallel program visualization tool based on an easy-to-use Motif GUI (graphical user interface)

Developed by : European Center for Parallelism of

Barcelona (CEPBA) Universitat Politecnica de Catalunya http://www.cepba.upc.es/

38

Paraver is designed to visualize and analyze - Communication and load balance - Combining OpenMP and MPI - Hardware performance and counters

Usage- Compile programs with special libraries

- Run programs to produce trace files - View and analyze traces - Designed to help in program understanding and optimization

39

40

41

42

43

44

45

46

47

Inefficient programming exampleInefficient programming example

Load imbalance (inefficient memory use)

TLB (translation lookaside buffer) misses

48Figure 6. Stage 1 – Processor 0 – Mesh Map

49

Figure 7. Stage 1 – Processor 3 – Mesh Map

50

Figure 8. Stage 1 – VAMPIR – Activity Chart

51

Figure 9. Stage 1 - PARAVER – Global Display

52

Figure 10. Stage 4 - VAMPIR – Activity Chart

53

Figure 11. Stage 4 - VAMPIR – Display Chart

54

Table 8. TLB misses.Table 8. TLB misses.

STAGES Proc. 0 Proc. 3

1 TLB misses 9,464 7,870

4 TLB misses 12,210 208,341

55Figure 12. Stage 4 - Processor 0 – Mesh Map

56Figure 13. Stage 4 – Processor 3 – Mesh Map

57

Table 9. Stage 4 timing of the SOR module.Table 9. Stage 4 timing of the SOR module.

Processor Time spent in SOR

0 0.3671

1 0.4068

2 0.6940

3 0.8393

58Figure 14. Stage 4 – VAMPIR – Activity Chart

59Figure 15. Stage 4 – VAMPIR – Display Chart

60

Figure 16. Stage 4 – PARAVER – Global Display

61

ConclusionsConclusions

A significant factor that affects the performance of a parallel application is the balance between communication and workload. The challenge of the message passing model is in reducing message traffic over the interconnection network. To fully understand the

performance behavior of such applications, analysis and

visualization tools are needed. Two such tools, VAMPIR

and PARAVER, were used to analyze the performance of

the seepage application. It was seen that optimization of

the parallel code can be carried out in an iterative process

involving these tools to investigate performance issues.

62

Web SitesWeb SitesProject site http://www.engineering.ucsb.edu/~hpscicom

San Diego Supercomputer Centerhttp://www.npaci.edu/Horizon/guide_linked/bh_tools_txt.html

VAMPIR http://www.pallas.comPARAVER

http://www.cepba.upc.es/