Reordering of hybrid unstructured grids for an implicit Navier-Stokes solver based on OpenMP...

Accepted Manuscript

Reordering of Hybrid Unstructured Grids for an Implicit Navier-Stokes Solver

Based on OpenMP Parallelization

Meng CHENG, Gang WANG, Haris Hameed Mian

PII: S0045-7930(14)00189-3

DOI: http://dx.doi.org/10.1016/j.compfluid.2014.05.003

Reference: CAF 2551

To appear in: Computers & Fluids

Received Date: 15 November 2013

Revised Date: 1 April 2014

Accepted Date: 4 May 2014

Please cite this article as: CHENG, M., WANG, G., Mian, H.H., Reordering of Hybrid Unstructured Grids for an

Implicit Navier-Stokes Solver Based on OpenMP Parallelization, Computers & Fluids (2014), doi: http://dx.doi.org/

10.1016/j.compfluid.2014.05.003

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers

we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and

review of the resulting proof before it is published in its final form. Please note that during the production process

errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

http://dx.doi.org/10.1016/j.compfluid.2014.05.003

http://dx.doi.org/http://dx.doi.org/10.1016/j.compfluid.2014.05.003

http://dx.doi.org/http://dx.doi.org/10.1016/j.compfluid.2014.05.003

Reordering of Hybrid Unstructured Grids for an Implicit Navier-Stokes Solver Based on OpenMP

Parallelization

Meng CHENG1, Gang WANG1, Haris Hameed Mian1

1 National Key Laboratory of Science and Technology on Aerodynamic Design and Research , Northwestern Polytechnical University, Xi’an 710072, P.R. China

Corresponding Author: Gang WANG Tel: 0086-029-88460753, Email : [email protected]

Abstract. Grid reordering is an efficient way to obtain better implicit convergence speed in viscous flow simulation based on unstructured grids. When performing parallel computation on the shared memory machines, the convergence performance for solving high-Reynolds number flow with LU-SGS implicit scheme is ruined by the interface between different sub-domains divided by OpenMP parallelization. In order to improve the compatibility between OpenMP parallel environment and the implicit LU-SGS time-stepping scheme, a grid reordering method for unstructured hybrid grids is proposed. In this method, the structured-grid cells in the viscous layer near-body surface are reordered along the normal direction (like columns) and the unstructured part is reordered layer by layer according to the neighboring relations. To investigate the performance of the current implementation, turbulent flows around the RAE2822 airfoil, the NHLP-2D L1T2 multi-element airfoil configuration, the DLR-F6 wing-body-nacelle-pylon configuration and an aerospace plane has been simulated on unstructured hybrid grids. The numerical results show that the grid reordering method is an efficient and practical strategy for improving the convergence rate and the overall efficiency in the parallel computation with unstructured flow solver.

Keywords: unstructured hybrid grids; grid reordering; LU-SGS scheme; CFD; parallel computation

1 Introduction

With the fast development in the field of computational fluid dynamics (CFD), research topics in this area have become more and more complex. As a result, the computational requirements have increased so rapidly that even the existing computing technology is far from being able to meet this requirement. Parallel computing [1-3] has become an inevitable choice to perform these extensive numerical computations. The open multi-processing (OpenMP) [4-6] which is based on the shared-memory platform and the message passing interfaces (MPI) [7-9] based on massage-passing platform are the two key standards usually adopted for parallelization. In calculations based on the MPI environment, the mesh is parted into several sub-domains and allocated to different processors. The data at the interface

passes between neighboring sub-domains. However, for OpenMP, which is based on the shared-memory platform, there is no need to part the mesh geometrically or transfer information between different sub-domains, which ensures load balancing, saves the communication costs and is simpler to be programmed. For complex geometries, the rational demands of domain decomposition are always challenges for the MPI technology. In comparison, OpenMP doesn’t possess this problem. By contrast, MPI is mostly used on distributed memory systems. The hybrid MPI/ OpenMP approach [10, 11] is widely applied in hierarchical machine model, in which MPI is used for communication across distributed memory nodes and OpenMP is used for fine-grained parallelization within a node. However, for shared memory systems, OpenMP is more convenient and efficient than MPI.

At the same time, the complexity of the geometry augmented the use and development of the hybrid unstructured grid technology [12-14]. For unstructured hybrid grids, structured or semi-structured grid cells are utilized to resolve viscous boundary layers and unstructured-grid cells are employed elsewhere [15]. The use of hybrid grids combines the geometric flexibility offered by unstructured grids and the numerical accuracy of the structured grids. This meshing technique offers the potential of attaining a balance between mesh quality, efficiency, and flexibility.

Although the unstructured grids are flexible in their use but a negative factor is also associated with them. The data storage of unstructured grids is random, which has negative impacts on the convergence behavior of the computation.

To achieve a better cell order, grid reordering has been employed by many researchers and has been proven to be effective. Löhner [16] discussed several reordering strategies leading to a minimization of cache-misses and an optimal grouping of elements for different computer platforms. Martin et al. [17] used a linelet-preconditioner for an implicit finite element solver to propagate information in a fast way to the boundaries and obtains rapid convergence rates. The lower-upper symmetric Gauss-Seidel (LU-SGS) time-marching method [18] is a very efficient method for structured grids and has been used by some authors with unstructured grids. Sharov et al. [19] proposed a grid reordering method which improves the balance between lower and upper matrices. When performing large-scale parallel computation on the shared memory machines, the order of the cells (edges for edge-based solvers) affects the task assignment and has a great influence on the parallel efficiency. Aubry et al. [20] presented reordering methods which guarantee that nodes belonging to one thread are not accessed by other threads for vertex-centered discretizations. Löhner [21] described renumbering techniques based on shared-memory, cache-based parallel machines to avoid cache-misses and cache-line overwrite and gained good results. They found computational efficiency on the shared memory machines was greatly improved by grid reordering.

The convergence performance for solving high-Reynolds number flow with LU-SGS implicit scheme is ruined by the interface between different sub-domains divided by OpenMP parallel environment. As the parallel thread count increases, the interface is enlarged and its side effect on the convergence performance becomes more serious. For the cells in boundary layer, which have high aspect ratios, the neighboring cells along the normal direction contribute much to the residuals and implicit system. If neighboring cells along the normal direction are computed in different parallel processors, the convergence rate will decrease significantly. With regard to this situation, we present a grid reordering method to avoid the side effect from parallelization.

The paper is organized as follows. Section two introduces the numerical methods used for computations and presents the grid reordering method in detail. In section three, the technique is tested for various two and three dimensional aerodynamic configurations. The simulation results prove the feasibility of the proposed method.

2 Numerical Methods

The proposed method has been implemented in an in-house flow solver HUNS3D [22], developed for viscous flows based on hybrid unstructured meshes.

2.1 Governing Equation

The integral form of non-dimensionalized three-dimensional unsteady Reynolds averaged Navier-Stokes (RANS) equations can be written as

( ) ( )d dQ F Q n G Q nΩ

∂ + ⋅ = ⋅∂ ∫∫∫ ∫∫ ∫∫V dS St ∂Ω ∂Ω

(1)

where Ω is the control volume; ∂Ω is the boundary of the control volume; Q is the

conservative variable; ( )F Q is the inviscid flux; and the right side is the viscous

term. Using the cell-centred finite volume method, the semi-discretization form of Eq. (1) can be expressed as

d( )

d

QR QΩ = −i

i it

(2)

where Ωi represents volume of cell i, residual term ( )R Qi is the summation of inviscid

and viscous flux terms on all faces of cell i.

2.2 LU-SGS Scheme

Equation (2) is a system of coupled ordinary differential equations in time. By using the backward Euler scheme for the implicit time integration, we obtain

11

1( )Q QQ

R QΩ Ω++

+−Δ = = −Δ Δ

n nnni i

i i ii it t

(3)

where the superscript n represents the number of time level. Since 1Q +n is unknown on

current time level, the residual 1( )R Q +ni cannot be evaluated directly. However, it can

be linearized by using first order Taylor expansion in the following way:

1 1

( )

( )( ) ( )

R QR Q R Q Q

Q+ +

∈

∂≈ + Δ

∂∑n

n n nii i jn

j C i j

(4)

where C(i) is the set of cell i and its neighbor cells. After 1( )R Q +ni in Eq. (3) is

substituted by the linearization term in Eq. (4), then the following implicit system is

obtained: 1

1

( )

( )( )

R QQR Q Q

QΩ

++

∈

∂Δ = − − ΔΔ ∂∑

nnn ni

i i jnj C ii jt

(5)

The HUNS3D flow solver uses an improved LU-SGS scheme for solving the above equation system, details of which could be found in Ref. [23]. By using the LU-SGS scheme, the expression becomes

1

:

( )1, 1, 2, ,

2

F QQ D R n I Q

Qλ∗ − ∗

<

⎡ ⎤⎛ ⎞∂Δ = − ⋅ − Δ ⋅Δ = ⋅⋅⋅⎢ ⎥⎜ ⎟⎜ ⎟∂⎢ ⎥⎝ ⎠⎣ ⎦

∑ jni i ij j

j j i j

S i N (6)

1

:

( )1, , 1, ,1

2

F QQ Q D n I Q

Qλ∗ −

>

⎡ ⎤∂Δ = Δ − ⋅ − Δ ⋅Δ = − ⋅⋅ ⋅⎢ ⎥

∂⎢ ⎥⎣ ⎦∑ jn n

i i ij jj j i j

S i N N

(7)

where j is the cell adjacent to cell i , λij is the maximum eigenvalue of Jacobi matrix

on the cell face and D is the diagonal matrix expressed as:

1

2

VD Iλ

⎛ ⎞= + Δ⎜ ⎟⎜ ⎟Δ⎝ ⎠

∑iij

all face

St

(8)

2.3 Grid Reordering Method

As seen from Eq. (6) and Eq. (7), the LU-SGS scheme requires two sweeps: forward sweep through cell numbers from 1 to N and backward sweep in a reverse loop. In case of forward sweep (lower), summation for cell i is over all surrounding cells whose number is less than i. Backward sweep (upper) is summation over surrounding cells whose number exceed the current cell number. If some cells are surrounded by only those cells whose numbers are greater (lesser) than current cell number, the local iterations will degenerate from Gauss-Seidel iterations to Jacobi iterations [19]. In other words, the lower/upper balance of the method highly depends on grid numbering.

In this work, we employ static scheduling in the parallel computation for its advantages of lower scheduling overhead, less data race [24, 25] and less cells at the interfaces between different sub-domains compared to dynamic scheduling and guided scheduling. By the static scheduling, the grids are divided averagely into M sub-domains in accordance with the cell indexes and allocated to M processors. According to the assignment characteristics of the static scheduling, we design the grid reordering method making the cells in the same processor more centralized in spatial position to reduce the number of cells at the interfaces. The reduction on one hand minimizes the undesirable effect of parallel process on LU-SGS scheme, on the other hand controls data race to a certain degree. It is an effective way to accelerate the convergence.

In the viscosity dominated boundary layers regions of high Reynolds number flows, the gradients of most flow parameters (except pressure) along the normal direction are far greater than the streamwise direction. The efficient computation of

boundary layer requires structured or semi-structured grid cells of high aspect ratio to describe the flow. For high aspect ratio cells, the fluxes on the faces parallel to the streamwise direction play the decisive role for the residuals of the cells in computation using LU-SGS implicit method. Therefore, making the faces of cells parallel to streamwise direction as the interface between parallel sub-domains must be avoided in the boundary layer region.

Besides, in unstructured grid, the neighboring cells often have indexes with great difference. In this situation, it takes a long time to call cell data so the computation speed will decrease. Guaranteeing the neighboring cells have close indexes will enhance the rate of computation.

Based on the above reasons, the matrix quality could be improved by the following grid reordering method.

The group of all the cells on the surface is defined as the first layer, the ones adjacent to the first layer as the second layer, and so on. The proposed grid reordering method is detailed as follows:

1. Pre-reordering of the cells in the first layer: Number the cells around the surface one by one in circles for two-dimensional grids. For three-dimensional grids, take any cell as the first one and number the cells outward in circles according to the adjacent relation until all the cells in the first layer are numbered.

2. Reorder the structured-grid cells: Number the structured-grid cells along the normal direction (like columns) in the order formed in the first step until all the structured-grid cells are numbered. For a structured-grid cell (prism or hexahedral), its neighboring cell on the above layer is the cell in the normal direction to it. When the upper cell is an unstructured one (pyramid or tetrahedron), the numbering of this column is finished.

3. Reorder the unstructured-grid cells: Number the unstructured-grid cells that have not numbered outward layer by layer according to the neighboring relations until all the unstructured-grid cells are numbered.

Consider the two-dimensional local grid shown in Fig. 1. (a) as an example of the above processes. Number the cells clockwise (or anticlockwise) starting from cell A, the order of the cells in the first layer is: A, B, C, D, E, F. Then start to number the structured-grid cells along the normal direction in form of columns. For instance, the cell adjacent to A in the upper layer is G. The cell adjacent to G in the upper layer is O, an unstructured-grid cell. Accordingly the column based on A is finished and then return to cell B in the first layer and so on. After finishing the reordering of all the structured-grid cells, the new order of the structured-grid cells is: A, G, B, H, C, I, D, J, E, K, F, L. Finally, the unstructured-grid cells are numbered layer by layer according to their neighboring relations. The new order of the cells in the third and fourth layer is: O, Q, T, V, X, Z, N, P, R, S, U, W, Y, M. (The order of the cells in the third and fourth layer is related to the order of the cells in the lower layers not described here.)

The new grid numbers after reordering are presented in Fig. 1. (b):

3 Test Cases and Results

Four typical test cases are selected to illustrate the computational efficiency of the proposed grid reordering method. These test cases include the transonic viscous flow around the RAE2822 airfoil, the subsonic viscous flow around the NHLP-2D L1T2 high-lift multi-element airfoil configuration, the transonic viscous flow around DLR-F6 wing-body-nacelle-pylon (WBNP) configuration and the hypersonic viscous flow around an aerospace plane. The original meshes are generated by advancing layer method (ALM) [26] and advancing front method (AFM) [27]. All the test cases are performed on Linux with a 3.4GHz INTEL COREi7 and 16GB RAM computer system. The source code of HUNS3D solver is compiled with GFortran (version 4.4.6 (Red Hat 4.4.6-3)). It cost less than a second to reorder a 2D mesh of 14 thousand cells and about 20 seconds to reorder a 3D mesh of 5.53 million cells. For test cases in this work, the grid reordering has little influence on the calculation time of a single-step. Because the ALM and AFM based mesh generation gives inherent close numbers to neighboring cells for the original mesh.

3.1 RAE2822 Airfoil

The RAE2822 is a supercritical airfoil [28], which has become a standard test case for turbulence modeling validation. The computational mesh used in this case consists of 5632 structured-grid cells surrounding the airfoil surfaces and 7788 unstructured-grid cells in other area. The free-stream conditions are specified as follows: 60.734, =6.5 10 , =2.8α= × �Ma Re .

The grid is divided into eight sub-domains according to the task assignment principle of the OpenMP based parallel computation, as marked with different colors in Fig. 2.

As illustrated in Fig. 2. (a), there are some dark blue cells spreading in other sub-domains. After grid reordering, boundaries between different sub-domains are very clear and there are no crosses between different sub-domains as seen before. The cells in the same sub-domain are more centralized in the spatial locations.

Fig. 3 shows the local grid in the viscous layer near-body surface. Before grid reordering, the sub-domains of parallel course are distributed like layers because of the streamwise direction ordering method. After grid reordering, the sub-domains appear to be columns because of the normal direction reordering method. So the grid reordering method successfully avoid the faces of cells parallel to streamwise direction to be the interface between parallel sub-domains in the boundary layer region.

The convergence history of the maximal residual in terms of iteration numbers with and without grid reordering is shown in Fig. 4 (a). The computations are performed by using sixteen parallel threads. As shown in the figure, the convergence speed of the maximal residual after grid reordering is noticeably faster. After grid reordering, the computations require approximately 30% less iterations to attain 5.5 orders of magnitude for the residual. Fig. 4 (b) shows a comparison of the computed surface pressure coefficients with the experimental data. The computed pressure distributions before and after reordering are found to be in good agreement with each

other. This indicates that the grid reordering method does not have influence on the computational results.

The effect of grid reordering method on convergence behavior for different thread numbers has been investigated by varying the thread number from 1 to 32. The convergence histories before and after grid reordering are shown in Fig. 5.

Fig. 5 (a) presents the convergence histories on the mesh without grid reordering. It illustrates that as the thread number increases, the convergence speed in terms of iterations become slower. However, as shown in Fig. 5 (b), the convergence histories for different thread numbers are nearly identical after grid reordering. The presented grid reordering method significantly avoids the decrease of the convergence performance as the thread number increases.

The speedup ratios before and after grid reordering are presented in Table 1. The computation time is for 100 iterations. As the thread number increases, the computation speed improves multiple times. The speedup ratios before and after grid reordering do not have much difference.

Table1. Comparison of speedup ratios before and after grid reordering Thread number Computation time(sec) Speedup ratio

Not reordered Reordered Not reordered reordered sequential 21.4 21.3 - -

2 11.3 11.2 1.89 1.90 4 6.0 5.8 3.57 3.67 8 3.1 3.1 6.90 6.87

3.2 NHLP-2D L1T2 Airfoil

The second test case is for the turbulent flow over the NHLP-2D L1T2 high-lift multi-element airfoil configuration [29]. The flow conditions used in the simulation are 60.197, =3.52 10 , =20.18α= × �Ma Re . The computational mesh consists of 25908 structured-grid cells and 37367 unstructured-grid cells.

The grid is divided into eight sub-domains as in Fig. 6. As seen in Fig. 6, the partition in the viscous layer near-body surfaces is greatly

different after grid reordering. Fig. 7 shows the local details in the boundary layer. Before the grid reordering, the sub-domains of parallel course appear to form in layers. After grid reordering, the sub-domains are distributed like columns.

Fig. 8 (a) compares the residual convergence history by employing sequential and twelve threads for both before and after grid reordering. As indicated previously that without grid reordering, the residual convergence rate slows down compared to the sequential computing. But after grid reordering, the residual convergence trend is similar with the sequential one. The computations require about 17% less iterations to reach eight orders of magnitude for residuals after grid reordering. Since the original mesh is generated layer by layer according to the ALM and AFM, so is the cell indexes, the lower/upper balance of LU-SGS scheme is already fine. So the grid reordering does not have much effect on the sequential computations. Fig. 8 (b) shows a comparison of the computed surface pressure coefficients with the experimental data. Good agreement is found between the computed pressure distributions before and after reordering.

3.3 DLR-F6 WBNP Configuration

In 1990 experimental measurements were made for DLR-F6 WBNP configuration [30] in the ONERA S2MA wind tunnel. This transport configuration was selected by the Second AIAA CFD Drag Prediction Workshop as the test case. The mesh consists of 3146176 structured-grid cells and 2385963 unstructured-grid cells.

The free-stream conditions used to perform the computations are defined as follows: 60.75, =3.0 10 , =1α= × �Ma Re .

Fig. 9 shows the sub-domains of parallel course. It can be seen that in the boundary layer the sub-domains before grid reordering present layers around the surface and the sub-domains after grid reordering appear as columns, which guarantees the faces parallel to streamwise direction not be the interface between sub-domains in the boundary layer region.

Fig. 10 (a) shows the convergence histories of the maximal residual. The computations use eight threads. It can be seen that the convergence speed after grid reordering has been improved. For this case the computations require approximately 19% less iterations to reach six orders of magnitude for residuals after grid reordering. The pressure coefficient plot for the wing section at 0.239η = is shown in Fig. 10. (b).

The computed pressure coefficients before and after reordering consistent with each other.

The comparison of pressure coefficient contour plot with and without grid reordering is shown in Fig. 11. The results indicate that the grid reordering improves the computational efficiency without affecting the numerical accuracy.

3.4 Aerospace Plane

This experimental configuration has been used by Li [31] to study the typical features of hypersonic flows. Here it is selected to demonstrate the effectiveness of grid reordering for computations performed in the hypersonic flow regime. The three dimensional mesh consists of 1064814 structured-grid cells and 780891 unstructured-grid cells. The free-stream conditions used to simulate the test case are as follows: 78.02, =1.34 10 , =0α= ×Ma Re .

Fig. 12 shows the sub-domains of parallel course before and after grid reordering. The sub-domains in the boundary layer are distributed like layers before grid reordering and columns after grid reordering.

The comparison of the residual convergence with and without grid reordering is shown in Fig. 13 (a). The computations are performed by using eight parallel threads. As illustrated in the figure, the maximal residual without using grid reordering stalled after 5,000 iterations and continues to converge with grid reordering. This demonstrates that the convergence property has been significantly improved. The pressure coefficients on symmetrical plane are shown in Fig. 13 (b). The computed pressure coefficients before and after reordering are both in good agreement with experimental data. Fig. 14 shows the pressure distribution and the temperature distribution in the flow field.

Conclusions

This paper presents a grid reordering technique for hybrid unstructured grids. The proposed method improves the convergence property of the computation by improving the compatibility between OpenMP parallel environment and the implicit LU-SGS time-stepping scheme. Numerical computations has been performed for different two and three dimensional aerodynamic configurations including RAE2822 airfoil, the NHLP-2D L1T2 high-lift multi-element airfoil, the DLR-F6 WBNP configuration and an aerospace plane. The results indicate that the presented reordering strategy effectively improves the computational efficiency in the flow field based on the shared-memory parallel environment and maintains the computational accuracy. Acknowledgements. The research was supported by NPU Foundation of Fundamental Research (NPU-FFR-JC201212) and Advanced Research Foundation of Commercial Aircraft Corporation of China (COMAC). The authors thankfully acknowledge these institutions.

References

[1] Gropp WD, Kaushik DK, Keyes DE, Smith BF. High-performance parallel implicit CFD.

Parallel Computing. 2001;27:337-62.

[2] Moureau V, Domingo P, Vervisch L. Design of a massively parallel CFD code for

complex geometries. Comptes Rendus Mécanique. 2011;339:141-8.

[3] Barney B. Introduction to parallel computing. Lawrence Livermore National

Laboratory. 2010;6:10.

[4] Dagum L, Menon R. OpenMP: an industry standard API for shared-memory

programming. Computational Science & Engineering, IEEE. 1998;5:46-55.

[5] Hoeflinger J, Alavilli P, Jackson T, Kuhn B. Producing scalable performance with

OpenMP: Experiments with two CFD applications. Parallel Computing. 2001;27:391-413.

[6] Bessonov O. OpenMP parallelization of a CFD code for multicore computers:

analysis and comparison. Parallel Computing Technologies: Springer; 2011. p. 13-22.

[7] Gropp W, Lusk E, Doss N, Skjellum A. A high-performance, portable implementation

of the MPI message passing interface standard. Parallel computing. 1996;22:789-828.

[8] Gabriel E, Fagg GE, Bosilca G, Angskun T, Dongarra JJ, Squyres JM, et al. Open MPI:

Goals, concept, and design of a next generation MPI implementation. Recent

Advances in Parallel Virtual Machine and Message Passing Interface: Springer; 2004. p.

97-104.

[9] Appleyard J, Drikakis D. Higher-order CFD and interface tracking methods on

highly-parallel MPI and GPU systems. Computers & Fluids. 2011;46:101-5.

[10] Yakubov S, Cankurt B, Abdel-Maksoud M, Rung T. Hybrid MPI/OpenMP

parallelization of an Euler–Lagrange approach to cavitation modelling. Computers &

Fluids. 2013;80:365-71.

[11] Gorobets A, Trias F, Oliva A. A parallel MPI+ OpenMP+ OpenCL algorithm for

hybrid supercomputations of incompressible flows. Computers & Fluids. 2013;88:764-

72.

[12] Nakahashi K, Sharov D, Kano S, Kodera M. Applications of unstructured hybrid grid

method to high Reynolds number viscous flows. International journal for numerical

methods in fluids. 1999;31:97-111.

[13] Wang Y, Murgie S. Hybrid mesh generation for viscous flow simulation.

Proceedings of the 15th International Meshing Roundtable: Springer; 2006. p. 109-26.

[14] Wang G, Ye Z, Z Y. Element Type Unstructured Grid Generation and its Application

to Viscous Flow Simulation. 24th International congress of aeronautical sciences2004.

[15] Soetrisno M, Imlay ST, Roberts DW, Taflin DE. Development of a 3-D zonal implicit

procedure for hybrid structured-unstructured grids. AIAA Paper. 1996:96-0167.

[16] Löhner R. Some useful renumbering strategies for unstructured grids. International

Journal for Numerical Methods in Engineering. 1993;36:3259-70.

[17] Martin D, Loehner R. An implicit linelet-based solver for incompressible flows.

AIAA paper. 1992;668:1992.

[18] Yoon S, Jameson A. An LU-SSOR scheme for the Euler and Navier-Stokes

equations. 25th AIAA aerospace sciences meeting1987.

[19] Sharov D, Nakahashi K. Reordering of 3-D hybrid unstructured grids for vectorized

LU-SGS Navier-Stokes computations. AIAA paper. 1997;97:2102-17.

[20] Aubry R, Houzeaux G, Vazquez M, Cela J. Some useful strategies for unstructured

edge based solvers on shared memory machines. International journal for numerical

methods in engineering. 2011;85:537-61.

[21] Löhner R. Renumbering strategies for unstructured-grid solvers operating on

shared-memory, cache-based parallel machines. Computer methods in applied

mechanics and engineering. 1998;163:95-109.

[22] Mian HH, Wang G, Raza MA. Application and validation of HUNS3D flow solver for

aerodynamic drag prediction cases. Applied Sciences and Technology (IBCAST), 2013

10th International Bhurban Conference on: IEEE; 2013. p. 209-18.

[23] Wang G, Jiang Y, Ye Z. An Improved LU-SGS Implicit Scheme for High Reynolds

Number Flow Computations on Hybrid Unstructured Mesh. Chinese Journal of

Aeronautics. 2012;25:33-41.

[24] Chapman B, Jost G, Van Der Pas R. Using OpenMP: portable shared memory

parallel programming: MIT press; 2008.

[25] Sato Y, Hino T, Ohashi K. Parallelization of an unstructured Navier–Stokes solver

using a multi-color ordering method for OpenMP. Computers & Fluids. 2013;88:496-

509.

[26] Pirzadeh S. Recent progress in unstructured grid generation. 1992. AIAA Paper 92-

0445

[27] Pirzadeh S. Structured background grids for generation of unstructured grids by

advancing-front method. AIAA journal. 1993;31:257-65.

[28] Cook P, Firmin M, McDonald M. Aerofoil RAE 2822: pressure distributions, and

boundary layer and wake measurements: RAE; 1977.

[29] Moir I. Measurements on a two-dimensional aerofoil with high-lift devices. AGARD

Advisory Report 303, Advisory Group for aerospace Research & development, Neuilly-

sur-Seine, 1994. Test case A2.

[30] http://aaac.larc.nasa.gov/tsab/cfdlarc/aiaa-dpw/Workshop2/

[31] Li S. Hypersonic Flow over Typical Geometries (in Chinese): National Defense

Industry Press; 2007

Figure Captions

Fig. 1. (a) The grids before grid reordering and (b) The grids after grid reordering

Fig. 2. The sub-domains of parallel course on RAE2822 (a) before grid reordering (b) after grid reordering

Fig. 3. The grids in the viscous layer near-body surface of RAE2822 (a) before grid reordering and (b) after grid reordering

Fig. 4. Computational results for RAE2822 airfoil (a) residual convergence history (b) comparison of computed surface pressure coefficient with experimental data

Fig. 5. Convergence histories of different parallel thread number (a) before grid reordering (b) after grid reordering

Fig. 6. The sub-domains of parallel course on NHLP-2D L1T2 (a) before grid reordering (b) after grid reordering

Fig. 7. The grids in the viscous layer near-body surface of NHLP-2D L1T2 (a) before grid reordering and (b) after grid reordering

Fig. 8. Computational results for NHLP-2D L1T2 airfoil (a) residual convergence history (b) comparison of computed surface pressure coefficient with experimental data

Fig. 9. The sub-domains of parallel course on DLR-F6 WBNP configuration (a) before grid reordering (b) after grid reordering

Fig. 10. Computational results for DLR-F6 WBNP configuration (a) residual convergence history (b) comparison of computed surface pressure coefficient with experimental data

Fig. 11. Pressure coefficient contour plot for DLR-F6 WBNP

Fig. 12. The sub-domains of parallel course on aerospace plane (a) before grid reordering (b) after grid reordering

Fig. 13. Computational results for aerospace plane model (a) residual convergence history (b) comparison of computed surface pressure coefficient with experimental data

Fig. 14. Computational results for aerospace plane model (a) the pressure distribution (b) the temperature distribution

Fig. 3.

(a) (b)

Fig. 4.

(a) (b)

Fig. 3.

(a) (b)

Fig. 4.

(a) Iteration

Res

idu

al

0 2000 4000 6000 8000 10000-5.5

-5

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

-1

Not reorderedReordered

(b)

x/c

Cp

0 0.2 0.4 0.6 0.8 1

-1.5

-1

-0.5

0

0.5

1

1.5

Not reorderedReorderedExperiment

Ma=0.734, Re=6.5×106, α=2.8o

Fig. 5.

(a)

Iteration

Res

idu

al

0 5000 10000-5.5

-5

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

sequential_not reordered4 threads_not reordered8 threads_not reordered

16 threads_not reordered32 threads_not reordered

(b)

Iteration

Res

idu

al

0 5000 10000-5.5

-5

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

sequential_reordered4 threads_reordered8 threads_reordered

16 threads_reordered32 threads_reordered

Fig. 6.

(a) (b)

Fig. 7.

(a) (b)

Fig. 8.

(a) Iteration

Res

idu

al

0 5000 10000 15000 20000 25000-7

-6

-5

-4

-3

-2

-1

0

sequential_not reordered12 threads_not reordered

sequential_reordered12 threads_reordered

(b)

x/c

Cp

0 0.2 0.4 0.6 0.8 1

-16

-14

-12

-10

-8

-6

-4

-2

0

2


Ma=0.197, Re=3.52×106, α=20.18o

Fig. 9.

(a) (b)

Fig. 10.

(a)

Iteration

Res

idu

al

0 5000 10000 15000 20000-6

-5

-4

-3

-2

-1

0


(b)

x/c

Cp

0 0.2 0.4 0.6 0.8 1

-1.5

-1

-0.5

0

0.5

1


Ma=0.75, Re=3×106, α=1o

Fig. 11.

Fig. 12.

(a) (b)

Fig. 13.

(a) Iteration

Res

idu

al

0 5000 10000 15000 20000-5

-4

-3

-2

-1

0

1

2


(b)

x/m

Cp

0 0.1 0.2 0.3-0.1

0.4

0.9

1.4

1.9


Ma=8.02, Re=1.34×107, α=0o

Fig. 14.

(a) (b)

Reordering of hybrid unstructured grids for an implicit Navier-Stokes solver based on OpenMP...

Documents

Transcript of Reordering of hybrid unstructured grids for an implicit Navier-Stokes solver based on OpenMP...