List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs)...

58

Transcript of List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs)...

Page 1: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently
Page 2: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently
Page 3: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

List of papers

This thesis is based on the following papers, which are referred to in the textby their Roman numerals.

I M. Neytcheva, M. Do-Quang and X. He. Element-by-element Schurcomplement approximations for general nonsymmetric matrices oftwo-by-two block form. Springer Lecture Notes in Computer Science(LNCS), 5910/2010, 2010.

II X. He, M. Neytcheva and S. Serra Capizzano. On an augmentedLagrangian-based preconditioning of Oseen type problems.BIT-Numerical Mathematics, 51:865-888, 2011.

III X. He and M. Neytcheva. Preconditioning the incompressibleNavier-Stokes equations with variable viscosity. J. Comput. Math., inpress, 2012.

IV X. He, Marcus Holm and M. Neytcheva. Efficient implementations ofthe inverse Sherman-Morrison algorithm. Submitted to the conferenceproceedings of the PARA’2012 conference.

V X. He and M. Neytcheva. On preconditioning incompressiblenon-Newtonian flow problems. The Department of InformationTechnology, Uppsala university, technical report, 2012-16.

VI O. Axelsson, X. He and M. Neytcheva. Numerical solution of thetime-dependent Navier-Stokes equations for variable density variableviscosity. The Department of Information Technology, Uppsalauniversity, technical report, 2012-19.

Reprints were made with permission from the publishers.

Page 4: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently
Page 5: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Incompressible Navier-Stokes equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Weak formulation and linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3 Preconditioning of the two-by-two block systems . . . . . . . . . . . . . . . . . . . . . . . 14

2.3.1 Element-by-element approximation of a Schurcomplement matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.2 Augmented Lagrangian method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.3.3 Fast solutions with the modified pivot block in the

augmented Lagrangian method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Incompressible Navier-Stokes equations with variable viscosity . . . . . . . . . . . 243.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2 Effect of the variable viscosity on the augmented Lagrangian

method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Incompressible Navier-Stokes equations with variable viscosity anddensity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2 Reformulation of the coupled system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.3 Discretization in time, operator splitting scheme and

linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.4 Preconditioning techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.5 Coupling with the phase-field model to solve the multi-phase

flow problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 Computational challenges and some open problems to be addressed infuture research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.1 Efficient solutions of the modified pivot block in the augmented

Lagrangian method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.2 Element-by-element Schur complement approximation method 415.3 Adaptive mesh refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.4 Stable numerical schemes with higher order of accuracy in time 42

6 Summary of papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.1 Paper I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.2 Paper II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Page 6: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

6.3 Paper III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.4 Paper IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466.5 Paper V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.6 Paper VI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7 Summary in Swedish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

8 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Page 7: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

1. Introduction

Computational fluid dynamics (CFD) is an important branch of fluid mechan-ics and computational mathematics. Numerical simulations become more andmore irreplaceable and indispensable in modern research, not only becausethe traditional laboratory experiments are costly, but also because the numeri-cal simulations enable us to model processes, which cannot be experimentallytested, and extend our capability to reproduce physical phenomena in order toobtain a deeper insight of the underlying processes and their interactions.

The incompressible Navier-Stokes (N-S) equations are the governing equa-tions for the incompressible flows, which are derived following the generalphysical laws, such as conservation of mass and momentum. The dynamics ofthe physical process is described by a mathematical model, which consists ofa set of coupled nonlinear partial differential equations (PDEs). These equa-tions, in turn, depend on various problem parameters, such as density andviscosity, that themselves may vary in time and space, exhibit discontinuitiesas in multiphase systems, and take critical values. Furthermore, the equationsincluded in the coupled system are of different types: elliptic, parabolic orhyperbolic, which adds to the complexity of the task to simulate them numer-ically.

Given the mathematical model, computer simulations consist of three tasks,highly related to each other. First, the continuous equations (the PDEs) have tobe discretized appropriately and in a stable manner, guaranteeing sufficientlysmall and uniformly bounded discretization errors in time and space. Second,the nonlinearities have to be handled. Due to the very high complexity of theoriginal model equations, it is in many cases not possible to solve those asa whole coupled system and the so-called splitting techniques come in play,entailing the necessity to estimate the related splitting error and to balance itwith the discretization errors. The next step is to chose suitable numericalsolution methods for the arising nonlinear or linearized algebraic systems ofequations. The time integration, sometimes over long time intervals, the large,even huge size of the linear systems, the dependence of problem parameters,etc, impose strong requirements on those solution methods to be robust withrespect to the parameters involved, to be of optimal order of computationalcomplexity and, last but not least, to use the available computer resources tofull extent. The latter makes a tight connection between the choice of the nu-merical methods and their efficient implementation on a nowadays complexand hierarchical computer architecture. The dynamic evolution of the com-puter architecture and the very demanding fluid flow simulation suggest that

7

Page 8: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

when choosing or designing the solution methods, it might be profitable to uti-lize readily available computational kernels and toolboxes, which are shownto be numerically efficient and be highly optimized for the computer platformused for the simulations.

The focus of this work is on the numerical solution methods for the lin-earized N-S equations, targeting iterative methods and preconditioners. Weadopt the following strategy. As a first step, we consider the stationary N-Sequations with constant coefficients, in this case, density and viscosity, andefficient preconditioned iterative methods for those. Next, we consider thestationary N-S equations with variable viscosity and study the applicability ofthe solution methods in that case. Finally, we consider the N-S equations intheir full complexity, namely, when both density and viscosity vary in spaceand time, and when N-S equations have to be coupled to some additional PDEsin order to properly describe the physics of the processes. In the above stageswe aim to show that efficient computational kernels for the simplest settingremain a method of choice for the most involved setting.

In general, after discretization and linearization, the original nonlinear prob-lem is converted into a sequence of linear problems to be solved. These arelinear systems of equations, that are of the form

A x = b

with a coefficient matrix A , that is of large dimension. Because of the un-derlying mathematical model, the matrix A is indefinite and nonsymmetricof two-by-two block structure. Sparsity is also an important property of thecoefficient matrix. There are two classes of solution strategies to solve thislinear system, namely, direct methods and iterative methods. Although directmethods are robust, reliable and relatively well parallelizable, for large scaleproblems they are not feasible due to their high requirements on memory stor-age and unacceptably long computing time.

In this work we focus on the Krylov subspace iterative methods [1, 65],which are computationally cheaper than the direct methods because matrix×vectoroperations with sparse matrices are mainly involved. In order to accelerate theconvergence rate of the iterative methods, efficient preconditioning techniquesbecome essential. A preconditioner, denoted here by P , is in general a lin-ear operator, defined explicitly as a matrix or implicitly as a procedure, thattransforms the above linear system into an equivalent one of the form

P−1A x = P−1b.

When using preconditioning, the main aim is to define P such that the trans-formed matrix P−1A has more favourable properties than A itself. In gen-eral, we would like P−1A to act similarly to the identity operator, that wouldresult in fast convergence of the iterative method. Ideally, P should be asclose to A as possible, but at the same time P = A is not a realistic choice,

8

Page 9: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

since it leads back to the complexity of the original problem. Thus, whenconstructing a preconditioner we seek the right balance in order to construct apreconditioner that conserves some important properties of the original systemmatrix but allows for a much more efficient solution procedure. How to esti-mate the efficiency of a preconditioner depends on the properties of A (sym-metricity, nonsymmetricity, positive definiteness, etc). Generally speaking, aspectrum of P−1A , contained into one or a few clusters away from zero,results in fast convergence. The efficient preconditioner should admit the fol-lowing properties. To construct and to solve systems with the preconditionershould be computationally cheap and parallelizable. The preconditioned sys-tem should be much better conditioned than the original system itself, to yieldfast convergence. Finally, the preconditioner should be robust with respectto all parameters involved-problem parameters (such as material coefficients),discretization parameters in space and time and method parameters (if any).

The general structure of the matrix A , that arises when solving the N-Sproblems is of two-by-two block form,

A =

[A BT

C −D

].

The matrix is indefinite. The main pivot block A might be symmetric or non-symmetric, singular or nonsingular, and the block D may be zero. The blocksB and C may be of full or lower rank, quite often C ≡ B.

Efficient preconditioners for the matrices arising from Navier-Stokes equa-tions with constant density and viscosity have been studied intensively duringthe past decades, see the survey papers [3, 17, 62], the book [31] and the ref-erences therein. A class of widely-used preconditioners are derived based onan exact block-factorization of the matrix A , followed by approximations ofA and the Schur complement S = −D−CA−1BT , see the papers [5, 7, 8, 17]and the numerous references therein. A major prerequisite for the efficiencyof such preconditioners is to find high quality approximations of A and S, orof their inverses. The approximation of A can be implicitly defined as an inneriterative solution method with proper stopping tolerance. Compared to ap-proximating the pivot block A, finding efficient approximations for the Schurcomplement is much more difficult, due to the fact that S is in general a densematrix. Forming S explicitly and solving systems with it are computationallyheavy tasks. In Papers I, II and IV, we contribute to the search of efficientpreconditioners of the Schur complement by thoroughly analyzing and testingthe element-by-element Schur complement approximation method [6, 45, 51]and the so-called augmented Lagrangian method [5, 18, 20, 33].

For the constant density but varying viscosity Navier-Stokes equations, thematrix of the discrete linear system of equations is also indefinite and non-symmetric of two-by-two block form, and the difference appears in the pivotblock of A . Variable viscosity has its impact on the behavior of precondi-tioners, shown to be efficient for the constant viscosity case. Those precondi-

9

Page 10: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

tioners have to be reconsidered and analyzed in order to show their robustnesswith respect to varying viscosity. In Paper III and Paper V we choose the aug-mented Lagrangian method and show that the corresponding preconditionerpreserves its high quality also for spatially varying viscosity.

The full complexity N-S model, i.e., when variations in time and space ofthe unknowns (velocity and pressure), as well as of the problem parameters(density and viscosity) are taken into account, includes one or more additionalpartial differential equations. In Paper VI, we reformulate the equations usingthe so-called momentum instead of the classical unknown velocity. A goodreason for the change of variable is that the momentum is smoother than ve-locity. Another benefit is that, within the operator splitting, which is indispens-able in this case, the matrices arising in the discrete linearized equations, areanalogous to those in the two simplified formulations. Therefore, the alreadyknown preconditioners can be straightforwardly re-utilized here.

The rest of this thesis is organized as follows. Chapter 2 is an overviewof the linearization methods, finite element discretization and the most fre-quently used preconditioners proposed for the incompressible Navier-Stokesequations with constant viscosity and density. In Chapter 3 the augmentedLagrangian method is reconsidered and its behavior is analyzed for the caseof spatially varying viscosity. Navier-Stokes equations with full complexity,i.e., time-dependence, nonlinearity, variable viscosity and variable density areconsidered in Chapter 4. A reformulation of the N-S problem and its stablenumerical solution is introduced here. In Chapter 5 some computational chal-lenges for further improvements on fast and reliable numerical solutions ofthe Navier-Stokes equations are discussed, and possible research directionsare also outlined. A summary of the papers included in this thesis is given inChapter 6.

10

Page 11: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

2. Incompressible Navier-Stokes equations

2.1 IntroductionIn this chapter we consider the numerical solution methods of the incompress-ible flow problems with constant viscosity and density, modeled by the Navier-Stokes equations. The mathematical model reads as follows:

∂u∂ t−ν∆u+(u ·∇)u+∇p = f on Ω× [0,T ],

∇ ·u = 0 on Ω× [0,T ],u = g on ∂ΩD× [0,T ],

ν∂u∂n−np = 0 on ∂ΩN× [0,T ],

u(x,0) = u0(x) on Ω.

(2.1)

Here u is the unknown velocity and p is the unknown pressure, Ω is a boundedand connected domain Ω ⊂ Rd (d = 2,3), ∂Ω = ∂ΩD ∪ ∂ΩN is its bound-ary where ∂ΩD and ∂ΩN denote the parts of the boundary, where Dirichletor Neumann boundary conditions are imposed, correspondingly. The termsf : Ω→Rd , g and u0 are correspondingly, a given force field, Dirichlet bound-ary data and an initial condition for the velocity. The coefficient ν > 0 isthe kinematic viscosity, assumed here to be constant (related to the so-calledReynolds number Re as Re = UL

ν, where L denotes the characteristic length

scale for the domain Ω and U is some reference value of the velocity). Theoperator ∆ is the Laplace operator in Rd , ∇ denotes the gradient, (∇·) is thedivergence operator, and n denotes the outward-pointing normal to the bound-ary.

The above Navier-Stokes equations constitute the fundamental model forthe incompressible flows in computational fluid dynamics. Due to the presenceof the nonlinear term (u ·∇)u, some linearization methods must be used. Thediscretization of (2.1) has to obey certain requirements. We limit ourselves tothe finite element method (FEM) and proper time discretization methods, andoutline the above requirements in Section 2.2.

The main focus of this work is on fast and reliable numerical solution meth-ods for the systems of algebraic equations arising after discretizing and lin-earizing the N-S equations. The numerical solution of those linear systemsis the major computational kernel for the incompressible flow simulations, aswell as a major difficulty as it is performed repeatedly and has to be bothreliable and as fast as possible.

11

Page 12: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

We state that we are concerned only with large scale simulations, for whichthe use of direct solution methods, applied to the whole system, is not feasible.Instead, preconditioned Krylov subspace methods are to be utilized. Then,the construction of numerically and computationally efficient preconditionersbecomes the major concern.

In this chapter, some linearization methods and preconditioning techniquesare introduced and our contributions to the search of efficient preconditionersare also presented.

2.2 Weak formulation and linearizationFor the weak formulation of the Navier-Stokes equations (2.1), we define thesolution space for the velocity and the test space, namely,

H1E = u ∈H 1(Ω)d | u = g on ∂ΩD,

H1E0

= v ∈H 1(Ω)d | v = 0 on ∂ΩD,

H 1(Ω)d = ui : Ω→ Rd | ui,∂ui

∂x j∈ L2(Ω), i, j = 1, · · ·,d,

and define L2(Ω) as the approximate space for the pressure p,

L2(Ω) = p : Ω→ R |∫

Ω

p2 < ∞.

Then the weak formulation of (2.1) reads as follows (see e.g. [31]):Find u ∈H1

E and p ∈ L2(Ω) such that∫Ω

∂u∂ t·vdΩ+ν

∫Ω

∇u : ∇vdΩ+∫

Ω

(u ·∇u) ·vdΩ−∫

Ω

p (∇ ·v)dΩ =∫

Ω

f ·vdΩ,∫Ω

q (∇ ·u)dΩ = 0,

(2.2)

for all v ∈ H1E0

and q ∈ L2(Ω). Here ∇u : ∇v represents the componentwisescalar product, e.g., in two dimensions ∇u1 ·∇v1+∇u2 ·∇v2 (u = (u1,u2) andv = (v1,v2)). The pressure is uniquely defined only up to a constant term. Tomake is unique, one normally imposes the additional constraint

∫Ω

p dΩ = 0.We also assume that the discretization is done using a stable pair of FEMspaces, satisfying the LBB condition (see e.g. [31]).

Due to the presence of the convective term (u ·∇)u in (2.1) or (∫

Ω(u ·∇u) ·

vdΩ) in (2.2), the Navier-Stokes system is nonlinear. There are two widelyused linearization methods, i.e., Newton’s method and Picard’s method, seee.g. [31]. Next we briefly introduce linearization by Newton’s method, fol-lowed by Picard’s method.

12

Page 13: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

Let (u0, p0) be an initial guess and let (uk, pk) be the approximate solutionat the kth nonlinear step. Substituting into the weak formulation (2.2), thenonlinear residual is obtained as

Rk =∫

Ω

f ·vdΩ−∫

Ω

∂uk

∂ t·vdΩ−ν

∫Ω

∇uk : ∇vdΩ−∫

Ω

(uk ·∇uk) ·vdΩ

+∫

Ω

pk (∇ ·v)dΩ

Pk =−∫

Ω

q (∇ ·uk)dΩ,

for all v ∈ H1E0

and q ∈ L2(Ω). We update the approximation of the velocityand pressure as uk+1 = uk + δuk, pk+1 = pk + δ pk, where δuk ∈ H1

E0and

δ pk ∈ L2(Ω) (provided that uk ∈ H1E and pk ∈ L2(Ω)). Then, the correction

(δuk,δ pk) should satisfy the following problem:Find δuk ∈H1

E0and δ pk ∈ L2(Ω) such that∫

Ω

∂ (δuk)

∂ t·vdΩ+ν

∫Ω

∇δuk : ∇vdΩ+∫

Ω

(uk ·∇δuk) ·vdΩ

+∫

Ω

(δuk ·∇uk) ·vdΩ+∫

Ω

(δuk ·∇δuk) ·vdΩ−∫

Ω

δ pk (∇ ·v)dΩ = Rk∫Ω

q (∇ ·δuk)dΩ = Pk,

for all v ∈H1E0

and q ∈ L2(Ω).By dropping the term

∫Ω(δuk ·∇δuk) ·v, we obatin Newton’s linearization:

Find δuk ∈H1E0

and δ pk ∈ L2(Ω) such that∫Ω

∂ (δuk)

∂ t·vdΩ+ν

∫Ω

∇δuk : ∇vdΩ+∫

Ω

(uk ·∇δuk) ·vdΩ

+∫

Ω

(δuk ·∇uk) ·vdΩ−∫

Ω

δ pk (∇ ·v)dΩ = Rk∫Ω

q (∇ ·δuk)dΩ = Pk

for all v ∈H1E0

and q ∈ L2(Ω). After the correction (δuk,δ pk) has been com-puted, the next approximation is updated as uk+1 = uk + δuk and pk+1 =pk +δ pk.

Picard’s linearization process is obtained in a similar way as Newton’s lin-earization, except that an additional term,

∫Ω(δuk ·∇uk) ·vdΩ is also dropped.

Thus, Picard’s linearization reads as follows:Find δuk ∈H1

E0and δ pk ∈ L2(Ω) such that∫

Ω

∂ (δuk)

∂ t·vdΩ+ν

∫Ω

∇δuk : ∇vdΩ+∫

Ω

(uk ·∇δuk) ·vdΩ−∫

Ω

δ pk (∇ ·v)dΩ = Rk∫Ω

q (∇ ·δuk)dΩ = Pk,

13

Page 14: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

for all v ∈ H1E0

and q ∈ L2(Ω). Similarly, we update the approximations asuk+1 = uk +δuk and pk+1 = pk +δ pk for k = 0,1, · · · until convergence. Thelinear system to be solved at each Picard’s step is also known as the Oseenproblem.

In summary, the linearization of the Navier-Stokes equations (2.1) by New-ton’s method results in a sequence of problems of the formFind (u ∈H1

E0, p ∈ L2(Ω)) such that

∂u∂ t−ν∆u+(w ·∇)u+(u ·∇)w+∇p = f on Ω× (0,T ],

∇ ·u = 0 on Ω× (0,T ],

with proper boundary conditions for u. Here the field w denotes the approxi-mation of the velocity computed at the previous Newton iteration.

Picard’s linearization results in a sequence of Oseen’s problems, namely,Find (u ∈H1

E0, p ∈ L2(Ω)) such that

∂u∂ t−ν∆u+(w ·∇)u+∇p = f on Ω× (0,T ],

∇ ·u = 0 on Ω× (0,T ],

with proper boundary conditions for u.As is well known, provided that the initial guess is sufficiently close to the

exact solution, Newton’s method shows locally quadratic convergence. How-ever, besides more work to assemble the required matrices and vectors, anotherdisadvantage of Newton’s method is that the radius of its ball of convergenceis proportional to the viscosity [31]. Therefore, for small viscosity, it is essen-tial to run a few Picard iterations to feed a sufficiently good initial guess to theNewton’s iterations, since Picard’s method has a larger radius of convergencethan Newton’s method [42].

2.3 Preconditioning of the two-by-two block systemsLet Xh

E0and Ph be finite dimensional subspaces of H1

E0and L2(Ω), and let

~ϕi1≤i≤nu be the nodal basis of XhE0

and φi1≤i≤np be the nodal basis of Ph.According to the Galerkin framework, the discrete velocity and pressure arerepresented as

uh =nu

∑i=1

ui~ϕi, ph =np

∑i=1

piφi,

where nu and np are the total number of degrees of freedom for velocity andpressure. The linear system arising from Newton’s or Picard’s linearization isof the form [

A BT

B O

][uhph

]=

[fg

]or A x = b, (2.3)

14

Page 15: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

where the system matrix A =

[A BT

B O

]is nonsymmetric and indefinite of

saddle point form. The unknown vector uh is the discrete velocity and ph isthe discrete pressure. Combining them together we set xT = [uT

h , pTh ]. The

matrix B ∈ Rnu×np corresponds to the discrete (negative) divergence operatorand BT corresponds to the discrete gradient operator. In Newton’s method thepivot block A has the form A = σM + νL+N +W where M is the velocitymass matrix, L is the Laplacian matrix, N denotes the convective matrix andW denotes the Newton derivative matrix. The parameter σ denotes a func-tion reciprocal to the time step (σ = 0 for a stationary problem). Given theapproximation of uh, the entries of N and W are given by

N = [ni j], ni j =∫

Ω

(uh ·∇~ϕ j) ·~ϕi,

W = [wi j], wi j =∫

Ω

(~ϕ j ·∇uh) ·~ϕi.

In Picard’s method, the derivative matrix W is neglected.Linear systems of the form (2.3) are often referred to as two-by-two block

systems. Fast and reliable numerical solutions for two-by-two block systemshave been intensively studied in the past decades, see the milestone papers[5, 7, 8, 17] and the book [31], and the numerous references therein.

As is well known, direct solution methods are highly robust with respectto both problem and discretization parameters, and are, therefore, a preferredchoice in numerical simulations performed by engineers and applied scien-tists. The limiting factors of the sparse direct solvers are most often the de-mands on memory resources and the need to repeatedly factorize matrices,which are recomputed during the simulation process, as for instance, the Jaco-bians in nonlinear problems. For real industrial applications where the modelsare mostly three dimensional and result in very large scale linear systems ofthe type (2.3), rapidly convergent iterative methods, accelerated by a properpreconditioner become the methods of choice. In this thesis, we consider thepreconditioned Krylov subspace methods, see [1, 4, 65].

To accelerate the convergence rate of the Krylov subspace methods, effi-cient preconditioning techniques become crucial and essential. The definitionpreconditioning refers to transforming the linear system

A x = b

into an equivalent one,P−1A x = P−1b,

with the aim that the coefficient matrix P−1A should have more favorableproperties for iterative solution methods than A itself. A preconditioner, de-noted here by P , is in general a linear operator, defined explicitly as a matrixor implicitly as a procedure. The requirements for efficient preconditionershave been presented in Chapter 1.

15

Page 16: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

How to construct efficient preconditioners for two-by-two block systemsarising in the incompressible Navier-Stokes equations is one of the main con-cerns in this work. There are several strategies to construct preconditioners.The first class of preconditioners are referred to as purely algebraic precon-ditioners. The term ’algebraic’ means that when constructing preconditionersonly the information of the coefficient matrix and the right hand side vector isneeded. This class of preconditioners includes the incomplete LU factoriza-tion method, sparse approximate inverse and algebraic multilevel and multi-grid methods (see, for example, the survey paper [15] and references therein).The study of these preconditioners is out of the scope of this thesis.

In this work, we limit ourselves to the preconditioners based on some ap-proximate block-factorization of the original matrix. In general, the exactblock-factorization of a matrix of two-by-two block form is

A =

[A11 A12A21 A22

]=

[A11 0A21 S

] [I1 A−1

11 A120 I2

], (2.4)

where I1 and I2 are identity matrices of proper dimensions. The pivot blockA11 is assumed to be nonsingular and S = A22−A21A−1

11 A12 is the exact Schurcomplement matrix. The preconditioners for such matrices of two-by-twoblock form are either of full block-factorized form or of block lower- or upper-triangular form

MF =

[A11 OA21 S

][I1 A−1

11 A12O I2

], (2.5)

ML =

[A11 OA21 S

], MU =

[A11 A12

0 S

]. (2.6)

Here the matrix A−111 denotes some approximation of A−1

11 , given either in ex-plicit form or implicitly defined via an inner iterative solution method with aproper stopping tolerance. The matrix S is some approximation of the exactSchur complement S. The literature on this class of preconditioners is huge,see [5, 7, 8, 11, 13, 32, 41, 50, 57, 60, 62, 64], the surveys [3, 17], the book[31] and the references therein.

In [7] it is pointed out that for indefinite systems, the block-triangular pre-conditioner, ML or MU , is more efficient than the full block-factorized pre-conditioner MF . Furthermore, when solving systems with the preconditionerMF , we need the action of A−1

11 twice. This is, clearly, computationally heav-ier task, compared to ML and MU , where the action of A−1

11 is needed onlyonce. So, for the Navier-Stokes equations linearized using Newton’s methodand Picard’s method, the block-triangular preconditioners ML and MU are theones to choose, which are identically efficient in practice.

16

Page 17: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

For A11 = A11 and S = S in the preconditioner ML (2.6), the preconditionedmatrix M−1

L A (A is defined as in (2.4)) is of the form

M−1L A =

[A−1

11 O−S−1A21A−1

11 S−1

][A11 A12A21 A22

]=

[I1 A−1

11 A120 I2

]where the matrices I1 and I2 are the identity matrices of proper dimensions. Itis known (cf., e.g. [7]) that (i) in this case the minimal polynomial of M−1

L A ,i.e., the polynomial P(·) of the smallest degree for P(M−1

L A ) = 0 takes theform P = (1− t)2 and there will be at most two iterations when solving sys-tems with the matrix A preconditioned by ML; (ii) in the general case, whereA−1

11 ≈ A−111 and S ≈ S, the eigenvalues of M−1

L A are located in disks and theradii of the disks are controlled by making a sufficient number of inner itera-tions when iteratively solving systems with the pivot block matrix A11 and bychoosing a sufficiently accurate approximation S of S. Thus, we can see thatthe quality of the preconditioner ML applied to the matrix A depends on theaccurate solutions of the pivot block matrix and how well the Schur comple-ment matrix is approximated. The most challenging task, however, turns outto be the construction of numerically and computationally efficient approxi-mations of the Schur complement matrix, which is in general dense and it isnot practical to form it explicitly.

The research on Schur complement approximations for the Stokes and Os-een’s problem with constant viscosity has been quite active during the pastdecades [3, 17, 57, 62, 31]. Some of the well-known (problem-dependent)Schur complement approximations are the following.

(1) The pressure mass matrix MpThe matrix Mp can be used for the Stokes problem and for relatively’mild’ values of ν in the Oseen problem (see, e.g., [52]), but it is notefficient for more general settings. We note also that Mp is always sym-metric and positive definite while S is in general nonsymmetric.

(2) The pressure convection-diffusion (PCD) preconditioner SPCDThis preconditioner is first suggested in [43] and is an approximation ofthe Schur complement matrix of the form

S−1PCD = M−1

p ApL−1p ,

where Ap and Lp are the pressure convection-diffusion and Laplace ma-trices correspondingly, and M−1

p denotes some approximate solution withMp (pressure mass matrix). As can be seen, some non-physical bound-ary conditions are needed for Ap and Lp.

(3) The BFBt preconditionerThis preconditioner is also an approximation of the Schur complementmatrix and is defined as

S−1BFBt = (BM−1

u BT )−1BM−1u AM−1

u BT (BM−1u BT )−1,

17

Page 18: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

where Mu is a diagonal approximation of the velocity mass matrix Mu.This preconditioner is suggested in [29]. As is seen from its definition,no artificial boundary conditions have to be set, and the preconditioneris fairly easy to construct.

(4) The Hermitian/Skew-Hermitian (HSS) method [12, 16] and the aug-mented Lagrangian (AL) method [18, 19, 20].

These approximations may be costly to apply, e.g., the BFBt preconditioneror may need the construction of an additional convection-diffusion operator onthe finite element space for the pressure and an artificial boundary for the pres-sure, e.g., the PCD preconditioner, or may need special care of method param-eters to achieve the ”optimal” convergence rate and the so-obtained ”optimal”parameters are problem-dependent, e.g., as in the HSS method and the ALmethod. These approximations are fairly robust with respect to the discretiza-tion and problem parameters, i.e., the mesh size h and the viscosity ν .

Our contribution to the search of efficient preconditioners for the constantcoefficients N-S equations is contained in Papers I and II. In paper I we con-tribute to the techniques for approximating the Schur complement matrix us-ing an element-by-element framework. In paper II we present a more generalanalysis of the so-called augmented Lagrangian method.

2.3.1 Element-by-element approximation of a Schur complementmatrix

When discretizing the linearized Navier-Stokes equations or the Stokes equa-tions with finite element method, the resulting linear system admits itself thetwo-by-two block structure as shown in (2.3) or the more general form (2.4).We note that the local stiffness matrix, corresponding to the coefficient ma-trix of the linear system obtained by discretizing the linearized Navier-Stokesequations or the Stokes equations on one finite element, admits also a two-by-two block form, namely,

Ak =

[Ak

11 Ak12

Ak21 Ak

22

],

and the whole coefficient matrix is obtained by A = ∑nEk=1 RT

k AkRk. The termsRk are the Boolean matrices that prescribe the local-to-global correspondenceof the degrees of freedom and nE denotes the total number of finite elements.

The element-by-element approximation of the Schur complement [6, 45,51] is constructed based on the local features of the finite element discretiza-tion, and is of the form

SEBE =nE

∑k=1

RTk SkRk, or

S−1EBE =

nE

∑k=1

RTk S−kRk,

(2.7)

18

Page 19: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

and Sk is the local Schur complement on each element, i.e., Sk =Ak22−Ak

21(Ak11)−1Ak

12(Ak

11 and Sk are assumed non-singular for all elements). From the formula (2.7)we see that the construction of such an approximation is numerically cheapand fully parallelizable. For a uniform mesh, we only need to compute thelocal Schur complement or its inverse on one element and assemble it on allelements.

In Paper I this method is used to approximate the Schur complement ofthe system matrices arising from the Stokes problem and the Oseen problemwith constant viscosity. For the Stokes problem, this approximation is inde-pendent of the mesh refinement. However, for the Oseen problem, as mostof the widely-used preconditioners, this approximation is not fully robust withrespect to the mesh size and the value of the viscosity. Due to its low computa-tional cost and sparse structure, it is still very attractive for the incompressibleNavier-Stokes equations with moderate viscosity, i.e., ν ∈ [0.1,0.01], corre-sponding to Re ∈ [10,100]. Although in Paper I we only test this precon-ditioner for the Oseen problem, it is applicable also for the linear algebraicsystems arising from Newton’s linearization.

After the FEM discretization, the arising coefficient matrices, both the globallinear system and the local stiffness matrices, are already in a two-by-twoblock form due to the underlying Oseen problem or Stokes problem. In othercases, this two-by-two block structure can be obtained based on some properblock partitioning, such as that based on consecutive regular mesh refinements.Hence, this preconditioner is suitable for a broader class of applications. Inseveral works (see e.g. [6, 45]) it has been shown that, SEBE is a high qualityapproximation of the exact Schur complement S, i.e., (1− ζ 2)S ≤ SEBE ≤ S,where ζ is a positive constant, strictly less that one, independent on the meshsize and easily computable. However, so far this has been shown rigorouslyonly in the case of symmetric and positive definite matrices. In Paper I, wealso suggest a framework to study this preconditioner for general nonsymmet-ric matrices. As mentioned in Paper I, more effort is needed to analyse andoptimize this preconditioning method for nonsymmetric matrices, and this isalso one possible direction for my future research.

2.3.2 Augmented Lagrangian methodAiming to find a preconditioner which is fully independent of the mesh sizeand the viscosity, we consider the so-called augmented Lagrangian (AL) ap-proach (see e.g. [5, 33]). To this end we first transform the linear system (2.3)into an equivalent one with the same solution, which is of the form[

A+ γBTW−1B BT

B 0

][uhph

]=

[fg

]or AALx = b, (2.8)

where γ > 0 and W are suitable scalar and matrix parameters. The modifiedright hand vector is f = f+ γBTW−1B g. It is clear that the transformation

19

Page 20: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

(2.8) can be done for any value of γ and any nonsingular matrix W . In [18]and [20] the AL type preconditioners are proposed for the transformed system(2.8), which are of block lower- or upper-triangular form

M ALL =

[A+γBT W−1B 0

B − 1γ

W

]or M AL

U =

[A+γBT W−1B BT

0 − 1γ

W

](2.9)

We now briefly explain the purpose of doing the transformation (2.8). Com-paring the matrix AAL in (2.8) with its AL type preconditioner in (2.9) andthe general two-by-two block matrix A in (2.4) with its block triangular pre-conditioner in (2.6), we can see that the Schur complement SAAL = −B(A+

γBTW−1B)−1BT of the transformed matrix AAL is approximated by −1γW .

Here the matrix W can be chosen to be the pressure mass matrix as shown in[18] or even the identity matrix as shown in [5].

It has been analyzed in [18, 39] that, to make the AL type preconditionerM AL

L work efficiently for any choice of W , the value of γ should be large,i.e., the spectrum of (M AL

L )−1AAL clusters to 1 with γ → ∞. Thus, for largevalues of γ and provided that we accurately solve the system with the modifiedpivot block AAL = A+ γBTW−1B, the AL preconditioner will lead to a veryfast convergence, within a few iterations.

However, with increasing γ the modified pivot block matrix of AAL be-comes increasingly ill-conditioned and finding fast solutions for systems withAAL becomes increasing difficult, which contradicts to the requirement that γ

needs to be large. How to balance the effect of the value of the parameter γ isstudied in a recent PhD thesis [69], where the matrix W is fixed as the pressuremass matrix and the focus is on how to choose the ”optimal” value of γ . In-deed, some ”optimal” values of γ are derived based on special techniques, andthe optimal γ is found to be small- γ ∈ [0.01,0.001]. The small values contra-dict to the requirement that γ needs to be large so that the AL preconditioner(2.9) works efficiently for the transformed linear systems (2.8). Further, theso-obtained ”optimal” values are still problem dependent and it is difficulty toapply them for broad applications.

Focusing on the choice of γ only and ignoring the other matrix parameterW can not give us a complete insight of the behavior of the AL preconditioner.In Paper II our main contribution is that we analyse the effects of γ and Win a more general framework. The analysis reveals that when we aim at bal-ancing the solution of the modified pivot block AAL (the inner solver) and thesolution of the whole system, preconditioned by M AL

L or M ALU , the optimal

value of γ turns out to be one. The latter, however, entails that W should be agood approximation of the original Schur complement, i.e., W ∼ −BA−1BT .To the best of author’s knowledge, Paper II is the first paper in this area totheoretically point out that the efficient Schur complement approximation ofthe original system matrix A is still not circumvented even when using theAL preconditioner.

20

Page 21: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

The structure of the matrices, arising in the above outlined AL method canbe related to those, arising after applying the so-called ’grad-div’ stabilizationto the momentum equation. The AL method begins with the discretization,followed by stabilization (transformation). Provided that the incompressibilityconstraint is satisfied, i.e. ∇ ·u = 0, it is possible to apply the gradient operatorto the divergence-free constraint ∇(∇ ·u), pre-multiplied with a stabilizationconstant γ to the momentum equation in the Navier-Stokes equations (2.1).Thus, we obtain the so-called ’grad-div’ stabilization formulation [22, 38, 54]

∂u∂ t−ν∆u− γ∇(∇ ·u)+(u ·∇)u+∇p = f,

∇ ·u = 0.

Then, after linearization and discretization with FEM, the resulting nonsym-metric and indefinite linear system takes the form[

A+ γAGD BT

B 0

][uhph

]=

[fg

]or AGDx = b, (2.10)

where the matrix AGD denotes the discrete operator of the term∫

Ω(∇ ·u)(∇ ·

v)dΩ, which is similar to the matrix BT B (v is a test function). As can beseen, the resulting matrix AGD is similar to the matrix AAL in (2.8), with W -the identity matrix. It is clear that the matrix AGD is sparser than BTW−1Band incomplete LU factorization can be used to construct a preconditioner forA+ γAGD. However, in this case there is only the parameter γ to tune the’grad-div’ stabilization, while in the AL method, we possess both γ and W toplay with.

2.3.3 Fast solutions with the modified pivot block in theaugmented Lagrangian method

In some sense, the ’grad-div’ stabilization method can be seen as a attemptto find efficient solutions for the modified pivot block AAL = A+ γBTW−1B,which is dense and not practical to be formed explicitly. In [69], a specificmultigrid method is used for this block which works efficiently with smallvalues of γ . Clearly, with γ → 0 the modified block AAL converges to theoriginal pivot block A, which is sparse. At the same time, the efficiency of theAL preconditioner decreases when γ becomes small.

In order to find efficient solutions independent of the values of γ , in Pa-per II we propose a numerical algorithm to compute the exact or an approxi-mate inverse of AAL based on the inverse Sherman-Morrison’s (ISM) formula[24, 25, 26], where the approximate inverse can be used as an multiplicativepreconditioner when iteratively solving systems with AAL.

21

Page 22: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

The matrix AAL can be rewritten in a more general form, namely, H =A0 +XY T . Here the matrix A0 is assumed to be non-singular and its inverseto be easily computed (e.g., A0 could be diagonal, even the identity matrix).There are various application areas, where matrices of the form of H arise. Forexample, large matrices of that type appear in statistical problems and their ex-act inverses are to be computed. Thus, an efficient implementation of the ISMalgorithm can serve as a useful computational kernel for a broader class of ap-plications. In earlier works, the ISM algorithm is implemented with only level1 BLAS routines [24, 25], which have limited efficiency on modern parallelarchitectures. Here, we give the ISM formula and briefly introduce the Blas-1ISM algorithm (see below).

Let H be of the form H = A0 +XY T with X ,Y ∈ Rn×m and Im ∈ Rm×m

be the identity matrix of size m. The Sherman-Morrison-Woodbury formulaprovides an explicit form of (A0 +XY T )−1, given by the expression

(A0 +XY T )−1 = A−10 −A−1

0 X(Im +Y T A−10 X)−1Y T A−1

0 , (2.11)

provided that the matrix Im+Y T A−10 X is nonsingular. Applying formula (2.11)

on the columns of X and Y , in [24, 25] an algorithm is derived, to compute A−1

in the following form

A−1 = A−10 −A−1

0 UR−1V T A−10 , (2.12)

where R ∈ Rm×m is a diagonal matrix and U,V ∈ Rn×m. The computationalprocedure is presented in Algorithm 1 below. We use Matlab-type notationsand IA, IA0 denote A−1 and A−1

0 , correspondingly.

Algorithm 1 (Blas-1 ISM)

for k=1:m,U(:,k) = X(:,k);V(:,k) = Y(:,k);for l=1:k-1,

U(:,k) = U(:,k) - (V(:,l)’*IA0*X(:,k))*R(l,l)*U(:,l);V(:,k) = V(:,k) - (Y(:,k)’*IA0*U(:,l))*R(l,l)*V(:,l);

endR(k,k) = 1/(1+V(:,k)’*IA0*X(:,k));

endIA = IA0 - IA0*U*R*V’*IA0;

As we can see, Algorithm 1 consists of vector and matrix-vector operationsonly, which are relatively less efficient on nowadays computers. In Paper IVwe propose two parallel strategies based on a block version of the ISM algo-rithm, which involve only matrix operations, e.g., matrix×matrix. In this way,the highly optimized and parallelized BLAS3 routines can be used, which areshown to be more efficient than the BLAS1 routines. One of the two block

22

Page 23: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

ISM algorithms is designed to achieve good performance in terms of com-putational complexity referred to as the block ISM (BISM) algorithm. Theother, referred to as the reduced memory block ISM (RMISM) algorithm, re-duces the memory demands to make it more suitable for large scale problems.Both block ISM algorithms are implemented on CPU and GPU based comput-ers, and their efficiencies are compared and demonstrated in Paper IV. Here weonly present the parallel speedup as an example. For arbitrary dense matricesX ,Y of the size X ,Y ∈ Rn×m,n = 2×104,m = 2×103, and A0 ∈ Rn×n- diag-onal, the parallel speedup of the two block ISM algorithms using OpenMP isplotted in Figure 2.1. As can be seen, both of the two algorithms exhibit linearspeedup.

Figure 2.1. Scaling behavior on a multicore system

A common way to compute an approximate inverse via ISM is to drop somerelatively small-valued entries when doing the ISM factorization [24, 26]. Inpaper IV, we only consider the efficient implementation of the block ISM al-gorithm for dense matrices. The analysis of the effect of sparse matrices onthe performance of the block ISM algorithm, data structures and paralleliza-tion techniques, as well as obtaining an approximate inverse of a sparse matrixare planned as future research directions.

23

Page 24: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

3. Incompressible Navier-Stokes equationswith variable viscosity

3.1 IntroductionIn this chapter we consider fast solution methods for the incompressible Navier-Stokes equations with variable viscosity. There are mainly two classes of ap-plications with non-constant viscosity. The first class incorporates non(quasi)-Newtonian flows. For example, in non-Newtonian flows viscosity may be afunction of pressure and the rate-of-strain tensor (e.g., [21, 59]). In somequasi-Newtonian flows the variable viscosity may also depend on pressureand shear (e.g., [40, 48, 58]). The other class of applications with variableviscosity arises in multi-phase flow models where each phase is assumed to beimmiscible and incompressible.

The incompressible Navier-Stokes equations with variable viscosity read asfollows:

∂u∂ t

+u ·∇u−∇ · (2ν(·)Du)+∇p = f, on Ω× (0,T ]

∇ ·u = 0, on Ω× (0,T ],(3.1)

with some given boundary and initial conditions for u. Here Ω× (0,T ] ⊂ Rd

(d = 2,3) is a bounded, connected domain with boundary ∂Ω and f : Ω→ Rd

is a given force field. The operator Du = (∇u+∇T u)/2 denotes the rate-of-strain tensor. As mentioned, in non-Newtonian flows, the kinematic viscos-ity ν may depend on the second invariant of the rate-of-deformation tensorDII(u) = 1

2 tr(D2u) and pressure p, i.e., ν(·) = ν(DII, p) (e.g. [34]). In multi-phase flows, the kinematic viscosity ν is a function of time and space, i.e.,ν(·) = ν(x, t), to be determined.

Variable viscosity is an important factor affecting the behaviors of the knownpreconditioners. In this chapter we choose the augmented Lagrangian methodto study the impact of the variation of viscosity. Here, we fix the density asconstant. An illustrative example for such a system is a mixture of water andoil, which have almost the same density, however their viscosities differ much.Other examples of problems of practical importance are considered in [68],namely, extrusion with variable viscosity and a geodynamic problem with asharp viscosity contrast.

24

Page 25: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

3.2 Effect of the variable viscosity on the augmentedLagrangian method

As already stated, we assume that the kinematic viscosity coefficient is asmooth function, such that

0 < νmin ≤ ν(x, t)≤ νmax,

where νmin and νmax denote its minimal and maximal values. When solv-ing (3.1), we limit ourselves to the stationary state, and we apply Picard’slinearization method. This technique requires to solve a sequence of approxi-mate solutions of the Oseen problem, which reads as follows:At each Picard iteration, find u : Ω→ Rd and p : Ω→ R satisfying

−∇ · (2ν(x, t)Du)+(w ·∇)u+∇p = f, in Ω

∇ ·u = 0, in Ω(3.2)

where w = u(k−1) is the velocity, which has been computed at the previousPicard iteration, and is updated at every nonlinear step.

The weak formulation of (3.2) reads as follows:Find u ∈H1

E and p ∈ L2(Ω) such that∫Ω

2ν(x, t)Du : DvdΩ+∫

Ω

(w ·∇)u ·vdΩ−∫

Ω

p(∇ ·v)dΩ =∫

Ω

f ·vdΩ,∫Ω

q∇ ·udΩ = 0,

(3.3)

for all v ∈ H1E0

and all q ∈ L2(Ω). The linear systems arising from the weakformulation (3.3) are again of the form[

F BT

B O

][uhph

]=

[fg

]or A x = b, (3.4)

where the system matrix A =

[F BT

B O

]is nonsymmetric and indefinite of

two-by-two block form. The unknown vector uh ∈Rnu is the discrete velocityand ph ∈ Rnp is the discrete pressure. We also assume that the discretizationis done using a stable pair of FEM spaces, satisfying the LBB condition [31].Clearly, when considering variable viscosity, the difference, compared to thecase of constant viscosity, can be observed in the pivot block F ∈ Rnu×nu ,which, in the case of variable viscosity, has the form F = Aν +N. The matrixAν is the discrete operator, corresponding to the term

∫Ω

2ν(x, t)Du : DvdΩ,where ~ϕi1≤i≤nu are the nodal basis functions

Aν ∈ Rnu×nu , [Aν ]i, j =∫

Ω

2ν(x, t)D~ϕi : D~ϕ j.

25

Page 26: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

Thus, the matrix Aν is symmetric and positive definite. Note, however, thatit is not block-diagonal. The other matrices, i.e., N and B, are the same asdefined in Chapter 2.

Here, we also consider preconditioned iterative methods to solve the linearsystem (3.4), and the chosen preconditioning approach is the augmented La-grangian method. As in Chapter 2, we first transform the linear system (3.4)algebraically into an equivalent one[

F + γBTW−1B BT

B 0

][uhph

]=

[fg

]or Aγx = b, (3.5)

where f = f + γBTW−1 g, and γ > 0 and W are suitable scalar and matrixparameters. Clearly, the transformation (3.5) does not change the solution forany value of γ and any nonsingular matrix W . The equivalent system (3.5) iswhat we intend to solve and the AL-type preconditioner for the transformedsystem matrix Aγ is of the block triangular form

ML =

[F + γBTW−1B 0

B −1γW

]. (3.6)

As analyzed in [5, 18, 39], the eigenvalues λ of the preconditioned ma-trix M−1

L Aγ consist of two parts, i.e., multiple 1 and 11+1/γµ

, where µ arethe eigenvalues of Q = W−1BF−1BT . Supposing that the µ are bounded ina rectangular box and the bounds are independent of the mesh size, then theeigenvalues λ are also bounded and the bounders are again robust with re-spect to the mesh size. The main theoretical contribution regarding the ALpreconditioner in Paper III is that, with the choice of W = M (the pressuremass matrix), we derive the following bounds of the eigenvalues µ , namely,

c20ν2

min

νmax(ν2min + c2

1)≤ Re(µ)≤ 1

νminand | Im(µ) |≤ 1

2νmin,

where the constants c0, c1, νmin and νmin are independent of the mesh size.This form generalizes the result from Theorem 1 in [30] where the Oseenproblem with constant viscosity is considered.

Still, the bottleneck problem in the AL preconditioning method is howto efficiently solve systems with the modified pivot block F + γBTW−1B.Besides the ISM method introduced in Chapter 2, in Paper III we test anaggregation-based algebraic multigrid method (AGMG) [53, 55, 56] for themodified pivot block F +γBTW−1B, and numerical experiments show that theAGMG method behaves reasonably well and can be utilized as a choice ofmethod in practice. The task to find fast solution methods of the pivot blockremains a very challenging problem and is also one direction for my futureresearch.

26

Page 27: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

For the non-Newtonian flows, the velocity u and the pressure p satisfy thefollowing generalized incompressible Navier-Stokes equations:

∂u∂ t

+u ·∇u−∇ · (2ν(DII(u), p)Du)+∇p = f, in Ω× (0,T ]

∇ ·u = 0, in Ω× (0,T ]

where the kinematic viscosity ν may depend on the second invariant of therate-of-deformation tensor DII(u) = 1

2 tr(D2u) and pressure p [34]. Becausethe viscosity function ν(DII(u), p) may also depend on velocity u, two of theterms exhibit nonlinear behavior: ∇ · (2ν(DII(u))Du, p) and u ·∇u. WhetherNewton’s or Picard’s linearization is used, the linear system arising from FEMdiscretization is still of two-by-two block form as in (3.4) and the augmentedLagrangian preconditioning method is straightforwardly applicable. The con-tribution in Paper V is that, to the best of author’s knowledge, it is the first timeto propose the AL type preconditioning method in the area of fast solutionmethods for the non-Newtonian flow problems. The numerical experimentsincluded in Paper V confirm the usefulness of the approach.

27

Page 28: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

4. Incompressible Navier-Stokes equationswith variable viscosity and density

4.1 IntroductionVariable density variable viscosity problems arise in many complex flow pro-cesses, and have been studied intensively by numerical simulations. For exam-ple, density and viscosity can be a function of the temperature in convectionflows; variable density ground water flow phenomena, in particular when afluid of high density overlays a fluid of lower density, driven by the gravityand the variable density; rising bubble phenomena where a bubble of lightfluid is surrounded by a heavier fluid, and rises and deforms due to gravity andsurface tension forces; splashing phenomena when a solid is dropped into aliquid, typically when penetrating the surface of the liquid, etc.

The major difficulty of the numerical approximation of such flows is thehigh complexity of the nonlinear time-dependent variable density and vari-able viscosity Navier-Stokes equations, combined with the mass conservationequations for density and viscosity. Normally operator splitting schemes areused to solve the coupled system. Fast and reliable solution methods for thearising discrete linear systems of equations are crucial for numerical simula-tions. In this chapter all the above aspects are discussed.

4.2 Reformulation of the coupled systemNow we consider the incompressible Navier-Stokes equations in their fullcomplexity, including time-dependence, and spatial and temporal variation ofdensity and viscosity. The formulation reads as follows:

ρ(∂u∂ t

+u ·∇u)−∇ · (2µDu)+∇p = ρf in Ω× (0,T ],

∇ ·u = 0 in Ω× (0,T ],∂ρ

∂ t+u ·∇ρ = 0 in Ω× (0,T ],

(4.1)

with some given boundary and initial conditions for u and ρ . The operatorDu = (∇u+∇T u)/2 denotes the rate-of-strain tensor.

Using the mass conservation equation, i.e., the third equation in (4.1), namely,

∂ t(ρu)+(u ·∇)(ρu) = ρ(

∂u∂ t

+u ·∇u)+u(∂ρ

∂ t+u ·∇ρ) = ρ(

∂u∂ t

+u ·∇u),

28

Page 29: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

the momentum equation can be reformulated as

∂ t(ρu)+(u ·∇)(ρu)−∇ · (2µDu)+∇p = ρf.

We assume that the viscosity depends on density as some Lipschitz-continuousfunction µ(ρ). Then a similar equation as for density also holds for viscosity,namely,

∂ µ

∂ t+u ·∇µ =

∂ µ

∂ρ(∂ρ

∂ t+u ·∇ρ) = 0.

By introducing the momentum variable v = ρu, the incompressible Navier-Stokes equations with variable viscosity and density can be rewritten into thefollowing formulation:

∂v∂ t

+(u ·∇)v−∇ · (2µD(v/ρ))+∇p = ρf in Ω× (0,T ],

∇ ·v = u ·∇ρ in Ω× (0,T ],∂ρ

∂ t+u ·∇ρ = 0 in Ω× (0,T ],

∂ µ

∂ t+u ·∇µ = 0 in Ω× (0,T ].

(4.2)

The second equation in (4.2) can be seen as a consequence of the incom-pressibility constraint. It is obtained from the relation

∇ ·v≡ ∇ · (ρu) = ρ∇ ·u+u ·∇ρ.

Since ∇ ·u = 0, then ∇ ·v = u ·∇ρ .The boundary and initial conditions are assumed to be

ρ|t=0 = ρ0, µ|t=0 = µ0 = µ(ρ0), u|t=0 = u0v|t=0 = v0 = ρ0u0, u|Γ = b, ρ|Γin = a,

where Γ = ∂Ω, a > 0 and µ(ρ) is a given function.Since the advection equations for density and viscosity are first order hy-

perbolic equations, then the boundary conditions are given at inflow boundary,

Γin = x ∈ Γ,u ·n < 0. Therefore,[

ρ

µ

]Γin

=

[a

µ(a)

]. For the Navier-Stokes

equations (4.2), there is no need to impose any boundary and initial conditionsfor the pressure variable.

In our work we advocate two ideas. The first is to use the momentum v= ρuinstead of the velocity u as a variable in the model. The second claim is thatone should solve the N-S equations as a coupled system for v and p insteadsplitting those, in this way avoiding the need to impose non-physical boundaryconditions for the pressure. The rational for the latter idea is that as we have

29

Page 30: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

already seen, we can solve the arising saddle point systems via fast and robustpreconditioned iterative solution methods.

There are several reasons why we should use the momentum variable v =ρu instead of the velocity u. First, one can expect that v has a smoother behav-ior, i.e., less strong variations, than u and can be more accurately approximatedin numerical simulations. Second, when using the variable v, the momentumequation coupled with the divergence constraint for the momentum, i.e., thefirst two equations in (4.2), have a form analogous to the Navier-Stokes equa-tions with constant viscosity and density (Chapter 2) and constant density vari-able viscosity (Chapter 3). After the operator splitting and linearization by the”frozen coefficient” approach in a similar way as is done in Picard’s method,the resulting linear problem is still of Oseen’s type (see details in the nextsection). Therefore, all the preconditioning methods proposed for the discretelinear system of equations arising in the Oseen problem are straightforwardlyapplicable.

4.3 Discretization in time, operator splitting scheme andlinearization

As already mentioned, in order to handle the high computational complexityof the mathematical model, normally some operator splitting methods are uti-lized, see [14, 27, 35, 36, 37, 61]. To get an insight of these operator splittingmethods, we take the scheme given in [37] as an example. With the initializa-tion (ρ0,u0, p0), the approximate sequences ρn,un, pnn=0,1,···,N on all timelevels are computed by solving:

ρn+1−ρn

τ+∇·(ρn+1un)− ρn+1

2∇·un = 0,

ρn un+1−un

τ+ρ

n+1(un·∇)un+1−µ∆un+1 +ρn+1

4(∇·un)un+1

+∇pn = fn+1, un+1|∂ΩD = g,

∆pn+1 =χ

τ∇·un+1, ∂n pn+1|∂Ω = 0.

(4.3)

As can be seen, the diffusion-convection term is advanced at each time step(i.e., the second equation in (4.3)) without enforcing the incompressibilityconstraint. The resulting, intermediate velocity field is then projected ontothe space of discretely divergence-free vector field (i.e., the last equation in(4.3)). However, one needs to impose some non-physical boundary conditionsfor pressure, i.e., ∂n pn+1|∂Ω = 0. This scheme is proposed for the originalNavier-Stokes equations (4.1) with constant viscosity variable density, and thetwo terms ρn+1

2 ∇·un, ρn+1

4 (∇·un)un+1 are added for stability reasons. Also, it

30

Page 31: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

is indicated in [35] that the resulting pressure is still a reasonable approxima-tion to the true pressure, at least in the interior of the domain Ω, and the errorsare mainly located around the boundary ∂Ω.

The computational procedure used in Paper VI is motivated by two facts.First, since in general the initial pressure is not known, we must keep thecoupled diffusion and incompressibility constraint in its coupled form intact,which also enables the computation of pressure without use of artificial pres-sure boundary conditions. Therefore, instead we split off the advection part,which can be handled separately. Second, for reasons of stability and to avoidthe use of very small time steps, we must use a stable implicit time integrationmethod of second order of accuracy.

At each time level, we first compute density, and then compute velocity andpressure at the same time by solving the momentum equation coupled togetherwith the divergence constraint for the momentum. Implicitly, the incompress-ibility constraint, i.e., ∇ ·u = 0 is also satisfied. Furthermore, we linearize thecoupled equations using a ”frozen coefficient” approach in a similar way as isdone in the Oseen problem.

To this end, we find the approximate sequences ρn,µn,vn,un, pnn=0,1,···,Nwith the initial conditions (ρ0 = ρ0,µ

0 = µ0,v0 = v0,u0 = v0/ρ0) for all timesteps n from 0 to N−1. We also assume that µ is a known function of ρ .

Algorithm 2 (Backward Euler scheme)A1-1: Compute ρn+1 from

ρn+1−ρn

τ+un·∇ρ

n+1 = 0. (4.4)

A1-2: Compute (vn+1, pn+1) from

vn+1−vn

τ+(un·∇)vn+1−∇·(µn+1D(

vn+1

ρn+1 ))+∇pn+1 = ρn+1 fn+1,

∇·vn+1− τ2∆pn+1 = un·∇ρ

n+1.

(4.5)

A1-3: Finally, obtain un+1 as un+1 = vn+1/ρn+1.

The above equations are obtained by using the first-order semi-implicit dis-cretization. To fully avoid unphysical oscillations (see, e.g., [37]), we addi-tionally regularize the problem by adding the term −τ2∆pn+1, where −∆ isthe negative Laplacian operator and τ is the time step.

To obtain a algorithm of second-order accuracy in time, we can replace thefirst-order Euler backward time discretization with the three-level backwarddifferentiation formula (BDF2). This scheme proceeds as follows. First, oneinitializes (ρ0,µ0,v0,u0), and computes (ρ1,µ1,v1,u1, p1) by using one stepof the first-order Algorithm 2. Then for n≥ 1, proceed as follows.

31

Page 32: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

Algorithm 3 (BDF2)

A2-1: Set the linearly extrapolated velocity at time level n+1 as

u? = 2un−un−1.

A2-2: Compute ρn+1 from

3ρn+1−4ρn +ρn−1

2τ+u?·∇ρ

n+1 = 0. (4.6)

A2-3: Compute (vn+1, pn+1) from

3vn+1−4vn +vn−1

2τ+(u?·∇)vn+1−∇·(µn+1D(

vn+1

ρn+1 ))+∇pn+1 = ρn+1 fn+1,

∇·vn+1− τ2∆pn+1 = u?·∇ρ

n+1.

(4.7)

A2-4: Recover the velocity un+1 as un+1 = vn+1/ρn+1.

Remark 4.3.1 The second order backward difference time stepping method issimple to implement. The linearized coupled system of equations (4.7) can beseen as the Oseen problem introduced already in Chapter 2. Therefore, af-ter discretization with the finite element method, the resulting nonsymmetricand indefinite linear system is of two-by-two block form. Then, the precondi-tioning techniques proposed for the two-by-two block systems are applicable,typically those preconditioners used in the Oseen problem, see the examplespresented in Chapter 2. In the following section of this chapter we discussanother efficient preconditioning technique for the discrete equations (4.7).

However, The BDF2 method is not fully stable in the sense of A- and B-stability for systems of nonlinear ordinary differential equations (cf., e.g.,[70]). Such stability holds for linear problems with all eigenvalues of theoperator located in the stable half of the complex plane. The stability analysisfor nonlinear problems is more complicated, however, and one can not justrely on eigenvalues of the linearized (Jacobian) operator. It can be shown thatmethods, such as BDF2 or the traditional form of the trapezoidal method arenot fully stable. This prevents the use of the methods for long time intervals.

In [2] it has been shown that the so-called ’one-leg’ (or ’one-sided’ formof the θ -method for θ ≤ 1/2−|O(τ)|) is stable for monotone operators uni-formly in time and is, hence, applicable for infinitely long time integrationintervals. It has a second order of accuracy for θ = 1/2−|O(τ)|, where τ isthe time step. For θ = 1/2−|O(τς )|, ς < 1, the method is not fully of secondorder but has increased stability properties. For reasons of simplicity, whenwe do not need to integrate on very long time intervals, as well as for reasons

32

Page 33: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

of comparison with other related work, such as [37], the BDF2 method is alsoused in the numerical experiments.

We first briefly recall the one-leg θ -method (OLTM). Consider the evolu-tion equation

dt+F(t,ξ ) = 0, t > 0, ξ (0) = ξ0.

The classical θ -method on implicit one-leg form is

ξ (t + τ)−ξ (t)+ τF(t, ξ ) = 0, t = 0,τ,2τ, · · ·ξ (0) = ξ0,

where τ is the time step and

t = θ t +(1−θ)(t + τ)

ξ = θξ (t)+(1−θ)ξ (t + τ),0≤ θ ≤ 1.

We refer to [2, 67] for more details on the one-leg form of the θ -scheme andits properties.

We present next an implementation of OLTM for the density and the mo-mentum equation. Note that we need to compute (u(tn + τ/2), p(tn + τ/2)),which is done by solving a Stokes problem of the form as in (4.8). To addition-ally simplify the computational procedure, we split the momentum vn+1 intotwo parts, i.e., vn+1 = (vn+1

1 + vn+12 )/2, where the component vn+1

1 recoversthe convective character as in (4.10) and the other component vn+1

2 takes careof the diffusion property and the divergence constraint for the momentum asin (4.11).

We choose θ = 1/2 to guarantee second order accuracy in time.

Algorithm 4 (OLTM)

A3-1: Compute v(tn + τ/2),u(tn + τ/2), p(tn + τ/2) by solving

vn+ 12 −vn

τ/2−∇·(µnD(

vn+ 12

ρn ))+∇pn+ 12 = ρ

n fn+ 12 − (un·∇)vn,

∇·vn+ 12 − τ

2∆pn+ 1

2 = un·∇ρn,

(4.8)

and set un+ 12 = vn+ 1

2 /ρn.A3-2: Compute ρn+1 by solving

ρn+1−ρn

τ+un+ 1

2 ·∇ρn+1 +ρn

2= 0. (4.9)

A3-3: By defining ρn+ 12 = (ρn+1 + ρn)/2 and µn+ 1

2 = (µn+1 + µn)/2, onecomputes (vn+1

1 ,vn+12 , pn+1) by solving

vn+11 −vn

τ+un+ 1

2 ·∇vn+1

1 +vn

2= ρ

n+ 12 fn+ 1

2 +∇·(µn+ 12 D(un+ 1

2 ))−∇pn+ 12 ,

(4.10)

33

Page 34: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

Table 4.1. Comparison of the computational complexity of BDF2 and OLTM

BDF2 OLTMSolve the hyperbolic equations as in (4.6) Solve the hyperbolic equations twice as in

(4.9),(4.10)Solve the Oseen type problem as in (4.7) Solve the Stokes type problem twice as in

(4.8),(4.11)Reassemble matrices corresponding to Reassemble matrices corresponding tou?·∇, ∇·(µn+1D) un·∇,un+ 1

2 ·∇, ∇·(µnD),∇·(µn+ 12 D)

Recompute more right hand vectors

vn+12 −vn

τ−∇·(µn+ 1

2 D(vn+1

2ρn+1 +un)/2)+∇ pn+1 = ρ

n+ 12 fn+ 1

2 − (un+ 12 ·∇)

vn+11 +vn

2,

∇·vn+12 − τ

2∆ pn+1 = 2un+ 1

2 ·∇ρn+1−∇·vn+1

1 .

(4.11)

A3-4: Finally, we compute (vn+1,un+1, pn+1) as

vn+1 =vn+1

1 +vn+12

2, un+1 =

vn+1

ρn+1 , pn+1 =pn+1 + pn+ 1

2

2.

The form of the constraint in (4.11) is motivated by

∇·vn+1 = ∇·vn+1

1 +vn+12

2= ∇·(ρn+1un+1) = un+1·∇ρ

n+1 +ρn+1

∇·un+1.

Using the assumption ∇·un+1 = 0 one gets

∇·vn+12 = 2un+1·∇ρ

n+1−∇·vn+11 .

Since the un+1 is not known, we use un+ 12 to replace it. Also, as can be seen,

the incompressibility constraint, i.e., ∇·u = 0 is satisfied.Table 4.1 summarizes the computational complexity of Algorithm 3 and

Algorithm 4 at each time level. As is well known, preconditioning and solv-ing the Stokes problem is much easier than preconditioning and solving theOseen problem especially for small values of the viscosity. Thus, the efficientpreconditioned iterative solution methods for the Stokes equations will pay offthe heavier assembling and computing work in Algorithm 4, making it a veryattractive approach.

34

Page 35: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

(a)

(b)

Figure 4.1. The difference between the computed and true pressure by using BDF2(a) and OLTM (b) with τ = h = 0.0156 and T = 1.57.

Numerical experiments in Paper VI show that both the BDF2 and the one-leg θ -scheme are stable and second order accurate in time. Compared to thecomputation of velocity, density and viscosity, the more difficult task is tocorrectly compute the pressure unknowns. In Figure 4.1, for a test problemwith known analytical solution (see Paper VI) we see that both schemes cap-

35

Page 36: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

ture quite well the pressure. Since in the two algorithms we do not imposeany artificial boundary conditions for the pressure unknowns, small differencebetween the computed and the true pressure appears globally within the do-main. This is in contrast to the results in [35], see, e.g., Figure 4.2, where thedifference is mainly located around the boundary due to some non-physicalboundary conditions imposed for the pressure.

Figure 4.2. Pressure error field at time T = 1 in a square, scaned from [35].

4.4 Preconditioning techniquesAfter discretizing in space using some proper finite element pair, we canrewrite the linearized system (4.7) in the BDF2 scheme into a block matrixstructure as follows:

A

[vh(t + τ)ph(t + τ)

]= rhs, where A =

[A BT

B −τ2Lp

]. (4.12)

The system (4.12) will be solved by a generalized conjugate gradient method,such as GMRES ([65]) or GCGMR ([1]). The matrix block A has the formA = σM+E, where M is a mass matrix, E comes from the discrete diffusionand convection terms and σ is a function of the reciprocal of the time stepτ . The block B arises from the discrete negative divergence operator and theterm−τ2Lp corresponds to the discrete stability operator, i.e.,−τ2∆p in (4.7),where Lp is the discrete Laplacian operator for the pressure unknowns.

The preconditioner used for the system (4.12) is of block triangular form

P =

[A OB S

], (4.13)

where S is the approximation of the exact Schur complement of (4.12), i.e., S=−τ2Lp−BA−1BT . As can be seen, the preconditioner used here follows thesame construction strategy introduced in Section 2.3, see the block triangular

36

Page 37: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

preconditioner (2.6) utilized for the general two-by-two block matrix (2.4). Inthis paper, we construct an approximation of the Schur complement throughits inverse

S−1 =−σ L−1p −µM−1

p ,

where Mp is the diagonal part of the pressure mass matrix. The matrix Lp isobtained as Lp = Lp +h2I, where Lp is the pressure Laplacian matrix (whichis singular), I is the identity matrix of corresponding order and h is the meshsize. Adding a diagonal perturbation to Lp is a useful technique to make itnonsingular, yet not altering the order of the discretization error. This precon-ditioner is analyzed in [49, 67], see also [17] and the related references therein.For simplicity, we assume that the viscosity number µ is constant. It is trivialto solve linear systems with the diagonal matrix Mp, and for the linear systemswith Lp we use an aggregation-based multigrid (AGMG) [53, 55, 56] solver.

The matrix A denotes a approximation of the pivot matrix A, obtained as aninner iterative solution method with a proper stopping tolerance. We also rec-ommend the aggregation-based multigrid method (AGMG) as an inner solver.

To check the quality of the preconditioner P for some test problem withknown analytical solution (Paper VI), we present the iterations when solvingthe system (4.12) by GCGMR and the iterations when solving systems withLp and A by the AGMG in Table 4.2. If we fix the ratio τ/h and look throughthe columns, we can see that the AGMG and the GCGMR iterations are inde-pendent of the mesh refinement. If we fix the mesh size h, i.e., looking throughthe rows, we can see that AGMG and the GCGMR iterations are slightly de-pendent of the time step τ , but the iterations are still quite few for the timestep range, i.e., h ≤ τ ≤ 8h. Therefore, we can conclude that the block tri-angular preconditioner P (4.13) with the AGMG method as the inner solverfor its sub-blocks has an optimal computational complexity, thus, the num-ber of operations, performed per degree of freedom per iteration is boundedindependently of the number of degrees of freedom.

The results in Table 4.2 are obtained by setting all the three stopping tol-erances, i.e., the tolerance used when solving the system (4.12) by GCGMRand the tolerance used when solving systems with Lp and A by the AGMG,as relative, equal to 10−6. The purpose of choosing such tight tolerance forAGMG is to illustrate its efficiency. In practice, it is not necessary to choosesuch small tolerance for the AGMG solver. For the action of L−1

p , it turns outthat only one V-cycle AGMG is enough to obtain an enough accurate solution.

When using the one-leg θ -scheme (Algorithm 4), one needs to solve theStokes type problem twice. How to efficiently precondition the linear systemarising from the Stokes problem has been well studied. One can refer to thereferences given in Chapter 2 for more details.

37

Page 38: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

Table 4.2. Iterations

its its itsτ Lp A A Lp A A Lp A A

h = 0.031 h = 0.015 h = 0.007h 8 5 5 8 5 5 8 7 7

2h 8 7 7 8 7 7 8 8 94h 8 9 8 8 9 10 8 8 88h 8 7 11 8 10 10 8 10 10

4.5 Coupling with the phase-field model to solve themulti-phase flow problems

Next, we choose a more realistic problem to check the numerical schemesproposed above, e.g., the development of a Rayleigh-Taylor instability in arectangular domain [0,1]∪ [0,4]. The system consists of two immiscible andincompressible flows and is an example of density-driven flow. At T = 0 theheavier phase is above the lighter, and for T > 0, the system is driven bythe action of the downward gravity force. The interface evolves with time.Therefore, the density and viscosity are non-constant and depend on time andspace.

There are two classes of techniques for the computer simulation of multi-phase problems. The first group of methods is that of the so-called sharpinterface methods, where, generally speaking, Navier-Stokes equations aresolved for each phase separately and the solution is ’glued’ via special inter-face conditions to be imposed on the interfaces. These methods are capable ofrepresenting the underlying physics rather accurately but there are several dis-advantages. In these methods, the interface is explicitly traced, which imposesadditional requirements on the meshes and their readjustment. Furthermore,the surface forces, which are important for capturing the correct behavior ofthe multiphase problems, are imposed as boundary conditions at the interfaces,and are not trivial to handle. Besides, in this framework the problem param-eters (viscosity and density) exhibit jumps across the interface, which makesthe numerical simulation more difficult.

The second group of methods are referred to as the diffusive interface meth-ods. There it is assumed that the interface has some small, but nonzero thick-ness. Density and viscosity remain constant within each flow phase, howeveradmit rapid but smooth variation across interfaces, which evolve with time andspace, and, therefore, can be seen as smooth functions of space and time. Nor-mally, besides the Navier-Stokes equations in this framework, another math-ematical model e.g., the phase-field model or the level-set model, is addedto simulate and trace the interfaces. In this thesis, we choose the phase-fieldmodel. Then, the governing equations for the multi-phase problems are the

38

Page 39: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

Navier-Stokes equations, coupled together with the Cahn-Hilliard equations(see, e.g. two recent PhD thesis [23, 46]). In the coupled system, at each timelevel, one needs to solve the Cahn-Hilliard equations to locate the interfaceand compute the surface tension forces. Then, the approximated surface ten-sion forces are added to the force term and the Navier-Stokes equations aresolved. The computed velocity field obtained by solving the Navier-Stokesequations is feeded back to the Cahn-Hilliard equations. The above process iscontinued until the required time has been reached.

The aspects of fast solution methods of the Cahn-Hilliard equations are outof the scope of this thesis, and we only use the preconditioning techniquespresented in [23]. When using the BDF2 method (Algorithm 3) to solve theNavier-Stokes equations, there is no need to solve the advection equations fordensity and viscosity. As mentioned already, in the diffuse interface methods,density and viscosity can be seen as smooth functions and these functionsare expressed in the phase-field model and computed by solving the Cahn-Hilliard equations. The preconditioning techniques proposed in Section 4.4are straightforwardly used here. By setting the Atwood number At = (ρ2−ρ1)/(ρ2 +ρ1) = 0.5, i.e., ρ2/ρ1 = 3 (ρ2,ρ1 are two constant densities in thetwo fluid phases), and other parameters (see details in Paper VI), we can plotthe interface with evolving time as in Figure 4.3. As can be seen, at T = 0.8 therolling-up of the falling fluid can be clearly seen and this plot appears nearlyidentical with that reported in [44]. For long time simulations, turbulencephenomena begins to take place (see [44], for instance).

39

Page 40: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

(a) T=0 (b) T=0.4 (c) T=0.8

Figure 4.3. The evolution of the interface in Rayleigh-Taylor instability problem.

40

Page 41: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

5. Computational challenges and some openproblems to be addressed in future research

5.1 Efficient solutions of the modified pivot block in theaugmented Lagrangian method

The bottleneck problem in the augmented Lagrangian preconditioner is howto efficiently solve the system with one sub-block in it, which is dense andnot practical to be explicitly formed. In this thesis several methods are tested,e.g., incomplete LU factorization, algebraic multigrid method etc. We alsocontribute to the search of efficient solution methods and we propose an al-gorithm to compute its approximate inverse based on the so-called inverseSherman-Morrison formula, which can be used as a multiplicative precondi-tioner when applying iterative solutions. However, this problem is still anopen and challenging question and is not fully resolved. More research effortis to be invested in it. The problem is studied by other researchers, such asJ. Cerdán et al (see [26]) and R. Bru et al (see [24]), and initial contacts forfuture collaboration with those authors have been established.

5.2 Element-by-element Schur complementapproximation method

The element-by-element Schur complement approximation method is intro-duced in Chapter 2 and studied in Paper I for the Oseen problem with constantdensity and viscosity, which gives raise to nonsymmetric and indefinite linearsystems. This method is studied also by other researchers, such as J. Kraus(see [45]) and O. Axelsson and M. Neytcheva (see [6]). As mentioned inthose works and also in Paper I, more research effort need to be invested fornonsymmetric matrices. The general framework proposed in Paper I for thenonsymmetric case will be further analyzed and numerical improvements willbe addressed in my future research. Cooperation with M. Neytcheva for thistopic is planned to be continued in the future.

5.3 Adaptive mesh refinementIn the numerical simulation of flow problems, the accuracy of simulation re-lies on the resolution of regions where the variables undergo rapid changes

41

Page 42: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

or attain relatively small values. Since these regions have a large impact onthe accuracy of the final solution, relatively finer grids are needed there. Atthe same time, in the regions where variables have a smooth behavior, coarsergrids suffice to guarantee the resolution.

Uniformly refined meshes are not suitable in general due to the high compu-tational complexity for large scale problems. In order to decrease the numberof degrees of freedom, and at the same time make sure that a finer mesh isprovided in the regions where high resolution is required, adaptive mesh re-finement is a more favourable choice. How to determine the domains wherethe mesh needed to be refined or de-refined and how to choose the stoppingtolerances for the refinement and de-refinement are the main challenges. Someadaptive strategies have been studied in two and three dimensions to improvecomputational efficiency, e.g., [9, 10]. On the other hand, multiphase flowprecesses may undergo very dynamic stages and local refinement might entailtoo much overhead, compared to a fine enough fixed mesh. The interplay be-tween the above arguments and the impact on the performance of a parallelcomputer implementation, as well as the impact on anisotropy on the robust-ness of the considered preconditioners remain be to studied and are includedin my planned research work.

5.4 Stable numerical schemes with higher order ofaccuracy in time

Already known and widely used operator splitting methods for the Navier-Stokes equations with variable density and viscosity, which have been theoret-ically or numerically proven to be stable, are at most of second order accuracyin time. In order to guarantee the accuracy of the final solution, the time stepcan not be relatively large. Therefore, numerical schemes with lower order ofaccuracy in time are not recommended for long time simulations.

Aiming at using large time-steps to reduce the computational complexitywhile keeping the accuracy of the solutions within certain bounds, we mayconsider another approach, namely, to represent the system of PDEs govern-ing the incompressible and time-dependent Navier-Stokes problem as a systemof differential algebraic equations (DAEs). In this way, the high-order time in-tegration methods for the ordinary differential equations, e.g., the family ofRunge-Kutta methods, can be used, and large time steps become possible. Onthe other hand, as is known, the coefficients needed in the Runge-Kutta meth-ods have to be computed in sequence, thus, the efficient parallelization is noteasily achieved and the modern computers can not be fully utilized for largescale problems. Besides, some mathematical challenges also need further in-vestigation, e.g., by doing this transformation the same solution of the PDEsand DAEs has to be guaranteed and the existence of unique solution of theDAEs has to be rigorously proved etc. The search of stable numerical meth-

42

Page 43: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

ods with higher accuracy order for the Navier-Stokes equations is another veryimportant direction for my future research. The above method is one possiblechoice and more research must be invested here.

43

Page 44: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

6. Summary of papers

6.1 Paper IIn this paper we target the efficient solution of systems of equations arising inmultiphase flow models, where the governing equations are the Cahn-Hilliardequations and the incompressible Navier-Stokes equations. The finite elementdiscretization of the systems of the above PDEs results in indefinite and non-symmetric matrices of two-by-two block form.

For the solution of the so-arising linear systems we consider iterative meth-ods and describe a preconditioner, based on the classical two-by-two blockfactorization of the matrix of the underlying algebraic system. The exact Schurcomplement of the matrix is approximated via the element-by-element Schurcomplement approximation method, which has been shown in several works(see e.g. [6, 45]) to be a cheap technique resulting in a high quality approxima-tion of the Schur complement. However, so far this has been shown rigorouslyonly in the case of symmetric and positive definite matrices. In this paper,a framework is suggested to analyse the behavior of this preconditioner forgeneral indefinite and nonsymmetric matrices.

Numerical experiments in this paper show that for the Stokes problem, thispreconditioner is independent of the mesh refinement and the preconditionediterations are quite few. However, for the linearized problem of the Navier-Stokes equations, e.g., the indefinite and nonsymmetric Oseen problem, thispreconditioner is not robust with respect to the mesh refinement and the vis-cosity, as most of the widely used preconditioners. Due to its low constructioncost and sparse structure, it still remains very attractive for the Navier-Stokesequations with moderate viscosity, in terms of preconditioned iterations andlow computational complexity at each iteration. Based on the numerical ex-periments, we also recommend this preconditioner for the discretized systemsof the Cahn-Hilliard equations.

6.2 Paper IIIn this paper we consider the efficient preconditioner for nonsymmetric andindefinite matrices of two-by-two block structure, in particular arising in thefinite element discretization of the linearized Navier-Stokes equations, e.g.,the Oseen problem with constant viscosity.

Efficient and robust preconditioners for Stokes and Oseen problem withconstant viscosity have been studied intensively during the past decades (see,

44

Page 45: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

for example, [3, 17, 57, 62, 31] and the references therein). Most of the widelyused preconditioners make full utilization of the block factorization of the two-by-two block matrix, and approximate the pivot block and the exact Schurcomplement in some way. However, as shown in this paper, most of the well-known (problem-dependent) Schur complement approximations are fairly butnot fully robust with respect to the mesh refinement and the viscosity.

Aiming at finding some preconditioners, which are fully independent of thediscretization parameter and the viscosity, we choose the so-called augmentedLagrangian (AL) method. In the AL method, the preconditioner is based onthe equivalent system obtained via the algebraical transformation of the origi-nal linear system of equations. Then a block-triangular preconditioner is pro-posed for the transformed system. In the AL preconditioner, the exact Schurcomplement of the equivalent system is preconditioned by−1

γW where W and

γ are matrix and scalar parameters, and the modified pivot block is implicitlyapproximated via an inner iterative solver with a proper stopping tolerance.

The analysis in this paper and other literature reveals that, if γ→∞ and pro-vided that the modified pivot block is solved accurately enough, it takes veryfew iterations for the AL preconditioned system to converge to the exact solu-tion. However, the modified pivot block becomes increasingly ill-conditionedwith γ → ∞ and finding fast and reliable solutions for that block turns moredifficult. In this paper our main contribution regarding the AL preconditionersis that we analyse the effect of γ via a more general framework. The analy-sis reveals that when we aim at balancing the solution of the modified pivotblock (the inner solver) and the solution of the whole transformed system, pre-conditioned by the AL preconditioner, the optimal value of γ turns out to beone. The latter, however, entails that W should be a good approximation of theSchur complement of the original discrete system.

In practice, the major difficulty is moved to how to efficiently solve systemswith the modified pivot block. Based on its special structure, in this paper wesuggest an algorithm to compute its exact or approximate inverse by using theso-called inverse Sherman-Morrison formula. The sparse approximate inversecan be obtained by dropping relatively small-valued entries and can be used asa high quality multiplicative preconditioner when applying iterative solutionmethods.

6.3 Paper IIIThere are numerous applications which involve the incompressible Navier-Stokes equations with variable viscosity. Non-Newtonian flow is a class ofareas where the value of viscosity is not constant, for example, the viscositymay be a function of the pressure and the rate-of-strain tensor (e.g., [21, 59]).Another class of applications is multiphase flow problems. In this paper weonly consider the Navier-Stokes equations with constant density but varying

45

Page 46: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

viscosity. In some multiphase flow, the differences of the densities of all phaseflows can be ignored, however, their viscosities may differ much. An illustra-tive example for such a system is the mixture of water and oil.

Fast and reliable solution methods for the Navier-Stokes with variable vis-cosity are the main concern in this paper. The finite element discretization ofthe linearized Oseen problem with variable viscosity results in an indefiniteand nonsymmetric matrix of two-by-two block structure, similar to the Oseenproblem with constant viscosity. The only difference is found in the pivotblock, which involves the term of variable viscosity. An iterative solutionmethod is chosen for the arising linear systems. The variation of the viscosityis an important factor and effects the behavior of the already known precon-ditioners, proposed for two-by-two block matrices. To this end, we choosethe augmented Lagrangian method and analyse the impact of the variation ofviscosity on the AL preconditioner. The AL preconditioner involves a matrixparameter W and a positive scalar γ . The main theoretical contribution of thispaper is that, with the choice of W = M (M being the pressure mass matrix),the spectrum of the preconditioned matrix is proven to be bounded in a rectan-gular box, and the bounds are independent of the mesh size, but depend on theextremal values of the variable viscosity. The result in this paper generalizethe conclusion in [30], where the Oseen problem with constant viscosity isconsidered.

Still, how to efficiently solve linear systems with the modified pivot blockin the AL preconditioner remains an open question. In this paper we test anaggregation-based algebraic multigrid solver [53, 55, 56] and present numeri-cal experiments to show its efficiency.

6.4 Paper IVIn this paper we study the efficiency of a number of implementations of theso-called inverse Sherman-Morrison (ISM) algorithm to compute the exactor an approximate inverse of a non-singular matrix. The motivation is thatin the augmented Lagrangian method, one needs to efficiently solve systemswith a sub-block, which is of the form H = A0 +XY T . Here, the matrix A0 isassumed to be invertible and its inverse to be easy to compute (e.g., A0 couldbe the diagonal of A or even the identity matrix). The ISM algorithm turns outto be particularly efficient for ’thin’ matrices X and Y , i.e., matrices with manyrows and relatively few columns. Interestingly enough, large scale matrices ofthe form H with thin X ,Y arise in various statistical applications, where theexact inverses are required.

Using the same algorithm, a sparse approximate inverse can also be ob-tained, to be used as a multiplicative preconditioner for some iterative solutionmethods. The sparsification is usually obtained by dropping some relativelysmall-valued entries. The filtering tolerance is problem dependent.

46

Page 47: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

The ISM algorithm and a block version of it have been derived and studiedearlier in [24, 25, 26]. In this paper we focus on the parallel implementation ofthe block ISM algorithm on two computer architectures- multicore processorand GPU. We propose two block ISM factorization algorithms, one aims toachieve good performance in terms of computational complexity, and the otherreduces the memory demands to make it more suitable for large problems.Since the two block algorithms only perform operations on sub-matrices ofthe original matrix, standard level 3 BLAS routines can be utilized, whichare highly optimized and efficiently parallelized on modern high performancecomputers. we demonstrate their efficiency on both multicore CPU and GPU-equipped computers. A comparison of the performance of the CPU and GPUversions is also performed and analyzed with the aid of a performance model.

6.5 Paper VIn this paper we utilize the augmented Lagrangian method and the AL typepreconditioner to achieve fast-convergent iteration solution method when solv-ing the incompressible Navier-Stokes equations with variable viscosity, partic-ularly arising in non-Newtonian flows.

The construction of the AL method and the AL type preconditioner is al-gebraic and robust with respect to different forms of variable viscosity, e.g.,the variable viscosity may be a function of the pressure and the rate-of-straintensor (e.g., [21, 59]), and different linearization methods, i.e., the Newtonmethod and the Picard method. In this paper we numerically demonstratethe efficiency of the AL type preconditioner for the linearized problem, aris-ing in non-Newtonian flow problems with Bingham model. There, the sys-tem of modified pivot block in the AL type preconditioner is solved by anaggregation-based algebraic multigrid method. This paper has a practicalvalue since it is the first time to suggest the AL type preconditioner in thearea of fast numerical solutions of non-Newtonian flow models.

6.6 Paper VIIn this paper we consider numerical solution schemes for the incompressibleNavier-Stokes equations with variable viscosity and density. A class of ap-plications involving the above equations arise in multiphase flow problems,consisting of several different incompressible and immiscible phases.

The system is modeled by a coupled system of partial differential equa-tions and due to its high complexity, some operator splitting scheme is used(see some earlier works [35, 36, 37]). There, the hyperbolic equations of themass conservation for density and viscosity are solved first, and the incom-pressibility constraint, i.e., the divergence free constraint of the velocity, is

47

Page 48: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

decoupled from the momentum equation. Therefore, one needs to solve aconvection-diffusion equation to compute an intermediate velocity, and thenthis velocity field is projected onto the space of discretely divergence-free vec-tor fields by solving a Laplacian equation. This operator splitting method isoften used since it requires solutions of systems with constant Laplacian ma-trix, for which readily available fast solvers can be applied. However, duringthe projection step, some artificial boundary conditions have to be imposed forthe pressure unknowns.

Contrast to the above splitting method, in this paper we first reformulate theNavier-Stokes equations by introducing the momentum v = ρu as a variableinstead of the velocity u, then propose a different operator splitting method.Here, a consequence of the incompressibility constraint, i.e., the divergenceconstraint for the momentum, is coupled with the momentum equation andone can compute the divergence-free velocity and pressure at the same time.The momentum variable behaves much smoother than the velocity, which de-creases the difficulty in the numerical approximations. Besides, by using themomentum, the Navier-Stokes equations can be straightforwardly linearizedby the ”frozen coefficient” approach in a similar way as is done in Picard’smethod, which results in Oseen’s type problem. Our scheme is very attractivesince how to efficiently solve the Oseen problem has be studied intensivelyand the preconditioning techniques used there can be directly utilized here.

In addition to the above operator splitting method, we propose another onebased on the one-leg numerical scheme (e.g., [2]). In this method, instead ofOseen’s type problem, one needs to solve the Stokes problem twice. Handlingthe Stokes problem is much easier than the Oseen’s problem, especially withsmall values of the viscosity. Thus, efficient preconditioned iterative solutionmethods for the Stokes problem will pay off the heavier computational com-plexity, and we believe that it is also a very attractive method.

In this paper, we derive a splitting method and estimate the related split-ting error. We also analyse possible strategies to regularize the arising saddlepoint systems. The second order time discretization schemes are describedand tested, and their computational complexities are compared. The accuracyof the schemes and the convergence of the involved preconditioned iterativesolution methods are illustrated with numerical experiments. We also presentnumerical tests for the Rayleigh-Taylor instability problem using the coupledNavier-Stokes and Cahn-Hilliard equations.

48

Page 49: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

7. Summary in Swedish

I den här avhandlingen behandlar vi numeriska beräkningsmetoder för att lösade inkompressibla Navier-Stokesekvationerna, med hjälp av finita element-diskretisering i rummet, med fokus på iterativa metoder och prekonditioner-are.

Navier-Stokesekvationer består av en grupp kopplade, icke-linjära partielladifferentialekvationer (PDEer), och dessa ekvationer beror på en mängd olikaparametrar, som t.ex. densitet och viskositet. Densiteten och viskositeten kanvariera som en funktion av tid och rum, och vara diskontinuerliga om manexempelvis studerar ett multifassystem. Variationen hos parametrarna görden redan komplicerade uppgiften att simulera dessa system numerisk ännusvårare.

Till att börja med, fokuserar vi på de tidsoberoende Navier-Stokesekvation-erna med konstant densitet och viskositet. På grund av att PDEerna är icke-linjära, behövs metoder för linjärisering, t.ex. Newtons metod eller Picardsmetod. Efter linjärisering och diskretisering är det ursprungliga icke-linjäraproblemet i allmänhet modifierat till uppgiften att hitta lösningar till en sekvensav linjära ekvationssystem. På grund av egenskaper hos den underliggandematematiska modellen, är det linjära systemets koefficientmatris gles, indefinitoch osymmetrisk i en struktur om två-gånger-tvåblock. För att lösa de härlinjära systemen används iterativa metoder istället för direkta metoder, efter-som de inte kräver lika mycket datorkraft. För att öka konvergenshastighetenhos de iterativa metoderna, är det viktigt med en effektiv teknik för prekondi-tionering. Hur man på ett effektivt sätt prekonditionerar de blockmatriser somalstras av Navier-Stokesekvationerna med konstant densitet och viskositet harstuderats intensivt de senaste decennierna, och denna fråga är också avhan-dlingens huvudfokus. Vi studerar den klass av prekonditionerare som härledsur blockfaktorisering av två-gånger-tvåblockmatriser, följda av approxima-tioner av dess submatriser, d.v.s. pivotblocket innehållande hastigheten ochdet exakta Schurkomplementet. I artikel I och artikel II, bidrar vi till sökan-det efter effektiva prekonditionerare genom att analysera och testa den ele-mentvisa approximationsmetoden av Schurkomplementmatrisen och den såkallade utökade Lagrangianmetoden (augmented Lagrangian method) och dessresulterande blocktriangulära prekonditionerare. Allmänna ramverk föreslåsför att analysera dessa två metoder, kombinerade med numeriska experimentsom visar på des s numeriska effiktivitet.

49

Page 50: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

De inkompressibla Navier-Stokesekvationerna med variabel viskositet up-pkommer i många tillämpningar, så som icke-newtonska flödesproblem ochmultifasflödesproblem. Snabba och pålitliga lösningsmetoder för Navier-Stokesekvationerna med variabel viskositet är huvudtemat i artikel III och ar-tikel V. Artikel III behandlar Navier-Stokesekvationerna med varierande vis-kositet uppkomna ur multifasflödesproblem och artikel V behandlar en annanklass av tillämpningar, nämligen de icke-newtonska flödesproblemen. Finitaelementdiskretiseringen av de linjäriserade Navier-Stokesekvationerna medvariabel viskositet resulterar i indefinita och icke-symmetriska matriser medtvå-gånger-tvåblockstruktur. En iterativ lösningsmetod är vald för de linjärasystem som genereras. Variationen hos viskositeten är en viktig faktor sompåverkar beteendet hos sedan tidigare kända prekonditionerare, som tagitsfram för två-gånger-två-blockmatriser. Vi väljer den utökade Lagrangianmeto-den och analyserar vilket inflytande viskositetens variation har på de blocktri-angulära prekonditionerarna.

Ett subblock i den utökade Lagrangianprekonditioneraren är på formen(H = A+XTY ). Matrisen H är generellt tät trots att matriserna A, X och Y ärglesa. Att tillräckligt noggrant lösa systemen med matrisen H är en flaskhalsför den utökade Lagrangianprekonditioneraren och hur man gör det är fort-farande en öppen fråga. Dessutom uppstår storskaliga matriser på formen Hi flertalet statistiska tillämpningar, där de exakta inverserna är nödvändiga. Iartikel IV undersöker vi effektiviteten hos ett antal implementationer av dens.k. blockinversa Sherman-Morrisonalhoritmen (ISM) för att beräkna den ex-akta inversen eller en approximativ invers av en icke-singulär matris på for-men H. ISM-algoritmen visar sig vara särskilt effektiv för "tunna" matriserX och Y , d.v.s. matriser med många rader och relativt få kolumner. I dennaartikel fok userar vi på parallell implementation av block-ISM-algoritmen påtvå datorarkitektur-multicoreprocessorer och GPUer. Vi föreslår två block-ISM-algoritmer där den ena siktar mot att uppnå bra prestanda med avseendepå beräkningskomplexitet och den andra minskar minnesbehovet för att passabättre för stora problem. En jämförelse av prestandan av CPU- och GPU-versionerna av de två block-ISM-algoritmerna utförs och analyseras med hjälpav en prestandamodell.

I artikel VI behandlas numeriska lösningscheman för de inkompressiblaNavier-Stokes-ekvationerna med variabel densitet och viskositet. En klass avtillämpningar som leder till ekvationerna ovan uppkommer ur multifasflöde-sproblem, där ett system blandar olika inkompressibla och oblandbara fluider.Över den diffusa fluidgränsen varierar densitet och viskositet glatt men fort.I denna artikel omformulerar vi först Navier-Stokesekvationerna genom attintroducera rörelsemängd v = ρu som en variabel istället för u. Beroendepå komplexiteten föreslår vi två olika metoder för att splittra operatorerna

50

Page 51: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

och skattar felet som uppkommer vid en sådan delning. De två stencilernaför tidsdiskretisering, båda andra ordningens noggranna, beskrivs och testas,och deras beräkningskomplexitet jämförs. Noggrannheten hos de två sten-cilerna och konvergensen hos de involverade prekodition erade, iterativa lös-ningsmetoderna illustreras med numeriska experiment. En tillämpning för ettRayleigh-Taylor instabilitetsproblem med kopplade Navier-Stokes- och Cahn-Hilliardekvationer utförs.

51

Page 52: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

8. Acknowledgements

First of all, I would like to thank my advisor, Assoc. Prof. Maya Neytchevafor all her support and encouragement, and more importantly for generouslysharing her invaluable expertise. It has been a great pleasure to be your PhDstudent and to work with you. I deeply believe that my career will benefit a lotfrom the four-year’s study under your supervision.

I would like to express my gratitude to my second advisor Prof. GunillaKreiss. The regular discussions with you improve my knowledge and worksignificantly.

I would like to thank Prof. Owe Axelsson and Prof. Serra Capizzano Ste-fano. I have been very luck to have the opportunity to collaborate with you.Special thanks to Prof. Radim Blaheta for acting as the opponent for my li-centiate thesis defence.

Many thanks to Marcus Holm for our joint work and valuable discussions.Many thanks to all friends and colleagues at the Division of Scientific Com-

puting. I appreciate the inspiring and friendly environment here.Special thanks to Sofia Eriksson, for helping me with the summary in Swedish,

but most importantly for being such a good friend.I am really grateful to my parents for all the support and encouragement.

Many thanks to my dear Chinese friends in Sweden, Huang lixia, Zhao shuxi,Hu shuanglin, Yang xia, Mao Fang, Li shuyi, Dai liang, Jia kelin and Zhouhelin, for all the help and encouragement.

Special thanks to the China Scholarship Council and Northwesten Poly-technical University for supporting me to study in Sweden.

52

Page 53: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

References

[1] O. Axelsson. Iterative Solution Methods. Cambridge University Press,Cambridge, 1994.

[2] O. Axelsson. Error estimates over infinite intervals of some discretizations ofevolution equations. BIT Numerical Mathematics, 24:413-424, 1984.

[3] O. Axelsson. Milestones in the development of iterative solution methods. J.Electr. Comput. Eng., article ID: 972794, 2010.

[4] O. Axelsson and G. Lindskog. On the rate of convergence of thepreconditioned conjugate gradient method. Numer. Math., 48:499-523, 1986.

[5] O. Axelsson and R. Blaheta. Preconditioning of matrices partitioned intwo-by-two block form: Eigenvalue estimates and Schwarz DD for mixedFEM. Num. Lin. Alg. Appl., 17:787-810, 2010.

[6] O. Axelsson, R. Blaheta and M. Neytcheva. Preconditioning for boundaryvalue problems using elementwise Schur complements. SIAM J. Matrix Anal.Appl., 31:767-789, 2009.

[7] O. Axelsson and M. Neytcheva. A general approach to analysepreconditioners for two-by-two block matrices. Num. Lin. Alg. Appl., articlefirst published online: 14 Dec. 2011, DOI: 10.1002/nla.830.

[8] O. Axelsson and M. Neytcheva. Eigenvalue estimates for preconditionedsaddle point matrices. Num. Lin. Alg. Appl., 13:339-360, 2006.

[9] I. Babuška and T. Strouboulis. The Finite Element Method and its Reliability.Clarendon Press, New York, 2001.

[10] W. Bangerth and R. Rannacher. Adaptive Finite Element Methods forDifferential Equations. Birkhäuser Verlag, 2003.

[11] Z.-Z. Bai. Structured preconditioners for nonsingular matrices of blocktwo-by-two structures. Math. Comp., 75:791-815, 2006.

[12] Z.-Z. Bai, G. H. Golub, and M. K. Ng. Hermitian and skew-Hermitiansplitting methods for non-Hermitian positive definite linear systems. SIAM J.Matrix Anal. Appl., 24:603-626, 2003.

[13] Z.-Z. Bai and M.K. Ng. On inexact preconditioners for nonsymmetricmatrices. SIAM J. Sci. Comput., 26:1710-1724, 2005.

[14] J.B. Bell and D.L. Marcus. A second-order projection method forvariable-density flows, J. Comput. Phys., 101:334-348, 1992.

[15] M. Benzi. Preconditioning techniques for large linear systems. J. Comput.Phys., 182:418-477, 2002.

[16] M. Benzi. A Generalization of the Hermitian and Skew-Hermitian SplittingIteration. SIAM J. Matrix Anal. Appl., 31:360-374, 2009.

[17] M. Benzi, G. Golub and J. Liesen. Numerical solution of saddle pointproblems. Acta Numer., 14:1-137, 2005.

[18] M. Benzi and M.A. Olshanskii. An augmented Lagrangian-based approach tothe Oseen problem. SIAM J. Sci. Comput., 28:2095-2113, 2006.

53

Page 54: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

[19] M. Benzi and M.A. Olshanskii. Field-of-values convergence analysis ofaugmented Lagrangian preconditioners for the linearized Navier-Stokesproblem. SIAM J. Numer. Anal., 49:770-778, 2011.

[20] M. Benzi, M.A. Olshanskii and Z. Wang. Modified augmented Lagrangianpreconditioners for the incompressible Navier–Stokes equations. Int. J.Numer. Meth. Fluids., 66:486-508, 2011.

[21] M. Bercovier, M. Engelman, A finite element method for incompressiblenon-Newtonian flows. J. Comput. Phys., 36:313-326, 1980.

[22] S. Börm and S. Le Borne. H -LU factorization in preconditioners foraugmented Lagrangian and grad-div stabilized saddle point systems. Int. J.Numer. Meth. Fluids., 68:83-98, 2012.

[23] P. Boynanova. On numerical solution methods for block-structure discretesystems. PhD thesis, Department of Information Technology, UppsalaUniversity, Sweden, 2012. Available online, ISSN 1651-6214;942.

[24] R. Bru, J. Cerdán, J. Marín and J. Mas. Preconditioning sparse nonsymmetriclinear systems with the Sherman-Morrison formula. SIAM J. Sci. Comput.,25:701-715, 2003.

[25] R. Bru, J. Marín, J. Mas and M. Tuma. Balanced incomplete factorization.SIAM J. Sci. Comput., 30:2302-2318, 2008.

[26] J. Cerdán, T. Faraj, N. Malla, J. Marín and J. Mas. Block approximate inversepreconditioners for sparse nonsymmetric linear systems. Electron. Trans.Numer. Anal., 37:23-40 2010.

[27] A.J. Chorin. Numerical solution of the Navier-Stokes equations. Math.Comp., 22:745–762, 1968.

[28] U. Christensen and H. Harder. 3-D convection with variable viscosity.Geophys. J. Int., 104:213-220, 2007.

[29] H.C. Elman. Preconditioning for the steady-state Navier-Stokes equationswith low viscosity. SIAM J. Sci. Comput., 20:1299-1316, 1999.

[30] H.C. Elman and D.J. Silvester. Fast nonsymmetric iterations andpreconditioning for Navier-Stikes equations. SIAM J. Sci. Comput., 17:33-46,1996.

[31] H.C. Elman, D.J. Silvester and A.J. Wathen. Finite Element and Fast IterativeSolvers: with Application in Incompressible Fluid Dynamics. NumericalMathematics and Scientific Computing. Oxford University Press, New York,2005.

[32] H.C. Elman, D.J. Silvester and A.J. Wathen. Performance and analysis ofsaddle point preconditioners for the discrete steady-state Navier-Stokesequations. Numer. Math., 90:665-688, 2002.

[33] M. Fortin, R. Glowinski. Augmented Lagrangian Methods: Application to theNumerical Solution of Boundary-Value Problems. North-Holland PublishingCo., Amsterdam, 1983.

[34] P.P. Grinevich and M.A. Olshanskii. An iterative method for the Stokes-typeproblem with variable viscosity. SIAM J. Sci. Comput., 31:3959-3978, 2009.

[35] J.-L. Guermond, P. Minev and J. Shen. An overview of projection methodsfor incompressible flows. Comput. Methods Appl. Mech. Eng.,195:6011-6045, 2006.

[36] J.-L. Guermond and L. Quartapelle. A projection FEM for variable density

54

Page 55: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

incompressible flows. J. Comput. Phys. 165:167-188, 2000.[37] J.-L. Guermond and A. Salgado. A splitting method for incompressible flows

with density based on a pressure Poisson equation. J. Comput. Phys.228:2834-2846, 2009.

[38] T. Heister. A Massively Parallel Finite Element Framework with Applicationto Incompressible Flows. PhD Thesis, Göttingen University, 2011. Availableonline.

[39] X. He, M. Neytcheva, S. Serra Capizzano. On an augmentedLagrangian-based preconditioning of Oseen type problems. BIT NumericalMathematics, 51:865-888, 2011.

[40] J. Hron, J. Malék, J. Necas and K.R. Rajagopal. Numerical simulations andglobal existence of solutions of two-dimensional flows of fluids withpressure- and shear-dependent viscosities. Math. Comput. Simulation,61:297-315, 2003

[41] I.C.F. Ipsen. A note on preconditioning nonsymmetric matrices. SIAM J. Sci.Comput., 23:1050-1051, 2001.

[42] O. Karakashian. On a Galerkin-Lagrange multiplier method for the stationaryNavier-Stokes equations. SIAM J. Numer. Anal., 19:909-923, 1982.

[43] D. Kay, D. Loghin and A.J. Wathen. A preconditioner for the steady-stateNavier-Stokes equations. SIAM J. Sci. Comput., 24:237-256, 2002.

[44] H.G. Lee, K. Kim and J. Kim,. On the long term simulation of theRayleigh-Taylor instability. Int. J. Numer. Meth. Fluids., 00:1-25, 2010.

[45] J. Kraus. Algebraic multilevel preconditioning of finite element matricesusing local Schur complements. Num. Lin. Alg. Appl., 13:49-70, 2006.

[46] M. Kronbichler. Computational techniques for coupled flow-transportproblems. PhD thesis, Department of Information Technology, UppsalaUniversity, Sweden, 2012. Available online, ISSN 1651-6214;885.

[47] C. Li and C. Vuik. Eigenvalue analysis of the SIMPLE preconditioning forincompressible flow. Num. Lin. Alg. Appl., 11:511-523, 2004.

[48] J. Malék, J. Necas and K. R. Rajagopal. Global existence of solutions forflows of fluids with pressure and shear dependent viscosities. Appl. Math.Lett., 15:961-967, 2002.

[49] K.A. Mardal and R. Winther. Uniform preconditioners for the time dependentStokes problem. Numer. Math., 38:305-327, 2004.

[50] M.F. Murphy, G.H. Golub and A. J. Wathen. A note on preconditioning forindefinite linear systems. SIAM J. Sci. Comput., 21:1969-1072, 2000.

[51] M. Neytcheva. On element-by-element Schur complement approximations.Linear Alg. Appl., 434:2308-2324, 2011.

[52] M. Neytcheva, M. Do-Quang and H. Xin. Element-by-element Schurcomplete approximations for general nonsymmetric matrices of two-by-twoblock form. Lecture Notes in Computer Science (LNCS) 5910:108-115,2010.

[53] A. Napov and Y. Notay. An algebraic multigrid method with guaranteedconvergence rate, Report GANMN 10-03, Université Libre de Bruxelles,Brussels, Belgium, 2010 (Revised 2011).

[54] A.C. Niet and F.W. Wubs. Two preconditioners for saddle point problems influids flow. Int. J. Numer. Meth. Fluids., 54:355-377, 2007.

55

Page 56: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently

[55] Y. Notay. An aggregation-based algebraic multigrid method, Electron. Trans.Numer. Anal., 37:123-146, 2010.

[56] Y. Notay. Aggregation-based algebraic multigrid for convection-diffusionequations, Report GANMN 11-01, Université Libre de Bruxelles, Brussels,Belgium, 2011.

[57] M.A. Olshanskii and Y. Vassilevski. Pressure Schur complementpreconditioners for the discrete Oseen problem. SIAM J. Sci. Comput.,29:2686-2704, 2007.

[58] A. Ouazzi, S. Turek and J. Hron. Finite element methods for the simulation ofincompressible powder flow. Comm. Numer. Methods Engrg., 21:581-596,2005.

[59] T. C. Papanastasiou. Flows of materials with yield. J. Rheol., 31:385-404,1987.

[60] I. Perugia and V. Simoncini. Block-diagonal and indefinite symmetricpreconditioners for mixed finite element formulations. Num. Lin. Alg. Appl.,7:585-616, 2000.

[61] R. Rannacher. On Chorin’s projection method for the incompressibleNavier-Stokes equations. The Navier-Stokes equations II – theory andnumerical methods (Oberwolfach, 1991), 167-183, Lecture Notes in Math.,1530, Springer, Berlin, 1992.

[62] M.U. Rehman, C. Vuik and G.Segal. Preconditioners for the incompressibleNavier-Stokes equations. The Department of Applied Mathematical Analysis,Delft University of Technology, Netherland, TR-15, 2007.

[63] M.U. Rehman, C. Vuik and G.Segal. SIMPLE-type preconditioners for theOseen’s problem. Int. J. Numer. Meth. Fluids., 61:432-452, 2009.

[64] T. Rusten and R. Winther. A preconditioned iterative method for saddle pointproblems. SIAM J. Matrix Anal. Appl., 13:887-904, 1992.

[65] Y. Saad. Iterative Methods for Sparse Linear Systems. Society for Industrialand Applied Mathematics, Philadelphia, PA, second edition, 2003.

[66] P. J. Tackley. Effects of strongly variable viscosity on three-dimensionalcompressible convection in planetary mantles, J. Geophys. Res.,101:3311-3332, 1996.

[67] S. Turek. Efficient Solvers for Incompressible Flow Problems: AnAlgorithmic and Computational Approach. Springer Berlin, 1999.

[68] M. ur Rehman, T. Geenen, C. Vuik, G. Segal and S.P. MacLachlan, Oniterative methods for the incompressible Stokes problem. Int. J. Numer. Meth.Fluids., 65:1180-1200, 2011.

[69] Zh. Wang. Preconditioning techniques for the incompressible Navier-Stokesequations. PhD thesis, University of Emory, America, 2011. Available online.

[70] G. Wanner. A short proof on nonlinear A-stability, Scientific Notes, BIT,16:226-227, 1976.

[71] C. Vuik, A. Saghir and GP. Boerstoel. The Krylov accelerated SIMPLE(R)method for flow problems in industrial furnaces. Int. J. Numer. Meth. Fluids.,33:1027-1040, 2000.

[72] C. Vuik and A. Saghir. The Krylov accelerated SIMPLE(R) method forincompressible flow. The Department of Applied Mathematical Analysis,Delft University of Technology, Netherland, TR-01, 2002.

56

Page 57: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently
Page 58: List of papers - DiVA portal544511/FULLTEXT01.pdf · First, the continuous equations (the PDEs) have to be discretized appropriately and in a stable manner, guaranteeing sufficiently