Application of hierarchical matrices for partial inverse

Application of hierarchical matrices for partialinverse

Alexander Litvinenko

KAUST, SRI-UQ Centerhttp://sri-uq.kaust.edu.sa/

www.hlib.org

November 26, 2013

http://sri-uq.kaust.edu.sa/

4*

Content

1. Problem Setup

2. Hierarchical Domain Decomposition (HDD)

3. HDD in the H-matrix arithmetic

4. Computational resources of HDD

5. Modifications of HDD

6. Numerical results

2 / 40

4*

Happy Birthday Prof. Hackbusch !!!

www.mis.mpg.de/calendar/conferences/2013/wh65.htmlRandolph E. Bank (Uni of California), Susanne C. Brenner (Louisiana

State Uni), Eric Cances (Ecole des Ponts ParisTech), Albert Cohen (Uni

Pierre et Marie Curie), Wolfgang Dahmen (RWTH Aachen), Bjorn

Engquist (Uni of Texas at Austin), Christian Lubich (Uni Tubingen),

Yvon Maday (Uni Pierre et Marie Curie), Reinhold Schneider (TU

Berlin), Rob Stevenson (Uni Amsterdam), Endre Suli (Uni Oxford),

Gabriel Wittum (Uni Frankfurt), Jinchao Xu (Pennsylvania State Uni) 3 / 40

4*

Problem setup

The elliptic boundary value problem: find u ∈ H1(Ω) s.t. :∑

1≤i ,j≤2

∂

∂xiαi ,j(x)

∂

∂xju = f in Ω

u = g on ∂Ω

(1)

where αi ,j ∈ L∞(Ω) such A(x) = (αi ,j)i ,j=1,2 satisfies0 < λ ≤ λmin(A(x)) ≤ λmax(A(x)) ≤ λ , ∀x ∈ Ω.⇒ Oscillatory or jumping coefficients are allowed.

4 / 40

4*

The motivation and goals

E.g. a) compute solution on γ, b) compute solution in asubdomain ν, c) compute solution on the interface.d) Let Ah · xh = bh and h H, may be interested only inxH = RH←hA

−1h bh or

xH = RH←hA−1h PbH [see also Hackbusch and Drechsler, 2012]

5 / 40

4*

The idea of HDD

Apply Galerkin FE discretisation to (1).We construct the discrete solution in the form

uh = Fhfh + Ghgh, (2)

where Fh, Gh are two solution operators, fh is the FE rhs and gh isthe FE Dirichlet-boundary values.

Often only few functionals of the solution are of interest!

6 / 40

4*

IDEA: Leaves to Root and Root to Leaves algorithms

7 / 40

4*

Domain decomposition tree (TTh)

FE discretisation: triangulation Th, Ω := Ωh = ∪t∈Tht.

1

2

3

4

5

6

7

910

11

12

13

14

15

8

5

6

7

11

12

13

14

15

8

1

2

3

4

5

6

7

910

3

4

1

910

......

5

6

11

12

13

14

15

6

7

11

15

8

......

26

2

6

• Ω is the root of the tree,

• TTh is a binary tree,

• if ω ∈ TTh has two sonsω1, ω2 ∈ TTh : ω = ω1 ∪ ω2

and γω = ∂ω1 ∩ ∂ω2,

• ω ∈ TTh is a leaf, if and only ifω ∈ Th.

8 / 40

4*

Notation

Let ω ∈ TTh , ω = ω1 ∪ ω2.Γω,1 := ∂ω ∩ ω1,Γω,2 := ∂ω ∩ ω2

γω := ∂ω1\∂ω = ∂ω2\∂ω

I := I (Ω) = set of all nodal points in Ω.I (ω) := i ∈ I : xi ∈ ω.

9 / 40

4*

FE Galerkin Discretisation

For ω ∈ TTh define fω := (fi )i∈I (ω), gω := (gi )i∈I (∂ω),dω := (fω, gω).Let bj , j = 1, ...,N be piecewise linear basis,Vh := spanb1, ..., bN, Vh ⊂ V = H1(Ω).Variational Galerkin formulation of (1): find uh ∈ Vh such that

aω(uh, bj) = (fω, bj)L2(ω) ∀ j ∈ I (ω),

uh(xj) = gj ∀ j ∈ I (∂ω),(3)

where

aω(bi , bj) =

∫Ωα(x)(∇bi ,∇bj)dx,

(fω, bj) =

∫suppbj

fωbjdx.

10 / 40

4*

Main point of HDD

Main point of HDD is to build the mapping Φω = (Φgω,Φf

ω),where Φg

ω : RI (∂ω) → RI (γω) and Φfω : RI (ω) → RI (γω) for each

ω ∈ TTh .1. Definition of Mapping Φω := (Φg

ω,Φfω)

(Φω(dω))i := uh(xi ) , ∀i ∈ I (γω).

Hence, Φω(dω) is the trace of uh on γω.Actually, Φωdω = Φg

ωgω + Φfωfω.

2. Definition of auxiliary Mapping Ψω := (Ψgω,Ψf

ω)

Ψω(d) = (Ψω(dω))i∈I (∂ω) with (Ψω(dω))i := aω(uh, bi )− (fω, bi )L2(ω) ,

Ψωdω = Ψfωfω + Ψg

ωgω.

11 / 40

4*

Construction of the mappings Ψω and Φω

Lemma 1: Let ω1 and ω2 be two sons of ω ∈ TTh . Let dω1 anddω2 be the data associated to ω1 and ω2 s.t. :• (consistency conditions for the Dirichlet data)

g1,i = g2,i , ∀i ∈ I (ω1) ∩ I (ω2),

• (consistency conditions for the right-hand side)

f1,i = f2,i , ∀i ∈ I (ω1) ∩ I (ω2).

ω

ω

ω

1

2

xjγ ω

xj

Let uω1 and uω2 be the local FE solutions of the problem (3) forthe data dω1 , dω2 .

12 / 40

4*

Construction of the mappings Ψω and Φω

If uω1 , uω2 satisfy

γΨω1(dω1) + γΨω2(dω2) = 0,

then uω defined by assembling

uω(xi ) :=

uω1(xi ) for i ∈ I (ω1)uω2(xi ) for i ∈ I (ω2)

ω

ω

ω

1

2

xjγ ω

xj

is solution of (3) for the data dω = (fω, gω) given by

fω :=

f1,i for i ∈ I (ω1)f2,i for i ∈ I (ω2)

gω :=

g1,i for i ∈ I (∂ω1)g2,i for i ∈ I (∂ω2)

13 / 40

4*

Construction of Φω

Given: d1 := dω1 = (f1, g1,Γ, g1,γ), where g1,Γ := (g1)i∈I (Γω,1),g1,γ := (g1)i∈I (γ). Then

Ψω1d1 = Ψfω1f1 + ΨΓ

ω1g1,Γ + Ψγ

ω1g1,γ ,

Ψω2d2 = Ψfω2f2 + ΨΓ

ω2g2,Γ + Ψγ

ω2g2,γ .

Restricting to I (γ) and summing(γΨγ

ω1+ γΨγ

ω2

)gγ = (−Ψf

ω1f1 −ΨΓ

ω1g1,Γ −Ψf

ω2f2 −ΨΓ

ω2g2,Γ)|γ .

We setM := −( γΨγ

ω1+ γΨγ

ω2),

compute M−1 and solve for gγ :

gγ = M−1(Ψfω1f1 + ΨΓ

ω1g1,Γ + Ψf

ω2f2 + ΨΓ

ω2g2,Γ)|γ .

14 / 40

4*

HDD consists of two algorithms

1. Compute Ψω for all leaves of TTh (∈ R3×3 matrices).

2. Recursion from the leaves to the root (end if ω = Ω):

2.1 Compute Ψω and Φω from Ψω1 ,Ψω2 .2.2 Store Φω and delete Ψω1 ,Ψω2 .

II. Application of Φω

1. Given dω = (fω, gω), compute the solution uh on the interfaceγ by Φω(dω).

2. Build the data dω1 = (fω1 , gω1), dω2 = (fω2 , gω2) fromdω = (fω, gω) and gγ = Φω(dω).

3. Repeat for sons of ω1 and ω2.

15 / 40

4*

HDD in the H-matrix arithmetic

Exact HDD requires expensive matrix arithmetic.Let the system of linear equations for ω ∈ TTh be Au = Fc .Rewrite it in the block matrix form:(

ABB ABI

AIB AII

)(uB

uI

)=

(FBFI

)c ,

where uB ∈ RI (∂ω), uI ∈ RI (γ),ABB ∈ RI (∂ω) → RI (∂ω), AII ∈ RI (γ) → RI (γ).

16 / 40

4*

Eliminate uI via the Schur complement

(ABB − ABIA

−1II AIB 0

AIB AII

)(uB

uI

)=

(FB − ABIA

−1II FI

FI

)c .

(ABB − ABIA−1II AIB)uB = (FB − ABIA

−1II FI )c

uI = A−1II FI c − A−1

II AIBuB ,

Ψgω :=ABB − ABIA

−1II AIB (Schur complement)

Ψfω :=FB − ABIA

−1II FI ,

Φgω :=A−1

II AIB

Φfω :=A−1

II FI .Apply the H-matrix techniques.

17 / 40

4*

Rank-k matrices

R ∈ Rn×m, A ∈ Rn×k , B ∈ Rm×k ,k min(n,m). The storage R = ABT isk(n + m) instead of n ·m for R representedin the full matrix format.

=

A

BT

*

R

k

k

n

m

n

m

H-matrices (Hackbusch ’99)

1. Build cluster tree TI and block cluster tree TI×I .

I

I

I I

I

I

I I I I1

1

2

2

11 12 21 22

I11

I12

I21

I22

18 / 40

4*

Admissible condition

2. For each (t × s) ∈ TI×I , t, s ∈ TI , checkthe standard admissibility conditionmindiam(Qt), diam(Qs) ≤ η · dist(Qt ,Qs).

if(adm=true) then M|t×s is a rank-k matrixblockif(adm=false) then divide M|t×s further or de-fine as a dense matrix block, if small enough.

Q

Qt

S

dist

H=

t

s

Resume: Grid → cluster tree (TI ) + admissi-bility condition → block cluster tree (TI×I ) →H-matrix → H-matrix arithmetic.

4 2

2 23

3 3

4 2

2 2

4 2

2 2

4

19 / 40

4*

Definition of H-matrices

Definition: H(TI×J , k) := M ∈ RI×J | rank(M |t×s) ≤ k for alladmissible leaves t × s of TI×J.Let n := max(|I |, |J|), d = 1, 2, 3 be the spatial dimension.

Operation Sequential Complexity Parallel Complexity(Hackbusch et al. ’99-’06) (Kriemann ’05)

storage(M) N = O(kn log n) Nq

Mx N = O(kn log n) Nq

M1 ⊕M2 N = O(k2n log n) Nq

M1 M2, M−1 N = O(k2n log2 n) Nq +O(n)

H-LU N = O(k2n log2 n) Nq +O(k

2n log2 nn1/d )

H-matrix conversion N = O(k2n log2 n) Nq

20 / 40

4*

H-matrix conversion

A B

C 25 4

4 85

5 165

5 165

5 326

6 325

532 5

5 32

6

632 5

5 32

1

1

32 5

5 32 5

5

32 5

5

16 4

4 32 5

5 16

5

5 32

5

532 5

5 32

12

12

32 5

5 32 5

5

32 5

5

16 5

5

32 4

4 165

5 325

532 5

5 32

1

1

32 5

5 32 6

6

32 5

5 32 6

6

32 5

5

32 5

5

16 5

5

16 4

4 31

(right) An H-matrix approximation to Ψgω in the HDD method,

k ≤ 12. The weak admissibility condition is used.21 / 40

4*

H-matrix conversion

9 3

8 3

3 3

8 3

8 3

3 3

8 3

8 3

3

3 9

3 8

3 3

3 8

3 8

3 3

3 8

3 8

3 3

3 3

8 3 3

33 3

3 3

3 8 3

38 3

3 3

33 3

3 3

3 3

3

3

3

33 8

3 3

3

3 3

3 3

8 3

8 3

3 3

8 3

8 3

3 3

3 3

3 3

3 3

3 3

33 3

3 3

3 3

8 3

12 8

3 3

12 8

4 4

3 8

3 3

3 3

3 3

4 4

3 8

3 3

3 3

3 3

3 3

8 3

3 8

3 3

3 8

3 8 8

3 3

3

3

3 8

3 3

3

3 8

3 8

3 3

3 8

3 3

3 3

3

3 3

3 3

3 8

3

3 3

3 3

3 8

3 8

3 3

3 8

3 3

3 3

3 3

3 8

3 3

8 8

3 3

3 3

3 8

4 4

3 8

8 8

3 3

8 8

4 4

8 8

8 3

8 3

8 3

8 3 83

8 3

3 3

8 3

8 3

38 3

7 33

3

9 8

9 8 8

8

1

8 8

8 8

8 3

8 8

8 3

8 8

8 8

3 8

8 8

3 8

3 3

3 3

8 3

3 3

8 3

3 8

3 3

3 8

3 3

3 3

8 3

8 8

8 3

8 8

8 8

3 7

8 8

3 7

3 8

3 3

8 3

3 3

3 3

Matrix Ψf with the standard admissibility condition

22 / 40

A

B

C

D

E

13 4

4 45

5 85

5 82

2

8 5

5 16 5

5 8

5

5

8 5

5

16 5

5 81

1

8 5

5

8 5

5 15

5

516 5

5 15

255 8

6 16

6 16

6 32

7 32

32

16

32

16

32

6

32

6

16

6

32

5 16

6 32

32

32

16

16

31

19

32

5 32

632

5 31

258

16

16

32

32

1

32

6

16

6 32

6 16

6 32

5

32

16

32

16

32

1

32

6

32

5

16

6

16

5 31

20

32

5 32

632

5 31

25 7

7 89

9 1610

10 1611

11 3218

18 3215

15

32 17

17

16 10

10

32 8

8 1611

11 32

19

19

32 11

11 32 14

1432 12

12 31

17 8

8 16 11

11

16 8

8 32 10

10 16

17

17 3214

14

32 16

16

17 6

6 16 9

9

16 10

10

16 8

8 31

20

20

32 12

12 32 13

1332 11

11 31

25 5

5 86

6 166

6 166

6 327

7 321

1

32 6

6

16 6

6 32 6

6 16

6

6 32

11

11

32 6

6

16 6

6

32 5

5 166

6 321

1

32 6

6

32 5

5

16 6

6

16 5

5 31

32

32

32 10

10 32 12

1232 10

10 31

F

Building (Ψgω)H ∈ R513×513 from (Ψg

ω1)H and (Ψgω2)H ∈ R384×384.

23 / 40

4*

HDD and Multiscales

HDD with fH ∈ VH ⊂ Vh

Given: h H, fH ∈ VH ⊂ Vh,

mappings Ψfω : RI (ωh) → RI (∂ωh) Φf

ω : RI (ωh) → RI (γωh )

want to build Ψfω : RI (ωH) → RI (∂ωh) Φf

ω : RI (ωH) → RI (γωh ).

Φfω = Φf

ω · Ph←Hω

Hh

A P

.=

B

24 / 40

4*

HDD with truncation of the small scales

Ω

h

H

TH

Th

Tr

. . . .

.

.

.

.

.

.

.

.

.

.

.

.

mean value

(left)Domain decomposition tree TTh ; (right) 2√nhnH dofs.

Application: Multiscale problems (e.g. the skin problem, porousmedium).Use the microscopic model to extract all microscale details andthen compute the macroscale behaviour.

25 / 40

4*

Computational resources for ω ∈ TTh

Lemma 2: Let ω ∈ TTh , n := |I (ω)| and√n be the number of

dofs on the interface. Then the storage costs and computationalcomplexities of Ψg

ω, Ψfω, Φg

ω, Φfω are as shown in Table.

storage Computational complexity

Ψgω O(k

√n log

√n)∗ O(k2√n log2√n)

Ψfω O(kn log n)∗ O(k2n log2 n)

Φgω O(k

√n) 0

Φfω O(kn log n) 0

Lemma 3: Application of Φgω costs O(k2√n). Application of Φf

ω

costs O(k2n log n).

26 / 40

4*

The mean value of the solution in ω

Lemma 4: Let ω, ω1, ω2 ∈ TTh and ω = ω1 ∪ ω2. Letλωi (dωi ) = (λgωi , gωi ) + (λfωi

, fωi ) computes the mean value in ωi ,i = 1, 2. Then

λω(dω) = (λfω, fω) + (λgω, gω)

computes the mean value in ω. Hereλfω : RI (ω) → R, fω ∈ RI (ω),λgω : RI (∂ω) → R, gω ∈ RI (∂ω),λfω = c1λ

fω1

+ c2λfω2

,λgω = c1λ

gω1 + c2λ

gω2 ,

gω is built from gω1 , gω2 and g |γ := Φω(dω).

27 / 40

Numerical results

28 / 40

I. Skin Problem in 2D

Elliptic diffusion problem with highly jumping coefficients.

a b

Lipid layer

α

β

[Khoromskij, Wittum 02]

29 / 40

Dependence of the relative error on α

α 1.0 10−1 10−2 10−3 10−4 10−5

‖u−u‖2‖u‖2

6.6 ∗ 10−9 2.0 ∗ 10−8 6.6 ∗ 10−8 7.4 ∗ 10−7 4.2 ∗ 10−6 7.0 ∗ 10−5

1292 dofs, ε = 10−8, β = 1.0, residual ‖Au− c‖ = 10−10.ε is responsible for the H-matrix approximation accuracy.

30 / 40

Dependence of the absolute and relative errors on ε

ε ‖u−u‖2

‖u‖2‖u− u‖∞ ‖u− u‖A

10−6 4.4 ∗ 10−1 6.67 ∗ 102 1.1 ∗ 103

10−8 7.27 ∗ 10−5 2.3 ∗ 10−1 9.0 ∗ 10−1

10−10 5.1 ∗ 10−7 1.0 ∗ 10−3 3.0 ∗ 10−3

10−12 3.9 ∗ 10−9 1.2 ∗ 10−5 2.9 ∗ 10−5

10−14 1.2 ∗ 10−11 6.6 ∗ 10−7 1.2 ∗ 10−7

10−16 1.6 ∗ 10−12 1.1 ∗ 10−8 1.7 ∗ 10−8

1292 dofs, α = 10−5, residual ‖Au− c‖ = 10−10.

31 / 40

II. Comparison of storage costs for H-Cholesky, HDD andH-matrix inverse (in MB)

ε H− LLT HDD (A−1)H

10−3 13.3 19.7 51.010−4 14.7 20.1 64.010−5 16.0 20.4 75.210−6 17.2 20.6 87.4

1292 dofs.

32 / 40

Computational times

dofs HDD pre,H− LLT ,PCG (A−1)H pre,H− LLT

332 0.19 0.1 0.24 0.11652 0.96 0.6 3.54 0.5

1292 10.6 5 65.8 4.72572 36 53 n.e.m. 505132 218 not enough memory n.e.m. n.e.m.

Computational times for the skin problem with α = 10−5,ε = 10−8, ‖Au− c‖ = 10−8, H

h = 2.

33 / 40

III. Problems with oscillatory coefficients−div(α∇u) = 1 in Ω ⊂ R2,u = 0 on ∂Ω

(4)

where α = 1 + 0.5sin(50x)sin(50y).

global k ‖u40 − uk‖2 / ‖u40‖2 ‖u40 − uk‖∞2 7 7 ∗ 10−2

4 2 ∗ 10−2 1.8 ∗ 10−3

6 5.4 ∗ 10−4 4.5 ∗ 10−5

8 6.6 ∗ 10−5 6.3 ∗ 10−6

10 7.6 ∗ 10−6 9 ∗ 10−7

34 / 40

Dependence on the frequency w

w ‖u40 − uk‖2 / ‖u40‖2 ‖u40 − uk‖∞10 1.65 ∗ 10−4 1.76 ∗ 10−5

50 1.8 ∗ 10−4 1.9 ∗ 10−5

Table : 2572 dofs, f = 1, α(x , y) = 1 + 0.5sin(wx)sin(wy).

35 / 40

IV. Truncation of the scales < H

Memory costs of all Φgω and Φf

ω (in kB). H-matrix rank k = 7.

dofs Φg , H = h Φg , H = 0.125

332 2.45 ∗ 102 2 ∗ 102

652 1.1 ∗ 103 7.9 ∗ 102

1292 5 ∗ 103 2.6 ∗ 103

2572 2.1 ∗ 104 7.4 ∗ 103

dofs Φf , H = h Φf , H = 0.125332 4 ∗ 102 2.8 ∗ 102

652 2.4 ∗ 103 1.8 ∗ 103

1292 1.4 ∗ 104 1.2 ∗ 104

2572 7.86 ∗ 104 6.9 ∗ 104

36 / 40

V. Many right-hand sides

Au(i) = c(i), 1292 dofs, c i , i = 1, ..., imax , PCG method.“Leaves to Root ” ⇒ t1,“Root to Leaves ” ⇒ t2.

imax t1 + t2, sec. tcg , sec.

10 38+2.8 29

100 38+27 117

1000 38+240 1048

The total computational times of HDD and PCGfor imax right-hand sides.

37 / 40

4*

Conclusion

1. HDD computes uh := Bhfh + Chgh or uh := BH fH + Chgh.

2. Bh, BH and Ch have H-matrix format.

3. The complexities are O(k2nh log3 nh) andO(k2√nhnH log3√nhnH).

4. The storages are O(knh log2 nh) and O(k√nhnH log2√nhnH).

5. HDD computes functionals of the solution (mean values∫ω uhdx , ω ⊂ Ω, the solution at a point, the solution in a

small subdomain ω),

38 / 40

4*

Thanks to

Prof. W. Hackbusch (for the whole idea and support)B. N. KhoromskijL. Grasedyck and S. Borm(for detailed explanation of H-matrix technique)all colleagues at Max Planck Institute for Applied Mathematics andSciences.

39 / 40

Thanks for your attention!

40 / 40

Application of hierarchical matrices for partial inverse

Education

Transcript of Application of hierarchical matrices for partial inverse