Numerical Linear Algebra Chap. 4: Perturbation and ...Numerical Linear Algebra Chap. 4: Perturbation...

Numerical Linear AlgebraChap. 4: Perturbation and Regularisation

Heinrich Vossvoss@tu-harburg.de

Hamburg University of TechnologyInstitute of Numerical Simulation

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 1 / 55

Linear systems

Sensitivity of linear systems

Consider the linear system of equation

Ax = b (1)

where A ∈ R(n,n) is a nonsingular matrix, and a perturbed system

(A + ∆A)(x + ∆x) = b + ∆b. (2)

Our aim is to examine how perturbations of A and of b affect the solution ofthe system.

Linear systems

Sensitivity of linear systems

Consider the linear system of equation

Ax = b (1)

where A ∈ R(n,n) is a nonsingular matrix, and a perturbed system

(A + ∆A)(x + ∆x) = b + ∆b. (2)

Our aim is to examine how perturbations of A and of b affect the solution ofthe system.

Linear systems

Remarks

Small perturbations always have to be kept in mind when solving practicalproblems since

the data A and/or b may be obtained from measurements, and thereforethey are erroneous

using computers the representation of data as floating point numbersalways produces errors.

Hence, one always has to emanate from the fact that one solves a perturbedlinear system instead of the given one. However, usually the pertubations arequite small.

Linear systems

Remarks

Linear systems

Remarks

Linear systems

Remarks

Linear systems

Perturbation lemma

LemmaLet B ∈ R(n,n), and assume that for some vector norm and the associatematrix norm the following inequality is satisfied ‖B‖ < 1.

Then the matrix I − B is nonsingular, and it holds that

‖(I − B)−1‖ ≤ 11 − ‖B‖

Linear systems

Perturbation lemma

LemmaLet B ∈ R(n,n), and assume that for some vector norm and the associatematrix norm the following inequality is satisfied ‖B‖ < 1.

Then the matrix I − B is nonsingular, and it holds that

‖(I − B)−1‖ ≤ 11 − ‖B‖

Linear systems

For every x ∈ Rn, x 6= 0,

‖(I − B)x‖ ≥ ‖x‖ − ‖Bx‖ ≥ ‖x‖ − ‖B‖‖x‖ = (1 − ‖B‖)‖x‖ > 0.

Therefore, the linear system(I − B)x = 0

has the unique solution x = 0, and I − B is nonsingular.

The estimate of the norm of the inverse of I − B follows from

1 = ‖(I − B)−1(I − B)‖ = ‖(I − B)−1 − (I − B)−1B‖≥ ‖(I − B)−1‖ − ‖(I − B)−1B‖≥ ‖(I − B)−1‖ − ‖(I − B)−1‖ · ‖B‖= (1 − ‖B‖) · ‖(I − B)−1‖.

Linear systems

‖(I − B)x‖ ≥ ‖x‖ − ‖Bx‖ ≥ ‖x‖ − ‖B‖‖x‖ = (1 − ‖B‖)‖x‖ > 0.

Linear systems

‖(I − B)x‖ ≥ ‖x‖ − ‖Bx‖ ≥ ‖x‖ − ‖B‖‖x‖ = (1 − ‖B‖)‖x‖ > 0.

Linear systems

Corollary

Let A ∈ Rn be a nonsingular matrix, and ∆A ∈ Rn. Assume that

‖∆A‖ ≤ 1‖A−1‖

for a matrix norm which is subordinate to some vector norm.

Then A + ∆A is nonsingular, and it holds that

‖(A + ∆A)−1‖ ≤ ‖A−1‖1 − ‖A−1∆A‖

≤ ‖A−1‖1 − ‖A−1‖ · ‖∆A‖

Linear systems

Corollary

Let A ∈ Rn be a nonsingular matrix, and ∆A ∈ Rn. Assume that

‖∆A‖ ≤ 1‖A−1‖

for a matrix norm which is subordinate to some vector norm.

Then A + ∆A is nonsingular, and it holds that

‖(A + ∆A)−1‖ ≤ ‖A−1‖1 − ‖A−1∆A‖

≤ ‖A−1‖1 − ‖A−1‖ · ‖∆A‖

Linear systems

The existence of (A + ∆A)−1 follows from the perturbation lemma since

‖∆A‖ <1

‖A−1‖⇒ 1 > ‖A−1‖ · ‖∆A‖ ≥ ‖A−1∆A‖

andA + ∆A = A(I + A−1∆A).

‖(A + ∆A)−1‖ = ‖(I + A−1∆A)−1A−1‖ ≤ ‖A−1‖ · ‖(I + A−1∆A)−1‖

≤ ‖A−1‖1 − ‖A−1∆A‖

≤ ‖A−1‖1 − ‖A−1‖ · ‖∆A‖

Linear systems

The existence of (A + ∆A)−1 follows from the perturbation lemma since

‖∆A‖ <1

‖A−1‖⇒ 1 > ‖A−1‖ · ‖∆A‖ ≥ ‖A−1∆A‖

andA + ∆A = A(I + A−1∆A).

‖(A + ∆A)−1‖ = ‖(I + A−1∆A)−1A−1‖ ≤ ‖A−1‖ · ‖(I + A−1∆A)−1‖

≤ ‖A−1‖1 − ‖A−1∆A‖

≤ ‖A−1‖1 − ‖A−1‖ · ‖∆A‖

Linear systems

Remark

The Corollary demonstrates that for a nonsingular matrix A the perturbedmatrix A + ∆A is also nonsingular if the perturbation ∆A is sufficiently small.

Linear systems

Perturbed linear system

We consider the perturbed linear system

(A + ∆A)(x + ∆x) = b + ∆b,

and we assume that the perturbation ∆A is so small that the condition of theCorollary is satisfied. Then A + ∆A is nonsingular.

Solving for ∆x one obtains the absolute error which is caused by theperturbations of A and b:

∆x = (A + ∆A)−1(∆b −∆Ax)

= (I + A−1∆A)−1A−1(∆b −∆Ax).

Hence, with an arbitrary vector norm and the subordinate matrix norm weobtain

‖∆x‖ ≤ ‖(I + A−1∆A)−1‖ · ‖A−1‖ (‖∆b‖+ ‖∆A‖ · ‖x‖) .

Linear systems

(A + ∆A)(x + ∆x) = b + ∆b,

∆x = (A + ∆A)−1(∆b −∆Ax)

= (I + A−1∆A)−1A−1(∆b −∆Ax).

‖∆x‖ ≤ ‖(I + A−1∆A)−1‖ · ‖A−1‖ (‖∆b‖+ ‖∆A‖ · ‖x‖) .

Linear systems

(A + ∆A)(x + ∆x) = b + ∆b,

∆x = (A + ∆A)−1(∆b −∆Ax)

= (I + A−1∆A)−1A−1(∆b −∆Ax).

‖∆x‖ ≤ ‖(I + A−1∆A)−1‖ · ‖A−1‖ (‖∆b‖+ ‖∆A‖ · ‖x‖) .

Linear systems

Perturbed linear system ct.

For b 6= 0 and as a consequence x 6= 0 it holds for the relative error ‖∆x‖/‖x‖that

‖∆x‖‖x‖

≤ ‖(I + A−1∆A)−1‖ · ‖A−1‖(‖∆b‖‖x‖

+ ‖∆A‖)

and the Corollary yields

‖∆x‖‖x‖

≤ ‖A−1‖1 − ‖A−1‖ · ‖∆A‖

(‖A‖‖∆b‖

‖b‖+ ‖∆A‖

)≤ ‖A−1‖ · ‖A‖

1 − ‖A−1‖ · ‖A‖‖∆A‖‖A‖

(‖∆A‖‖A‖

+‖∆b‖‖b‖

). (4)

Hence, for small perturbations (such that the denominator does not deviatevery much from 1) the relative error ∆b of the right hand side and the relativeerror ∆A of the system matrix are amplified by the factor ‖A−1‖ · ‖A‖. Thisamplification factor is called condition of the matrix A.

Linear systems

‖∆x‖‖x‖

≤ ‖(I + A−1∆A)−1‖ · ‖A−1‖(‖∆b‖‖x‖

+ ‖∆A‖)

‖∆x‖‖x‖

≤ ‖A−1‖1 − ‖A−1‖ · ‖∆A‖

(‖A‖‖∆b‖

‖b‖+ ‖∆A‖

)≤ ‖A−1‖ · ‖A‖

1 − ‖A−1‖ · ‖A‖‖∆A‖‖A‖

(‖∆A‖‖A‖

+‖∆b‖‖b‖

). (4)

Linear systems

‖∆x‖‖x‖

≤ ‖(I + A−1∆A)−1‖ · ‖A−1‖(‖∆b‖‖x‖

+ ‖∆A‖)

‖∆x‖‖x‖

≤ ‖A−1‖1 − ‖A−1‖ · ‖∆A‖

(‖A‖‖∆b‖

‖b‖+ ‖∆A‖

)≤ ‖A−1‖ · ‖A‖

1 − ‖A−1‖ · ‖A‖‖∆A‖‖A‖

(‖∆A‖‖A‖

+‖∆b‖‖b‖

). (4)

Linear systems

Definition

Let A ∈ C(n,n) be a nonsingular matrix, and let ‖ · ‖ be a matrix norm on C(n,n)

which is subordinate to some vector norm.

Thenκ(A) := ‖A−1‖ · ‖A‖

is called condition of the matrix A (or of the linear system of equations (1))corresponding to the norm ‖ · ‖.

RemarkFor every nonsingular matrix A and every norm ‖ · ‖ it holds that κ(A) ≥ 1,because

1 = ‖I‖ = ‖AA−1‖ ≤ ‖A‖ · ‖A−1‖ = κ(A).

Linear systems

Definition

Thenκ(A) := ‖A−1‖ · ‖A‖

1 = ‖I‖ = ‖AA−1‖ ≤ ‖A‖ · ‖A−1‖ = κ(A).

Linear systems

Definition

Thenκ(A) := ‖A−1‖ · ‖A‖

1 = ‖I‖ = ‖AA−1‖ ≤ ‖A‖ · ‖A−1‖ = κ(A).

Linear systems

Theorem

Let A,∆A ∈ R(n,n) and b,∆b ∈ Rn, b 6= 0, such that A is nonsingular, andassume that ‖A−1‖ · ‖∆A‖ < 1 for some matrix norm which is subordinate tosome vector norm ‖ · ‖.

Let x and x + ∆x be the solution of the linear system (1) and the perturbedsystem (2), respectively, and the following estimation of the relative error holds

‖∆x‖‖x‖

≤ κ(A)

1 − κ(A) · ‖∆A‖‖A‖

(‖∆A‖‖A‖

+‖∆b‖‖b‖

where κ(A) := ‖A‖ · ‖A−1‖ denotes the condition of A.

Linear systems

Theorem

Let A,∆A ∈ R(n,n) and b,∆b ∈ Rn, b 6= 0, such that A is nonsingular, andassume that ‖A−1‖ · ‖∆A‖ < 1 for some matrix norm which is subordinate tosome vector norm ‖ · ‖.

Let x and x + ∆x be the solution of the linear system (1) and the perturbedsystem (2), respectively, and the following estimation of the relative error holds

‖∆x‖‖x‖

≤ κ(A)

1 − κ(A) · ‖∆A‖‖A‖

(‖∆A‖‖A‖

+‖∆b‖‖b‖

where κ(A) := ‖A‖ · ‖A−1‖ denotes the condition of A.

Linear systems

Remark

Assume that the length of the mantissa (i.e. the number of leading digits infloating point representation) of our computer is `. Then that the relative inputdata error of A and b is 5 · 10−`.

If κ(A) = 10γ , then (not considering the round of errors which occur in thenumerical method for solving the linear system) we have to expect a relativeerror of approximately 5 · 10γ−` for a numerical solution the linear systemAx = b.

Roughly speaking solving a linear system numerically we are loosing γ digitsStellen if the order of magnitude of the condition of the system matrix A is 10γ .

This loss of accuracy has nothing to do with the algorithm of choice. It isproblem immanent. �

Linear systems

Remark

Linear systems

Remark

Linear systems

Example

Consider the linear system of equations(1 11 0.999

which obviously has the solution x = (1, 1)T .

For x + ∆x := (5,−3.002)T it holds that

A(x + ∆x) =

(1.998

2.001002

)=: b + ∆b.

Hence,‖∆b‖∞‖b‖∞

= 1.001 · 10−3 and‖∆x‖∞‖x‖∞

= 4.002,

Linear systems

Example

A(x + ∆x) =

(1.998

2.001002

)=: b + ∆b.

= 1.001 · 10−3 and‖∆x‖∞‖x‖∞

= 4.002,

Linear systems

Example

A(x + ∆x) =

(1.998

2.001002

)=: b + ∆b.

= 1.001 · 10−3 and‖∆x‖∞‖x‖∞

= 4.002,

Linear systems

Example ct.

and it follows for the condition

κ∞(A) ≥ 4.0021.001

103 = 3998.

Indeed

A−1 =

(−999 10001000 −1000

)and therefore

κ∞(A) = 4000.

This example demonstrates that the estimation of the relative error of thesolution of a perturbed system is sharp. �

Linear systems

Example ct.

κ∞(A) ≥ 4.0021.001

103 = 3998.

Indeed

A−1 =

(−999 10001000 −1000

)and therefore

κ∞(A) = 4000.

Linear systems

Example ct.

κ∞(A) ≥ 4.0021.001

103 = 3998.

Indeed

A−1 =

(−999 10001000 −1000

)and therefore

κ∞(A) = 4000.

Linear systems

Geometric condition

The following Theorem contains a geometric characterization of the conditionnumber. It says that the relative distance of a nonsingular matrix to the closestsingular matrix with respect to Euclidean norm is the reciprokal of thecondition number.

TheoremLet A ∈ R(n,n) be nonsingular.Then it holds that

min{‖∆A‖2

‖A‖2: A + ∆A singular}

1κ2(A)

Linear systems

Geometric condition

The following Theorem contains a geometric characterization of the conditionnumber. It says that the relative distance of a nonsingular matrix to the closestsingular matrix with respect to Euclidean norm is the reciprokal of thecondition number.

TheoremLet A ∈ R(n,n) be nonsingular.Then it holds that

min{‖∆A‖2

‖A‖2: A + ∆A singular}

1κ2(A)

Linear systems

It suffices to prove that

min {‖∆A‖2 : A + ∆A singular} = 1/‖A−1‖2.

That the minimum is at least 1/‖A−1‖2 follows from the perturbation lemma:for ‖∆A‖2 < 1/‖A−1‖2 it holds that

1 > ‖∆A‖2‖A−1‖2 ≥ ‖A−1∆A‖2.

Hence,I + A−1∆A = A−1(A + ∆A),

andA + ∆A

is invertible.

Linear systems

1 > ‖∆A‖2‖A−1‖2 ≥ ‖A−1∆A‖2.

Hence,I + A−1∆A = A−1(A + ∆A),

andA + ∆A

is invertible.

Linear systems

1 > ‖∆A‖2‖A−1‖2 ≥ ‖A−1∆A‖2.

Hence,I + A−1∆A = A−1(A + ∆A),

andA + ∆A

is invertible.

Linear systems

Proof ct.

We now construct a matrix ∆A, such that

A + ∆Ais singular and ‖∆A‖2 = 1/‖A−1‖2

which demonstrates that the minimum is greater or equal to 1/‖A−1‖2.

‖A−1‖2 = maxx 6=0

‖A−1x‖2

‖x‖2

it follows that there exists x satisfying ‖x‖2 = 1 and ‖A−1‖2 = ‖A−1x‖2 > 0.

With this x we define

y :=A−1x

‖A−1x‖2=

A−1x‖A−1‖2

and ∆A := − xyT

‖A−1‖2.

Linear systems

Proof ct.

‖A−1‖2 = maxx 6=0

‖A−1x‖2

‖x‖2

y :=A−1x

‖A−1x‖2=

A−1x‖A−1‖2

and ∆A := − xyT

‖A−1‖2.

Linear systems

Proof ct.

‖A−1‖2 = maxx 6=0

‖A−1x‖2

‖x‖2

y :=A−1x

‖A−1x‖2=

A−1x‖A−1‖2

and ∆A := − xyT

‖A−1‖2.

Linear systems

Proof ct.

Then it holds that ‖y‖2 = 1 and

‖∆A‖2 = maxz 6=0

‖xyT z‖2

‖A−1‖2‖z‖2= max

|yT z|‖z‖2

‖x‖2

‖A−1‖2=

1‖A−1‖2

where the maximum is attained for z = y , e.g.

(A + ∆A)y = Ay − xyT y‖A−1‖2

‖A−1‖2− x‖A−1‖2

we obtain the singularity of A + ∆A. �

Linear systems

Proof ct.

Then it holds that ‖y‖2 = 1 and

‖∆A‖2 = maxz 6=0

‖xyT z‖2

‖A−1‖2‖z‖2= max

|yT z|‖z‖2

‖x‖2

‖A−1‖2=

1‖A−1‖2

where the maximum is attained for z = y , e.g.

(A + ∆A)y = Ay − xyT y‖A−1‖2

‖A−1‖2− x‖A−1‖2

we obtain the singularity of A + ∆A. �

Least squares problems

Theorem

Let A = UΣV H be the singular value decomposition of A ∈ Rm×n whereσ1 ≥ σ2 ≥ · · · ≥ σr > σr+1 = · · · = σmin(m,n) = 0. Then it holds that

(i) rank(A) = r ,

(ii) null(A) := {x ∈ Rn : Ax = 0} = span{v r+1, . . . , vn},

(iii) range(A) := {Ax : x ∈ Rn} = span{u1, . . . , ur},

(iv) A =r∑

i=1σiui(v i)T = UrΣr V T

r with Ur = (u1, . . . , ur ),

Vr = (v1, . . . , vr ), Σr = diag(σ1, . . . , σr ),

(v) ‖A‖2S :=

m∑i=1

n∑j=1

a2ij =

r∑i=1

σ2i ,

(vi) ‖A‖2 := σ1.

Theorem

(i) rank(A) = r ,

(iv) A =r∑

r with Ur = (u1, . . . , ur ),

Vr = (v1, . . . , vr ), Σr = diag(σ1, . . . , σr ),

(v) ‖A‖2S :=

m∑i=1

n∑j=1

a2ij =

r∑i=1

σ2i ,

(vi) ‖A‖2 := σ1.

Theorem

(i) rank(A) = r ,

(iv) A =r∑

r with Ur = (u1, . . . , ur ),

Vr = (v1, . . . , vr ), Σr = diag(σ1, . . . , σr ),

(v) ‖A‖2S :=

m∑i=1

n∑j=1

a2ij =

r∑i=1

σ2i ,

(vi) ‖A‖2 := σ1.

Theorem

(i) rank(A) = r ,

(iv) A =r∑

r with Ur = (u1, . . . , ur ),

Vr = (v1, . . . , vr ), Σr = diag(σ1, . . . , σr ),

(v) ‖A‖2S :=

m∑i=1

n∑j=1

a2ij =

r∑i=1

σ2i ,

(vi) ‖A‖2 := σ1.

Theorem

(i) rank(A) = r ,

(iv) A =r∑

r with Ur = (u1, . . . , ur ),

Vr = (v1, . . . , vr ), Σr = diag(σ1, . . . , σr ),

(v) ‖A‖2S :=

m∑i=1

n∑j=1

a2ij =

r∑i=1

σ2i ,

(vi) ‖A‖2 := σ1.

Theorem

(i) rank(A) = r ,

(iv) A =r∑

r with Ur = (u1, . . . , ur ),

Vr = (v1, . . . , vr ), Σr = diag(σ1, . . . , σr ),

(v) ‖A‖2S :=

m∑i=1

n∑j=1

a2ij =

r∑i=1

σ2i ,

(vi) ‖A‖2 := σ1.

Theorem

(i) rank(A) = r ,

(iv) A =r∑

r with Ur = (u1, . . . , ur ),

Vr = (v1, . . . , vr ), Σr = diag(σ1, . . . , σr ),

(v) ‖A‖2S :=

m∑i=1

n∑j=1

a2ij =

r∑i=1

σ2i ,

(vi) ‖A‖2 := σ1.

(i): Multiplication by nonsingular matrices UT und V does not change the rankof A. Therefore,

rank(A) = rank(Σ) = r .

(ii): From V T v i = ei it follows that

Av i = UΣV T v i = UΣei = 0 for i = r + 1, . . . , n.

Hence,v r+1, . . . , vn ∈ null(A).

dim null(A) = n − r implies that these vectors form a basis of null(A).

Proof ct.

(iii): From A = UΣV T we obtain

Range(A) = U · Range(Σ) = U · span(e1, . . . , er ) = span(u1, . . . , ur ).

(iv): Blockmatrix multiplication yields

A = UΣV T =(u1 . . . um

...(vn)T

σiui(v i)T .

Proof ct.

(iii): From A = UΣV T we obtain

Range(A) = U · Range(Σ) = U · span(e1, . . . , er ) = span(u1, . . . , ur ).

(iv): Blockmatrix multiplication yields

A = UΣV T =(u1 . . . um

...(vn)T

σiui(v i)T .

Proof ct.(v): Let A = (a1, . . . , an).

Multiplication by the orthogonal matrix UT does not change the Euclideanlength of a vector. Jence,

‖A‖2S =

n∑i=1

‖ai‖22 =

n∑i=1

‖UT ai‖22 = ‖UT A‖2

Similarly, multiplying the rows of UT A by the orthogonal matrix V from theright does not change their length, from which we get

‖A‖2S = ‖UT ΣV‖2

S = ‖Σ‖2S =

r∑i=1

σ2i .

(vi): ‖A‖2 is a singular value of A, i.e. ‖A‖2 ≤ σ1 (cf. proof of the existencetheorem of the SVD). Thus

‖A‖2 = max{‖Ax‖2 : ‖x‖2 = 1} ≥ ‖Av1‖2 = σ1. �

‖A‖2S =

n∑i=1

‖ai‖22 =

n∑i=1

‖UT ai‖22 = ‖UT A‖2

‖A‖2S = ‖UT ΣV‖2

S = ‖Σ‖2S =

r∑i=1

σ2i .

‖A‖2 = max{‖Ax‖2 : ‖x‖2 = 1} ≥ ‖Av1‖2 = σ1. �

‖A‖2S =

n∑i=1

‖ai‖22 =

n∑i=1

‖UT ai‖22 = ‖UT A‖2

‖A‖2S = ‖UT ΣV‖2

S = ‖Σ‖2S =

r∑i=1

σ2i .

‖A‖2 = max{‖Ax‖2 : ‖x‖2 = 1} ≥ ‖Av1‖2 = σ1. �

‖A‖2S =

n∑i=1

‖ai‖22 =

n∑i=1

‖UT ai‖22 = ‖UT A‖2

‖A‖2S = ‖UT ΣV‖2

S = ‖Σ‖2S =

r∑i=1

σ2i .

‖A‖2 = max{‖Ax‖2 : ‖x‖2 = 1} ≥ ‖Av1‖2 = σ1. �

Condition of a matrix

Let A = UΣV T be the SVD of a nonsingular matrix A. Then A−1 = VΣ−1UT isthe SVD of A−1, from which we get

‖A‖2 = σ1 and ‖A−1‖2 =1σn

Hence, the condition of A with respect to the Euclidean norm is

κ2(A) :=σ1

σn. �

Condition of a matrix

Let A = UΣV T be the SVD of a nonsingular matrix A. Then A−1 = VΣ−1UT isthe SVD of A−1, from which we get

‖A‖2 = σ1 and ‖A−1‖2 =1σn

Hence, the condition of A with respect to the Euclidean norm is

κ2(A) :=σ1

σn. �

Remark

Let A ∈ R(n,n) have eigenvalues µ1, . . . , µn. Then it follows from Ax i = µix i

|µi |2 =(Ax i)HAx i

(x i)Hx i =(x i)HAT Ax i

(x i)Hx i .

Rayleigh’s principle yields

λmin ≤xHAT Ax

xHx≤ λmax für alle x ∈ Cn, x 6= 0,

where λmin and λmax is the minimal und maximal eigenvalue of AT A,respectively. Hence,

σ1 ≥ |µi | ≥ σn for every i .

For symmetric A it folds that σ1 = |µ1| and σr = |µr |. For non symmetricmatrices this is in general not the case. �

Remark

(x i)Hx i .

λmin ≤xHAT Ax

Remark

(x i)Hx i .

λmin ≤xHAT Ax

Numerical computation

The singular values of A are the square roots of the eigenvalues of AT A.Hence, in principle the SVD of A can be determined with any eigensolver.

To this end one has to evaluate AT A and AAT which is costly and whichdeteriorates the condition number considerably.

Actually, one uses an algorithm of Golub and Reinsch (1971), which takesadvantage of the QR algorithm for computing the eigenvalues of AT A, butwhich avoids the explicit computation of AT A and AAT . �

Data compression

The singular value decomposition can be used for data compression. This isbased upon the following theorem:

TheoremLet A = UΣV T be the singular value decomposition of A ∈ Rm×n, and letU = (u1, . . . , um) and V = (v1, . . . , vn).

Then for k < n

Ak :=k∑

σjuj(v j)T

is the best approximation of A with rank(Ak ) = k with respect to the spectralnorm, and it holds that

‖A − Ak‖2 = σk+1.

Data compression

Then for k < n

Ak :=k∑

σjuj(v j)T

‖A − Ak‖2 = σk+1.

Data compression

Then for k < n

Ak :=k∑

σjuj(v j)T

‖A − Ak‖2 = σk+1.

ProofIt holds that

‖A − Ak‖2 = ‖n∑

σjuj(v j)T‖2

= ‖Udiag{0, . . . , 0, σk+1, . . . , σn}V T‖2 = σk+1,

and it remains to show, that there does not exist a matrix of rank k , thedistance to A of which is less than σk+1.

Let B be any matrix with rank(B) = k . Then the dimension of the null space ofB is n − k . The dimension of span{v1, . . . , vk+1} is k + 1, and therefore theintersection of these two spaces contains a nontrivial vector w with ‖w‖2 = 1.

Hence,

‖A − B‖22 ≥ ‖(A − B)w‖2

2 = ‖Aw‖22

= ‖UΣV T w‖22 = ‖Σ(V T w)‖2

≥ σ2k+1‖V T w‖2

2 = σ2k+1. �

ProofIt holds that

‖A − Ak‖2 = ‖n∑

σjuj(v j)T‖2

= ‖Udiag{0, . . . , 0, σk+1, . . . , σn}V T‖2 = σk+1,

Hence,

‖A − B‖22 ≥ ‖(A − B)w‖2

2 = ‖Aw‖22

= ‖UΣV T w‖22 = ‖Σ(V T w)‖2

≥ σ2k+1‖V T w‖2

2 = σ2k+1. �

ProofIt holds that

‖A − Ak‖2 = ‖n∑

σjuj(v j)T‖2

= ‖Udiag{0, . . . , 0, σk+1, . . . , σn}V T‖2 = σk+1,

Hence,

‖A − B‖22 ≥ ‖(A − B)w‖2

2 = ‖Aw‖22

= ‖UΣV T w‖22 = ‖Σ(V T w)‖2

≥ σ2k+1‖V T w‖2

2 = σ2k+1. �

Data compression

Let A ∈ R(m,n) be a matrix the elements aij of which are color values of pixelsof a picture.

If A = UΣV T is the singular value decomposition of A, then

Ak =k∑

σjuj(v j)T , k = 1, . . . , min(n, m)

is an approximation to A. The storage of Ak requires onlyk ∗ (n + m + 1)/(n ∗ m) memory cells whereas A requires mn.

Notice that that using the SVD in this manner is a very simple way of datacompression. There are algorithm in image processing which are much lesscostly.

Data compression

Ak =k∑

σjuj(v j)T , k = 1, . . . , min(n, m)

Data compression

Ak =k∑

σjuj(v j)T , k = 1, . . . , min(n, m)

Example

Original

Figure: Original

Example ct.

k=5; 2.6%

Figure: Compression: k = 5

Example ct.

k=10; 5.3%

Example ct.

k=20; 10.5%

Pseudoinverse

Consider the linear least squares problem

Let A ∈ R(m,n) and b ∈ Rm with m ≥ n.

Find x ∈ Rn such that

‖Ax − b‖2 = min! (1)

We examine this problem taking advantage of the singular valuedecomposition.

In the following we denote with σ1 ≥ σ2 ≥ · · · ≥ σr > 0 = σr+1 = · · · = σn = 0the singular values of A. A = UΣV T is the singular value decomposition of A,and uj and vk are the left and right singular vectors, respectively, i.e. thecolumns of U and V .

Pseudoinverse

Let A ∈ R(m,n) and b ∈ Rm with m ≥ n.

Find x ∈ Rn such that

‖Ax − b‖2 = min! (1)

We examine this problem taking advantage of the singular valuedecomposition.

In the following we denote with σ1 ≥ σ2 ≥ · · · ≥ σr > 0 = σr+1 = · · · = σn = 0the singular values of A. A = UΣV T is the singular value decomposition of A,and uj and vk are the left and right singular vectors, respectively, i.e. thecolumns of U and V .

Pseudoinverse ct.

TheoremLet c := UT b ∈ Rm.

The set of solutions of the linear least squares problem is

L = x + null(A), (2)

where x is the following particular solution (1):

x :=r∑

σiv i . (3)

Pseudoinverse

Multiplying a vector by an orthogonal matrix does not change its length.Hence, with z := V T x it holds that

‖Ax − b‖22 = ‖UT (Ax − b)‖2

2 = ‖ΣV T x − UT b‖22

= ‖Σz − c‖22 = ‖(σ1z1 − c1, . . . , σr zr − cr ,−cr+1, . . . ,−cm)T‖2

Therefore, the solution of problem (1) reads: zi := ciσi

, i = 1, . . . , r , und zi ∈ Rbeliebig für i = r + 1, . . . , n, i.e.

x =r∑

σiv i +

n∑i=r+1

ziv i , zi ∈ R, i = r + 1, . . . , n. (4)

Since the tailing n − r columns of V span the null space of A, the set L ofsolutions of problem (1) has the form (2), (3). �

Pseudoinverse

‖Ax − b‖22 = ‖UT (Ax − b)‖2

2 = ‖ΣV T x − UT b‖22

x =r∑

σiv i +

n∑i=r+1

ziv i , zi ∈ R, i = r + 1, . . . , n. (4)

Pseudoinverse

‖Ax − b‖22 = ‖UT (Ax − b)‖2

2 = ‖ΣV T x − UT b‖22

x =r∑

σiv i +

n∑i=r+1

ziv i , zi ∈ R, i = r + 1, . . . , n. (4)

Pseudonormal solution

This theorem demonstrates again that the linear least squares problem (1)has a unique solution if and only if r = rank(A) = n. We enforce theuniqueness also in the case r < n requiring additionally that the Euclideannorm of the solution is minimal.

DefinitionLet L be the solution set of the linear least squares problem (1).x ∈ L is called pseudonormal solution of (1), if

‖x‖2 ≤ ‖x‖2 for every x ∈ L.

Pseudonormal solution

This theorem demonstrates again that the linear least squares problem (1)has a unique solution if and only if r = rank(A) = n. We enforce theuniqueness also in the case r < n requiring additionally that the Euclideannorm of the solution is minimal.

DefinitionLet L be the solution set of the linear least squares problem (1).x ∈ L is called pseudonormal solution of (1), if

‖x‖2 ≤ ‖x‖2 for every x ∈ L.

Pseudonormal solution ct.

The representation (4) of the general solution of (1) yields that x in (3) is thepseudonormal solution of (1):∥∥∥∥∥x +

n∑i=r+1

∥∥∥∥∥ 22 = ‖x‖2

2 +n∑

|zi |2 · ‖v i‖22 ≥ ‖x‖2

The pseudonormal solution is unique, and x obviously is the only solution of(1) with x ∈ null(A)⊥ ∩ L. Hence, we obtained

SatzThere exists a unique pseudonormal solution x of problem (1) which ischaracterized by

x ∈ null(A)⊥ ∩ L.

n∑i=r+1

∥∥∥∥∥ 22 = ‖x‖2

2 +n∑

|zi |2 · ‖v i‖22 ≥ ‖x‖2

n∑i=r+1

∥∥∥∥∥ 22 = ‖x‖2

2 +n∑

|zi |2 · ‖v i‖22 ≥ ‖x‖2

Pseudoinverse

For every A ∈ R(m,n)

Rm 3 b 7→ x ∈ Rn : ‖Ax − b‖2 ≤ ‖Ax − b‖2 ∀x ∈ Rn, ‖x‖2 minimal

defines a mapping which obviously is linear (cf. the representation of x in (3)).Therefore, it can be representated as a matrix A† ∈ R(n,m).

DefinitionFor A ∈ R(m,n) the matrix A† ∈ R(n,m), such that x := A†b for every b ∈ Rm isthe pseudonormal solution of the linear least squares problem (1) is calledpseudo inverse ( or Moore-Penrose inverse) of A.

Pseudoinverse

For every A ∈ R(m,n)

Rm 3 b 7→ x ∈ Rn : ‖Ax − b‖2 ≤ ‖Ax − b‖2 ∀x ∈ Rn, ‖x‖2 minimal

defines a mapping which obviously is linear (cf. the representation of x in (3)).Therefore, it can be representated as a matrix A† ∈ R(n,m).

DefinitionFor A ∈ R(m,n) the matrix A† ∈ R(n,m), such that x := A†b for every b ∈ Rm isthe pseudonormal solution of the linear least squares problem (1) is calledpseudo inverse ( or Moore-Penrose inverse) of A.

Pseudo inverse

If rank(A) = n and m ≥ n, then the least squares problem (1) is uniquelysolvable, and it follows from the normal equations that the solution isx = (AT A)−1AT b.Hence, in this case A† = (AT A)−1AT .

If n = m and A is nonsingular, then it holds that A† = A−1.Hence, the pseudo inverse is the usual inverse, if this one exists, and thepseudo inverse is consistent extension of the inverse. �

Pseudo inverse

If rank(A) = n and m ≥ n, then the least squares problem (1) is uniquelysolvable, and it follows from the normal equations that the solution isx = (AT A)−1AT b.Hence, in this case A† = (AT A)−1AT .

If n = m and A is nonsingular, then it holds that A† = A−1.Hence, the pseudo inverse is the usual inverse, if this one exists, and thepseudo inverse is consistent extension of the inverse. �

Pseudo inverse ct.

Theoremlet A ∈ R(m,n) and

A = UΣV T , Σ = (σiδij)i,j

its singular value decomposition

Then it holds that

(i) Σ† = (τiδij)j,i , τi =

{σ−1

i , falls σi 6= 00, falls σi = 0

(ii) A† = VΣ†UT .

Pseudo inverse ct.

Then it holds that

{σ−1

Pseudo inverse ct.

Then it holds that

{σ−1

Pseudo inverse ct.RemarkThe explicit representation of the pseudo inverse is needed only for theoreticalconsiderations and is never computed explicitly (similarly to the inverse of anonsingular matrix).

CorollaryFor every matrix A ∈ R(m,n) it holds THAT

A†† = A

and(A†)T = (AT )†.

A† has the well known properties of the inverse A−1 of a nonsingular matrix Awith the only exception that in general

(AB)† 6= B†A†.

A†† = A

and(A†)T = (AT )†.

(AB)† 6= B†A†.

A†† = A

and(A†)T = (AT )†.

(AB)† 6= B†A†.

Example

(1 −10 0

(√2 0

)1√2

(1 −11 1

Its pseudo inverse is

A† =1√2

(1 1−1 1

) (1/√

2 00 0

(1 0−1 0

Then A2 = A and (A†)2 = 12 A†, i.e. (A2)† 6= (A†)2. �

Example

(1 −10 0

(√2 0

)1√2

(1 −11 1

A† =1√2

(1 1−1 1

) (1/√

2 00 0

(1 0−1 0

Example

(1 −10 0

(√2 0

)1√2

(1 −11 1

A† =1√2

(1 1−1 1

) (1/√

2 00 0

(1 0−1 0

Perturbation of least squares problems

‖Ax − b‖2 = min! (1)

with A ∈ R(m,n), rank(A) = r , and a perturbed problem

‖A(x + ∆x)− (b + ∆b)‖2 = min!, (2)

where we incorporate only perturbations of the right hand side b, but not ofthe system matrix A.

Let x = A†b and x + ∆x = A†(b + ∆b) the pseudo normal solution of (1) and(2), respectively.

Then ∆x = A†∆b, and from ‖A†‖2 = 1σr

it follows

‖∆x‖2 ≤ ‖A†‖2 · ‖∆b‖2 =1σr‖∆b‖2.

‖Ax − b‖2 = min! (1)

‖A(x + ∆x)− (b + ∆b)‖2 = min!, (2)

it follows

‖∆x‖2 ≤ ‖A†‖2 · ‖∆b‖2 =1σr‖∆b‖2.

‖Ax − b‖2 = min! (1)

‖A(x + ∆x)− (b + ∆b)‖2 = min!, (2)

it follows

‖∆x‖2 ≤ ‖A†‖2 · ‖∆b‖2 =1σr‖∆b‖2.

It holds that

‖x‖22 =

r∑i=1

σ2i≥ 1

r∑i=1

∥∥∥∥∥r∑

∥∥∥∥∥2

Obviously,r∑

i=1ciui is the projection of b to the range of A. Therefore it follows

for the relative error

‖∆x‖2

‖x‖2≤ σ1

σr· ‖∆b‖2

‖Prange(A)b‖2. (3)

This inequality specifies, how the relative error of the right hand side of alinear least squares problem effects the solution of the problem

It holds that

‖x‖22 =

r∑i=1

σ2i≥ 1

r∑i=1

∥∥∥∥∥r∑

∥∥∥∥∥2

Obviously,r∑

‖∆x‖2

‖x‖2≤ σ1

σr· ‖∆b‖2

It holds that

‖x‖22 =

r∑i=1

σ2i≥ 1

r∑i=1

∥∥∥∥∥r∑

∥∥∥∥∥2

Obviously,r∑

‖∆x‖2

‖x‖2≤ σ1

σr· ‖∆b‖2

condition

DefinitionFor A ∈ R(m,n) let A = UΣV T be the singular value decomposition, and letrank(A) = r . Then κ2(A) := σ1

σris called condition of A.

If A ∈ R(n,n) is nonsingular then this definition coincides with the one withrespect to the Euclidean norm for quadratic matrices given before.

κ2(AT A) = κ2(A)2

Hence, the normal equation of a linear least squares problem are much worseconditioned than the system matrix of the problem.

condition

κ2(AT A) = κ2(A)2

condition

κ2(AT A) = κ2(A)2

Perturbed least squares problemsFor perturbations of the system matrix the following theorem holds

TheoremAssume that A ∈ R(m,n), m ≥ n, is not deficient, i.e. rank(A) = n. Let x be thesolution of the least squares problem (1) and x be the solution of theperturbed problem

‖(A + ∆A)x − (b + ∆b)‖2 = min!, (4)

ε := max(‖∆A‖2

‖A‖2,‖∆b‖2

‖b‖2

1κ2(A)

=σn(A)

σ1(A). (5)

Then it holds that

‖x − x‖2

‖x‖2≤ ε

(2κ2(A)

cos θ+ tan θ · κ2

)+ O(ε2), (6)

where θ is the angle between b and its projection to range(A).

‖(A + ∆A)x − (b + ∆b)‖2 = min!, (4)

ε := max(‖∆A‖2

‖A‖2,‖∆b‖2

‖b‖2

1κ2(A)

=σn(A)

σ1(A). (5)

Then it holds that

‖x − x‖2

‖x‖2≤ ε

(2κ2(A)

)+ O(ε2), (6)

where θ is the angle between b and its projection to range(A).

‖(A + ∆A)x − (b + ∆b)‖2 = min!, (4)

ε := max(‖∆A‖2

‖A‖2,‖∆b‖2

‖b‖2

1κ2(A)

=σn(A)

σ1(A). (5)

Then it holds that

‖x − x‖2

‖x‖2≤ ε

(2κ2(A)

)+ O(ε2), (6)

where θ is the angle between b and its projection to range(A).TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 47 / 55

Regularization

Example

Consider the orthogonal projection of a given function f : [0, 1] → R to thespace Πn−1 of polynomial of maximum degree n − 1 with respect to the scalarproduct

〈f , g〉 :=

0f (x)g(x) dx .

Choosing the basis {1, x , . . . , xn−1} one obtains the linear system

Ay = b (1)

withA = (aij)i,j=1,...,n, aij :=

1i + j − 1

the so called Hilbert Matrix, and b ∈ Rn, bi := 〈f , x i−1〉.

Regularization

Example

Consider the orthogonal projection of a given function f : [0, 1] → R to thespace Πn−1 of polynomial of maximum degree n − 1 with respect to the scalarproduct

〈f , g〉 :=

0f (x)g(x) dx .

Choosing the basis {1, x , . . . , xn−1} one obtains the linear system

Ay = b (1)

withA = (aij)i,j=1,...,n, aij :=

1i + j − 1

the so called Hilbert Matrix, and b ∈ Rn, bi := 〈f , x i−1〉.

Regularization

Example ct.

For dimensions n = 10, n = 20 and n = 40 we choose the right hand side of(1) such that y = (1, . . . , 1)T is the unique solution, and we solve the resultingsystem by the known methods.

The LU factorization with column pivoting (in MATLAB A\b), the Choleskyfactorization, the QR decomposition of A and the singular value decompositionof A yield the following errors with respect to the Euclidean norm:

n = 10 n = 20 n = 40LU factorization 5.24 E-4 8.25 E+1 3.78 E+2Cholesky 7.15 E-4 numer. nicht pos. def.QR decomposition 1.41 E-3 1.67 E+2 1.46 E+3SVD 8.24 E-4 3.26 E+2 8.35 E+2

Regularization

Example ct.

Regularization

Example ct.

Regularization

Example ct.

A similar behavior is observed for the least square problem. For n = 10,n = 20 and n = 40 and m = n + 10 we consider the least squares problem

‖Ax − b‖2 = min!

with the Hilbert matrix A ∈ R(m,n), where b is chosen such that x = (1, . . . , 1)T

solves the problem with residual Ax − b = 0.

n = 10 n = 20 n = 40Normal equations 2.91 E+2 2.40 E+2 8.21 E+2QR factorization 1.93 E-5 5.04 E+0 1.08 E+1SVD 4.67 E-5 6.41 E+1 3.72 E+2

Regularization

Example ct.

A similar behavior is observed for the least square problem. For n = 10,n = 20 and n = 40 and m = n + 10 we consider the least squares problem

‖Ax − b‖2 = min!

with the Hilbert matrix A ∈ R(m,n), where b is chosen such that x = (1, . . . , 1)T

solves the problem with residual Ax − b = 0.

n = 10 n = 20 n = 40Normal equations 2.91 E+2 2.40 E+2 8.21 E+2QR factorization 1.93 E-5 5.04 E+0 1.08 E+1SVD 4.67 E-5 6.41 E+1 3.72 E+2

Regularization

For badly conditioned least squares problem or linear systems the followingapproach can yield reliable solutions:

Determine the singular value decomposition A = UΣVT of A, anddefine

Σ†τ = diag(ηiδji), ηi :=

{σ−1

i falls σi ≥ τ,0 sonst,

where τ > 0 is a given threshold, and

A†τ := VΣ†τ UT , xτ := A†τ b.

A†τ is called effektive pseudo inverse of A. This method of approximativelysolving Ax = b is called regularization by truncation

Very small singular values are disposed. One solves instead of Ax = b thelinear system Axτ = Pb, where P is the orthogonal projection to the subspacespan{ui : σi ≥ τ}.

Regularization

{σ−1

Regularization

{σ−1

Regularization

{σ−1

Regularization

Tichonov regularization

The most prominent way of regularization was introduced independently byPhilips (1962) and Tichonov (1963), and is called Tichonov Regularization.

Here small singular values are not discarded but their influence on thesolution is damped.

Instead of Ax = b one solves the linear system

(AT A + αIn)x = AT b (4)

where α > 0 is a suitable regularization parameter.

Regularization

Tichonov regularization ct.

Obviously, system (4) is equivalent to

‖Ax − b‖22 + α‖x‖2

2 = min! (5)

(which is the usual representation of the Tichonov regularization) orequivalent to

‖Ax − b‖2 = min, A =

(A√αIn

), b =

). (6)

This version together with the QR factorization of A was used by Golub (1965)to execute Tichonov’s regularization in a stable way.

Regularization

‖Ax − b‖22 + α‖x‖2

2 = min! (5)

‖Ax − b‖2 = min, A =

(A√αIn

), b =

). (6)

Regularization

‖Ax − b‖22 + α‖x‖2

2 = min! (5)

‖Ax − b‖2 = min, A =

(A√αIn

), b =

). (6)

Regularization

FromAT A = AT A + αIn

it follows that A has singular values√

σ2i + α, if σi denote the singular values

of A, and the condition or problem (1) is reduced to√

σ21+α

σ2n+α

If β := UT b, then problem (6) is equivalent to

V (ΣT UT UΣ + αIn)V T x = VΣT UT b = VΣT β,

x = V (ΣT Σ + αIn)−1ΣT β =n∑

βiσi

σ2i + α

Hence, knowing the singular value decomposition of A, the regularizedproblem can be solved for various regularization parameters α.

Regularization

σ21+α

σ2n+α

βiσi

σ2i + α

Regularization

σ21+α

σ2n+α

βiσi

σ2i + α

Regularization

Example

For the linear system (1), (2) one obtains the errors in the following table.

n = 10 n = 20 n = 40Tichonov Cholesky 1.41 E-3 2.03 E-3 3.51 E-3Tichonov QR 3.50 E-6 5.99 E-6 7.54 E-6Tichonov SVD 3.43 E-6 6.33 E-6 9.66 E-6truncated SVD 2.77 E-6 3.92 E-6 7.35 E-6

For the least squares problem one gets

Regularization

Example

For the linear system (1), (2) one obtains the errors in the following table.

For the least squares problem one gets

Numerical Linear Algebra Chap. 4: Perturbation and ...Numerical Linear Algebra Chap. 4: Perturbation...

Documents

Transcript of Numerical Linear Algebra Chap. 4: Perturbation and ...Numerical Linear Algebra Chap. 4: Perturbation...

eDWarD groesBeck voss (1929–2012) - University of Michiganwebapps.lsa.umich.edu/herb/publications/research/...eDWarD groesBeck voss (1929–2012) Edward Groesbeck Voss, known to

VOSS VINEYARDS SAUVIGNON BLANC 2016 NAPA ...orchestracms.s3.amazonaws.com/voss/tasting-notes/voss...VOSS VINEYARDS SAUVIGNON BLANC 2016 NAPA VALLEY ORGANICALLY FARMED SINCE 1986 Sourced

Copyright H von Voss, Munich 2011 Hyperbilirubinemia-Intoxication- Encephalopathy-Syndrome Hubertus von Voss, Munich.

Voss Artesian Water Dubai

Voss Accessories

Douglas And Voss

Michael and Voss 2009a

Meet Ms. Voss!

Voss Konrad Truffer

Cosmological Perturbation

Application of Simultaneous Perturbation Stochastic ... of Simultaneous Perturbation Stochastic Approximation ... and centrifugal compressor ... Simultaneous Perturbation Stochastic

CHAPTER 4 : AMLS METHOD - Institut für Mathematik 4 : AMLS METHOD Heinrich Voss voss@tu-harburg.de Hamburg University of Technology Institute of Mathematics TUHH Heinrich Voss AMLS

Lec04 Perturbation

Perturbation Notation Perturbation Perturbation Methodsdejong/Perturbation Methods.pdf · Perturbation DND Notation Introduction to Perturbation Methods Using Perturbation to Approximate

Georgina Voss

Blohm + Voss Oil Tools

Voss Water Pamphlet

BRUCE VOSS RESUME--2019 - Hawaii State Legislature · 2019. 3. 30. · BRUCE VOSS RESUME--2019 Bruce Voss is an experienced and tenacious business litigation attorney, whose ... District

Kayak Voss week trips

VOSS Automotive India - IATF 16949 · Title: VOSS Automotive India - IATF 16949 Author: VOSS Automotive India Pvt. Ind. Subject: Certificate IATF 16949 Keywords: VOSS Automotive India;