Numerical Linear Algebra Chap. 4: Perturbation and ...Numerical Linear Algebra Chap. 4: Perturbation...

Post on 18-Jun-2020

21 views 0 download

Transcript of Numerical Linear Algebra Chap. 4: Perturbation and ...Numerical Linear Algebra Chap. 4: Perturbation...

Numerical Linear AlgebraChap. 4: Perturbation and Regularisation

Heinrich Vossvoss@tu-harburg.de

Hamburg University of TechnologyInstitute of Numerical Simulation

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 1 / 55

Linear systems

Sensitivity of linear systems

Consider the linear system of equation

Ax = b (1)

where A ∈ R(n,n) is a nonsingular matrix, and a perturbed system

(A + ∆A)(x + ∆x) = b + ∆b. (2)

Our aim is to examine how perturbations of A and of b affect the solution ofthe system.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 2 / 55

Linear systems

Sensitivity of linear systems

Consider the linear system of equation

Ax = b (1)

where A ∈ R(n,n) is a nonsingular matrix, and a perturbed system

(A + ∆A)(x + ∆x) = b + ∆b. (2)

Our aim is to examine how perturbations of A and of b affect the solution ofthe system.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 2 / 55

Linear systems

Remarks

Small perturbations always have to be kept in mind when solving practicalproblems since

the data A and/or b may be obtained from measurements, and thereforethey are erroneous

using computers the representation of data as floating point numbersalways produces errors.

Hence, one always has to emanate from the fact that one solves a perturbedlinear system instead of the given one. However, usually the pertubations arequite small.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 3 / 55

Linear systems

Remarks

Small perturbations always have to be kept in mind when solving practicalproblems since

the data A and/or b may be obtained from measurements, and thereforethey are erroneous

using computers the representation of data as floating point numbersalways produces errors.

Hence, one always has to emanate from the fact that one solves a perturbedlinear system instead of the given one. However, usually the pertubations arequite small.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 3 / 55

Linear systems

Remarks

Small perturbations always have to be kept in mind when solving practicalproblems since

the data A and/or b may be obtained from measurements, and thereforethey are erroneous

using computers the representation of data as floating point numbersalways produces errors.

Hence, one always has to emanate from the fact that one solves a perturbedlinear system instead of the given one. However, usually the pertubations arequite small.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 3 / 55

Linear systems

Remarks

Small perturbations always have to be kept in mind when solving practicalproblems since

the data A and/or b may be obtained from measurements, and thereforethey are erroneous

using computers the representation of data as floating point numbersalways produces errors.

Hence, one always has to emanate from the fact that one solves a perturbedlinear system instead of the given one. However, usually the pertubations arequite small.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 3 / 55

Linear systems

Perturbation lemma

LemmaLet B ∈ R(n,n), and assume that for some vector norm and the associatematrix norm the following inequality is satisfied ‖B‖ < 1.

Then the matrix I − B is nonsingular, and it holds that

‖(I − B)−1‖ ≤ 11 − ‖B‖

.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 4 / 55

Linear systems

Perturbation lemma

LemmaLet B ∈ R(n,n), and assume that for some vector norm and the associatematrix norm the following inequality is satisfied ‖B‖ < 1.

Then the matrix I − B is nonsingular, and it holds that

‖(I − B)−1‖ ≤ 11 − ‖B‖

.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 4 / 55

Linear systems

Proof

For every x ∈ Rn, x 6= 0,

‖(I − B)x‖ ≥ ‖x‖ − ‖Bx‖ ≥ ‖x‖ − ‖B‖‖x‖ = (1 − ‖B‖)‖x‖ > 0.

Therefore, the linear system(I − B)x = 0

has the unique solution x = 0, and I − B is nonsingular.

The estimate of the norm of the inverse of I − B follows from

1 = ‖(I − B)−1(I − B)‖ = ‖(I − B)−1 − (I − B)−1B‖≥ ‖(I − B)−1‖ − ‖(I − B)−1B‖≥ ‖(I − B)−1‖ − ‖(I − B)−1‖ · ‖B‖= (1 − ‖B‖) · ‖(I − B)−1‖.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 5 / 55

Linear systems

Proof

For every x ∈ Rn, x 6= 0,

‖(I − B)x‖ ≥ ‖x‖ − ‖Bx‖ ≥ ‖x‖ − ‖B‖‖x‖ = (1 − ‖B‖)‖x‖ > 0.

Therefore, the linear system(I − B)x = 0

has the unique solution x = 0, and I − B is nonsingular.

The estimate of the norm of the inverse of I − B follows from

1 = ‖(I − B)−1(I − B)‖ = ‖(I − B)−1 − (I − B)−1B‖≥ ‖(I − B)−1‖ − ‖(I − B)−1B‖≥ ‖(I − B)−1‖ − ‖(I − B)−1‖ · ‖B‖= (1 − ‖B‖) · ‖(I − B)−1‖.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 5 / 55

Linear systems

Proof

For every x ∈ Rn, x 6= 0,

‖(I − B)x‖ ≥ ‖x‖ − ‖Bx‖ ≥ ‖x‖ − ‖B‖‖x‖ = (1 − ‖B‖)‖x‖ > 0.

Therefore, the linear system(I − B)x = 0

has the unique solution x = 0, and I − B is nonsingular.

The estimate of the norm of the inverse of I − B follows from

1 = ‖(I − B)−1(I − B)‖ = ‖(I − B)−1 − (I − B)−1B‖≥ ‖(I − B)−1‖ − ‖(I − B)−1B‖≥ ‖(I − B)−1‖ − ‖(I − B)−1‖ · ‖B‖= (1 − ‖B‖) · ‖(I − B)−1‖.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 5 / 55

Linear systems

Corollary

Let A ∈ Rn be a nonsingular matrix, and ∆A ∈ Rn. Assume that

‖∆A‖ ≤ 1‖A−1‖

for a matrix norm which is subordinate to some vector norm.

Then A + ∆A is nonsingular, and it holds that

‖(A + ∆A)−1‖ ≤ ‖A−1‖1 − ‖A−1∆A‖

≤ ‖A−1‖1 − ‖A−1‖ · ‖∆A‖

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 6 / 55

Linear systems

Corollary

Let A ∈ Rn be a nonsingular matrix, and ∆A ∈ Rn. Assume that

‖∆A‖ ≤ 1‖A−1‖

for a matrix norm which is subordinate to some vector norm.

Then A + ∆A is nonsingular, and it holds that

‖(A + ∆A)−1‖ ≤ ‖A−1‖1 − ‖A−1∆A‖

≤ ‖A−1‖1 − ‖A−1‖ · ‖∆A‖

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 6 / 55

Linear systems

Proof

The existence of (A + ∆A)−1 follows from the perturbation lemma since

‖∆A‖ <1

‖A−1‖⇒ 1 > ‖A−1‖ · ‖∆A‖ ≥ ‖A−1∆A‖

andA + ∆A = A(I + A−1∆A).

‖(A + ∆A)−1‖ = ‖(I + A−1∆A)−1A−1‖ ≤ ‖A−1‖ · ‖(I + A−1∆A)−1‖

≤ ‖A−1‖1 − ‖A−1∆A‖

≤ ‖A−1‖1 − ‖A−1‖ · ‖∆A‖

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 7 / 55

Linear systems

Proof

The existence of (A + ∆A)−1 follows from the perturbation lemma since

‖∆A‖ <1

‖A−1‖⇒ 1 > ‖A−1‖ · ‖∆A‖ ≥ ‖A−1∆A‖

andA + ∆A = A(I + A−1∆A).

‖(A + ∆A)−1‖ = ‖(I + A−1∆A)−1A−1‖ ≤ ‖A−1‖ · ‖(I + A−1∆A)−1‖

≤ ‖A−1‖1 − ‖A−1∆A‖

≤ ‖A−1‖1 − ‖A−1‖ · ‖∆A‖

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 7 / 55

Linear systems

Remark

The Corollary demonstrates that for a nonsingular matrix A the perturbedmatrix A + ∆A is also nonsingular if the perturbation ∆A is sufficiently small.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 8 / 55

Linear systems

Perturbed linear system

We consider the perturbed linear system

(A + ∆A)(x + ∆x) = b + ∆b,

and we assume that the perturbation ∆A is so small that the condition of theCorollary is satisfied. Then A + ∆A is nonsingular.

Solving for ∆x one obtains the absolute error which is caused by theperturbations of A and b:

∆x = (A + ∆A)−1(∆b −∆Ax)

= (I + A−1∆A)−1A−1(∆b −∆Ax).

Hence, with an arbitrary vector norm and the subordinate matrix norm weobtain

‖∆x‖ ≤ ‖(I + A−1∆A)−1‖ · ‖A−1‖ (‖∆b‖+ ‖∆A‖ · ‖x‖) .

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 9 / 55

Linear systems

Perturbed linear system

We consider the perturbed linear system

(A + ∆A)(x + ∆x) = b + ∆b,

and we assume that the perturbation ∆A is so small that the condition of theCorollary is satisfied. Then A + ∆A is nonsingular.

Solving for ∆x one obtains the absolute error which is caused by theperturbations of A and b:

∆x = (A + ∆A)−1(∆b −∆Ax)

= (I + A−1∆A)−1A−1(∆b −∆Ax).

Hence, with an arbitrary vector norm and the subordinate matrix norm weobtain

‖∆x‖ ≤ ‖(I + A−1∆A)−1‖ · ‖A−1‖ (‖∆b‖+ ‖∆A‖ · ‖x‖) .

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 9 / 55

Linear systems

Perturbed linear system

We consider the perturbed linear system

(A + ∆A)(x + ∆x) = b + ∆b,

and we assume that the perturbation ∆A is so small that the condition of theCorollary is satisfied. Then A + ∆A is nonsingular.

Solving for ∆x one obtains the absolute error which is caused by theperturbations of A and b:

∆x = (A + ∆A)−1(∆b −∆Ax)

= (I + A−1∆A)−1A−1(∆b −∆Ax).

Hence, with an arbitrary vector norm and the subordinate matrix norm weobtain

‖∆x‖ ≤ ‖(I + A−1∆A)−1‖ · ‖A−1‖ (‖∆b‖+ ‖∆A‖ · ‖x‖) .

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 9 / 55

Linear systems

Perturbed linear system ct.

For b 6= 0 and as a consequence x 6= 0 it holds for the relative error ‖∆x‖/‖x‖that

‖∆x‖‖x‖

≤ ‖(I + A−1∆A)−1‖ · ‖A−1‖(‖∆b‖‖x‖

+ ‖∆A‖)

. (3)

and the Corollary yields

‖∆x‖‖x‖

≤ ‖A−1‖1 − ‖A−1‖ · ‖∆A‖

(‖A‖‖∆b‖

‖b‖+ ‖∆A‖

)≤ ‖A−1‖ · ‖A‖

1 − ‖A−1‖ · ‖A‖‖∆A‖‖A‖

(‖∆A‖‖A‖

+‖∆b‖‖b‖

). (4)

Hence, for small perturbations (such that the denominator does not deviatevery much from 1) the relative error ∆b of the right hand side and the relativeerror ∆A of the system matrix are amplified by the factor ‖A−1‖ · ‖A‖. Thisamplification factor is called condition of the matrix A.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 10 / 55

Linear systems

Perturbed linear system ct.

For b 6= 0 and as a consequence x 6= 0 it holds for the relative error ‖∆x‖/‖x‖that

‖∆x‖‖x‖

≤ ‖(I + A−1∆A)−1‖ · ‖A−1‖(‖∆b‖‖x‖

+ ‖∆A‖)

. (3)

and the Corollary yields

‖∆x‖‖x‖

≤ ‖A−1‖1 − ‖A−1‖ · ‖∆A‖

(‖A‖‖∆b‖

‖b‖+ ‖∆A‖

)≤ ‖A−1‖ · ‖A‖

1 − ‖A−1‖ · ‖A‖‖∆A‖‖A‖

(‖∆A‖‖A‖

+‖∆b‖‖b‖

). (4)

Hence, for small perturbations (such that the denominator does not deviatevery much from 1) the relative error ∆b of the right hand side and the relativeerror ∆A of the system matrix are amplified by the factor ‖A−1‖ · ‖A‖. Thisamplification factor is called condition of the matrix A.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 10 / 55

Linear systems

Perturbed linear system ct.

For b 6= 0 and as a consequence x 6= 0 it holds for the relative error ‖∆x‖/‖x‖that

‖∆x‖‖x‖

≤ ‖(I + A−1∆A)−1‖ · ‖A−1‖(‖∆b‖‖x‖

+ ‖∆A‖)

. (3)

and the Corollary yields

‖∆x‖‖x‖

≤ ‖A−1‖1 − ‖A−1‖ · ‖∆A‖

(‖A‖‖∆b‖

‖b‖+ ‖∆A‖

)≤ ‖A−1‖ · ‖A‖

1 − ‖A−1‖ · ‖A‖‖∆A‖‖A‖

(‖∆A‖‖A‖

+‖∆b‖‖b‖

). (4)

Hence, for small perturbations (such that the denominator does not deviatevery much from 1) the relative error ∆b of the right hand side and the relativeerror ∆A of the system matrix are amplified by the factor ‖A−1‖ · ‖A‖. Thisamplification factor is called condition of the matrix A.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 10 / 55

Linear systems

Definition

Let A ∈ C(n,n) be a nonsingular matrix, and let ‖ · ‖ be a matrix norm on C(n,n)

which is subordinate to some vector norm.

Thenκ(A) := ‖A−1‖ · ‖A‖

is called condition of the matrix A (or of the linear system of equations (1))corresponding to the norm ‖ · ‖.

RemarkFor every nonsingular matrix A and every norm ‖ · ‖ it holds that κ(A) ≥ 1,because

1 = ‖I‖ = ‖AA−1‖ ≤ ‖A‖ · ‖A−1‖ = κ(A).

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 11 / 55

Linear systems

Definition

Let A ∈ C(n,n) be a nonsingular matrix, and let ‖ · ‖ be a matrix norm on C(n,n)

which is subordinate to some vector norm.

Thenκ(A) := ‖A−1‖ · ‖A‖

is called condition of the matrix A (or of the linear system of equations (1))corresponding to the norm ‖ · ‖.

RemarkFor every nonsingular matrix A and every norm ‖ · ‖ it holds that κ(A) ≥ 1,because

1 = ‖I‖ = ‖AA−1‖ ≤ ‖A‖ · ‖A−1‖ = κ(A).

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 11 / 55

Linear systems

Definition

Let A ∈ C(n,n) be a nonsingular matrix, and let ‖ · ‖ be a matrix norm on C(n,n)

which is subordinate to some vector norm.

Thenκ(A) := ‖A−1‖ · ‖A‖

is called condition of the matrix A (or of the linear system of equations (1))corresponding to the norm ‖ · ‖.

RemarkFor every nonsingular matrix A and every norm ‖ · ‖ it holds that κ(A) ≥ 1,because

1 = ‖I‖ = ‖AA−1‖ ≤ ‖A‖ · ‖A−1‖ = κ(A).

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 11 / 55

Linear systems

Theorem

Let A,∆A ∈ R(n,n) and b,∆b ∈ Rn, b 6= 0, such that A is nonsingular, andassume that ‖A−1‖ · ‖∆A‖ < 1 for some matrix norm which is subordinate tosome vector norm ‖ · ‖.

Let x and x + ∆x be the solution of the linear system (1) and the perturbedsystem (2), respectively, and the following estimation of the relative error holds

‖∆x‖‖x‖

≤ κ(A)

1 − κ(A) · ‖∆A‖‖A‖

(‖∆A‖‖A‖

+‖∆b‖‖b‖

).

where κ(A) := ‖A‖ · ‖A−1‖ denotes the condition of A.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 12 / 55

Linear systems

Theorem

Let A,∆A ∈ R(n,n) and b,∆b ∈ Rn, b 6= 0, such that A is nonsingular, andassume that ‖A−1‖ · ‖∆A‖ < 1 for some matrix norm which is subordinate tosome vector norm ‖ · ‖.

Let x and x + ∆x be the solution of the linear system (1) and the perturbedsystem (2), respectively, and the following estimation of the relative error holds

‖∆x‖‖x‖

≤ κ(A)

1 − κ(A) · ‖∆A‖‖A‖

(‖∆A‖‖A‖

+‖∆b‖‖b‖

).

where κ(A) := ‖A‖ · ‖A−1‖ denotes the condition of A.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 12 / 55

Linear systems

Remark

Assume that the length of the mantissa (i.e. the number of leading digits infloating point representation) of our computer is `. Then that the relative inputdata error of A and b is 5 · 10−`.

If κ(A) = 10γ , then (not considering the round of errors which occur in thenumerical method for solving the linear system) we have to expect a relativeerror of approximately 5 · 10γ−` for a numerical solution the linear systemAx = b.

Roughly speaking solving a linear system numerically we are loosing γ digitsStellen if the order of magnitude of the condition of the system matrix A is 10γ .

This loss of accuracy has nothing to do with the algorithm of choice. It isproblem immanent. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 13 / 55

Linear systems

Remark

Assume that the length of the mantissa (i.e. the number of leading digits infloating point representation) of our computer is `. Then that the relative inputdata error of A and b is 5 · 10−`.

If κ(A) = 10γ , then (not considering the round of errors which occur in thenumerical method for solving the linear system) we have to expect a relativeerror of approximately 5 · 10γ−` for a numerical solution the linear systemAx = b.

Roughly speaking solving a linear system numerically we are loosing γ digitsStellen if the order of magnitude of the condition of the system matrix A is 10γ .

This loss of accuracy has nothing to do with the algorithm of choice. It isproblem immanent. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 13 / 55

Linear systems

Remark

Assume that the length of the mantissa (i.e. the number of leading digits infloating point representation) of our computer is `. Then that the relative inputdata error of A and b is 5 · 10−`.

If κ(A) = 10γ , then (not considering the round of errors which occur in thenumerical method for solving the linear system) we have to expect a relativeerror of approximately 5 · 10γ−` for a numerical solution the linear systemAx = b.

Roughly speaking solving a linear system numerically we are loosing γ digitsStellen if the order of magnitude of the condition of the system matrix A is 10γ .

This loss of accuracy has nothing to do with the algorithm of choice. It isproblem immanent. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 13 / 55

Linear systems

Example

Consider the linear system of equations(1 11 0.999

)x =

(2

1.999

),

which obviously has the solution x = (1, 1)T .

For x + ∆x := (5,−3.002)T it holds that

A(x + ∆x) =

(1.998

2.001002

)=: b + ∆b.

Hence,‖∆b‖∞‖b‖∞

= 1.001 · 10−3 and‖∆x‖∞‖x‖∞

= 4.002,

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 14 / 55

Linear systems

Example

Consider the linear system of equations(1 11 0.999

)x =

(2

1.999

),

which obviously has the solution x = (1, 1)T .

For x + ∆x := (5,−3.002)T it holds that

A(x + ∆x) =

(1.998

2.001002

)=: b + ∆b.

Hence,‖∆b‖∞‖b‖∞

= 1.001 · 10−3 and‖∆x‖∞‖x‖∞

= 4.002,

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 14 / 55

Linear systems

Example

Consider the linear system of equations(1 11 0.999

)x =

(2

1.999

),

which obviously has the solution x = (1, 1)T .

For x + ∆x := (5,−3.002)T it holds that

A(x + ∆x) =

(1.998

2.001002

)=: b + ∆b.

Hence,‖∆b‖∞‖b‖∞

= 1.001 · 10−3 and‖∆x‖∞‖x‖∞

= 4.002,

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 14 / 55

Linear systems

Example ct.

and it follows for the condition

κ∞(A) ≥ 4.0021.001

103 = 3998.

Indeed

A−1 =

(−999 10001000 −1000

)and therefore

κ∞(A) = 4000.

This example demonstrates that the estimation of the relative error of thesolution of a perturbed system is sharp. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 15 / 55

Linear systems

Example ct.

and it follows for the condition

κ∞(A) ≥ 4.0021.001

103 = 3998.

Indeed

A−1 =

(−999 10001000 −1000

)and therefore

κ∞(A) = 4000.

This example demonstrates that the estimation of the relative error of thesolution of a perturbed system is sharp. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 15 / 55

Linear systems

Example ct.

and it follows for the condition

κ∞(A) ≥ 4.0021.001

103 = 3998.

Indeed

A−1 =

(−999 10001000 −1000

)and therefore

κ∞(A) = 4000.

This example demonstrates that the estimation of the relative error of thesolution of a perturbed system is sharp. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 15 / 55

Linear systems

Geometric condition

The following Theorem contains a geometric characterization of the conditionnumber. It says that the relative distance of a nonsingular matrix to the closestsingular matrix with respect to Euclidean norm is the reciprokal of thecondition number.

TheoremLet A ∈ R(n,n) be nonsingular.Then it holds that

min{‖∆A‖2

‖A‖2: A + ∆A singular}

}=

1κ2(A)

.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 16 / 55

Linear systems

Geometric condition

The following Theorem contains a geometric characterization of the conditionnumber. It says that the relative distance of a nonsingular matrix to the closestsingular matrix with respect to Euclidean norm is the reciprokal of thecondition number.

TheoremLet A ∈ R(n,n) be nonsingular.Then it holds that

min{‖∆A‖2

‖A‖2: A + ∆A singular}

}=

1κ2(A)

.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 16 / 55

Linear systems

Proof

It suffices to prove that

min {‖∆A‖2 : A + ∆A singular} = 1/‖A−1‖2.

That the minimum is at least 1/‖A−1‖2 follows from the perturbation lemma:for ‖∆A‖2 < 1/‖A−1‖2 it holds that

1 > ‖∆A‖2‖A−1‖2 ≥ ‖A−1∆A‖2.

Hence,I + A−1∆A = A−1(A + ∆A),

andA + ∆A

is invertible.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 17 / 55

Linear systems

Proof

It suffices to prove that

min {‖∆A‖2 : A + ∆A singular} = 1/‖A−1‖2.

That the minimum is at least 1/‖A−1‖2 follows from the perturbation lemma:for ‖∆A‖2 < 1/‖A−1‖2 it holds that

1 > ‖∆A‖2‖A−1‖2 ≥ ‖A−1∆A‖2.

Hence,I + A−1∆A = A−1(A + ∆A),

andA + ∆A

is invertible.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 17 / 55

Linear systems

Proof

It suffices to prove that

min {‖∆A‖2 : A + ∆A singular} = 1/‖A−1‖2.

That the minimum is at least 1/‖A−1‖2 follows from the perturbation lemma:for ‖∆A‖2 < 1/‖A−1‖2 it holds that

1 > ‖∆A‖2‖A−1‖2 ≥ ‖A−1∆A‖2.

Hence,I + A−1∆A = A−1(A + ∆A),

andA + ∆A

is invertible.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 17 / 55

Linear systems

Proof ct.

We now construct a matrix ∆A, such that

A + ∆Ais singular and ‖∆A‖2 = 1/‖A−1‖2

which demonstrates that the minimum is greater or equal to 1/‖A−1‖2.

From

‖A−1‖2 = maxx 6=0

‖A−1x‖2

‖x‖2

it follows that there exists x satisfying ‖x‖2 = 1 and ‖A−1‖2 = ‖A−1x‖2 > 0.

With this x we define

y :=A−1x

‖A−1x‖2=

A−1x‖A−1‖2

and ∆A := − xyT

‖A−1‖2.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 18 / 55

Linear systems

Proof ct.

We now construct a matrix ∆A, such that

A + ∆Ais singular and ‖∆A‖2 = 1/‖A−1‖2

which demonstrates that the minimum is greater or equal to 1/‖A−1‖2.

From

‖A−1‖2 = maxx 6=0

‖A−1x‖2

‖x‖2

it follows that there exists x satisfying ‖x‖2 = 1 and ‖A−1‖2 = ‖A−1x‖2 > 0.

With this x we define

y :=A−1x

‖A−1x‖2=

A−1x‖A−1‖2

and ∆A := − xyT

‖A−1‖2.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 18 / 55

Linear systems

Proof ct.

We now construct a matrix ∆A, such that

A + ∆Ais singular and ‖∆A‖2 = 1/‖A−1‖2

which demonstrates that the minimum is greater or equal to 1/‖A−1‖2.

From

‖A−1‖2 = maxx 6=0

‖A−1x‖2

‖x‖2

it follows that there exists x satisfying ‖x‖2 = 1 and ‖A−1‖2 = ‖A−1x‖2 > 0.

With this x we define

y :=A−1x

‖A−1x‖2=

A−1x‖A−1‖2

and ∆A := − xyT

‖A−1‖2.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 18 / 55

Linear systems

Proof ct.

Then it holds that ‖y‖2 = 1 and

‖∆A‖2 = maxz 6=0

‖xyT z‖2

‖A−1‖2‖z‖2= max

z 6=0

|yT z|‖z‖2

‖x‖2

‖A−1‖2=

1‖A−1‖2

,

where the maximum is attained for z = y , e.g.

From

(A + ∆A)y = Ay − xyT y‖A−1‖2

=x

‖A−1‖2− x‖A−1‖2

= 0

we obtain the singularity of A + ∆A. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 19 / 55

Linear systems

Proof ct.

Then it holds that ‖y‖2 = 1 and

‖∆A‖2 = maxz 6=0

‖xyT z‖2

‖A−1‖2‖z‖2= max

z 6=0

|yT z|‖z‖2

‖x‖2

‖A−1‖2=

1‖A−1‖2

,

where the maximum is attained for z = y , e.g.

From

(A + ∆A)y = Ay − xyT y‖A−1‖2

=x

‖A−1‖2− x‖A−1‖2

= 0

we obtain the singularity of A + ∆A. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 19 / 55

Least squares problems

Theorem

Let A = UΣV H be the singular value decomposition of A ∈ Rm×n whereσ1 ≥ σ2 ≥ · · · ≥ σr > σr+1 = · · · = σmin(m,n) = 0. Then it holds that

(i) rank(A) = r ,

(ii) null(A) := {x ∈ Rn : Ax = 0} = span{v r+1, . . . , vn},

(iii) range(A) := {Ax : x ∈ Rn} = span{u1, . . . , ur},

(iv) A =r∑

i=1σiui(v i)T = UrΣr V T

r with Ur = (u1, . . . , ur ),

Vr = (v1, . . . , vr ), Σr = diag(σ1, . . . , σr ),

(v) ‖A‖2S :=

m∑i=1

n∑j=1

a2ij =

r∑i=1

σ2i ,

(vi) ‖A‖2 := σ1.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 20 / 55

Least squares problems

Theorem

Let A = UΣV H be the singular value decomposition of A ∈ Rm×n whereσ1 ≥ σ2 ≥ · · · ≥ σr > σr+1 = · · · = σmin(m,n) = 0. Then it holds that

(i) rank(A) = r ,

(ii) null(A) := {x ∈ Rn : Ax = 0} = span{v r+1, . . . , vn},

(iii) range(A) := {Ax : x ∈ Rn} = span{u1, . . . , ur},

(iv) A =r∑

i=1σiui(v i)T = UrΣr V T

r with Ur = (u1, . . . , ur ),

Vr = (v1, . . . , vr ), Σr = diag(σ1, . . . , σr ),

(v) ‖A‖2S :=

m∑i=1

n∑j=1

a2ij =

r∑i=1

σ2i ,

(vi) ‖A‖2 := σ1.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 20 / 55

Least squares problems

Theorem

Let A = UΣV H be the singular value decomposition of A ∈ Rm×n whereσ1 ≥ σ2 ≥ · · · ≥ σr > σr+1 = · · · = σmin(m,n) = 0. Then it holds that

(i) rank(A) = r ,

(ii) null(A) := {x ∈ Rn : Ax = 0} = span{v r+1, . . . , vn},

(iii) range(A) := {Ax : x ∈ Rn} = span{u1, . . . , ur},

(iv) A =r∑

i=1σiui(v i)T = UrΣr V T

r with Ur = (u1, . . . , ur ),

Vr = (v1, . . . , vr ), Σr = diag(σ1, . . . , σr ),

(v) ‖A‖2S :=

m∑i=1

n∑j=1

a2ij =

r∑i=1

σ2i ,

(vi) ‖A‖2 := σ1.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 20 / 55

Least squares problems

Theorem

Let A = UΣV H be the singular value decomposition of A ∈ Rm×n whereσ1 ≥ σ2 ≥ · · · ≥ σr > σr+1 = · · · = σmin(m,n) = 0. Then it holds that

(i) rank(A) = r ,

(ii) null(A) := {x ∈ Rn : Ax = 0} = span{v r+1, . . . , vn},

(iii) range(A) := {Ax : x ∈ Rn} = span{u1, . . . , ur},

(iv) A =r∑

i=1σiui(v i)T = UrΣr V T

r with Ur = (u1, . . . , ur ),

Vr = (v1, . . . , vr ), Σr = diag(σ1, . . . , σr ),

(v) ‖A‖2S :=

m∑i=1

n∑j=1

a2ij =

r∑i=1

σ2i ,

(vi) ‖A‖2 := σ1.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 20 / 55

Least squares problems

Theorem

Let A = UΣV H be the singular value decomposition of A ∈ Rm×n whereσ1 ≥ σ2 ≥ · · · ≥ σr > σr+1 = · · · = σmin(m,n) = 0. Then it holds that

(i) rank(A) = r ,

(ii) null(A) := {x ∈ Rn : Ax = 0} = span{v r+1, . . . , vn},

(iii) range(A) := {Ax : x ∈ Rn} = span{u1, . . . , ur},

(iv) A =r∑

i=1σiui(v i)T = UrΣr V T

r with Ur = (u1, . . . , ur ),

Vr = (v1, . . . , vr ), Σr = diag(σ1, . . . , σr ),

(v) ‖A‖2S :=

m∑i=1

n∑j=1

a2ij =

r∑i=1

σ2i ,

(vi) ‖A‖2 := σ1.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 20 / 55

Least squares problems

Theorem

Let A = UΣV H be the singular value decomposition of A ∈ Rm×n whereσ1 ≥ σ2 ≥ · · · ≥ σr > σr+1 = · · · = σmin(m,n) = 0. Then it holds that

(i) rank(A) = r ,

(ii) null(A) := {x ∈ Rn : Ax = 0} = span{v r+1, . . . , vn},

(iii) range(A) := {Ax : x ∈ Rn} = span{u1, . . . , ur},

(iv) A =r∑

i=1σiui(v i)T = UrΣr V T

r with Ur = (u1, . . . , ur ),

Vr = (v1, . . . , vr ), Σr = diag(σ1, . . . , σr ),

(v) ‖A‖2S :=

m∑i=1

n∑j=1

a2ij =

r∑i=1

σ2i ,

(vi) ‖A‖2 := σ1.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 20 / 55

Least squares problems

Theorem

Let A = UΣV H be the singular value decomposition of A ∈ Rm×n whereσ1 ≥ σ2 ≥ · · · ≥ σr > σr+1 = · · · = σmin(m,n) = 0. Then it holds that

(i) rank(A) = r ,

(ii) null(A) := {x ∈ Rn : Ax = 0} = span{v r+1, . . . , vn},

(iii) range(A) := {Ax : x ∈ Rn} = span{u1, . . . , ur},

(iv) A =r∑

i=1σiui(v i)T = UrΣr V T

r with Ur = (u1, . . . , ur ),

Vr = (v1, . . . , vr ), Σr = diag(σ1, . . . , σr ),

(v) ‖A‖2S :=

m∑i=1

n∑j=1

a2ij =

r∑i=1

σ2i ,

(vi) ‖A‖2 := σ1.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 20 / 55

Least squares problems

Proof

(i): Multiplication by nonsingular matrices UT und V does not change the rankof A. Therefore,

rank(A) = rank(Σ) = r .

(ii): From V T v i = ei it follows that

Av i = UΣV T v i = UΣei = 0 for i = r + 1, . . . , n.

Hence,v r+1, . . . , vn ∈ null(A).

dim null(A) = n − r implies that these vectors form a basis of null(A).

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 21 / 55

Least squares problems

Proof

(i): Multiplication by nonsingular matrices UT und V does not change the rankof A. Therefore,

rank(A) = rank(Σ) = r .

(ii): From V T v i = ei it follows that

Av i = UΣV T v i = UΣei = 0 for i = r + 1, . . . , n.

Hence,v r+1, . . . , vn ∈ null(A).

dim null(A) = n − r implies that these vectors form a basis of null(A).

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 21 / 55

Least squares problems

Proof

(i): Multiplication by nonsingular matrices UT und V does not change the rankof A. Therefore,

rank(A) = rank(Σ) = r .

(ii): From V T v i = ei it follows that

Av i = UΣV T v i = UΣei = 0 for i = r + 1, . . . , n.

Hence,v r+1, . . . , vn ∈ null(A).

dim null(A) = n − r implies that these vectors form a basis of null(A).

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 21 / 55

Least squares problems

Proof ct.

(iii): From A = UΣV T we obtain

Range(A) = U · Range(Σ) = U · span(e1, . . . , er ) = span(u1, . . . , ur ).

(iv): Blockmatrix multiplication yields

A = UΣV T =(u1 . . . um

(v1)T

...(vn)T

=r∑

i=1

σiui(v i)T .

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 22 / 55

Least squares problems

Proof ct.

(iii): From A = UΣV T we obtain

Range(A) = U · Range(Σ) = U · span(e1, . . . , er ) = span(u1, . . . , ur ).

(iv): Blockmatrix multiplication yields

A = UΣV T =(u1 . . . um

(v1)T

...(vn)T

=r∑

i=1

σiui(v i)T .

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 22 / 55

Least squares problems

Proof ct.(v): Let A = (a1, . . . , an).

Multiplication by the orthogonal matrix UT does not change the Euclideanlength of a vector. Jence,

‖A‖2S =

n∑i=1

‖ai‖22 =

n∑i=1

‖UT ai‖22 = ‖UT A‖2

S.

Similarly, multiplying the rows of UT A by the orthogonal matrix V from theright does not change their length, from which we get

‖A‖2S = ‖UT ΣV‖2

S = ‖Σ‖2S =

r∑i=1

σ2i .

(vi): ‖A‖2 is a singular value of A, i.e. ‖A‖2 ≤ σ1 (cf. proof of the existencetheorem of the SVD). Thus

‖A‖2 = max{‖Ax‖2 : ‖x‖2 = 1} ≥ ‖Av1‖2 = σ1. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 23 / 55

Least squares problems

Proof ct.(v): Let A = (a1, . . . , an).

Multiplication by the orthogonal matrix UT does not change the Euclideanlength of a vector. Jence,

‖A‖2S =

n∑i=1

‖ai‖22 =

n∑i=1

‖UT ai‖22 = ‖UT A‖2

S.

Similarly, multiplying the rows of UT A by the orthogonal matrix V from theright does not change their length, from which we get

‖A‖2S = ‖UT ΣV‖2

S = ‖Σ‖2S =

r∑i=1

σ2i .

(vi): ‖A‖2 is a singular value of A, i.e. ‖A‖2 ≤ σ1 (cf. proof of the existencetheorem of the SVD). Thus

‖A‖2 = max{‖Ax‖2 : ‖x‖2 = 1} ≥ ‖Av1‖2 = σ1. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 23 / 55

Least squares problems

Proof ct.(v): Let A = (a1, . . . , an).

Multiplication by the orthogonal matrix UT does not change the Euclideanlength of a vector. Jence,

‖A‖2S =

n∑i=1

‖ai‖22 =

n∑i=1

‖UT ai‖22 = ‖UT A‖2

S.

Similarly, multiplying the rows of UT A by the orthogonal matrix V from theright does not change their length, from which we get

‖A‖2S = ‖UT ΣV‖2

S = ‖Σ‖2S =

r∑i=1

σ2i .

(vi): ‖A‖2 is a singular value of A, i.e. ‖A‖2 ≤ σ1 (cf. proof of the existencetheorem of the SVD). Thus

‖A‖2 = max{‖Ax‖2 : ‖x‖2 = 1} ≥ ‖Av1‖2 = σ1. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 23 / 55

Least squares problems

Proof ct.(v): Let A = (a1, . . . , an).

Multiplication by the orthogonal matrix UT does not change the Euclideanlength of a vector. Jence,

‖A‖2S =

n∑i=1

‖ai‖22 =

n∑i=1

‖UT ai‖22 = ‖UT A‖2

S.

Similarly, multiplying the rows of UT A by the orthogonal matrix V from theright does not change their length, from which we get

‖A‖2S = ‖UT ΣV‖2

S = ‖Σ‖2S =

r∑i=1

σ2i .

(vi): ‖A‖2 is a singular value of A, i.e. ‖A‖2 ≤ σ1 (cf. proof of the existencetheorem of the SVD). Thus

‖A‖2 = max{‖Ax‖2 : ‖x‖2 = 1} ≥ ‖Av1‖2 = σ1. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 23 / 55

Least squares problems

Condition of a matrix

Let A = UΣV T be the SVD of a nonsingular matrix A. Then A−1 = VΣ−1UT isthe SVD of A−1, from which we get

‖A‖2 = σ1 and ‖A−1‖2 =1σn

.

Hence, the condition of A with respect to the Euclidean norm is

κ2(A) :=σ1

σn. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 24 / 55

Least squares problems

Condition of a matrix

Let A = UΣV T be the SVD of a nonsingular matrix A. Then A−1 = VΣ−1UT isthe SVD of A−1, from which we get

‖A‖2 = σ1 and ‖A−1‖2 =1σn

.

Hence, the condition of A with respect to the Euclidean norm is

κ2(A) :=σ1

σn. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 24 / 55

Least squares problems

Remark

Let A ∈ R(n,n) have eigenvalues µ1, . . . , µn. Then it follows from Ax i = µix i

|µi |2 =(Ax i)HAx i

(x i)Hx i =(x i)HAT Ax i

(x i)Hx i .

Rayleigh’s principle yields

λmin ≤xHAT Ax

xHx≤ λmax für alle x ∈ Cn, x 6= 0,

where λmin and λmax is the minimal und maximal eigenvalue of AT A,respectively. Hence,

σ1 ≥ |µi | ≥ σn for every i .

For symmetric A it folds that σ1 = |µ1| and σr = |µr |. For non symmetricmatrices this is in general not the case. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 25 / 55

Least squares problems

Remark

Let A ∈ R(n,n) have eigenvalues µ1, . . . , µn. Then it follows from Ax i = µix i

|µi |2 =(Ax i)HAx i

(x i)Hx i =(x i)HAT Ax i

(x i)Hx i .

Rayleigh’s principle yields

λmin ≤xHAT Ax

xHx≤ λmax für alle x ∈ Cn, x 6= 0,

where λmin and λmax is the minimal und maximal eigenvalue of AT A,respectively. Hence,

σ1 ≥ |µi | ≥ σn for every i .

For symmetric A it folds that σ1 = |µ1| and σr = |µr |. For non symmetricmatrices this is in general not the case. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 25 / 55

Least squares problems

Remark

Let A ∈ R(n,n) have eigenvalues µ1, . . . , µn. Then it follows from Ax i = µix i

|µi |2 =(Ax i)HAx i

(x i)Hx i =(x i)HAT Ax i

(x i)Hx i .

Rayleigh’s principle yields

λmin ≤xHAT Ax

xHx≤ λmax für alle x ∈ Cn, x 6= 0,

where λmin and λmax is the minimal und maximal eigenvalue of AT A,respectively. Hence,

σ1 ≥ |µi | ≥ σn for every i .

For symmetric A it folds that σ1 = |µ1| and σr = |µr |. For non symmetricmatrices this is in general not the case. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 25 / 55

Least squares problems

Numerical computation

The singular values of A are the square roots of the eigenvalues of AT A.Hence, in principle the SVD of A can be determined with any eigensolver.

To this end one has to evaluate AT A and AAT which is costly and whichdeteriorates the condition number considerably.

Actually, one uses an algorithm of Golub and Reinsch (1971), which takesadvantage of the QR algorithm for computing the eigenvalues of AT A, butwhich avoids the explicit computation of AT A and AAT . �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 26 / 55

Least squares problems

Numerical computation

The singular values of A are the square roots of the eigenvalues of AT A.Hence, in principle the SVD of A can be determined with any eigensolver.

To this end one has to evaluate AT A and AAT which is costly and whichdeteriorates the condition number considerably.

Actually, one uses an algorithm of Golub and Reinsch (1971), which takesadvantage of the QR algorithm for computing the eigenvalues of AT A, butwhich avoids the explicit computation of AT A and AAT . �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 26 / 55

Least squares problems

Numerical computation

The singular values of A are the square roots of the eigenvalues of AT A.Hence, in principle the SVD of A can be determined with any eigensolver.

To this end one has to evaluate AT A and AAT which is costly and whichdeteriorates the condition number considerably.

Actually, one uses an algorithm of Golub and Reinsch (1971), which takesadvantage of the QR algorithm for computing the eigenvalues of AT A, butwhich avoids the explicit computation of AT A and AAT . �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 26 / 55

Least squares problems

Data compression

The singular value decomposition can be used for data compression. This isbased upon the following theorem:

TheoremLet A = UΣV T be the singular value decomposition of A ∈ Rm×n, and letU = (u1, . . . , um) and V = (v1, . . . , vn).

Then for k < n

Ak :=k∑

j=1

σjuj(v j)T

is the best approximation of A with rank(Ak ) = k with respect to the spectralnorm, and it holds that

‖A − Ak‖2 = σk+1.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 27 / 55

Least squares problems

Data compression

The singular value decomposition can be used for data compression. This isbased upon the following theorem:

TheoremLet A = UΣV T be the singular value decomposition of A ∈ Rm×n, and letU = (u1, . . . , um) and V = (v1, . . . , vn).

Then for k < n

Ak :=k∑

j=1

σjuj(v j)T

is the best approximation of A with rank(Ak ) = k with respect to the spectralnorm, and it holds that

‖A − Ak‖2 = σk+1.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 27 / 55

Least squares problems

Data compression

The singular value decomposition can be used for data compression. This isbased upon the following theorem:

TheoremLet A = UΣV T be the singular value decomposition of A ∈ Rm×n, and letU = (u1, . . . , um) and V = (v1, . . . , vn).

Then for k < n

Ak :=k∑

j=1

σjuj(v j)T

is the best approximation of A with rank(Ak ) = k with respect to the spectralnorm, and it holds that

‖A − Ak‖2 = σk+1.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 27 / 55

Least squares problems

ProofIt holds that

‖A − Ak‖2 = ‖n∑

j=k+1

σjuj(v j)T‖2

= ‖Udiag{0, . . . , 0, σk+1, . . . , σn}V T‖2 = σk+1,

and it remains to show, that there does not exist a matrix of rank k , thedistance to A of which is less than σk+1.

Let B be any matrix with rank(B) = k . Then the dimension of the null space ofB is n − k . The dimension of span{v1, . . . , vk+1} is k + 1, and therefore theintersection of these two spaces contains a nontrivial vector w with ‖w‖2 = 1.

Hence,

‖A − B‖22 ≥ ‖(A − B)w‖2

2 = ‖Aw‖22

= ‖UΣV T w‖22 = ‖Σ(V T w)‖2

2

≥ σ2k+1‖V T w‖2

2 = σ2k+1. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 28 / 55

Least squares problems

ProofIt holds that

‖A − Ak‖2 = ‖n∑

j=k+1

σjuj(v j)T‖2

= ‖Udiag{0, . . . , 0, σk+1, . . . , σn}V T‖2 = σk+1,

and it remains to show, that there does not exist a matrix of rank k , thedistance to A of which is less than σk+1.

Let B be any matrix with rank(B) = k . Then the dimension of the null space ofB is n − k . The dimension of span{v1, . . . , vk+1} is k + 1, and therefore theintersection of these two spaces contains a nontrivial vector w with ‖w‖2 = 1.

Hence,

‖A − B‖22 ≥ ‖(A − B)w‖2

2 = ‖Aw‖22

= ‖UΣV T w‖22 = ‖Σ(V T w)‖2

2

≥ σ2k+1‖V T w‖2

2 = σ2k+1. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 28 / 55

Least squares problems

ProofIt holds that

‖A − Ak‖2 = ‖n∑

j=k+1

σjuj(v j)T‖2

= ‖Udiag{0, . . . , 0, σk+1, . . . , σn}V T‖2 = σk+1,

and it remains to show, that there does not exist a matrix of rank k , thedistance to A of which is less than σk+1.

Let B be any matrix with rank(B) = k . Then the dimension of the null space ofB is n − k . The dimension of span{v1, . . . , vk+1} is k + 1, and therefore theintersection of these two spaces contains a nontrivial vector w with ‖w‖2 = 1.

Hence,

‖A − B‖22 ≥ ‖(A − B)w‖2

2 = ‖Aw‖22

= ‖UΣV T w‖22 = ‖Σ(V T w)‖2

2

≥ σ2k+1‖V T w‖2

2 = σ2k+1. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 28 / 55

Least squares problems

Data compression

Let A ∈ R(m,n) be a matrix the elements aij of which are color values of pixelsof a picture.

If A = UΣV T is the singular value decomposition of A, then

Ak =k∑

j=1

σjuj(v j)T , k = 1, . . . , min(n, m)

is an approximation to A. The storage of Ak requires onlyk ∗ (n + m + 1)/(n ∗ m) memory cells whereas A requires mn.

Notice that that using the SVD in this manner is a very simple way of datacompression. There are algorithm in image processing which are much lesscostly.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 29 / 55

Least squares problems

Data compression

Let A ∈ R(m,n) be a matrix the elements aij of which are color values of pixelsof a picture.

If A = UΣV T is the singular value decomposition of A, then

Ak =k∑

j=1

σjuj(v j)T , k = 1, . . . , min(n, m)

is an approximation to A. The storage of Ak requires onlyk ∗ (n + m + 1)/(n ∗ m) memory cells whereas A requires mn.

Notice that that using the SVD in this manner is a very simple way of datacompression. There are algorithm in image processing which are much lesscostly.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 29 / 55

Least squares problems

Data compression

Let A ∈ R(m,n) be a matrix the elements aij of which are color values of pixelsof a picture.

If A = UΣV T is the singular value decomposition of A, then

Ak =k∑

j=1

σjuj(v j)T , k = 1, . . . , min(n, m)

is an approximation to A. The storage of Ak requires onlyk ∗ (n + m + 1)/(n ∗ m) memory cells whereas A requires mn.

Notice that that using the SVD in this manner is a very simple way of datacompression. There are algorithm in image processing which are much lesscostly.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 29 / 55

Least squares problems

Example

Original

Figure: Original

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 30 / 55

Least squares problems

Example ct.

k=5; 2.6%

Figure: Compression: k = 5

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 31 / 55

Least squares problems

Example ct.

k=10; 5.3%

Figure: Compression: k = 10

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 32 / 55

Least squares problems

Example ct.

k=20; 10.5%

Figure: Compression: k = 20

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 33 / 55

Least squares problems

Pseudoinverse

Consider the linear least squares problem

Let A ∈ R(m,n) and b ∈ Rm with m ≥ n.

Find x ∈ Rn such that

‖Ax − b‖2 = min! (1)

We examine this problem taking advantage of the singular valuedecomposition.

In the following we denote with σ1 ≥ σ2 ≥ · · · ≥ σr > 0 = σr+1 = · · · = σn = 0the singular values of A. A = UΣV T is the singular value decomposition of A,and uj and vk are the left and right singular vectors, respectively, i.e. thecolumns of U and V .

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 34 / 55

Least squares problems

Pseudoinverse

Consider the linear least squares problem

Let A ∈ R(m,n) and b ∈ Rm with m ≥ n.

Find x ∈ Rn such that

‖Ax − b‖2 = min! (1)

We examine this problem taking advantage of the singular valuedecomposition.

In the following we denote with σ1 ≥ σ2 ≥ · · · ≥ σr > 0 = σr+1 = · · · = σn = 0the singular values of A. A = UΣV T is the singular value decomposition of A,and uj and vk are the left and right singular vectors, respectively, i.e. thecolumns of U and V .

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 34 / 55

Least squares problems

Pseudoinverse ct.

TheoremLet c := UT b ∈ Rm.

The set of solutions of the linear least squares problem is

L = x + null(A), (2)

where x is the following particular solution (1):

x :=r∑

i=1

ci

σiv i . (3)

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 35 / 55

Least squares problems

Pseudoinverse

Multiplying a vector by an orthogonal matrix does not change its length.Hence, with z := V T x it holds that

‖Ax − b‖22 = ‖UT (Ax − b)‖2

2 = ‖ΣV T x − UT b‖22

= ‖Σz − c‖22 = ‖(σ1z1 − c1, . . . , σr zr − cr ,−cr+1, . . . ,−cm)T‖2

2.

Therefore, the solution of problem (1) reads: zi := ciσi

, i = 1, . . . , r , und zi ∈ Rbeliebig für i = r + 1, . . . , n, i.e.

x =r∑

i=1

ci

σiv i +

n∑i=r+1

ziv i , zi ∈ R, i = r + 1, . . . , n. (4)

Since the tailing n − r columns of V span the null space of A, the set L ofsolutions of problem (1) has the form (2), (3). �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 36 / 55

Least squares problems

Pseudoinverse

Multiplying a vector by an orthogonal matrix does not change its length.Hence, with z := V T x it holds that

‖Ax − b‖22 = ‖UT (Ax − b)‖2

2 = ‖ΣV T x − UT b‖22

= ‖Σz − c‖22 = ‖(σ1z1 − c1, . . . , σr zr − cr ,−cr+1, . . . ,−cm)T‖2

2.

Therefore, the solution of problem (1) reads: zi := ciσi

, i = 1, . . . , r , und zi ∈ Rbeliebig für i = r + 1, . . . , n, i.e.

x =r∑

i=1

ci

σiv i +

n∑i=r+1

ziv i , zi ∈ R, i = r + 1, . . . , n. (4)

Since the tailing n − r columns of V span the null space of A, the set L ofsolutions of problem (1) has the form (2), (3). �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 36 / 55

Least squares problems

Pseudoinverse

Multiplying a vector by an orthogonal matrix does not change its length.Hence, with z := V T x it holds that

‖Ax − b‖22 = ‖UT (Ax − b)‖2

2 = ‖ΣV T x − UT b‖22

= ‖Σz − c‖22 = ‖(σ1z1 − c1, . . . , σr zr − cr ,−cr+1, . . . ,−cm)T‖2

2.

Therefore, the solution of problem (1) reads: zi := ciσi

, i = 1, . . . , r , und zi ∈ Rbeliebig für i = r + 1, . . . , n, i.e.

x =r∑

i=1

ci

σiv i +

n∑i=r+1

ziv i , zi ∈ R, i = r + 1, . . . , n. (4)

Since the tailing n − r columns of V span the null space of A, the set L ofsolutions of problem (1) has the form (2), (3). �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 36 / 55

Least squares problems

Pseudonormal solution

This theorem demonstrates again that the linear least squares problem (1)has a unique solution if and only if r = rank(A) = n. We enforce theuniqueness also in the case r < n requiring additionally that the Euclideannorm of the solution is minimal.

DefinitionLet L be the solution set of the linear least squares problem (1).x ∈ L is called pseudonormal solution of (1), if

‖x‖2 ≤ ‖x‖2 for every x ∈ L.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 37 / 55

Least squares problems

Pseudonormal solution

This theorem demonstrates again that the linear least squares problem (1)has a unique solution if and only if r = rank(A) = n. We enforce theuniqueness also in the case r < n requiring additionally that the Euclideannorm of the solution is minimal.

DefinitionLet L be the solution set of the linear least squares problem (1).x ∈ L is called pseudonormal solution of (1), if

‖x‖2 ≤ ‖x‖2 for every x ∈ L.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 37 / 55

Least squares problems

Pseudonormal solution ct.

The representation (4) of the general solution of (1) yields that x in (3) is thepseudonormal solution of (1):∥∥∥∥∥x +

n∑i=r+1

ziv i

∥∥∥∥∥ 22 = ‖x‖2

2 +n∑

i=r+1

|zi |2 · ‖v i‖22 ≥ ‖x‖2

2.

The pseudonormal solution is unique, and x obviously is the only solution of(1) with x ∈ null(A)⊥ ∩ L. Hence, we obtained

SatzThere exists a unique pseudonormal solution x of problem (1) which ischaracterized by

x ∈ null(A)⊥ ∩ L.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 38 / 55

Least squares problems

Pseudonormal solution ct.

The representation (4) of the general solution of (1) yields that x in (3) is thepseudonormal solution of (1):∥∥∥∥∥x +

n∑i=r+1

ziv i

∥∥∥∥∥ 22 = ‖x‖2

2 +n∑

i=r+1

|zi |2 · ‖v i‖22 ≥ ‖x‖2

2.

The pseudonormal solution is unique, and x obviously is the only solution of(1) with x ∈ null(A)⊥ ∩ L. Hence, we obtained

SatzThere exists a unique pseudonormal solution x of problem (1) which ischaracterized by

x ∈ null(A)⊥ ∩ L.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 38 / 55

Least squares problems

Pseudonormal solution ct.

The representation (4) of the general solution of (1) yields that x in (3) is thepseudonormal solution of (1):∥∥∥∥∥x +

n∑i=r+1

ziv i

∥∥∥∥∥ 22 = ‖x‖2

2 +n∑

i=r+1

|zi |2 · ‖v i‖22 ≥ ‖x‖2

2.

The pseudonormal solution is unique, and x obviously is the only solution of(1) with x ∈ null(A)⊥ ∩ L. Hence, we obtained

SatzThere exists a unique pseudonormal solution x of problem (1) which ischaracterized by

x ∈ null(A)⊥ ∩ L.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 38 / 55

Least squares problems

Pseudoinverse

For every A ∈ R(m,n)

Rm 3 b 7→ x ∈ Rn : ‖Ax − b‖2 ≤ ‖Ax − b‖2 ∀x ∈ Rn, ‖x‖2 minimal

defines a mapping which obviously is linear (cf. the representation of x in (3)).Therefore, it can be representated as a matrix A† ∈ R(n,m).

DefinitionFor A ∈ R(m,n) the matrix A† ∈ R(n,m), such that x := A†b for every b ∈ Rm isthe pseudonormal solution of the linear least squares problem (1) is calledpseudo inverse ( or Moore-Penrose inverse) of A.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 39 / 55

Least squares problems

Pseudoinverse

For every A ∈ R(m,n)

Rm 3 b 7→ x ∈ Rn : ‖Ax − b‖2 ≤ ‖Ax − b‖2 ∀x ∈ Rn, ‖x‖2 minimal

defines a mapping which obviously is linear (cf. the representation of x in (3)).Therefore, it can be representated as a matrix A† ∈ R(n,m).

DefinitionFor A ∈ R(m,n) the matrix A† ∈ R(n,m), such that x := A†b for every b ∈ Rm isthe pseudonormal solution of the linear least squares problem (1) is calledpseudo inverse ( or Moore-Penrose inverse) of A.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 39 / 55

Least squares problems

Pseudo inverse

If rank(A) = n and m ≥ n, then the least squares problem (1) is uniquelysolvable, and it follows from the normal equations that the solution isx = (AT A)−1AT b.Hence, in this case A† = (AT A)−1AT .

If n = m and A is nonsingular, then it holds that A† = A−1.Hence, the pseudo inverse is the usual inverse, if this one exists, and thepseudo inverse is consistent extension of the inverse. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 40 / 55

Least squares problems

Pseudo inverse

If rank(A) = n and m ≥ n, then the least squares problem (1) is uniquelysolvable, and it follows from the normal equations that the solution isx = (AT A)−1AT b.Hence, in this case A† = (AT A)−1AT .

If n = m and A is nonsingular, then it holds that A† = A−1.Hence, the pseudo inverse is the usual inverse, if this one exists, and thepseudo inverse is consistent extension of the inverse. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 40 / 55

Least squares problems

Pseudo inverse ct.

Theoremlet A ∈ R(m,n) and

A = UΣV T , Σ = (σiδij)i,j

its singular value decomposition

Then it holds that

(i) Σ† = (τiδij)j,i , τi =

{σ−1

i , falls σi 6= 00, falls σi = 0

,

(ii) A† = VΣ†UT .

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 41 / 55

Least squares problems

Pseudo inverse ct.

Theoremlet A ∈ R(m,n) and

A = UΣV T , Σ = (σiδij)i,j

its singular value decomposition

Then it holds that

(i) Σ† = (τiδij)j,i , τi =

{σ−1

i , falls σi 6= 00, falls σi = 0

,

(ii) A† = VΣ†UT .

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 41 / 55

Least squares problems

Pseudo inverse ct.

Theoremlet A ∈ R(m,n) and

A = UΣV T , Σ = (σiδij)i,j

its singular value decomposition

Then it holds that

(i) Σ† = (τiδij)j,i , τi =

{σ−1

i , falls σi 6= 00, falls σi = 0

,

(ii) A† = VΣ†UT .

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 41 / 55

Least squares problems

Pseudo inverse ct.RemarkThe explicit representation of the pseudo inverse is needed only for theoreticalconsiderations and is never computed explicitly (similarly to the inverse of anonsingular matrix).

CorollaryFor every matrix A ∈ R(m,n) it holds THAT

A†† = A

and(A†)T = (AT )†.

A† has the well known properties of the inverse A−1 of a nonsingular matrix Awith the only exception that in general

(AB)† 6= B†A†.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 42 / 55

Least squares problems

Pseudo inverse ct.RemarkThe explicit representation of the pseudo inverse is needed only for theoreticalconsiderations and is never computed explicitly (similarly to the inverse of anonsingular matrix).

CorollaryFor every matrix A ∈ R(m,n) it holds THAT

A†† = A

and(A†)T = (AT )†.

A† has the well known properties of the inverse A−1 of a nonsingular matrix Awith the only exception that in general

(AB)† 6= B†A†.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 42 / 55

Least squares problems

Pseudo inverse ct.RemarkThe explicit representation of the pseudo inverse is needed only for theoreticalconsiderations and is never computed explicitly (similarly to the inverse of anonsingular matrix).

CorollaryFor every matrix A ∈ R(m,n) it holds THAT

A†† = A

and(A†)T = (AT )†.

A† has the well known properties of the inverse A−1 of a nonsingular matrix Awith the only exception that in general

(AB)† 6= B†A†.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 42 / 55

Least squares problems

Example

Let

A =

(1 −10 0

)= I

(√2 0

0 0

)1√2

(1 −11 1

),

Its pseudo inverse is

A† =1√2

(1 1−1 1

) (1/√

2 00 0

)I =

12

(1 0−1 0

).

Then A2 = A and (A†)2 = 12 A†, i.e. (A2)† 6= (A†)2. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 43 / 55

Least squares problems

Example

Let

A =

(1 −10 0

)= I

(√2 0

0 0

)1√2

(1 −11 1

),

Its pseudo inverse is

A† =1√2

(1 1−1 1

) (1/√

2 00 0

)I =

12

(1 0−1 0

).

Then A2 = A and (A†)2 = 12 A†, i.e. (A2)† 6= (A†)2. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 43 / 55

Least squares problems

Example

Let

A =

(1 −10 0

)= I

(√2 0

0 0

)1√2

(1 −11 1

),

Its pseudo inverse is

A† =1√2

(1 1−1 1

) (1/√

2 00 0

)I =

12

(1 0−1 0

).

Then A2 = A and (A†)2 = 12 A†, i.e. (A2)† 6= (A†)2. �

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 43 / 55

Least squares problems

Perturbation of least squares problems

Consider the linear least squares problem

‖Ax − b‖2 = min! (1)

with A ∈ R(m,n), rank(A) = r , and a perturbed problem

‖A(x + ∆x)− (b + ∆b)‖2 = min!, (2)

where we incorporate only perturbations of the right hand side b, but not ofthe system matrix A.

Let x = A†b and x + ∆x = A†(b + ∆b) the pseudo normal solution of (1) and(2), respectively.

Then ∆x = A†∆b, and from ‖A†‖2 = 1σr

it follows

‖∆x‖2 ≤ ‖A†‖2 · ‖∆b‖2 =1σr‖∆b‖2.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 44 / 55

Least squares problems

Perturbation of least squares problems

Consider the linear least squares problem

‖Ax − b‖2 = min! (1)

with A ∈ R(m,n), rank(A) = r , and a perturbed problem

‖A(x + ∆x)− (b + ∆b)‖2 = min!, (2)

where we incorporate only perturbations of the right hand side b, but not ofthe system matrix A.

Let x = A†b and x + ∆x = A†(b + ∆b) the pseudo normal solution of (1) and(2), respectively.

Then ∆x = A†∆b, and from ‖A†‖2 = 1σr

it follows

‖∆x‖2 ≤ ‖A†‖2 · ‖∆b‖2 =1σr‖∆b‖2.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 44 / 55

Least squares problems

Perturbation of least squares problems

Consider the linear least squares problem

‖Ax − b‖2 = min! (1)

with A ∈ R(m,n), rank(A) = r , and a perturbed problem

‖A(x + ∆x)− (b + ∆b)‖2 = min!, (2)

where we incorporate only perturbations of the right hand side b, but not ofthe system matrix A.

Let x = A†b and x + ∆x = A†(b + ∆b) the pseudo normal solution of (1) and(2), respectively.

Then ∆x = A†∆b, and from ‖A†‖2 = 1σr

it follows

‖∆x‖2 ≤ ‖A†‖2 · ‖∆b‖2 =1σr‖∆b‖2.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 44 / 55

Least squares problems

Perturbation of least squares problems

It holds that

‖x‖22 =

r∑i=1

c2i

σ2i≥ 1

σ21

r∑i=1

c2i =

1σ2

1

∥∥∥∥∥r∑

i=1

ciui

∥∥∥∥∥2

2

.

Obviously,r∑

i=1ciui is the projection of b to the range of A. Therefore it follows

for the relative error

‖∆x‖2

‖x‖2≤ σ1

σr· ‖∆b‖2

‖Prange(A)b‖2. (3)

This inequality specifies, how the relative error of the right hand side of alinear least squares problem effects the solution of the problem

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 45 / 55

Least squares problems

Perturbation of least squares problems

It holds that

‖x‖22 =

r∑i=1

c2i

σ2i≥ 1

σ21

r∑i=1

c2i =

1σ2

1

∥∥∥∥∥r∑

i=1

ciui

∥∥∥∥∥2

2

.

Obviously,r∑

i=1ciui is the projection of b to the range of A. Therefore it follows

for the relative error

‖∆x‖2

‖x‖2≤ σ1

σr· ‖∆b‖2

‖Prange(A)b‖2. (3)

This inequality specifies, how the relative error of the right hand side of alinear least squares problem effects the solution of the problem

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 45 / 55

Least squares problems

Perturbation of least squares problems

It holds that

‖x‖22 =

r∑i=1

c2i

σ2i≥ 1

σ21

r∑i=1

c2i =

1σ2

1

∥∥∥∥∥r∑

i=1

ciui

∥∥∥∥∥2

2

.

Obviously,r∑

i=1ciui is the projection of b to the range of A. Therefore it follows

for the relative error

‖∆x‖2

‖x‖2≤ σ1

σr· ‖∆b‖2

‖Prange(A)b‖2. (3)

This inequality specifies, how the relative error of the right hand side of alinear least squares problem effects the solution of the problem

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 45 / 55

Least squares problems

condition

DefinitionFor A ∈ R(m,n) let A = UΣV T be the singular value decomposition, and letrank(A) = r . Then κ2(A) := σ1

σris called condition of A.

If A ∈ R(n,n) is nonsingular then this definition coincides with the one withrespect to the Euclidean norm for quadratic matrices given before.

κ2(AT A) = κ2(A)2

Hence, the normal equation of a linear least squares problem are much worseconditioned than the system matrix of the problem.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 46 / 55

Least squares problems

condition

DefinitionFor A ∈ R(m,n) let A = UΣV T be the singular value decomposition, and letrank(A) = r . Then κ2(A) := σ1

σris called condition of A.

If A ∈ R(n,n) is nonsingular then this definition coincides with the one withrespect to the Euclidean norm for quadratic matrices given before.

κ2(AT A) = κ2(A)2

Hence, the normal equation of a linear least squares problem are much worseconditioned than the system matrix of the problem.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 46 / 55

Least squares problems

condition

DefinitionFor A ∈ R(m,n) let A = UΣV T be the singular value decomposition, and letrank(A) = r . Then κ2(A) := σ1

σris called condition of A.

If A ∈ R(n,n) is nonsingular then this definition coincides with the one withrespect to the Euclidean norm for quadratic matrices given before.

κ2(AT A) = κ2(A)2

Hence, the normal equation of a linear least squares problem are much worseconditioned than the system matrix of the problem.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 46 / 55

Least squares problems

Perturbed least squares problemsFor perturbations of the system matrix the following theorem holds

TheoremAssume that A ∈ R(m,n), m ≥ n, is not deficient, i.e. rank(A) = n. Let x be thesolution of the least squares problem (1) and x be the solution of theperturbed problem

‖(A + ∆A)x − (b + ∆b)‖2 = min!, (4)

where

ε := max(‖∆A‖2

‖A‖2,‖∆b‖2

‖b‖2

)<

1κ2(A)

=σn(A)

σ1(A). (5)

Then it holds that

‖x − x‖2

‖x‖2≤ ε

(2κ2(A)

cos θ+ tan θ · κ2

2(A)

)+ O(ε2), (6)

where θ is the angle between b and its projection to range(A).

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 47 / 55

Least squares problems

Perturbed least squares problemsFor perturbations of the system matrix the following theorem holds

TheoremAssume that A ∈ R(m,n), m ≥ n, is not deficient, i.e. rank(A) = n. Let x be thesolution of the least squares problem (1) and x be the solution of theperturbed problem

‖(A + ∆A)x − (b + ∆b)‖2 = min!, (4)

where

ε := max(‖∆A‖2

‖A‖2,‖∆b‖2

‖b‖2

)<

1κ2(A)

=σn(A)

σ1(A). (5)

Then it holds that

‖x − x‖2

‖x‖2≤ ε

(2κ2(A)

cos θ+ tan θ · κ2

2(A)

)+ O(ε2), (6)

where θ is the angle between b and its projection to range(A).

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 47 / 55

Least squares problems

Perturbed least squares problemsFor perturbations of the system matrix the following theorem holds

TheoremAssume that A ∈ R(m,n), m ≥ n, is not deficient, i.e. rank(A) = n. Let x be thesolution of the least squares problem (1) and x be the solution of theperturbed problem

‖(A + ∆A)x − (b + ∆b)‖2 = min!, (4)

where

ε := max(‖∆A‖2

‖A‖2,‖∆b‖2

‖b‖2

)<

1κ2(A)

=σn(A)

σ1(A). (5)

Then it holds that

‖x − x‖2

‖x‖2≤ ε

(2κ2(A)

cos θ+ tan θ · κ2

2(A)

)+ O(ε2), (6)

where θ is the angle between b and its projection to range(A).TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 47 / 55

Regularization

Example

Consider the orthogonal projection of a given function f : [0, 1] → R to thespace Πn−1 of polynomial of maximum degree n − 1 with respect to the scalarproduct

〈f , g〉 :=

∫ 1

0f (x)g(x) dx .

Choosing the basis {1, x , . . . , xn−1} one obtains the linear system

Ay = b (1)

withA = (aij)i,j=1,...,n, aij :=

1i + j − 1

, (2)

the so called Hilbert Matrix, and b ∈ Rn, bi := 〈f , x i−1〉.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 48 / 55

Regularization

Example

Consider the orthogonal projection of a given function f : [0, 1] → R to thespace Πn−1 of polynomial of maximum degree n − 1 with respect to the scalarproduct

〈f , g〉 :=

∫ 1

0f (x)g(x) dx .

Choosing the basis {1, x , . . . , xn−1} one obtains the linear system

Ay = b (1)

withA = (aij)i,j=1,...,n, aij :=

1i + j − 1

, (2)

the so called Hilbert Matrix, and b ∈ Rn, bi := 〈f , x i−1〉.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 48 / 55

Regularization

Example ct.

For dimensions n = 10, n = 20 and n = 40 we choose the right hand side of(1) such that y = (1, . . . , 1)T is the unique solution, and we solve the resultingsystem by the known methods.

The LU factorization with column pivoting (in MATLAB A\b), the Choleskyfactorization, the QR decomposition of A and the singular value decompositionof A yield the following errors with respect to the Euclidean norm:

n = 10 n = 20 n = 40LU factorization 5.24 E-4 8.25 E+1 3.78 E+2Cholesky 7.15 E-4 numer. nicht pos. def.QR decomposition 1.41 E-3 1.67 E+2 1.46 E+3SVD 8.24 E-4 3.26 E+2 8.35 E+2

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 49 / 55

Regularization

Example ct.

For dimensions n = 10, n = 20 and n = 40 we choose the right hand side of(1) such that y = (1, . . . , 1)T is the unique solution, and we solve the resultingsystem by the known methods.

The LU factorization with column pivoting (in MATLAB A\b), the Choleskyfactorization, the QR decomposition of A and the singular value decompositionof A yield the following errors with respect to the Euclidean norm:

n = 10 n = 20 n = 40LU factorization 5.24 E-4 8.25 E+1 3.78 E+2Cholesky 7.15 E-4 numer. nicht pos. def.QR decomposition 1.41 E-3 1.67 E+2 1.46 E+3SVD 8.24 E-4 3.26 E+2 8.35 E+2

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 49 / 55

Regularization

Example ct.

For dimensions n = 10, n = 20 and n = 40 we choose the right hand side of(1) such that y = (1, . . . , 1)T is the unique solution, and we solve the resultingsystem by the known methods.

The LU factorization with column pivoting (in MATLAB A\b), the Choleskyfactorization, the QR decomposition of A and the singular value decompositionof A yield the following errors with respect to the Euclidean norm:

n = 10 n = 20 n = 40LU factorization 5.24 E-4 8.25 E+1 3.78 E+2Cholesky 7.15 E-4 numer. nicht pos. def.QR decomposition 1.41 E-3 1.67 E+2 1.46 E+3SVD 8.24 E-4 3.26 E+2 8.35 E+2

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 49 / 55

Regularization

Example ct.

A similar behavior is observed for the least square problem. For n = 10,n = 20 and n = 40 and m = n + 10 we consider the least squares problem

‖Ax − b‖2 = min!

with the Hilbert matrix A ∈ R(m,n), where b is chosen such that x = (1, . . . , 1)T

solves the problem with residual Ax − b = 0.

n = 10 n = 20 n = 40Normal equations 2.91 E+2 2.40 E+2 8.21 E+2QR factorization 1.93 E-5 5.04 E+0 1.08 E+1SVD 4.67 E-5 6.41 E+1 3.72 E+2

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 50 / 55

Regularization

Example ct.

A similar behavior is observed for the least square problem. For n = 10,n = 20 and n = 40 and m = n + 10 we consider the least squares problem

‖Ax − b‖2 = min!

with the Hilbert matrix A ∈ R(m,n), where b is chosen such that x = (1, . . . , 1)T

solves the problem with residual Ax − b = 0.

n = 10 n = 20 n = 40Normal equations 2.91 E+2 2.40 E+2 8.21 E+2QR factorization 1.93 E-5 5.04 E+0 1.08 E+1SVD 4.67 E-5 6.41 E+1 3.72 E+2

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 50 / 55

Regularization

Regularization

For badly conditioned least squares problem or linear systems the followingapproach can yield reliable solutions:

Determine the singular value decomposition A = UΣVT of A, anddefine

Σ†τ = diag(ηiδji), ηi :=

{σ−1

i falls σi ≥ τ,0 sonst,

where τ > 0 is a given threshold, and

A†τ := VΣ†τ UT , xτ := A†τ b.

A†τ is called effektive pseudo inverse of A. This method of approximativelysolving Ax = b is called regularization by truncation

Very small singular values are disposed. One solves instead of Ax = b thelinear system Axτ = Pb, where P is the orthogonal projection to the subspacespan{ui : σi ≥ τ}.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 51 / 55

Regularization

Regularization

For badly conditioned least squares problem or linear systems the followingapproach can yield reliable solutions:

Determine the singular value decomposition A = UΣVT of A, anddefine

Σ†τ = diag(ηiδji), ηi :=

{σ−1

i falls σi ≥ τ,0 sonst,

where τ > 0 is a given threshold, and

A†τ := VΣ†τ UT , xτ := A†τ b.

A†τ is called effektive pseudo inverse of A. This method of approximativelysolving Ax = b is called regularization by truncation

Very small singular values are disposed. One solves instead of Ax = b thelinear system Axτ = Pb, where P is the orthogonal projection to the subspacespan{ui : σi ≥ τ}.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 51 / 55

Regularization

Regularization

For badly conditioned least squares problem or linear systems the followingapproach can yield reliable solutions:

Determine the singular value decomposition A = UΣVT of A, anddefine

Σ†τ = diag(ηiδji), ηi :=

{σ−1

i falls σi ≥ τ,0 sonst,

where τ > 0 is a given threshold, and

A†τ := VΣ†τ UT , xτ := A†τ b.

A†τ is called effektive pseudo inverse of A. This method of approximativelysolving Ax = b is called regularization by truncation

Very small singular values are disposed. One solves instead of Ax = b thelinear system Axτ = Pb, where P is the orthogonal projection to the subspacespan{ui : σi ≥ τ}.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 51 / 55

Regularization

Regularization

For badly conditioned least squares problem or linear systems the followingapproach can yield reliable solutions:

Determine the singular value decomposition A = UΣVT of A, anddefine

Σ†τ = diag(ηiδji), ηi :=

{σ−1

i falls σi ≥ τ,0 sonst,

where τ > 0 is a given threshold, and

A†τ := VΣ†τ UT , xτ := A†τ b.

A†τ is called effektive pseudo inverse of A. This method of approximativelysolving Ax = b is called regularization by truncation

Very small singular values are disposed. One solves instead of Ax = b thelinear system Axτ = Pb, where P is the orthogonal projection to the subspacespan{ui : σi ≥ τ}.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 51 / 55

Regularization

Tichonov regularization

The most prominent way of regularization was introduced independently byPhilips (1962) and Tichonov (1963), and is called Tichonov Regularization.

Here small singular values are not discarded but their influence on thesolution is damped.

Instead of Ax = b one solves the linear system

(AT A + αIn)x = AT b (4)

where α > 0 is a suitable regularization parameter.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 52 / 55

Regularization

Tichonov regularization

The most prominent way of regularization was introduced independently byPhilips (1962) and Tichonov (1963), and is called Tichonov Regularization.

Here small singular values are not discarded but their influence on thesolution is damped.

Instead of Ax = b one solves the linear system

(AT A + αIn)x = AT b (4)

where α > 0 is a suitable regularization parameter.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 52 / 55

Regularization

Tichonov regularization

The most prominent way of regularization was introduced independently byPhilips (1962) and Tichonov (1963), and is called Tichonov Regularization.

Here small singular values are not discarded but their influence on thesolution is damped.

Instead of Ax = b one solves the linear system

(AT A + αIn)x = AT b (4)

where α > 0 is a suitable regularization parameter.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 52 / 55

Regularization

Tichonov regularization ct.

Obviously, system (4) is equivalent to

‖Ax − b‖22 + α‖x‖2

2 = min! (5)

(which is the usual representation of the Tichonov regularization) orequivalent to

‖Ax − b‖2 = min, A =

(A√αIn

), b =

(b0

). (6)

This version together with the QR factorization of A was used by Golub (1965)to execute Tichonov’s regularization in a stable way.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 53 / 55

Regularization

Tichonov regularization ct.

Obviously, system (4) is equivalent to

‖Ax − b‖22 + α‖x‖2

2 = min! (5)

(which is the usual representation of the Tichonov regularization) orequivalent to

‖Ax − b‖2 = min, A =

(A√αIn

), b =

(b0

). (6)

This version together with the QR factorization of A was used by Golub (1965)to execute Tichonov’s regularization in a stable way.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 53 / 55

Regularization

Tichonov regularization ct.

Obviously, system (4) is equivalent to

‖Ax − b‖22 + α‖x‖2

2 = min! (5)

(which is the usual representation of the Tichonov regularization) orequivalent to

‖Ax − b‖2 = min, A =

(A√αIn

), b =

(b0

). (6)

This version together with the QR factorization of A was used by Golub (1965)to execute Tichonov’s regularization in a stable way.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 53 / 55

Regularization

Tichonov regularization ct.

FromAT A = AT A + αIn

it follows that A has singular values√

σ2i + α, if σi denote the singular values

of A, and the condition or problem (1) is reduced to√

σ21+α

σ2n+α

.

If β := UT b, then problem (6) is equivalent to

V (ΣT UT UΣ + αIn)V T x = VΣT UT b = VΣT β,

i.e.

x = V (ΣT Σ + αIn)−1ΣT β =n∑

i=1

βiσi

σ2i + α

v i .

Hence, knowing the singular value decomposition of A, the regularizedproblem can be solved for various regularization parameters α.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 54 / 55

Regularization

Tichonov regularization ct.

FromAT A = AT A + αIn

it follows that A has singular values√

σ2i + α, if σi denote the singular values

of A, and the condition or problem (1) is reduced to√

σ21+α

σ2n+α

.

If β := UT b, then problem (6) is equivalent to

V (ΣT UT UΣ + αIn)V T x = VΣT UT b = VΣT β,

i.e.

x = V (ΣT Σ + αIn)−1ΣT β =n∑

i=1

βiσi

σ2i + α

v i .

Hence, knowing the singular value decomposition of A, the regularizedproblem can be solved for various regularization parameters α.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 54 / 55

Regularization

Tichonov regularization ct.

FromAT A = AT A + αIn

it follows that A has singular values√

σ2i + α, if σi denote the singular values

of A, and the condition or problem (1) is reduced to√

σ21+α

σ2n+α

.

If β := UT b, then problem (6) is equivalent to

V (ΣT UT UΣ + αIn)V T x = VΣT UT b = VΣT β,

i.e.

x = V (ΣT Σ + αIn)−1ΣT β =n∑

i=1

βiσi

σ2i + α

v i .

Hence, knowing the singular value decomposition of A, the regularizedproblem can be solved for various regularization parameters α.

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 54 / 55

Regularization

Example

For the linear system (1), (2) one obtains the errors in the following table.

n = 10 n = 20 n = 40Tichonov Cholesky 1.41 E-3 2.03 E-3 3.51 E-3Tichonov QR 3.50 E-6 5.99 E-6 7.54 E-6Tichonov SVD 3.43 E-6 6.33 E-6 9.66 E-6truncated SVD 2.77 E-6 3.92 E-6 7.35 E-6

For the least squares problem one gets

n = 10 n = 20 n = 40Tichonov Cholesky 3.85 E-4 1.19 E-3 2.27 E-3Tichonov QR 2.24 E-7 1.79 E-6 6.24 E-6Tichonov SVD 8.51 E-7 1.61 E-6 3.45 E-6truncated SVD 7.21 E-7 1.94 E-6 7.70 E-6

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 55 / 55

Regularization

Example

For the linear system (1), (2) one obtains the errors in the following table.

n = 10 n = 20 n = 40Tichonov Cholesky 1.41 E-3 2.03 E-3 3.51 E-3Tichonov QR 3.50 E-6 5.99 E-6 7.54 E-6Tichonov SVD 3.43 E-6 6.33 E-6 9.66 E-6truncated SVD 2.77 E-6 3.92 E-6 7.35 E-6

For the least squares problem one gets

n = 10 n = 20 n = 40Tichonov Cholesky 3.85 E-4 1.19 E-3 2.27 E-3Tichonov QR 2.24 E-7 1.79 E-6 6.24 E-6Tichonov SVD 8.51 E-7 1.61 E-6 3.45 E-6truncated SVD 7.21 E-7 1.94 E-6 7.70 E-6

TUHH Heinrich Voss Numerical Linear Algebra Chap. 4: Perturbation and Regularisation 2005 55 / 55