m2n1

50
M2N1 Numerical Analysis Mathematics Imperial College London

Transcript of m2n1

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 1/50

M2N1Numerical Analysis

Mathematics

Imperial College London

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 2/50

ii

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 3/50

CONTENTS iii

Contents

1 Applied Linear Algebra 11.1 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Gram-Schmidt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 QR Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Cauchy-Schwartz inequality . . . . . . . . . . . . . . . . . . . . . . . 91.5 Gradients and Hessians . . . . . . . . . . . . . . . . . . . . . . . . . 101.6 Generalized inner product . . . . . . . . . . . . . . . . . . . . . . . . 141.7 Cholesky Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . 151.8 Least Square Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.8.1 General Least Squares Case . . . . . . . . . . . . . . . . . . . 211.9 A more abstract approach . . . . . . . . . . . . . . . . . . . . . . . . 231.10 Orthogonal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 26

2 Polynomial interpolation 312.1 Divided difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.2 Finding the error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.3 Best Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.4 Piecewise Polynomial Interpolation . . . . . . . . . . . . . . . . . . . 41

3 Quadrature (Numerical Integration) 43

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 4/50

iv CONTENTS

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 5/50

1

Chapter 1

Applied Linear Algebra

1.1 Orthogonality

Definition. Let a, b ∈ Rn. We define the inner product  of  a, b to be inner product 

, a, b = aTb =

ni=1

aibi.

Also define the outer product  of  a and b to be outer product 

abT =

a1...

an

b1 · · · bn

=

a1b1 · · · a1bn

.... . .

...

anb1 · · · anbn

.

Note. Note that:

1. The inner product is symmetric :

a, b =n

i=1

aibi

=n

i=1

biai = b, a

for all a, b ∈ R.

2. The inner product is linear  with respect to the second argument:

a,µb + λc =n

i=1

ai(µbi + λci) = µn

i=1

aibi + λn

i=1

aici

= µa, b + λa, c

for all a,b,c ∈ Rn, µ, λ ∈ R.

3. From 1. and 2. get that the inner product is linear with respect to the second

argument.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 6/50

2 1.1. ORTHOGONALITY

4. Observe that

a, a =n

i=1

a2i ≥ 0.

Definition. Letaa = [a, a]1/2

be the length  (or norm ) of  a.length norm 

Definition. We say that a, b ∈ Rn, a, b = 0 are orthogonal  if  a, b = 0.orthogonal 

Example.

Claim. If  a, b ∈ Rn are orthogonal, then a + b2 = a2 + b2.

Proof.

a + b2 def = a + b, a + b= a + b, a + a + b, b= a2 + b2 + 2a, b= a2 + b2.

Definition. A set of non-trivial vectors {q k}nk=1, q 

k∈ Rn, q 

k= 0 for k = 1 → n is

orthogonal  if  q  j

, q k = 0 for j, k = 1 → n, j = k.

As a useful shorthand, introduce the Kronecker Delta notation δ jk

δ  jk =

1 j = k,0 j = k.

Example. For the n × n identity matrix I  we have I  jk = δ  jk .

Definition. A set of non-trivial vectors {q k}nk=1, q 

k∈ Rn, q 

k= 0 for k = 1 → n is

orthonormal  if orthonormal 

q  j

, q k = δ  jk

for j, k = 1 → n.

Note. A set of vectors is orthonormal if it is orthogonal and each vector has unitlength.

Definition. A set of vectors {ak}nk=1, ak ∈ Rm for k = 1 → n is linearly independent linearly independent  if 

nk=1

ckak = 0

implies ck = 0 for k = 1 → n. The set {ak}nk=1 is linearly dependent  if there existlinearly dependent  coefficients ck ∈ R, k = 1 → n, not all zero, such that

n

k=1

ckak = 0.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 7/50

1.2. GRAM-SCHMIDT 3

Note. Recall that for A =

a1...

an

, ak ∈ Rm for k = 1 → n:

(1) If the only solution to Ac = 0 is c = 0 then {ak}nk=1 are linearly independent.

(2) If there exists c = 0 such that Ac = 0 then {ak}nk=1 are linearly dependent.

(3) Restrict to m = n (so that A is square). If  A−1 exists then rows (columns) of A are linearly independent. If  {ak}nk=1 are linearly independent they form abasis for Rn and each vector x ∈ Rn can be uniquely expressed a combinationof  ai’s.

Lemma 1.1. Let {ak}nk=1, ak ∈ Rm, k = 1 → n, be orthogonal. Then {ak}nk=1 is

linearly independent.

Proof. If n

k=1 ckak = 0 then for 1 ≤ j ≤ n

n

k=1

ckak, a j = 0, a jn

k=1

ckak, a j = 0

c ja j, a j = c ja j2 = 0.

Since a j’s are non-trivial, c j = 0. Repeat for all j = 1 → n.

Remark 1.1. Linear independence does not imply orthogonality. For example take

n = m = 2 and a1 =

20

and a2 =

31

which are clearly linearly independent

but not orthogonal.

1.2 Gram-Schmidt

1 Classical Gram-Schmidt Algorithm (CGS)1: v1 = a1

2: q 1

= v1/v13: for k = 2 to n do4: vk = ak −k−1

 j=1 ak, q  jq 

 j

5: q k

= vk/vk6: end for

Claim. Given {ai}ni=1, ai ∈ Rm, i = 1 → n linearly independent (so n ≤ m), CGSfinds {q 

i}ni=1, q 

i∈ Rm, i = 1 → n, orthogonal, i.e. q 

i, q 

 j = δ ij with Span {ai}ni=1 =

Span{q i}ni=1, i, j = 1 → n.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 8/50

4 1.2. GRAM-SCHMIDT

Proof. Since {ai}ni=1 are linearly independent, ai = 0 for i = 1 → n. For k = 1, weget

q 1

=v1

v1,

q 1 =

1, q 

11/2

=

1

v12v1, v1

1/2

= 1

For k = 2. From the code have

v2 = a2 − a2, q 1q 

1. (†)

Check that v2

is orthogonal to q 1

:

v2, q 1 = a2, q 

1 − a2, q 

1 q 

1, q 

1   

q1

= 0.

Need to check that v2 = 0. If  v2 = 0, then by (†), a2 equals to a2, q 1q 

1, which is

a multiple of  a1; contradiction to linear independence of  {ai}ni=1.Therefore v2 = 0 and q 

2has unit length and is a multiple of  v2 and hence {q 

i}2i=1 is

orthonormal. Clearly Span{ai}2i=1 = Span{q 

i}2i=1.

Assume the statement is true for k − 1, i.e. that {q i}k−1i=1 is orthonormal and

q  j

= linear combination of  {ai} ji=1

a j = linear combination of  {q i} ji=1

for j = 1 → k − 1. (⋆)

Set

vk = ak −k−1i=1

ak, q iq 

i.

Then vk is orthogonal to all q  j

, j = 1 → k − 1:

vk, q  j = q 

k, q 

 j −

k−1

i=1

ak, q i q 

i, q 

 j

    δij

= ak, q  j − ak, q 

 j

= 0.

If  vk = 0 then

ak =k−1i=1

ak, q iq 

i

= linear combination of {q i}k−1i=1

= linear combination of 

{ai

}k−1i=1

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 9/50

1.3. QR FACTORIZATION 5

by (⋆); contradiction to {ai}ni=1 linearly independent.Hence vk = 0. From q 

k=

vkvk get {q 

i}ki=1 orthonormal. Since q 

kis a linear

combination of 

{q i

}k−1i=1 and ak, it is a linear combination of 

{ai

}ki=1 by (⋆). Similarly

by (⋆), ak is a linear combination of {q i}ki=1.Hence the result follows by induction.

1.3 QR Factorization

Look at CGS from different viewpoint. For {ai}ni=1, CGS gives {q i}ni=1 orthonormal.

Let

Am×n

=

a1 . . . an

,

Q̂m×n

= q 1

. . . q  n .

Let R̂n×n

be an upper triangular matrix

R̂lk =

rlk l ≤ k,

0 l > k

and define e(n)k ∈ Rn to be e

(n)k

(e(n)k ) j = δ kj .

for j = 1 → n. Then clearly for any B ∈ Rm×n, Be(n)k = k-th column of  B. From

CGS we have a1 =

v1

1, let r11 =

a1

. Also for k = 1

→n

Ae(n)k = ak = vk +

k−1i=1

ak, q iq 

i

= vkq k

+k−1i=1

ak, q iq 

i

=k

i=1

rikq i

= Q̂R̂e(n)k ,

where rkk =

vk

> 0 and rik =

ak, q 

i

. Hence A = Q̂R̂.

Expressing A ∈ Rm×n as a product of  Q̂ ∈ Rm×n with orthonormal columns andR̂ ∈ Rn×n upper triangular with positive diagonal entries is called the reduced  QR

 factorisation of  A. reduced QR factorisation Now take Q ∈ Rm×m

Q =

Q̂ q n+1

. . . q  m

with q 

n+1, . . . , q  

mchosen so that the columns of  Q are orthonormal and R ∈ Rm×n

R =

R̂0...0

.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 10/50

6 1.3. QR FACTORIZATION

Clearly, R is upper triangular matrix (as R̂ is). Call A expressed as a product of  Qand R as the QR factorisation of  A.QR factorisation 

Observe the product of  Q with QT

QTQ

 jk

= q T j

q k

= q i, q 

k

= δ  jk

so QTQ = I (m) and also QT = Q−1.

Definition. Matrix Q ∈ Rm×m is called orthogonal  if  QTQ = I (m).orthogonal 

Proposition 1.2. Orthogonal matrices preserve length and angle, i.e. if  Q ∈ Rm×m

and QT

Q = I (m)

then for all v, w ∈Rm

(1) Qv,Qw = v, w (angle preserved),

(2) Qv = v (length preserved).

Proof. For v, w ∈ Rm

Qv,Qw = (Qv)TQw

= (vTQT)Qw

= vTI (m)w

= v, w.

Also

Qv = [Qv,Qv]1/2

(1)= [v, v]1/2

= v.

Proposition 1.3. If  Q1, Q2 ∈ Rm×m are orthogonal, then (Q1Q2) is orthogonal.

Proof. (Q1Q2)T(Q1Q2) = QT2 QT

1 Q1Q2 = QT2 Q2 = I (m).

Example. For m = 2 and

Q =

cos θ − sin θsin θ cos θ

.

Clearly Q is orthogonal and rotates a vector in R2 by an angle around the origin.

Definition. Define the Givens Rotation Matrix  G pq(θ) ∈ Rm×m, p, q ≤ m, asG pq(θ)

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 11/50

1.3. QR FACTORIZATION 7

G pq(θ) =

1. . .

cos θ − sin θ. . .

sin θ cos θ. . .

1

,

i.e. jth column of  G pq(θ) is

e(m) j =

0...0

10...0

(with 1 on jth row) if  j = p and j = q , or

0...

cos θ...

sin θ...0

= e(m) p cos θ + e(m)

q sin θ

if  j = p, or

0...

− sin θ...

cos θ...0

= −e(m) p sin θ + e(m)

q cos θ

if  j = q .

Note. Length of every column of G pq(θ) is 1 and columns of  G pq(θ) are orthogonal;G pq(θ) is orthogonal matrix.

For A, B ∈ Rm×n consider

G pq(θ)A = B.

All rows of  B are the same as those of  A, except for rows p and q . Aim is to obtain

a QR factorisation of  A using a sequence of givens rotations.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 12/50

8 1.3. QR FACTORIZATION

Example. For m = 3, n = 2,

A = 3 654 0

12 13 .

Take a sequence of Givens Rotations so that A is transformed into R upper triangular.Choose G12(θ) so that

A(1) = G12(θ)A =

· ·

0 ·12 13

.

Choose θ such that (since G12θ preserves length)

cos θ − sin θ

sin θ cos θ 3

4 = 5

0 .

Get

G12(θ) =

3

545 0

−45

35 0

0 0 1

and so

A(1) =

5 39

0 −5212 13

.

Choose G13(ϕ) as the next rotation since it does not affect row 2, so A(2)21 stays 0

(G23(ϕ) would not work). Want the first column of  A(2) to be a multiple λ of  e(3)1 .

Since G13(ϕ) preserves length, we know λ is (52 + 122)1/2 = 13. So

G13(ϕ) =

5

13 0 1213

0 1 0−12

13 0 513

and so

A(2) = G13(ϕ)A(1) =

13 27

0 −520 −31

.

Now choose G23(ψ)

G23(ψ) =1√ 

3665

1 0 0

0 −52 −310 31 −52

to get

R = A(3) = G23(ψ)A(2) =

13 27

0√ 

36650 0

.

So

R = G23(ψ)G13(ϕ)G12(θ)    G

A

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 13/50

1.4. CAUCHY-SCHWARTZ INEQUALITY 9

with G being orthogonal since Givens rotations are orthogonal. Then

R = GA

GTGA = GTR

A = QR

with Q = GT.In general, we want to solve Ax = b for A ∈ Rm×n. We apply a sequence of Givens Rotations G to take A to R upper triangular to get an equivalent systemGAx = Rx = Gb = c.If  m > n and if  ci = 0 for some i = n + 1 → m then there is no solution x to Rx = cand the system is said to be inconsistent . Otherwise there exists a unique solutionx which can be found by backward substitution.

1.4 Cauchy-Schwartz inequalityFor vectors a, b ∈ R3

a · b = |a||b| cos θ.

Generalize this to Rn.

Theorem 1.4 (Cauchy-Schwartx inequality). For a, b ∈ Rn

|a, b| ≤ ab

with equality iff  a and b are linearly dependent.

Proof. If  a = 0 then

a, b

= 0 for all b

∈Rn and so the inequality is trivially true.

If  a = 0 then let q  = aa and c = b − b, q q  so that

c, q  = b, q  − b, q q, q = 0.

We have

0 ≤ c2 = c, c= c, b − b, q q = c, b − b, q c, q = c, b= b − b, q q, b= b2 − b, q q, b= b2 − b, q 2

= b2 − [b, a]2 /a2

|a, b | ≤ abwith equality iff  c = 0, i.e.

b = b, q q 

= b, aa/a2,

i.e. a, b are linearly dependent.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 14/50

10 1.5. GRADIENTS AND HESSIANS

1.5 Gradients and Hessians

For a function of one variable f  : R → R have a Taylor series

f (a + h) = f (a) + hf ′(a) + h2

2!f ′′(a) + o(h3).

Now consider functions of  n variables, i.e. f  : Rn → R. Write f (x) where x = x1

...xn

∈ Rn. We define the partial derivative of  f  with respect to xi

∂f ∂xi

to be a∂f ∂xi

derivative of  f  when taking all x j, j = i, as constants.

Example. For n = 2, x =

x1

x2

, f (x) = f (x1, x2) = sin x1 sin x2. Then the first

derivatives are

∂f ∂x1

(x) = cos x1 sin x2,

∂f 

∂x2(x) = sin x1 cos x2.

Generally, the second derivatives are

∂ 2f 

∂xi∂x j=

∂ 

∂xi

∂f 

∂x j(x)

=∂ 

∂x j ∂f 

∂xi(x)

for i, j = 1 → n and f  sufficiently smooth.

Example. f (x) = sin x1 cos x2 . Then

∂ 2f 

∂x21

(x) =∂ 

∂x1

∂f 

∂x1(x)

= − sin x1 cos x2,

∂ 2f 

∂x22

(x) =∂ 

∂x2

∂f 

∂x2(x)

= − sin x1 cos x2,

∂ 2f 

∂x1x2(x) =

∂ 

∂x2 ∂f 

∂x1(x)

= − cos x1 sin x2.

Chain Rule

For f  : R → R, f (x), we can change the variable x so that x = x(t) or t = t(x) anddefine w(t) = f (x(t)). Then

dw(t)

dt=

df 

dx(x(t))

dx

dt(t).

Generalize to n variables. If  w(t) = f (x(t)) then

dw

dt

=n

i=1

∂f 

∂xi

(x(t))dxi

dt

(t).

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 15/50

1.5. GRADIENTS AND HESSIANS 11

Example. For n = 2, f (x) = sin x1 sin x2, x1(t) = t2, x2(t) = cos t and hencew(t) = sin t2 sin(cos t). We have

dw

dt = 2t cos t2 sin(cos t) + sin t2(cos(cos t))(− sin t)

=∂f 

∂x1(x(t))

dx1

dt(t) +

∂f 

∂x2(x(t))

dx2

dt(t).

For general w(t) = f (a + th),

dmw

dtm=

n

i=1

hi∂ 

∂xi

m

f (a + th).

Now can generalize the Taylor series to get

f (a + h) = f (a) +n

i=1

hi∂ 

∂x1f (a) +

12

ni=1

hi∂ 

∂xi

n j=1

h j∂ 

∂x j

f (a) + O(h3).

Definition. For a function f  : Rn → R, call the vector ▽f (x) ∈ Rn▽f (x)

▽f (x) =

∂f ∂x1

(x)...

∂f ∂xn

(x)

the gradient  of  f  at x. gradient 

Definition. For a function f  : Rn → R, call the matrix D2f (x) ∈ Rn×n D 2f (x)

D2f (x)

ij

=∂ 2f 

∂xix j(x)

the hessian  of  f  at x. hessian 

Can now rewrite the Taylor series as

f (a + h) = f (a) + hT▽f (a) +

1

2hTD2f (a)h + O(h3).

Example. Let f (x) = xT

Ax for all x ∈ R

n

, where A ∈ R

n×n

is a given symmetricmatrix. Find ▽f (x) and D2f (x).Get

f (x) =n

i=1

n j=1

Aijxix j

and so

∂f 

∂x p(x) =

ni=1

n j=1

Aij∂ 

∂x p(xix j)

=

n

i=1

n

 j=1 Aij x j ∂ 

∂x p xi + xi ∂ 

∂x p x j .

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 16/50

12 1.5. GRADIENTS AND HESSIANS

Also

∂ 

∂x pxi =

1 if  i = p,0 if  i

= p

= δ ip.

Therefore

∂f 

∂x p(x) =

ni=1

n j=1

Aij (δ ipx j + xiδ  jp)

=n

 j=1

A pjx j +n

i=1

Aipxi

and therefore

[▽f (x)] p = [Ax] p + AT

x p ,▽f (x) = Ax + ATx

= 2Ax if  A is symmetric.

For the Hessian get

∂ 2f 

∂xq∂x p(x) =

∂ 

∂xq

∂f 

∂x p(x)

=

∂ 

∂xq

[Ax] p +

ATx

 p

=

∂ 

∂xq n

 j=1A pjx j +

ni=1

Aipxi= A pq + (AT) pq

and so for A symmetric, D2f (x) = 2A. Note the analogy with derivatives of functionsof one variable:

f (x) = ax2, f (x) = xTAx,f ′(x) = 2ax, ▽f (x) = 2Ax,

f ′′(x) = 2a, D2f (x) = 2A.

Definition. A function f  : Rn → R has a local maximum [minimum ] in a if for alllocal maximum minimum  u

∈Rn,

u

= 1, there exists ε > 0 such that

f (a + hu) ≤ f (a)

[≥]

for all h ∈ [0, ε].

For n = 1, f ′(a) = 0 and f ′′(a) > [<]0 are sufficient  conditions for f  to have a localminimum [maximum] at x = a as

f (a ± h) = f (a) ± hf ′(a) +1

2h2f ′′(a) + O(h3)

= f (a) +1

2h2f ′′(a) + O(h3)

≥ [≤]f (a) for small h.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 17/50

1.5. GRADIENTS AND HESSIANS 13

Proposition 1.5. For f  : Rn → R, if ▽f (a) = 0 then f (x) does not have a localminimum or maximum at x = a, i.e. ▽f (a) = 0 is a necessary condition for f (x) tohave a local minimum or maximum at x = a.

Proof. We show that f  does not have a maximum at a (analogous for minimum).Let h ≥ 0 and consider

f (a + hu) = f (a) + huT▽f (a) + O(h2).

Let

u =▽f (a)

▽f (a)so that u = 1. Then

f (a + hu) = f (a) + h▽f (a)

2

▽f (a) + O(h

2

)= f (a) + h▽f (a)    

>0

> f (a).

Points a where ▽f (a) = 0 are called stationary points  of  f (x). stationary points 

Proposition 1.6. If ▽f (a) = 0 and wTD2f (a)w > [<]0 for all w ∈ Rn, w = 0, thenf (x) has a local minimum [maximum] at x = a.

Proof. Take u such that u = 1 (and so u = 0). Then

f (a + hu) = f (a) + huT▽f (a)    =0

+1

2h2  ≥0

uTD2f (a)u    >[<]0

+O(h3)

≥ [≤]f (a).

Example. For n = 2, f (x) = x21 − 2x1 + x2

2 − 2x2 + 1 we have

▽f (x) = ∂f ∂x1 (x)

∂f ∂x2

(x)

=

2(x1 − 1)2(x2 − 1)

.

Look for stationary points, i.e. when ▽f (a) = 0; get a = (1, 1). Compute the Hessian

D2f (x) =

∂ 2f 

∂x21

∂ 2f ∂x1x2

∂ 2f ∂x1x2

∂ 2f ∂x2

2

= 2 00 2

= 2I (2).

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 18/50

14 1.6. GENERALIZED INNER PRODUCT

Check that for all w ∈ R2, w = 0,

wTD2f (a)w = 2wTw

= 2w2

> 0.

So f  has a local minimum at (1, 1).

Definition. Call a matrix A ∈ Rn×ndefinite 

positive definite  if  xTAx > 0,negative definite  if  xTAx < 0,

non-negative definite  if  xTAx ≥ 0,non-positive definite  if  xTAx ≤ 0

for all x ∈ Rn, x = 0.

Note. Clearly, a positive (negative) definite matrix A∈Rn

×n is invertible since

there is no x ∈ Rn, x = 0, such that Ax = 0; if there was, then xTAx = xT0 = 0, acontradiction.

Example. For n = 2, A =

1 −1

−1 1

, x =

x1

x2

,

xTAx = x21 + x2

2 − 2x1x2

= (x1 − x2)2

≥ 0

so A is non-negative definite but not positive definite.

Using this definition, we can restate the proposition 1.6:

Proposition 1.7. If ▽f (a) = 0 and D2f (a) is positive [negative] definite then a isa local minimum [maximum] of  f .

1.6 Generalized inner product

Definition. Let A be symmetric positive definite matrix A ∈ Rn×n. Define theinner product  ·, ·Av, uA

v, uA = uTAv

for all v, u ∈ Rn

.

Note. We previously worked with ·, ·I  = uTv.

Check that the required properties of an inner product still hold:

• symmetry

u, vA = vTAu

= (vTAu)T

= uTATv

= uTAv

= u, vA,

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 19/50

1.7. CHOLESKY FACTORIZATION 15

• linearity

u,αv + βwA = αu, vA + β u, wAαu + βv,wA = αu, wA + β v, wA

for all u,v,w ∈ Rn and α, β ∈ R.

Definition. For a positive definite matrix A ∈ Rn×n define the length  of a vector length 

· Au ∈ Rn as

uA = (u, uA)1/2 .

Theorem 1.8 (Generalised Cauchy-Schwartz inequality). If A ∈ Rn×n is symmetricpositive definite then

|a, b

A

| ≤ a

A

b

A

for all a, b ∈ Rn with equality iff  a, b are linearly dependent.

Proof. Replace ·, · by ·, ·A and · by · A in the proof of Cauchy-Schwartzinequality.

1.7 Cholesky Factorization

Easy method of generating symmetric and positive definite matrices:

Proposition 1.9. If  P 

∈Rn×n is invertible, then A = P TP  is symmetric and

positive definite.

Proof. Matrix A is symmetric since

AT = (P TP )T

= P TP 

= A.

It is positive definite since

xTAx = xT(P TP )x

= (P x)T(P x)= P x2 ≥ 0

for all x ∈ Rn. Also if  xTAx = 0 then P x = 0 and so P x = 0 and hence x = 0since P  is invertible.

We now prove the reverse direction.Cholesky Factorisation 

Theorem 1.10. Let A ∈ Rn×n be any symmetric positive definite matrix. Thenthere exists an invertible P  ∈ Rn×n such that A = P TP . Furthermore, we canchoose P  to be upper triangular with P ii > 0, i = 1 → n, in which case we say that

A = P TP  is a Cholesky Factorisation (Decomposition) of  A.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 20/50

16 1.7. CHOLESKY FACTORIZATION

2 Apply CGS with , A to {vi}ni=11: w1 = v1

2: w1 = v1/v13: for k = 2 to n do4: wk = vk −i−1

 j=1vk, u jAu j5: uk = wk/wk6: end for

Proof. Let {vi}ni=1 be any n linearly independent vectors in Rn. Using the innerproduct induced by A, we apply Gram Schmidt (with this inner product) to {vi}ni=1

to get {ui}ni=1. Let U  =

u1 . . . un

∈ Rn×n.Then (this is a proof of  1.1 generalized)

[U T(AU )]ij = uTi Au j

= ui, u jA= δ ij

for i, j = 1 → n. So U TAU  = I (n).Does U −1 exist? Requires {ui}ni=1 to be linearly independent. Suppose there existsc ∈ Rn such that

ni=1 ciui = 0. Then

ni=1

ciAui = A0 = 0

uT j

n

i=1

ciAui = 0

ni=1

ciui, u jA = 0

c j = 0.

for j = 1 → n and so c = 0 and {u j}ni=1 are linearly independent.So U −1 exists and

U −1U  = I (n) = [I (n)]T

= [U −1U ]T

= U T(U −1)T

and therefore (U T)−1 = (U −1)T. We let P  = U −1 (so P  is invertible). Observe that

P T = (U −1)T

= (U T)−1.

Therefore

P TP  = P TI (n)P 

= P TU TAU P  = A.

To find P  upper triangular with P ii > 0, need to choose

{vi

}ni=1 to be a particular

basis for Rn: for i = 1 → n let vi = e(n)i ((e(n)

i ) j = δ ij for i, j = 1 → n). Clearly,

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 21/50

1.7. CHOLESKY FACTORIZATION 17

matrix U  from CGS is upper triangular since each ui is a linear combination of 

e(n)1 , . . . , e

(n)i . To show that U ii > 0, observe that U ii = (ui)i = (wi/wiA)i and

that

wi = e(n)i −

i−

1 j=1

e(n)i , u jAu j.

Since (u j)k = 0 for k > j, we have that (wi)i = (e(n)i )i = 1. Hence U  is upper

triangular with U ii > 0.Now choose P  to be U −1. Then

U P  = I (n)

p1

. . . pn

=

e

(n)1 . . . e

(n)n

.

For each i = 1

→n solve U p

i= e

(n)i : clearly ( p

i) j = 0 for j = i + 1

→n and

( pi)i = 1/U ii > 0 so P  is upper triangular with P ii > 0 for i = 1 → n.

Proposition 1.11. Let A ∈ Rn×n be symmetric positive definite. Then Akk > 0 fork = 1 → n and |A jk | < (A jj )1/2 (Akk)1/2 for j, k = 1 → n, j = k.

Proof. Since A is symmetric positive definite, by the previous theorem, there existsan invertible P  such that A = P TP . Let

P  =

p1

. . . pn

.

ThenA jk = pT

 jpk

= p j

, pk

for j, k = 1 → n. So Akk =  pk > 0 as p

k= 0 (P  is invertible and so { p

i}ni=1 are

linearly independent).Also

|A jk | = | p j p

k|

<  p j p

k

= (A jj )1/2 (Akk)1/2

by Cauchy-Schwartz (strict inequality as p j

and pk

are linearly independent).

Computing Cholesky Decomposition

Given A symmetric positive definite, can find L = P T lower triangular with Lii > 0such that A = LLT by applying CGS with ·, ·A to {ei}ni=1 to get {ui}ni=1 andputting P  = U −1 = [u1, . . . , un]−1.There is an easier way. Let L = [l1, . . . , ln] ∈ Rn×n and A = LLT. Then

Aij =n

k=1

Lij(LT)kj

=n

k=1

(lk)i(lk) j . (

†)

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 22/50

18 1.7. CHOLESKY FACTORIZATION

Also

(lklTk )ij = (lk)i(lT

k ) j

= (lk)i(lk) j .

So from (†) get

Aij =n

k=1

(lklTk )ij

A =n

k=1

lklTk .

Example. For n = 3. Find a Cholesky Decomposition of  A = 2 −1 0

−1

5

2 −10 −1 52

,

i.e. find lower triangular L, Lii > 0, i = 1 → n, such that A = LLT.

Need to check that A is symmetric (clear) and positive definite (good to verifyconditions from 1.11) Take arbitrary x ∈ R3. Firstly,

xTAx =3

i=1

3 j=1

Aijxix j

= 2x21 +

5

2x2

2 +5

2x2

3 − 2x1x2 − 2x2x3

≥ 2x2

1 +

5

2 x2

2 +

5

2 x2

3 − (x2

1 + x2

2) − (x2

2 + x2

3)

= x21 +

1

2x2

2 +3

2x2

3 > 0.

So let L = [l1, l2, l3] be lower triangular. Then

A = LLT

=3

k=1

lklTk

= l1lT1 + l2lT

2 + l3lT3 .

Since L is lower triangular

l1lT1 =

· · ·

· · ·· · ·

,

l2lT2 =

0 0 0

0 a2 ab0 ab b2

,

l3lT3 =

0 0 00 0 0

0 0 x

.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 23/50

1.7. CHOLESKY FACTORIZATION 19

Therefore the first column of  A is generated by l1 alone, i.e.

l1 =Ae1√ 

A11=

Ae1

e1

A

.

(l1)1(l1)1 = 2, (l1)1(l1)2 = −1, (l1)1(l1)3 = 0. Thus l1 = 1√ 2

2

−10

, a multiple of 

first column of  A.Define A(1) so that A(1) = l2lT

2 + l3lT3

A(1) = A − l1lT1

= A − 2 −1 0

−1 2 00 0 0

= 0 0 0

0 2 −10 −1 5

2

.

By the same reasoning l2 = 1√ 2

0

2−1

, multiple of the second column of  A(1).

Define A(2) so that A(2) = l3lT3

A(2) = A(1) − l2lT2

= A(1) −

0 0 00 2 −1

0 −112

=

0 0 0

0 0 00 0 2

and so l3 = 1√ 2

0

02

, multiple of the third column of  A(2).

Putting these together gives

L =1√ 

2

2 0 0−1 2 0

0 −1 2

.

Now consider the above constructive algorithm in the general case, i.e. A ∈ Rn×n

symmetric positive definite. Since A11 > 0, we can start the algorithm by defining

l1 =Ae1√ 

A11.

Then A(1) = A − l1lT1 is symmetric (since A and l1lT

1 are symmetric) and has theform

A(1) =

0 0 · · · 00... B0

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 24/50

20 1.8. LEAST SQUARE PROBLEMS

with B symmetric. To continue, we need to show that B is positive definite and soBkk > 0.

Theorem 1.12. Matrix B ∈R(n

−1)

×(n

−1)

defined above is positive definite.

Proof. We need to show that uTBu > 0 for all u ∈ Rn−1, u = 0. Take u ∈ Rn−1,

u = 0. Construct v =

0u

∈ Rn (hence v = 0) and eT

1 v = 0 means that e1 and v

are linearly independent. Then

A(1) = A − (Ae1)(Ae1)T

e12A

,

vTAv = uTBu.

So

uTBu = vTAv − (eT1 Av)2

e12A

=(v2

Ae12A − [e1, vA]2)

e12A

.

By Cauchy-Schwartz, |e1, vA| < e1AvA. Hence uTBu > 0.

Also B11 > 0 and so A(1)22 > 0; the procedure can continue.

Application of Cholesky Decomposition

Given A ∈ Rn×n symmetric positive definite, can find L lower triangular such thatLLT, Lii > 0. Solve Ax = b for given b ∈ Rn: Get

LLTx = b

and let z = LTx. Solve Lz = b by forward substitution

z1 = b1/L11,

zk = (bk −k−1

 j=1

Lkjz j)/(L jk)

for k = 2 → n. Having z, solve LTx = z by backward substitution.

1.8 Least Square Problems

Example. Take a pendulum with length l, measure the period T  and estimate g(the acceleration due to gravity). Have the following

L =√ 

l, C  =2π√ 

g,

CL = T.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 25/50

1.8. LEAST SQUARE PROBLEMS 21

Do m experiments to getLC  = T 

with L, T 

∈Rm. Plot the data (T i against Li) and fit a straight line through the

data. Choose C  to minimize the sum of squares of the errors, i.e. such thatmi=1

(T i − CLi)2 = T  − CL2

= T  − CL,T − CL= T 2 − 2C L, T  + C 2L2 = S 

is minimal. The derivative

dS 

dC = −2L, T  + 2C L2

equals 0 iff  C  = L,T 

L2 . Check the second derivative

d2S 

dC 2= 2L2 > 0.

Take C ⋆ = L,T L2 . Then

T  − C ⋆L, L = T , L − C ⋆L2

= 0,

i.e. the choice of  C ⋆ makes T  − C ⋆L perpendicular to L.

1.8.1 General Least Squares CaseGiven A ∈ Rm×n (m ≥ n) b ∈ Rn find x ∈ Rn such that Ax = b. For m > n, there’sno general solution as we have an overdetermined system. We are concerned aboutfinding x⋆ ∈ Rn which minimizes Ax − b over x. Let

Q(x) = Ax − b2

= Ax − b,Ax − b= (Ax − b)T(Ax − b)

= (xTAT − bT)(Ax − b)

= xTATAx − bTAx − xTATb + bTb

= xTATAx − 2bTAx + b2

= xTGx − 2µTx + b2

where

G = ATA ∈ Rn×n,

µ = ATb ∈ Rn.

Note that G is symmetric.Take derivatives of  Q to get

▽Q(x) = 2(Gx − µ),

D2Q(x) = 2G.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 26/50

22 1.8. LEAST SQUARE PROBLEMS

Theorem 1.13. Let A ∈ Rm×n (m ≥ n) with linearly independent columns andb ∈ Rm. Then ATA ∈ Rn×n is symmetric positive definite. Moreover, the x⋆ ∈ Rn

solving ATAx⋆ = ATb is the unique minimum of  Q(x) = Ax − b2 over x ∈ Rn.

Note. The equations ATAx⋆ = ATb are called normal equations  and x⋆ is callednormal equations 

the least squares solution  of  Ax = b..least squares solution 

Proof. Matrix ATA is clearly symmetric – shown above. Let A =

a1 . . . an

,

ai ∈ Rm, {ai}ni=1 linearly independent. Then for any c ∈ Rn

cTATAc = (Ac)TAc

= Ac2 ≥ 0

with equality iff Ac = 0, i.e. when c = 0 since{

ai}

n

i=1is linearly independent. Hence

ATA is positive definite.

To find the minimum of Q(x), find x⋆ such that ▽Q(x⋆) = 0 and D2Q(x⋆) is positivedefinite. Get

▽Q(x) = 2(Gx − µ) = 2(ATAx − ATb),

D2Q(x) = 2G = 2ATA.

Therefore x⋆ has to solve ATAx = ATb. As ATA is positive definite, (ATA)−1 exists.Hence there exists a unique x⋆ solving ATAx = ATb. As D2Q(x⋆) is positive definite,x⋆ is the unique global minimum of  Q(x) =

Ax

−b

2.

Example. For m = 3, n = 2, A =

3 65

4 012 13

, b =

1

11

. It is obvious that

no x ∈ R2 solves Ax = b. Find the least square solution x⋆ ∈ R2: solve the normalequations

ATAx⋆ = ATb

to get x⋆ =

0.090587 . . .0.010515 . . .

.

In practice, it is not a good idea to solve the normal equations since the matrixATA is generally badly conditioned. A matrix B ∈ Rn×n is ill-conditioned  if smallill-conditioned 

changes to b lead to large changes in the solution Bx = b, so if in

B(x + δx) = +δb

for small δb, δx is large.

We now find the x⋆ using the QR approach. Using a sequence of Givens rotations,can find G orthogonal such that GA = R upper triangular with Rii > 0. ThenA = GTR.

Rx = Gb

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 27/50

1.9. A MORE ABSTRACT APPROACH 23

with

Gb =

(Gb)1.

..(Gb)n

0...0

+

0.

..0

(Gb)n+1...

(Gb)m

= α + β.

If β  = 0, then there exists a unique solution to Rx = α = Gb so there exists a uniquesolution x to Ax = b.If  β  = 0 then Rx = Gb is an inconsistent system and has no solution x and so doesAx = b. However, we can solve Rx⋆ = α. We claim that x⋆

∈Rn is the least squares

solution of  Ax = b. Also β  = Ax − b.

1.9 A more abstract approach

A more abstract definition of the inner product:

Definition. Let V  be a real vector space. A inner product on V  × V  is a function·, · : V  × V  → R such that, for all u,v,w ∈ V,λ,µ ∈ R,

(1) λu + µv,w = λu, w + µv, w

(2) u, v = v, u,

(3) u, u ≥ 0 with equality iff  v = 0.inner 

product An inner product induces a norm u = (u, u)1/2 for all u ∈ V . This impliesu = 0 iff  u = 0.

Example. Let V  = C [a, b] be continuous functions over [a, b]. Let w ∈ C [a, b] with

w(x) > 0 for all x ∈ [a, b]. Define f, g = ba w(x)f (x)g(x)dx. Clearly (1) and (2)

hold. Also

f, f  =

 ba

w(x) (f (x))2 dx

≥ 0

and f, f  = 0 implies f  = 0.

Let V  be a real vector space with inner product·, ·. Let U  be a finite dimensional subspace of  V  with basis {φ

i}ni=1. Given v ∈ V , find u⋆ ∈ U  such that v − u⋆ ≤

v − u for all u ∈ V .

Example. Let V  = C [a, b] and

f, g

=  ba f (x)g(x)dx (i.e. w(x) = 1). Let U  be

polynomials of degree ≤ n − 1 with basis φi

= xi−1.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 28/50

24 1.9. A MORE ABSTRACT APPROACH

We have u ∈ U  implies u =n

i=1 λiφiwith λi ∈ R. Also u⋆ ∈ U  implies u⋆ =n

i=1 λ⋆i φ

iwith λ⋆

i ∈ R. Therefore

v − u⋆2 ≤ v − u2v −n

i=1

λ⋆i φ

i

2

≤v −

n j=1

λ jφ j

2

.

Let E (λ) =v −n

i=1 λiφi

2. Now we have to find λ⋆ ∈ Rn such that E (λ⋆) ≤ E (λ)

for all λ ∈ Rn. We have

E (λ) = v −n

 j=1λ jφ j, v −

ni=1

λiφi

= v2 − 2n

i=1

λiv, φi +

ni=1

n j=1

λiλ jφi, φ

 j.

Let µ ∈ Rn where µi = v, φi. Let G ∈ Rn×n where Gij = φ

i, φ

 j. Now we have

E (λ) = v2 − 2µT λ + λT Gλ,

▽E (λ) = −2µ + 2Gλ,

D2

E (λ) = 2G.

So λ⋆ minimises E (λ) if ▽E (λ⋆) = 0. This is equivalent to Gλ⋆ = µ. The matrix Gis called the Gram matrix  and depends on the basis for U. It is sometimes writtenGram matrix 

as G(φ1

,...,φn

).

Lemma 1.14. Let {φi}ni=1 be a basis of U . Let G ∈ Rn×n be such that Gij = φ

i, φ

 j.

Then G is positive-definite.

Proof. Check that for any λ ∈ Rn

λT Gλ =n

i=1

n j=1

λiλ jφi, φ

 j

= n

i=1

λiφi,

n j=1

λiφ j

=

n

i=1

λiφi

2

≥ 0.

This only equals to zero if  ni=1 λiφi

= 0. As φi’s are linearly independent this

implies λ = 0. Therefore λT Gλ > 0 for all λ = 0.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 29/50

1.9. A MORE ABSTRACT APPROACH 25

As G is positive definite, we can deduce that G−1 exists, and therefore there is aunique λ⋆ ∈ Rn solving Gλ⋆ = µ, i.e. ▽E (λ⋆) = 0 and therefore λ⋆ is a global minimum  of  E (·).

Theorem 1.15 (Orthogonality Property). Finding λ⋆ ∈ Rn which minimises E (λ)is equivalent to finding u⋆ =

ni=1 λ⋆

i φi∈ U  such that v − u⋆, u = 0 for all u ∈ Rn.

Proof. Gλ⋆ = µ implies that λT Gλ⋆ = λT µ for all λ ∈ Rn. Moreover λT Gλ⋆ = λT µimplies that (Gλ⋆)i = µ

i, repeat for i = 0 → n and we get Gλ⋆ = µ equivalent to

λT Gλ⋆ = λT µ for all λ ∈ Rn. So

Gλ⋆ = µ

λT 

Gλ⋆

= λT 

µn

i=1

n j=1

λiGijλ⋆ j =

ni=1

λiµi

n

i=1

λiφi,

n j=1

λ⋆ jφ

 j =

ni=1

λiφi, v

u, u⋆ = u, vv − u⋆, u = 0,

where u = ni=1 λiφi

.

Example. Let V  = C [0, 1] and f, g = 1

0 f (x)g(x)dx and let U  = Pn−1. Takeφi

= xi−1. Given v ∈ V , find u⋆ =n

i=1 λ⋆i xi−1 such that

v − u⋆ ≤ v − uv − u⋆2 ≤ v − u2 1

0(v − u⋆)2dx ≤

 1

0(v − u)2dx

for all u ∈ V . We now have to solve the normal equations Gλ⋆ = µ, where

µi = v, φi

=

 1

0v(x)xi−1dx,

Gij = φi, φ

 j

=

 1

0xi−1x j−1dx

=

 1

0xi+ j−2dx

=1

i + j − 1 .

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 30/50

26 1.10. ORTHOGONAL POLYNOMIALS

This gives the Hilbert Matrix 

G = 1 1

2 . . . 1n

12

13 . . . 1

n+1

... ... . . . ...1n

1n+1 . . . 1

2n−1

which is very badly conditioned as the columns are linearly dependent as n → ∞.We need to change basis; we have two options:

1. We can use the Gram-Schmidt algorithm to change the basis to an orthonormal basis {ψ

i}ni=1 where ψ

i, ψ

 j = δ ij. This implies that G = I .

2. We can also create an orthogonal  basis {ψi}ni=1 where ψ

i, ψ

 j = 0 for i = j.

Now G is diagonal and Gii =

ψi

2> 0. We have λ⋆

i = µi

ψi

2 and therefore

u⋆ = ni=1 µiψi2 ψi.

Example. Let V  = Rm and let a, b = aT b. Let U  = Span {ai}ni=1 with n ≤ m. So{ai}ni=1 is a basis for U . Given v ∈ Rm, we want to find u⋆ =

ni=1 λ⋆

i ai such that

v − u⋆ ≤ v − u for all u ∈ U . We need to solve the normal equations Gλ⋆ = µ,

µi = v, ai= aT 

i v,

Gij = ai, a j= aT 

i a j.

Let A =

ai · · · an

so AT A = G and µ = AT v. We can deduce that

AT Aλ⋆ = AT v

Aλ⋆ = v.

So AT A is ill-conditioned and we shouldn’t solve these normal equations and use theQR approach instead.

1.10 Orthogonal Polynomials

V  = C [a, b] and

f, g

=  ba w(x)f (x)g(x)dx where w is the weight function w

∈C (a, b) such that w ≥ 0 with possibly a finite number of zeros. This is required forthe integral to be well-defined.

|f, g| =

 ba

w(x)f (x)g(x)dx

 ba

|w(x)f (x)g(x)| dx

=

 ba

w(x) |f (x)g(x)| dx

≤  ba

w(x)dx maxa≤x≤b |f (x)| maxa≤x≤b |g(x)| .

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 31/50

1.10. ORTHOGONAL POLYNOMIALS 27

Therefore ·, · is well-defined if  ba w(x)dx < ∞.

Let U  = Pn be the polynomials of degree n. The natural basis

xini=0

leads to anill-conditioned Gram matrix. We will construct a new basis for Pn, {φi}ni=0 where

φ j(x) is a monic polynomial of degree j, i.e. φ j(x) = x j + j−

1i=o aijxi. monic polynomial 

Theorem 1.16. Monic orthogonal polynomials, φ j ∈ P j, satisfy the three termrecurrence relation, for j ≥ 1

φ j+1(x) = (x − a j)φ j(x) − b jφ j−1(x) for j ≥ 1

where

a j =xφ j, φ j

φ j

2

and

b j =φ j2

φ j−12 .

Proof. Let φ j ∈ P j be monic. This implies that

φ j+1(x) − xφ j(x)

   ∈Pj

=

 jk=0

bkxk

=

 jk=0

ckφk(x).

Now we need to find ck. We have

 j

k=0

ckφk(x), φi(x) = φ j+1(x) − xφ j(x), φi(x).

But φ j is orthogonal to φk for k = 0 → j − 1. Therefore φ j is orthogonal to any

 p

∈P j

−1 as

{φk

} j−1k=0 is a basis for P j

−1. Then for i = 0

→j

ci φi2 = φ j+1, φi − xφ j, φi= −φ j, xφi.

We have xφi ∈ Pi+1 and hence φ j, xφi = 0 if i ≤ j − 2. Since ci φi2 = −φ j, xφi,we have ci = 0 for i = 0 → j − 2. Hence

φ j+1(x) − xφ j(x) = c j−1φ j−1(x) + c jφ j(x).

This implies φ j+1(x) = (x + c j)φ j(x) + c j−1φ j−1(x).

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 32/50

28 1.10. ORTHOGONAL POLYNOMIALS

We have

c j−1 = −φ j, xφ j−1

φ j−

1

2

,

c j = −φ j, xφ jφ j2 .

Now note that

φ j, xφ j−1 = φ j, xφ j−1 − φ j    =0

+φ j, φ j.

Therefore c j−1 = − φj2φj−12 . Set b j = −c j−1 and a j = c j.

To apply this Theorem we need φ0(x) = 1 and φ1(x) = x − a0 where a0 ∈ R must

be chosen such that φ1, φ0 = 0, i.e.x − a0, 1 = 0

a01, 1 = x, 1a0 =

x, 112

=xφ0, φ0

φ02 .

We use the theorem for j ≥ 0 by setting φ−1(x) = 0. Thus

φ j+1(x) = (x − a j)φ j(x) − b jφ j−1(x)

where j ≥ 0 and

a j =xφ j, φ jφ j2 ,

b j =φ j2

φ j−12,

φ0(x) = 1,

φ−1(x) = 0.

Remark 1.2. Recall that g(x) is even iff g(x) = g(−x) or  2−2 g(x)dx = 2  20 g(x)dx.and g(x) is odd iff g(x) = −g(x) or

 2−2 g(x)dx = 0.odd/even 

 function Example. Let f, g =

 1−1 f (x)g(x)dx be our inner product (i.e. w(x) = 1 ). We

shall apply our method with j = 0 to this case. We have φ0(x) = 0 and φ1(x) = x−a0

which implies φ1(x) = x. Also

a1 =xφ0, φ0

φ02

=

 1−1 xdx

 1−1 1dx

= 0

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 33/50

1.10. ORTHOGONAL POLYNOMIALS 29

(since x is an odd function). Using the method with j = 1 we deduce φ2(x) =(x − a1)φ1(x) − b1φ0(x) = x2 − a1x − b1. Then

a1 =xφ1, φ1

φ12

=

 −1−1 x3dx

φ12

= 0,

b1 =φ12

φ02

=

 −1−1 x2dx

 −1

−1 1dx

=1

3.

So φ2(x) = x2 − 13 and we can continue in this matter.

Recall now our original problem. Given f  ∈ C [a, b] we wish to find p⋆n ∈ Pn such

that f − p⋆n ≤ f − pn for all pn ∈ Pn.

We wish to find an orthogonal basis for Pn, {φ j}n j=0. Then p⋆

n =n

 j=0 λ⋆ jφ j(x). We

solve the normal equations Gλ⋆ = µ with G ∈ R(n+1)×(n+1) and for i = 1 → n

Gij =

φi, φ j

= 0 if  i = j,

φi2 if  i = j,

µi = f, φi,

λ⋆i =

µi

Gii,

=µi

φi2 .

This implies that

 p⋆n(x) =

n

 j=0

f, φ j

φ j

2

φ j(x)

is the best approximation to f .

Example. Show that the polynomials T k(x) = cos(k cos−1(x)) for −1 ≤ x ≤ 1 areorthogonal with respect to the inner product f, g =

 1−1(1 − x2)−1/2f (x)g(x)dx.

Does T k(x) belong to Pk?

T 0(x) = cos 0

= 1,

T 1(x) = cos(cos−1 x)

= x.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 34/50

30 1.10. ORTHOGONAL POLYNOMIALS

Let’s use a change of variable θ = cos−1 x and so x = cos θ. Now we can writeT k(x) = cos kθ. Using cos((k + 1)θ) + cos((k − 1)θ) = 2cos kθ cos θ. We can deducethe following

T k+1(x) + T k−1(x) = 2T k(x)x

T k+1(x) = 2xT k(x) − T k−1(x).

We have

T 2(x) = 2xT 1(x) − T 0(x)

= 2x2 − 1,

T 3(x) = 2xT 2(x) − T 1(x)

= 22x3 − 3x.

By induction we have T k(x) ∈ Pk. The coefficient of  xk is 2k−1. Using x = cos θ,

T k(x), T  j(x) =

 1

−1(1 − x2)−1/2T k(x)T  j(x)dx

=

 0

π(sin θ)−1 cos(kθ)cos( jθ)(− sin θ)dθ

=

 π

0cos(kθ)cos( jθ)dθ

= 12

 π0

cos(( j + k)θ) + cos(( j − k)θ)dθ

=

0 if  j = k,π2 if  j = k = 0,π if  j = k = 0.

Call T k(x) the Chebyshev polynomials .Chebyshev polynomials 

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 35/50

31

Chapter 2

Polynomial interpolation

Given {(z j, f  j)}n j=0, z j, f  j ∈ C, 0 = 1 → n, we want to find a polynomial pn(z) ∈ Pnsuch that pn(z j) = f  j for j = 0 → n. Call such pn the interpolating polynomial . To interpolating 

polynomial prove that this polynomial exists:

Lemma 2.1 (Lagrange Basis Function). Let

l j(z) =n

k=0,k= j

(z − zk)

(z j − zk)

for j = 0 → n. Then l j(z) ∈ Pn and l j(zr) = δ  jr for j, r = 0 → n.

Proof. For j = 0 → n, l j(z) is a product of n factors of the form z−zkzj−zk

and therefore

l j

(z)∈Pn

.We have

l j(zr) =n

k=0,k= j

zr − zkz j − zk

for r = 0 → n. If  r = j, then clearly L j(zr) = 1. Otherwise for k = r, zr−zkzj−zk

= 0 and

so l j(zr) = 0. Hence l j(zr) = δ rj .

Lemma 2.2. The interpolating polynomial pn(z) ∈ Pn for data {(z j, f  j)}n j=0 withz j distinct is

 pn(z) =n

 j=0

f  jl j(z).

Note. Call pn in this form the Lagrange form  of the interpolating polynomial. Lagrange form 

Proof. We have pn(z) ∈ Pn since each l j(z) ∈ Pn. Also by the previous lemma, forr = 0 → n,

 pn(zr) =n

 j=0

f  jl j(zr)

=n

 j=0

f  jδ  jr

= f r.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 36/50

32

To prove the uniqueness of the interpolating polynomial:

Theorem 2.3 (Fundamental Theorem of Algebra). Let pn(z) = a0 +a1z +·+anzn ∈Pn where ai ∈ C, an = 0. Then pn(z) has n distinct roots in C unless ai = 0 fori = 0 → n.

Lemma 2.4. Given {(z j, f  j)}n j=0, z j distinct, there exists a unique  interpolatingpolynomial pn(z) ∈ Pn.

Proof. Assume the contrary, i.e. there exists q n ∈ Pn such that pn(z j) = q n(z j) = f  jfor j = 0 → n. Consider the polynomial ( pn − q n) ∈ Pn. Then

( pn − q n)(z j) = ( pn(z j) − q n(z j)) = 0

for j = 0 → n. Hence ( pn − q n) has n + 1 roots and therefore by the FundamentalTheorem of Algebra is 0, i.e. pn = q n. Hence pn is unique.

Example (of interpolating polynomial). For n = 2. Find p2 ∈ P2 such that p2(0) =a, p2(1) = b and p2(4) = c. Get

l0(z) =(z − z1)(z − z2)

(z0 − z1)(z0 − z2)=

(z − 1)(z − 4)

(0 − 1)(0 − 4)=

1

4(z2 − 5z + 4),

l1(z) =(z − 0)(z − 4)

(1 − 0)(1 − 4)= −1

3(z2 − 4z),

l2(z) =(z − 0)(z − 1)

(4 − 0)(4 − 1)

=1

12

(z2

−z).

Hence

 p2(z) = al0(z) + bl1(z) + cl2(z)]]

=

a

4− b

3+

c

12

z2 −

3a

4− 4b

3+

c

12

z + a

in Lagrange form.

We are interested in finding the coefficients of the interpolating polynomial in thecanonical form

 pn(z) =n

k=0

ak

zk.

Consider the equations

 pn(z j) =n

k=0

akzk j = f  j

for j = 0 → n. Get a system of equations

1 z0 z20 · · · zn

0

1 z1 z21 · · · zn

1...

......

...1 zn z2

n

· · ·znn

    

a0...

an

=

f 0...

f n

.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 37/50

33

Call V  the Vandermonde Matrix . So we need to solve V a = f . In general, V  is Vandermonde matrix ill-conditioned. With the canonical basis

zknk=0

, we can solve V a = f  by findingthe Lagrange basis {lk(z)}nk=0 and thus solving Ia = f . However, the Lagrange basis

has to be constructed.Assume we found pn−1 ∈ Pn−1 interpolating {(z j, f  j)}n−1

 j=0 and are given a new datapoint (zn, f n). One cannot use pn−1 to compute pn, since it is necessary to computethe new Lagrange basis for Pn.We now look for an alternative construction. If  pn−1 ∈ Pn−1 is such that pn−1(z j) =f  j for j = 0 → n − 1, let pn ∈ Pn be such that pn(z j) = f  j for j = 0 → n and

 pn(z) = pn−1(z) + cn−1k=0

(z − zk).

Clearly pn(z j) = pn−1(z j) = f  j for j = 0 → n − 1. Choose c ∈ C such that

 pn(zn) = pn−1(zn) + cn

k=0

(zn − zk) = f n

that is

c =f n − pn−1(zn)n−1

k=0 (zn − zk).

Therefore c depends on {(z j, f  j)}n j=0. We will use the notation c = f [z0, z1, . . . , zn] f [z0, z1, . . . , zn]

so that

 pn(z) = pn−1(z) + f [z0, z1, . . . , zn]n−1k=0

(z − zk).

That is, the coefficient of  zn in pn(z) is f [z0, z1, . . . , zn].Note that since pn is such that pn(z j) = f  j, j = 0 → n, is unique,

f [zπ(0), . . . , zπ(n)] = f [z0, . . . , zn]

for any permutation π of  {0, 1, . . . , n}.

Lemma 2.5. For {(z j, f  j)}n j=0, z j, f  j ∈ C, z j distinct,

f [z0, z1, . . . , zn] =n

 j=0

f  jk=0,k= j(z j − zk)

.

Furthermore, if  f  j = f (z j), j = 0 → n for some function f (z) then f [z0, . . . , zn] = 0if  f  ∈ Pn−1.

Proof. Compare coefficient of  zn in the Lagrange form of  pn with

 pn(z) = pn−1(z) + f [z0, . . . , zn]n−1k=0

(z − zk)

=n

 j=0

f  j

k=0,k= j

(z − zk)

z j − zk

=

n

 j=0 f  j zn +

· · ·nk=0,k= j(z j − zk) .

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 38/50

34 2.1. DIVIDED DIFFERENCE

Clearly the leading coefficient of  zn in the Lagrange form is

n

 j=0

f  j

k=0,k= j(z j − zk)

= f [z0, . . . , zn].

If  f  j = f (z j) for some f  ∈ Pn−1 then pn = f  ∈ Pn−1 as the interpolating polynomialis unique. Therefore the leading coefficient of  pn f [z0, . . . , zn] is 0.

Note that

 pn(z) = pn−1(z) + f [z0, . . . , zn]n−1k=0

(z − zk),

 pn−1(z) = pn−2(z) + f [z0, . . . , zn−1]n−2

k=0

(z − zk),

...

 p1(z) = p0(z) + f [z0, z1](z − z0)

 p0(z) = f 0 = f [z0]

and so we can write

 pn(z) = f [z0] +n

 j=1

f [z0, . . . , z j]

 j−1k=0

(z − zk).

Call this the Newton form  of the interpolating polynomial.Newton form 

2.1 Divided difference

Call f [z0, . . . , zn] the divided difference .divided difference 

Theorem 2.6. For any distinct complex numbers z0, z1, . . . , zn+1, the divided dif-ference satisfies the recurrence

f [z0, z1, . . . , zn+1] =f [z0, . . . , zn] − f [z1, . . . , zn+1]

z0 − zn+1.

Proof. Given

{(z j, f  j)

}

n+1 j=0 we construct pn, q n

∈Pn such that pn(z j) = f  j for j =

0 → n and q n(z j) = f  j for j = 1 → n + 1. Observe that f [z0, . . . , zn] is the coefficientof  zn in pn(z) and that f [z1, . . . , zn+1] is the coefficient of  zn in q n(z). Then

rn+1(z) =(z − zn+1) pn(z) − (z − z0)q n(z)

z0 − zn+1∈ Pn+1

and hence

rn+1(z0) = pn(z0) = f 0,

rn+1(zn+1) = q n(zn+1) = f n+1,

rn+1(z j) =(z j − zn+1)f  j − (z j − z0)f  j

z0

−zn+1

= f  j

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 39/50

2.2. FINDING THE ERROR 35

for j = 1 → n. Therefore rn+1(z) is the interpolating polynomial of  {(z j, f  j)}n+1 j=0 .

Since f [z0, . . . , zn+1] is the coefficient of  zn+1 in rn+1(z),

f [z0, . . . , zn+1] = f [z0, . . . , zn] − f [z1, . . . , zn+1]z0 − zn+1.

2.2 Finding the error

Given {(z j, f  j)}n j=0, z j, f  j ∈ C, z j distinct, there exists an interpolating polynomial pn ∈ Pn such that pn(z j) = f  j for j = 0 → n. The Newton Form of  pn(z) is

 pn(z) = f [z0] +n

 j=1

f [z0, . . . , z j]

 j−1

k=0

(z

−zk).

Theorem 2.6 gives a recurrence relation

f [z0, . . . , z j+1] =f [z0, . . . , z j] − f [z1, . . . , z j+1]

z0 − z j+1.

Can construct a divided difference table  divided difference 

table z0, f [z0],z1, f [z1], f [z0, z1],z2, f [z2], f [z1, z2], f [z0, z1, z2],

... . . .

zn, f [zn], f [zn−1, zn], · · · · · · f [z0, . . . , zn].

Note that the diagonal entries appear in the Newton form of  pn(z).

Example. For n = 2 and

{(z j, f  j)}2 j=0 = {(0, a), (1, b), (4, c)} ,

we have

f [z0] = f 0 = a,

f [z1] = f 1 = b, f [z0, z1] =f [z0] − f [z1]

z0 − z1=

a − b

−1= b − a,

f [z2] = f 2 = c, f [z1, z2] =f [z1] − f [z2]

z1 − z2=

b − c

−3=

c − b

3.

Therefore

f [z0, z1, z2] =(b − a) −

c−b3

−4

=a

4− b

3+

c

12

and so the Newton Form of  p2(z) is

 p2(z) = a + (b − a)(z − z0) + a

4 −b

3 +

c

12 (z − z0)(z − z1).

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 40/50

36 2.2. FINDING THE ERROR

Theorem 2.7. Let pn(z) interpolate f (z) at n + 1 distinct points {z j}n j=0, z j ∈ C.Then the error e(z) = f (z) − pn(z) is

e(z) = f [z0, . . . , zn, z]

nk=0

(z − zk)

for z = z j and e(z j) = 0 for j = 0 → n.

Proof. Polynomial pn(z) interpolates f (z) at {z j}n j=0. Add a new distinct point z.Newton Form of  pn+1(z) is

 pn+1(z) = pn(z) + f [z0, . . . , zn, z]n

k=0

(z − zk).

Therefore

e(z) = f (z) − pn(z) = f [z0, . . . , zn, z]n

k=0

(z − zk).

Theorem 2.8. Let f  ∈ C n[x0, xn], f  and its first n derivatives continuous over[x0, xn], xi ordered x0 < x1 < · · · < xn. Then there exists ξ ∈ [x0, xn] such that

f [x0, x1, . . . , xn] =1

n!f (n)(ξ ).

Proof. Let pn(x) interpolate f  at xi, i = 0 → n. Let e(x) = f (x) − pn(x) and soe(xi) = 0, i = 0 → n, therefore e(x) has at least (n +1) zeroes in [x0, xn]. By Rolle’sTheorem

e′(x) has at least n zeroes in [x0, xn],

e′′(x) has at least n − 1 zeroes in [x0, xn],

en(x) has at least 1 zero ξ  in [x0, xn].

Since e(x) = f (x) − pn(x), e(n)(x) = f (n)(x) − p(n)n (x). The Newton Form of  pn(x) is

 pn(x) = pn−1(x) + f [x0, . . . , xn]n−1i=0

(x − xi)

= f [x0, . . . , xn]xn + · · · .

Therefore

 p(n)n (x) = n!f [x0, . . . , xn]

and hence, as e(n)(ξ ) = 0,

f (n)(ξ ) = p(n)n (ξ ) = n!f [x0, . . . , xn].

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 41/50

2.2. FINDING THE ERROR 37

Theorem 2.9. Let f  ∈ C n+1[a, b] and {xi}ni=0 distinct in [a, b]. If pn ∈ Pn interpo-lates f  at {xi}ni=0, then e(x) = f (x) − pn(x) satisfies

|e(x)| ≤ 1(n + 1)!

n

i=0

(x − xi) maxa≤y≤b

f (n+1)(y)for all x ∈ [a, b].

Proof. Result trivially true at x = xi since e(xi) = 0 for i = 0 → n¡++¿. FromTheorem 2.7,

e(x) = f [x0, . . . , x]n

k=0

(x − xk).

From Theorem 2.8, there exists ξ x ∈ [a, b] such that

e(x) =f (n+1)(ξ x)

(n + 1)!

nk=0

(x − xk).

Therefore

|e(x)| =1

(n + 1)!

n

k=0

(x − xk)

f (n+1)(ξ x)

≤ 1

(n + 1)!

nk=0

(x − xk)

maxa≤y≤b

f (n+1)(y)

.

Definition. The infinity norm  of  g ∈ C [a, b] is infinity norm 

g∞ = maxa≤x≤b

|g(x)|.

Note. Beware that

f − pn∞ → 0

as n → ∞ in all cases.

Example.

1. Let [a, b] = [−12 ,

12 ], f (x) = e

x

, xi ∈ [a, b], i = 0 → n. Then |x − xi| ≤ 1 and soni=0(x − xi)∞ ≤ 1. Also f (n+1)∞ = ex∞ = e1/2. Therefore

f − pn∞ ≤ e1/2

(n + 1)!→ 0

as n → ∞.

2. For any [a, b] and f (x) = cos x, f (n+1)∞ ≤ 1. Also

cos − pn∞ ≤ (b − a)n+1

(n + 1)!→ 0

as n → ∞. Therefore pn(x) → cos x for all x.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 42/50

38 2.3. BEST APPROXIMATION

3. Let [a, b] = [0, 1], f (x) = (1 + x)−1. Then

f ′(x) = (−1)(1 + x)−2,

f (n+1)

(x) = (−1)n+1

(1 + x)−(n+2)

(n + 1)!

and therefore f (n+1)∞ ≤ (n + 1)!. Hence

f − pn∞ ≤ 1

(n + 1)!(n + 1)! = 1 → 0

as n → ∞.

2.3 Best Approximation

Given [a, b] and f  ∈ C [a, b], we want to choose interpolation points {xk}nk=0 in [a, b]

to minimize nk=0(x − xk)∞, i.e. to find

min{xk}nk=0

n

k=0

(x − xk)

,

i.e.

minqn∈Pn

xn+1 − q n(x)∞.

Consider the more general problem: to find

minqn∈Pn

g − q n(x)∞,

that is to find q ⋆n such that

g − q ⋆n∞ ≤ g − q n∞

for all q n ∈ Pn. Call such q ⋆n ∈ Pn the best approximation .

Theorem 2.10 (Equioscillation Property). Let g ∈ C [a, b] and n ≥ 0. Supposethere exists q ⋆n ∈ Pn and n + 2 distinct points {x⋆

 j}n+1 j=0 ,

a ≤ x⋆0 < x⋆1 < · · · < x⋆n+1 ≤ b,

such thatg(x⋆

 j ) − q ⋆n(x⋆ j ) = (−1) jσg − q ⋆n∞

for j = 0 → n + 1 where σ = ±1.Then q ⋆n is the best approximation to g from Pn with respect to · ∞, that is

g − q ⋆n∞ ≤ g − q n∞for all q n ∈ Pn.

Note. Call {x⋆k}n+1k=0 the equioscillation points .equioscillation points 

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 43/50

2.3. BEST APPROXIMATION 39

Proof. Let E  = g − q ⋆n∞. If  E  = 0, then q ⋆n = g is the best approximation. If E > 0, then suppose that there exists q n ∈ Pn such that g − q n∞ < E . Considerq ⋆n − q n ∈ Pn at the n + 2 points {x⋆

 j}n+1 j=0 :

q ⋆n(x⋆ j) − q n(x⋆

 j ) = (q ⋆n(x⋆ j ) − g(x⋆

 j )) + (g(x⋆ j ) − q n(x⋆

 j))

= σ(−1) j+1E + γ  j

with γ  j ∈ R and |γ  j| < E . Thus

sgn((q ⋆n − q n)(x⋆ j )) = sgn(σ(−1) j+1E )

and therefore q ⋆n − q n ∈ Pn changes sign n + 1 times and hence has n +1 roots. Thenby the Fundamental Theorem of Algebra q ⋆n ≡ q n; a contradiction, so q n does notexist and q ⋆n is the best approximation.

Theorem 2.11 (Chebyshev Equioscillation Theorem). Let g ∈ C [a, b] and n ≥ 0.Then there exists a unique q ⋆n ∈ Pn satisfying the equioscillation property such that

g − q ⋆n∞ ≤ g − q n∞for all q n ∈ Pn.

Note. Construction of  q ⋆n is difficult, we use best approximation in the least squaresense. But if  g(x) = xn+1, the construction of  q ⋆n is easy.

Lemma 2.12. If  g(x) = xn+1 on [−1, 1], then the best approximation to g by Pn

with respect to · ∞ is

q ⋆n(x) = xn+1 − 2−nT n+1(x)

where T n+1(x) is the Chebyshev Polynomial of degree n + 1, i.e.

T n+1(x) = cos((n + 1) cos−1 x).

Proof. We first need to show that q ⋆n is really in Pn: recall that

T 0(x) = 1,

T 1(x) = x,

T n+1(x) = 2xT n(x) − T n−1(x)

and soT n+1(x) = 2nxn+1 + · · · .

Therefore q ⋆n ∈ P n.The error is xn+1 − q ⋆n(x) = 2−nT n+1(x) for x ∈ [−1, 1]. Change the variable:x = cos θ, so θ = cos−1 x ∈ [0, π]. Then

T n+1(x) = cos((n + 1)θ).

Hence

xn+1 − q ⋆n∞ = max−1≤x≤1 | cos((n + 1) cos−1 x)| = 1.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 44/50

40 2.3. BEST APPROXIMATION

Choose

θ⋆ j =

n + 1

for j = 0 → n + 1 and so x⋆ j = cos θ⋆ j = cos jπn+1 . Then

T n+1(x⋆ j ) = cos((n + 1)θ⋆

 j ) = cos( jπ) = (−1) j.

Hence xn+1 − 2−nT n(x) satisfies the equioscillation property and is thus the bestapproximation to xn+1 in Pn.

Note. Note that the points are equally spaced in terms of  θ, but clustered aroundthe end points ±1 in terms of  x.

Example. The interpolation points are the zeros of the error E . Therefore

n j=0

(x − x j) = xn+1 − q ⋆n = 2−nT n+1(x).

Choose

θ j =(2 j + 1)

2(n + 1)

and so

x j = cos

(2 j + 1)π

2(n + 1)

.

Then

T n+1(x j) = cos(n + 1)θ j = cos(2 j + 1)π

2

= 0.

Therefore cos

(2 j + 1)π

2(n + 1)

n

 j=0

are the optimal Chebyshev Interpolation points for pn ∈ Pn on [−1, 1].

Generalize this for an interval [a, b]. For x ∈ [a, b], introduce t = 2x−(a+b)b−a ∈ [−1, 1],

so x = 12 [(b − a)t + (a + b)]. Then the optimal interpolation points for [a, b] are

x j =

1

2 (b − a)cos(2 j + 1)π

2(n + 1) + (a + b)for j = 0 → n.

Proof. Need to find

min{xj}nj=0∈[a,b]

n

 j=0

(x − x j)

.

That is to find

minqn∈Pn b

−a

2 n+1 2x

−(a + b)

b − a n+1

− q n(x)∞

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 45/50

2.4. PIECEWISE POLYNOMIAL INTERPOLATION 41

for [a, b]. That is the same as finding

minq̂n∈Pn

b − a

2 n+1

tn+1 − q̂ n(t)∞

for [−1, 1] with

q n(x) =

b − a

2

n+1

q̂ n(t).

Therefore

q ⋆n(x) =

b − a

2

n+1

2x − (a + b)

b − a

n+1

− 2−nT n+1

2x − (a + b)

b − a

.

Using the Equioscillation Property, get

t⋆ j = cosjπ

n + 1

and so

x⋆ j =

(b − a)cos jπn+1 + a + b

2.

2.4 Piecewise Polynomial Interpolation

We can try to decrease the error of Polynomial Interpolation by either increasing theorder of the interpolating polynomial or decreasing the interval between individualinterpolation points (i.e. increasing the number of them).

We can also consider piecewise linears . For given ordered equally spaced interpolationpoints {xi}ni=0 with x0 = a, xn = b, x j − x j−1 = h, we can use linear interpolationon each subinterval [x j−1, x j] for j = 1 → n. Define for x ∈ [x j−1, x j], j = 1 → n,

P L(x) = f (x j−1 +(x − x j−1)

n(f (x j) − f (x j−1))

and so P L(x j−1) = f (x j−1), P L(x j) = f (x j). The error is

f − P L∞ = maxa≤x≤b

|f (x) − P L(x)|

= max j=1→J 

max

xj−1<x≤xj|f (x) − P L(x)|

= max j=1→J 

maxxj−1<x≤xj

|(x − x j−1)(x − x j)|2!

f ′′(z j)where z j ∈ (x j − 1, x j). Since maximum of  |(x − x j−1)(x − x j)| occurs at x =(x j−1 + x j)/2,

f − P L∞ ≤ max j=1→n

h2

8f ′′(z j)

≤ h2

8f ′′∞.

Then for h

→0, P L

→f  provided f 

∈C 2[a, b]. We can generalize this method to

piecewise quadratics, cubics etc.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 46/50

42 2.4. PIECEWISE POLYNOMIAL INTERPOLATION

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 47/50

43

Chapter 3

Quadrature (NumericalIntegration)

We are given an interval [a, b] and a weight function  w(x) ∈ C (a, b), such that

w(x) > 0 except for a finite number of zeroes and ba w(x)dx < ∞. Now given a

function f (x), we want to approximate

I (f ) =

 ba

w(x)f (x)dx

by approximating f (x) by an interpolating polynomial pn(x), that is approximateI (f ) by

I n(f ) = I ( pn) =  ba w(x) pn(x)dx.

The Lagrange form of  pn(x) is

 pn(x) =n

k=0

f (xk)lk(x), lk =

 j=0,j=k

(x − xi)

xk − x j.

Hence

I ( pn) =

 ba

w(x)

n

k=0

f (xk)lk(x)

dx

=n

k=0

 ba

w(x)lk(x)dx

=n

k=0

wkf (xk)

where wk = ba w(x)lk(x)dx for k = 0 → n.

Example. Let [a, b] = [0, 1] and w(x) = x−1/2, n = 1, x0 = 1, x1 = 1. ApproximateI (f ) =

 10 x−1/2f (x)dx by

I 1(f ) =

1

k=0

wkf (xk)

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 48/50

44

where

w0 =  1

0

x−1/2(1−

x)dx = x1/2

1/2 −x3/2

3/2 1

0

= 4/3,

w1 =

 1

0x−1/2xdx =

2

3.

Hence

I 1(f ) =4

3f (x0) +

2

3f (x1).

If  w(x) ≡ 1, we get

I 1(f ) =1

2[f (x0) + f (x1)] ,

the trapesium rule .

In general, the error of approximation is

|I (f ) − I n(f )| =

 ba

w(x) [f (x) − pn(x)] dx

 ba

w(x)dxf − pn∞.

The error is zero if f  ∈ Pn, regardless of the interpolation (sampling) points {xk}nk=0.

Otherwise, we can choose{

xk

}n

k=0in a smart way so that I n(f ) = I (f ) for all f 

∈Pm,

where m > n is as large as possible.

Lemma 3.1. The orthogonal polynomial φn has n distinct roots in [a, b].

Proof. Let σ denote the number of sign changes of φn in [a, b]. If σ < n, let x1, . . . , xσ

denote the ordered points in [a, b] where φn changes sign. Consider q σ(x) = (x −x1) · · · (x − xσ). We have 2 possibilities – q σ is either positive or negative at b.

• If it is positive, then

φn, q σ = ba

w(x) φn(x)q σ(x)   >0 except at xi

dx > 0.

• If it is negative, then

φn, −q σ = − ba

w(x) φn(x)q σ(x)    <0 except at xi

dx > 0.

Contradiction as φn is the orthogonal polynomial of degree n, i.e. it is orthogonal

to all polynomials in Pn−1. Therefore σ ≥ n.

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 49/50

45

Theorem 3.2. Let w ∈ C (a, b) with w > 0 except for a finite number of points and ba w(x)dx < ∞. Let φn+1 be the orthogonal polynomial of degree n + 1 associated

with the inner product

g1, g2 =

 ba

w(x)g1(x)g2(x)dx.

Let {x⋆i }ni=0, x⋆

i ∈ [a, b] be the n + 1 distinct zeroes of  φn+1 (see the above lemma).If we approximate

I (f ) =

 ba

w(x)f (x)dx

by I n(f ) = I ( pn) where pn ∈ Pn is such that pn(x⋆i ) = f (x⋆

i ) for i = 0 → n, then

I n(f ) =n

i=0

w⋆i f (x⋆

i ),

w⋆i =

 ba

w(x)li(x)dx,

li =n

 j=0,j=i

(x − x⋆ j )

(x⋆i − x⋆

 j )

for i = 0 → n. Also I n(f ) = I (f ) for all f  ∈ P2n+1.

Proof. Let f  ∈ P2n+1. Then f  − pn ∈ P2n+1 has roots at {x⋆i }ni=0 and therefore

− pn = q nφn+1 for some q n

∈Pn. Then

I (f ) − I n(f ) = I (f ) − I ( pn)

=

 ba

w(x) [f (x) − pn(x)] dx

=

 ba

w(x)q n(x)φn+1(x)dx

= q n, φn+1 = 0

as φn+1 is the orthogonal polynomial of degree n + 1. Hence I n(f ) = I (f ) for allf  ∈ P2n+1.

With n + 1 sampling points, it is not  possible to choose wi, i = 0 → n, such thatI n(f ) = I (f ) for all f  ∈ P2n+2. Consider f (x) =

ni=0(x − xi)

2 ∈ P2n+2. ClearlyI (f ) > 0, but I n(f ) = 0.

Choosing the sampling points as the roots of  φn+1 is called Gaussian Quadrature . Gaussian Quadrature 

Example. Let [a, b] = [−1, 1], w ≡ 1 and

g1, g2 =

 1

−1g1(x)g2(x)dx.

For n = 1,

I 1(f ) = w⋆1f (x⋆

0) + w⋆1f (x⋆

1).

7/27/2019 m2n1

http://slidepdf.com/reader/full/m2n1 50/50

46

Recall that φ2(x) = x2 − 1/3 and so x⋆0 = −1/

√ 3, x⋆

1 = 1/√ 

3. Now we determinew⋆

0, w⋆1. Observe that

I 1(1) = w⋆0 + w⋆1 = I (1) =  1

−11dx = 2,

I 1(x) =1√ 

3(−w⋆

0 + w⋆1) = I (x) =

 1

−1xdx = 0

and hence w⋆0 = w⋆

1 = 1. Therefore I 1(f ) = f (−1/√ 

3) + f (1/√ 

3). Also I 1(x2) =2/3 = I (x2) and I 1(x3) = 0 = I (x3).For n = 2, φ3(x) = x3 − (3/5)x and so x⋆

0 = − 3/5, x⋆

1 = 0, x⋆2 =

 3/5. Therefore

I 2(f ) = w⋆0f (−

 3/5) + w⋆

1f (0) + w⋆2f (

 3/5).

This is exact for cubics and so

I 2(1) = w⋆0 + w⋆

1 + w⋆2 = 2,

I 2(x) = − 

3/5w⋆0 +

 3/5w⋆

2 = 0,

I 2(x2) =3

5w⋆

0 +3

5w⋆

2 =2

3.

Hence

I 2(f ) =1

9

5f (−

 3/5) + 8f (0) + 5f (

 3/5)

.