m2n1
-
Upload
david-toth -
Category
Documents
-
view
214 -
download
0
Transcript of m2n1
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 1/50
M2N1Numerical Analysis
Mathematics
Imperial College London
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 3/50
CONTENTS iii
Contents
1 Applied Linear Algebra 11.1 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Gram-Schmidt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 QR Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Cauchy-Schwartz inequality . . . . . . . . . . . . . . . . . . . . . . . 91.5 Gradients and Hessians . . . . . . . . . . . . . . . . . . . . . . . . . 101.6 Generalized inner product . . . . . . . . . . . . . . . . . . . . . . . . 141.7 Cholesky Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . 151.8 Least Square Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.8.1 General Least Squares Case . . . . . . . . . . . . . . . . . . . 211.9 A more abstract approach . . . . . . . . . . . . . . . . . . . . . . . . 231.10 Orthogonal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 Polynomial interpolation 312.1 Divided difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2 Finding the error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.3 Best Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.4 Piecewise Polynomial Interpolation . . . . . . . . . . . . . . . . . . . 41
3 Quadrature (Numerical Integration) 43
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 5/50
1
Chapter 1
Applied Linear Algebra
1.1 Orthogonality
Definition. Let a, b ∈ Rn. We define the inner product of a, b to be inner product
, a, b = aTb =
ni=1
aibi.
Also define the outer product of a and b to be outer product
abT =
a1...
an
b1 · · · bn
=
a1b1 · · · a1bn
.... . .
...
anb1 · · · anbn
.
Note. Note that:
1. The inner product is symmetric :
a, b =n
i=1
aibi
=n
i=1
biai = b, a
for all a, b ∈ R.
2. The inner product is linear with respect to the second argument:
a,µb + λc =n
i=1
ai(µbi + λci) = µn
i=1
aibi + λn
i=1
aici
= µa, b + λa, c
for all a,b,c ∈ Rn, µ, λ ∈ R.
3. From 1. and 2. get that the inner product is linear with respect to the second
argument.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 6/50
2 1.1. ORTHOGONALITY
4. Observe that
a, a =n
i=1
a2i ≥ 0.
Definition. Letaa = [a, a]1/2
be the length (or norm ) of a.length norm
Definition. We say that a, b ∈ Rn, a, b = 0 are orthogonal if a, b = 0.orthogonal
Example.
Claim. If a, b ∈ Rn are orthogonal, then a + b2 = a2 + b2.
Proof.
a + b2 def = a + b, a + b= a + b, a + a + b, b= a2 + b2 + 2a, b= a2 + b2.
Definition. A set of non-trivial vectors {q k}nk=1, q
k∈ Rn, q
k= 0 for k = 1 → n is
orthogonal if q j
, q k = 0 for j, k = 1 → n, j = k.
As a useful shorthand, introduce the Kronecker Delta notation δ jk
δ jk =
1 j = k,0 j = k.
Example. For the n × n identity matrix I we have I jk = δ jk .
Definition. A set of non-trivial vectors {q k}nk=1, q
k∈ Rn, q
k= 0 for k = 1 → n is
orthonormal if orthonormal
q j
, q k = δ jk
for j, k = 1 → n.
Note. A set of vectors is orthonormal if it is orthogonal and each vector has unitlength.
Definition. A set of vectors {ak}nk=1, ak ∈ Rm for k = 1 → n is linearly independent linearly independent if
nk=1
ckak = 0
implies ck = 0 for k = 1 → n. The set {ak}nk=1 is linearly dependent if there existlinearly dependent coefficients ck ∈ R, k = 1 → n, not all zero, such that
n
k=1
ckak = 0.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 7/50
1.2. GRAM-SCHMIDT 3
Note. Recall that for A =
a1...
an
, ak ∈ Rm for k = 1 → n:
(1) If the only solution to Ac = 0 is c = 0 then {ak}nk=1 are linearly independent.
(2) If there exists c = 0 such that Ac = 0 then {ak}nk=1 are linearly dependent.
(3) Restrict to m = n (so that A is square). If A−1 exists then rows (columns) of A are linearly independent. If {ak}nk=1 are linearly independent they form abasis for Rn and each vector x ∈ Rn can be uniquely expressed a combinationof ai’s.
Lemma 1.1. Let {ak}nk=1, ak ∈ Rm, k = 1 → n, be orthogonal. Then {ak}nk=1 is
linearly independent.
Proof. If n
k=1 ckak = 0 then for 1 ≤ j ≤ n
n
k=1
ckak, a j = 0, a jn
k=1
ckak, a j = 0
c ja j, a j = c ja j2 = 0.
Since a j’s are non-trivial, c j = 0. Repeat for all j = 1 → n.
Remark 1.1. Linear independence does not imply orthogonality. For example take
n = m = 2 and a1 =
20
and a2 =
31
which are clearly linearly independent
but not orthogonal.
1.2 Gram-Schmidt
1 Classical Gram-Schmidt Algorithm (CGS)1: v1 = a1
2: q 1
= v1/v13: for k = 2 to n do4: vk = ak −k−1
j=1 ak, q jq
j
5: q k
= vk/vk6: end for
Claim. Given {ai}ni=1, ai ∈ Rm, i = 1 → n linearly independent (so n ≤ m), CGSfinds {q
i}ni=1, q
i∈ Rm, i = 1 → n, orthogonal, i.e. q
i, q
j = δ ij with Span {ai}ni=1 =
Span{q i}ni=1, i, j = 1 → n.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 8/50
4 1.2. GRAM-SCHMIDT
Proof. Since {ai}ni=1 are linearly independent, ai = 0 for i = 1 → n. For k = 1, weget
q 1
=v1
v1,
q 1 =
q
1, q
11/2
=
1
v12v1, v1
1/2
= 1
For k = 2. From the code have
v2 = a2 − a2, q 1q
1. (†)
Check that v2
is orthogonal to q 1
:
v2, q 1 = a2, q
1 − a2, q
1 q
1, q
1
q1
= 0.
Need to check that v2 = 0. If v2 = 0, then by (†), a2 equals to a2, q 1q
1, which is
a multiple of a1; contradiction to linear independence of {ai}ni=1.Therefore v2 = 0 and q
2has unit length and is a multiple of v2 and hence {q
i}2i=1 is
orthonormal. Clearly Span{ai}2i=1 = Span{q
i}2i=1.
Assume the statement is true for k − 1, i.e. that {q i}k−1i=1 is orthonormal and
q j
= linear combination of {ai} ji=1
a j = linear combination of {q i} ji=1
for j = 1 → k − 1. (⋆)
Set
vk = ak −k−1i=1
ak, q iq
i.
Then vk is orthogonal to all q j
, j = 1 → k − 1:
vk, q j = q
k, q
j −
k−1
i=1
ak, q i q
i, q
j
δij
= ak, q j − ak, q
j
= 0.
If vk = 0 then
ak =k−1i=1
ak, q iq
i
= linear combination of {q i}k−1i=1
= linear combination of
{ai
}k−1i=1
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 9/50
1.3. QR FACTORIZATION 5
by (⋆); contradiction to {ai}ni=1 linearly independent.Hence vk = 0. From q
k=
vkvk get {q
i}ki=1 orthonormal. Since q
kis a linear
combination of
{q i
}k−1i=1 and ak, it is a linear combination of
{ai
}ki=1 by (⋆). Similarly
by (⋆), ak is a linear combination of {q i}ki=1.Hence the result follows by induction.
1.3 QR Factorization
Look at CGS from different viewpoint. For {ai}ni=1, CGS gives {q i}ni=1 orthonormal.
Let
Am×n
=
a1 . . . an
,
Q̂m×n
= q 1
. . . q n .
Let R̂n×n
be an upper triangular matrix
R̂lk =
rlk l ≤ k,
0 l > k
and define e(n)k ∈ Rn to be e
(n)k
(e(n)k ) j = δ kj .
for j = 1 → n. Then clearly for any B ∈ Rm×n, Be(n)k = k-th column of B. From
CGS we have a1 =
v1
q
1, let r11 =
a1
. Also for k = 1
→n
Ae(n)k = ak = vk +
k−1i=1
ak, q iq
i
= vkq k
+k−1i=1
ak, q iq
i
=k
i=1
rikq i
= Q̂R̂e(n)k ,
where rkk =
vk
> 0 and rik =
ak, q
i
. Hence A = Q̂R̂.
Expressing A ∈ Rm×n as a product of Q̂ ∈ Rm×n with orthonormal columns andR̂ ∈ Rn×n upper triangular with positive diagonal entries is called the reduced QR
factorisation of A. reduced QR factorisation Now take Q ∈ Rm×m
Q =
Q̂ q n+1
. . . q m
with q
n+1, . . . , q
mchosen so that the columns of Q are orthonormal and R ∈ Rm×n
R =
R̂0...0
.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 10/50
6 1.3. QR FACTORIZATION
Clearly, R is upper triangular matrix (as R̂ is). Call A expressed as a product of Qand R as the QR factorisation of A.QR factorisation
Observe the product of Q with QT
QTQ
jk
= q T j
q k
= q i, q
k
= δ jk
so QTQ = I (m) and also QT = Q−1.
Definition. Matrix Q ∈ Rm×m is called orthogonal if QTQ = I (m).orthogonal
Proposition 1.2. Orthogonal matrices preserve length and angle, i.e. if Q ∈ Rm×m
and QT
Q = I (m)
then for all v, w ∈Rm
(1) Qv,Qw = v, w (angle preserved),
(2) Qv = v (length preserved).
Proof. For v, w ∈ Rm
Qv,Qw = (Qv)TQw
= (vTQT)Qw
= vTI (m)w
= v, w.
Also
Qv = [Qv,Qv]1/2
(1)= [v, v]1/2
= v.
Proposition 1.3. If Q1, Q2 ∈ Rm×m are orthogonal, then (Q1Q2) is orthogonal.
Proof. (Q1Q2)T(Q1Q2) = QT2 QT
1 Q1Q2 = QT2 Q2 = I (m).
Example. For m = 2 and
Q =
cos θ − sin θsin θ cos θ
.
Clearly Q is orthogonal and rotates a vector in R2 by an angle around the origin.
Definition. Define the Givens Rotation Matrix G pq(θ) ∈ Rm×m, p, q ≤ m, asG pq(θ)
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 11/50
1.3. QR FACTORIZATION 7
G pq(θ) =
1. . .
cos θ − sin θ. . .
sin θ cos θ. . .
1
,
i.e. jth column of G pq(θ) is
e(m) j =
0...0
10...0
(with 1 on jth row) if j = p and j = q , or
0...
cos θ...
sin θ...0
= e(m) p cos θ + e(m)
q sin θ
if j = p, or
0...
− sin θ...
cos θ...0
= −e(m) p sin θ + e(m)
q cos θ
if j = q .
Note. Length of every column of G pq(θ) is 1 and columns of G pq(θ) are orthogonal;G pq(θ) is orthogonal matrix.
For A, B ∈ Rm×n consider
G pq(θ)A = B.
All rows of B are the same as those of A, except for rows p and q . Aim is to obtain
a QR factorisation of A using a sequence of givens rotations.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 12/50
8 1.3. QR FACTORIZATION
Example. For m = 3, n = 2,
A = 3 654 0
12 13 .
Take a sequence of Givens Rotations so that A is transformed into R upper triangular.Choose G12(θ) so that
A(1) = G12(θ)A =
· ·
0 ·12 13
.
Choose θ such that (since G12θ preserves length)
cos θ − sin θ
sin θ cos θ 3
4 = 5
0 .
Get
G12(θ) =
3
545 0
−45
35 0
0 0 1
and so
A(1) =
5 39
0 −5212 13
.
Choose G13(ϕ) as the next rotation since it does not affect row 2, so A(2)21 stays 0
(G23(ϕ) would not work). Want the first column of A(2) to be a multiple λ of e(3)1 .
Since G13(ϕ) preserves length, we know λ is (52 + 122)1/2 = 13. So
G13(ϕ) =
5
13 0 1213
0 1 0−12
13 0 513
and so
A(2) = G13(ϕ)A(1) =
13 27
0 −520 −31
.
Now choose G23(ψ)
G23(ψ) =1√
3665
1 0 0
0 −52 −310 31 −52
to get
R = A(3) = G23(ψ)A(2) =
13 27
0√
36650 0
.
So
R = G23(ψ)G13(ϕ)G12(θ) G
A
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 13/50
1.4. CAUCHY-SCHWARTZ INEQUALITY 9
with G being orthogonal since Givens rotations are orthogonal. Then
R = GA
GTGA = GTR
A = QR
with Q = GT.In general, we want to solve Ax = b for A ∈ Rm×n. We apply a sequence of Givens Rotations G to take A to R upper triangular to get an equivalent systemGAx = Rx = Gb = c.If m > n and if ci = 0 for some i = n + 1 → m then there is no solution x to Rx = cand the system is said to be inconsistent . Otherwise there exists a unique solutionx which can be found by backward substitution.
1.4 Cauchy-Schwartz inequalityFor vectors a, b ∈ R3
a · b = |a||b| cos θ.
Generalize this to Rn.
Theorem 1.4 (Cauchy-Schwartx inequality). For a, b ∈ Rn
|a, b| ≤ ab
with equality iff a and b are linearly dependent.
Proof. If a = 0 then
a, b
= 0 for all b
∈Rn and so the inequality is trivially true.
If a = 0 then let q = aa and c = b − b, q q so that
c, q = b, q − b, q q, q = 0.
We have
0 ≤ c2 = c, c= c, b − b, q q = c, b − b, q c, q = c, b= b − b, q q, b= b2 − b, q q, b= b2 − b, q 2
= b2 − [b, a]2 /a2
|a, b | ≤ abwith equality iff c = 0, i.e.
b = b, q q
= b, aa/a2,
i.e. a, b are linearly dependent.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 14/50
10 1.5. GRADIENTS AND HESSIANS
1.5 Gradients and Hessians
For a function of one variable f : R → R have a Taylor series
f (a + h) = f (a) + hf ′(a) + h2
2!f ′′(a) + o(h3).
Now consider functions of n variables, i.e. f : Rn → R. Write f (x) where x = x1
...xn
∈ Rn. We define the partial derivative of f with respect to xi
∂f ∂xi
to be a∂f ∂xi
derivative of f when taking all x j, j = i, as constants.
Example. For n = 2, x =
x1
x2
, f (x) = f (x1, x2) = sin x1 sin x2. Then the first
derivatives are
∂f ∂x1
(x) = cos x1 sin x2,
∂f
∂x2(x) = sin x1 cos x2.
Generally, the second derivatives are
∂ 2f
∂xi∂x j=
∂
∂xi
∂f
∂x j(x)
=∂
∂x j ∂f
∂xi(x)
for i, j = 1 → n and f sufficiently smooth.
Example. f (x) = sin x1 cos x2 . Then
∂ 2f
∂x21
(x) =∂
∂x1
∂f
∂x1(x)
= − sin x1 cos x2,
∂ 2f
∂x22
(x) =∂
∂x2
∂f
∂x2(x)
= − sin x1 cos x2,
∂ 2f
∂x1x2(x) =
∂
∂x2 ∂f
∂x1(x)
= − cos x1 sin x2.
Chain Rule
For f : R → R, f (x), we can change the variable x so that x = x(t) or t = t(x) anddefine w(t) = f (x(t)). Then
dw(t)
dt=
df
dx(x(t))
dx
dt(t).
Generalize to n variables. If w(t) = f (x(t)) then
dw
dt
=n
i=1
∂f
∂xi
(x(t))dxi
dt
(t).
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 15/50
1.5. GRADIENTS AND HESSIANS 11
Example. For n = 2, f (x) = sin x1 sin x2, x1(t) = t2, x2(t) = cos t and hencew(t) = sin t2 sin(cos t). We have
dw
dt = 2t cos t2 sin(cos t) + sin t2(cos(cos t))(− sin t)
=∂f
∂x1(x(t))
dx1
dt(t) +
∂f
∂x2(x(t))
dx2
dt(t).
For general w(t) = f (a + th),
dmw
dtm=
n
i=1
hi∂
∂xi
m
f (a + th).
Now can generalize the Taylor series to get
f (a + h) = f (a) +n
i=1
hi∂
∂x1f (a) +
12
ni=1
hi∂
∂xi
n j=1
h j∂
∂x j
f (a) + O(h3).
Definition. For a function f : Rn → R, call the vector ▽f (x) ∈ Rn▽f (x)
▽f (x) =
∂f ∂x1
(x)...
∂f ∂xn
(x)
the gradient of f at x. gradient
Definition. For a function f : Rn → R, call the matrix D2f (x) ∈ Rn×n D 2f (x)
D2f (x)
ij
=∂ 2f
∂xix j(x)
the hessian of f at x. hessian
Can now rewrite the Taylor series as
f (a + h) = f (a) + hT▽f (a) +
1
2hTD2f (a)h + O(h3).
Example. Let f (x) = xT
Ax for all x ∈ R
n
, where A ∈ R
n×n
is a given symmetricmatrix. Find ▽f (x) and D2f (x).Get
f (x) =n
i=1
n j=1
Aijxix j
and so
∂f
∂x p(x) =
ni=1
n j=1
Aij∂
∂x p(xix j)
=
n
i=1
n
j=1 Aij x j ∂
∂x p xi + xi ∂
∂x p x j .
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 16/50
12 1.5. GRADIENTS AND HESSIANS
Also
∂
∂x pxi =
1 if i = p,0 if i
= p
= δ ip.
Therefore
∂f
∂x p(x) =
ni=1
n j=1
Aij (δ ipx j + xiδ jp)
=n
j=1
A pjx j +n
i=1
Aipxi
and therefore
[▽f (x)] p = [Ax] p + AT
x p ,▽f (x) = Ax + ATx
= 2Ax if A is symmetric.
For the Hessian get
∂ 2f
∂xq∂x p(x) =
∂
∂xq
∂f
∂x p(x)
=
∂
∂xq
[Ax] p +
ATx
p
=
∂
∂xq n
j=1A pjx j +
ni=1
Aipxi= A pq + (AT) pq
and so for A symmetric, D2f (x) = 2A. Note the analogy with derivatives of functionsof one variable:
f (x) = ax2, f (x) = xTAx,f ′(x) = 2ax, ▽f (x) = 2Ax,
f ′′(x) = 2a, D2f (x) = 2A.
Definition. A function f : Rn → R has a local maximum [minimum ] in a if for alllocal maximum minimum u
∈Rn,
u
= 1, there exists ε > 0 such that
f (a + hu) ≤ f (a)
[≥]
for all h ∈ [0, ε].
For n = 1, f ′(a) = 0 and f ′′(a) > [<]0 are sufficient conditions for f to have a localminimum [maximum] at x = a as
f (a ± h) = f (a) ± hf ′(a) +1
2h2f ′′(a) + O(h3)
= f (a) +1
2h2f ′′(a) + O(h3)
≥ [≤]f (a) for small h.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 17/50
1.5. GRADIENTS AND HESSIANS 13
Proposition 1.5. For f : Rn → R, if ▽f (a) = 0 then f (x) does not have a localminimum or maximum at x = a, i.e. ▽f (a) = 0 is a necessary condition for f (x) tohave a local minimum or maximum at x = a.
Proof. We show that f does not have a maximum at a (analogous for minimum).Let h ≥ 0 and consider
f (a + hu) = f (a) + huT▽f (a) + O(h2).
Let
u =▽f (a)
▽f (a)so that u = 1. Then
f (a + hu) = f (a) + h▽f (a)
2
▽f (a) + O(h
2
)= f (a) + h▽f (a)
>0
> f (a).
Points a where ▽f (a) = 0 are called stationary points of f (x). stationary points
Proposition 1.6. If ▽f (a) = 0 and wTD2f (a)w > [<]0 for all w ∈ Rn, w = 0, thenf (x) has a local minimum [maximum] at x = a.
Proof. Take u such that u = 1 (and so u = 0). Then
f (a + hu) = f (a) + huT▽f (a) =0
+1
2h2 ≥0
uTD2f (a)u >[<]0
+O(h3)
≥ [≤]f (a).
Example. For n = 2, f (x) = x21 − 2x1 + x2
2 − 2x2 + 1 we have
▽f (x) = ∂f ∂x1 (x)
∂f ∂x2
(x)
=
2(x1 − 1)2(x2 − 1)
.
Look for stationary points, i.e. when ▽f (a) = 0; get a = (1, 1). Compute the Hessian
D2f (x) =
∂ 2f
∂x21
∂ 2f ∂x1x2
∂ 2f ∂x1x2
∂ 2f ∂x2
2
= 2 00 2
= 2I (2).
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 18/50
14 1.6. GENERALIZED INNER PRODUCT
Check that for all w ∈ R2, w = 0,
wTD2f (a)w = 2wTw
= 2w2
> 0.
So f has a local minimum at (1, 1).
Definition. Call a matrix A ∈ Rn×ndefinite
positive definite if xTAx > 0,negative definite if xTAx < 0,
non-negative definite if xTAx ≥ 0,non-positive definite if xTAx ≤ 0
for all x ∈ Rn, x = 0.
Note. Clearly, a positive (negative) definite matrix A∈Rn
×n is invertible since
there is no x ∈ Rn, x = 0, such that Ax = 0; if there was, then xTAx = xT0 = 0, acontradiction.
Example. For n = 2, A =
1 −1
−1 1
, x =
x1
x2
,
xTAx = x21 + x2
2 − 2x1x2
= (x1 − x2)2
≥ 0
so A is non-negative definite but not positive definite.
Using this definition, we can restate the proposition 1.6:
Proposition 1.7. If ▽f (a) = 0 and D2f (a) is positive [negative] definite then a isa local minimum [maximum] of f .
1.6 Generalized inner product
Definition. Let A be symmetric positive definite matrix A ∈ Rn×n. Define theinner product ·, ·Av, uA
v, uA = uTAv
for all v, u ∈ Rn
.
Note. We previously worked with ·, ·I = uTv.
Check that the required properties of an inner product still hold:
• symmetry
u, vA = vTAu
= (vTAu)T
= uTATv
= uTAv
= u, vA,
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 19/50
1.7. CHOLESKY FACTORIZATION 15
• linearity
u,αv + βwA = αu, vA + β u, wAαu + βv,wA = αu, wA + β v, wA
for all u,v,w ∈ Rn and α, β ∈ R.
Definition. For a positive definite matrix A ∈ Rn×n define the length of a vector length
· Au ∈ Rn as
uA = (u, uA)1/2 .
Theorem 1.8 (Generalised Cauchy-Schwartz inequality). If A ∈ Rn×n is symmetricpositive definite then
|a, b
A
| ≤ a
A
b
A
for all a, b ∈ Rn with equality iff a, b are linearly dependent.
Proof. Replace ·, · by ·, ·A and · by · A in the proof of Cauchy-Schwartzinequality.
1.7 Cholesky Factorization
Easy method of generating symmetric and positive definite matrices:
Proposition 1.9. If P
∈Rn×n is invertible, then A = P TP is symmetric and
positive definite.
Proof. Matrix A is symmetric since
AT = (P TP )T
= P TP
= A.
It is positive definite since
xTAx = xT(P TP )x
= (P x)T(P x)= P x2 ≥ 0
for all x ∈ Rn. Also if xTAx = 0 then P x = 0 and so P x = 0 and hence x = 0since P is invertible.
We now prove the reverse direction.Cholesky Factorisation
Theorem 1.10. Let A ∈ Rn×n be any symmetric positive definite matrix. Thenthere exists an invertible P ∈ Rn×n such that A = P TP . Furthermore, we canchoose P to be upper triangular with P ii > 0, i = 1 → n, in which case we say that
A = P TP is a Cholesky Factorisation (Decomposition) of A.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 20/50
16 1.7. CHOLESKY FACTORIZATION
2 Apply CGS with , A to {vi}ni=11: w1 = v1
2: w1 = v1/v13: for k = 2 to n do4: wk = vk −i−1
j=1vk, u jAu j5: uk = wk/wk6: end for
Proof. Let {vi}ni=1 be any n linearly independent vectors in Rn. Using the innerproduct induced by A, we apply Gram Schmidt (with this inner product) to {vi}ni=1
to get {ui}ni=1. Let U =
u1 . . . un
∈ Rn×n.Then (this is a proof of 1.1 generalized)
[U T(AU )]ij = uTi Au j
= ui, u jA= δ ij
for i, j = 1 → n. So U TAU = I (n).Does U −1 exist? Requires {ui}ni=1 to be linearly independent. Suppose there existsc ∈ Rn such that
ni=1 ciui = 0. Then
ni=1
ciAui = A0 = 0
uT j
n
i=1
ciAui = 0
ni=1
ciui, u jA = 0
c j = 0.
for j = 1 → n and so c = 0 and {u j}ni=1 are linearly independent.So U −1 exists and
U −1U = I (n) = [I (n)]T
= [U −1U ]T
= U T(U −1)T
and therefore (U T)−1 = (U −1)T. We let P = U −1 (so P is invertible). Observe that
P T = (U −1)T
= (U T)−1.
Therefore
P TP = P TI (n)P
= P TU TAU P = A.
To find P upper triangular with P ii > 0, need to choose
{vi
}ni=1 to be a particular
basis for Rn: for i = 1 → n let vi = e(n)i ((e(n)
i ) j = δ ij for i, j = 1 → n). Clearly,
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 21/50
1.7. CHOLESKY FACTORIZATION 17
matrix U from CGS is upper triangular since each ui is a linear combination of
e(n)1 , . . . , e
(n)i . To show that U ii > 0, observe that U ii = (ui)i = (wi/wiA)i and
that
wi = e(n)i −
i−
1 j=1
e(n)i , u jAu j.
Since (u j)k = 0 for k > j, we have that (wi)i = (e(n)i )i = 1. Hence U is upper
triangular with U ii > 0.Now choose P to be U −1. Then
U P = I (n)
U
p1
. . . pn
=
e
(n)1 . . . e
(n)n
.
For each i = 1
→n solve U p
i= e
(n)i : clearly ( p
i) j = 0 for j = i + 1
→n and
( pi)i = 1/U ii > 0 so P is upper triangular with P ii > 0 for i = 1 → n.
Proposition 1.11. Let A ∈ Rn×n be symmetric positive definite. Then Akk > 0 fork = 1 → n and |A jk | < (A jj )1/2 (Akk)1/2 for j, k = 1 → n, j = k.
Proof. Since A is symmetric positive definite, by the previous theorem, there existsan invertible P such that A = P TP . Let
P =
p1
. . . pn
.
ThenA jk = pT
jpk
= p j
, pk
for j, k = 1 → n. So Akk = pk > 0 as p
k= 0 (P is invertible and so { p
i}ni=1 are
linearly independent).Also
|A jk | = | p j p
k|
< p j p
k
= (A jj )1/2 (Akk)1/2
by Cauchy-Schwartz (strict inequality as p j
and pk
are linearly independent).
Computing Cholesky Decomposition
Given A symmetric positive definite, can find L = P T lower triangular with Lii > 0such that A = LLT by applying CGS with ·, ·A to {ei}ni=1 to get {ui}ni=1 andputting P = U −1 = [u1, . . . , un]−1.There is an easier way. Let L = [l1, . . . , ln] ∈ Rn×n and A = LLT. Then
Aij =n
k=1
Lij(LT)kj
=n
k=1
(lk)i(lk) j . (
†)
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 22/50
18 1.7. CHOLESKY FACTORIZATION
Also
(lklTk )ij = (lk)i(lT
k ) j
= (lk)i(lk) j .
So from (†) get
Aij =n
k=1
(lklTk )ij
A =n
k=1
lklTk .
Example. For n = 3. Find a Cholesky Decomposition of A = 2 −1 0
−1
5
2 −10 −1 52
,
i.e. find lower triangular L, Lii > 0, i = 1 → n, such that A = LLT.
Need to check that A is symmetric (clear) and positive definite (good to verifyconditions from 1.11) Take arbitrary x ∈ R3. Firstly,
xTAx =3
i=1
3 j=1
Aijxix j
= 2x21 +
5
2x2
2 +5
2x2
3 − 2x1x2 − 2x2x3
≥ 2x2
1 +
5
2 x2
2 +
5
2 x2
3 − (x2
1 + x2
2) − (x2
2 + x2
3)
= x21 +
1
2x2
2 +3
2x2
3 > 0.
So let L = [l1, l2, l3] be lower triangular. Then
A = LLT
=3
k=1
lklTk
= l1lT1 + l2lT
2 + l3lT3 .
Since L is lower triangular
l1lT1 =
· · ·
· · ·· · ·
,
l2lT2 =
0 0 0
0 a2 ab0 ab b2
,
l3lT3 =
0 0 00 0 0
0 0 x
.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 23/50
1.7. CHOLESKY FACTORIZATION 19
Therefore the first column of A is generated by l1 alone, i.e.
l1 =Ae1√
A11=
Ae1
e1
A
.
(l1)1(l1)1 = 2, (l1)1(l1)2 = −1, (l1)1(l1)3 = 0. Thus l1 = 1√ 2
2
−10
, a multiple of
first column of A.Define A(1) so that A(1) = l2lT
2 + l3lT3
A(1) = A − l1lT1
= A − 2 −1 0
−1 2 00 0 0
= 0 0 0
0 2 −10 −1 5
2
.
By the same reasoning l2 = 1√ 2
0
2−1
, multiple of the second column of A(1).
Define A(2) so that A(2) = l3lT3
A(2) = A(1) − l2lT2
= A(1) −
0 0 00 2 −1
0 −112
=
0 0 0
0 0 00 0 2
and so l3 = 1√ 2
0
02
, multiple of the third column of A(2).
Putting these together gives
L =1√
2
2 0 0−1 2 0
0 −1 2
.
Now consider the above constructive algorithm in the general case, i.e. A ∈ Rn×n
symmetric positive definite. Since A11 > 0, we can start the algorithm by defining
l1 =Ae1√
A11.
Then A(1) = A − l1lT1 is symmetric (since A and l1lT
1 are symmetric) and has theform
A(1) =
0 0 · · · 00... B0
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 24/50
20 1.8. LEAST SQUARE PROBLEMS
with B symmetric. To continue, we need to show that B is positive definite and soBkk > 0.
Theorem 1.12. Matrix B ∈R(n
−1)
×(n
−1)
defined above is positive definite.
Proof. We need to show that uTBu > 0 for all u ∈ Rn−1, u = 0. Take u ∈ Rn−1,
u = 0. Construct v =
0u
∈ Rn (hence v = 0) and eT
1 v = 0 means that e1 and v
are linearly independent. Then
A(1) = A − (Ae1)(Ae1)T
e12A
,
vTAv = uTBu.
So
uTBu = vTAv − (eT1 Av)2
e12A
=(v2
Ae12A − [e1, vA]2)
e12A
.
By Cauchy-Schwartz, |e1, vA| < e1AvA. Hence uTBu > 0.
Also B11 > 0 and so A(1)22 > 0; the procedure can continue.
Application of Cholesky Decomposition
Given A ∈ Rn×n symmetric positive definite, can find L lower triangular such thatLLT, Lii > 0. Solve Ax = b for given b ∈ Rn: Get
LLTx = b
and let z = LTx. Solve Lz = b by forward substitution
z1 = b1/L11,
zk = (bk −k−1
j=1
Lkjz j)/(L jk)
for k = 2 → n. Having z, solve LTx = z by backward substitution.
1.8 Least Square Problems
Example. Take a pendulum with length l, measure the period T and estimate g(the acceleration due to gravity). Have the following
L =√
l, C =2π√
g,
CL = T.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 25/50
1.8. LEAST SQUARE PROBLEMS 21
Do m experiments to getLC = T
with L, T
∈Rm. Plot the data (T i against Li) and fit a straight line through the
data. Choose C to minimize the sum of squares of the errors, i.e. such thatmi=1
(T i − CLi)2 = T − CL2
= T − CL,T − CL= T 2 − 2C L, T + C 2L2 = S
is minimal. The derivative
dS
dC = −2L, T + 2C L2
equals 0 iff C = L,T
L2 . Check the second derivative
d2S
dC 2= 2L2 > 0.
Take C ⋆ = L,T L2 . Then
T − C ⋆L, L = T , L − C ⋆L2
= 0,
i.e. the choice of C ⋆ makes T − C ⋆L perpendicular to L.
1.8.1 General Least Squares CaseGiven A ∈ Rm×n (m ≥ n) b ∈ Rn find x ∈ Rn such that Ax = b. For m > n, there’sno general solution as we have an overdetermined system. We are concerned aboutfinding x⋆ ∈ Rn which minimizes Ax − b over x. Let
Q(x) = Ax − b2
= Ax − b,Ax − b= (Ax − b)T(Ax − b)
= (xTAT − bT)(Ax − b)
= xTATAx − bTAx − xTATb + bTb
= xTATAx − 2bTAx + b2
= xTGx − 2µTx + b2
where
G = ATA ∈ Rn×n,
µ = ATb ∈ Rn.
Note that G is symmetric.Take derivatives of Q to get
▽Q(x) = 2(Gx − µ),
D2Q(x) = 2G.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 26/50
22 1.8. LEAST SQUARE PROBLEMS
Theorem 1.13. Let A ∈ Rm×n (m ≥ n) with linearly independent columns andb ∈ Rm. Then ATA ∈ Rn×n is symmetric positive definite. Moreover, the x⋆ ∈ Rn
solving ATAx⋆ = ATb is the unique minimum of Q(x) = Ax − b2 over x ∈ Rn.
Note. The equations ATAx⋆ = ATb are called normal equations and x⋆ is callednormal equations
the least squares solution of Ax = b..least squares solution
Proof. Matrix ATA is clearly symmetric – shown above. Let A =
a1 . . . an
,
ai ∈ Rm, {ai}ni=1 linearly independent. Then for any c ∈ Rn
cTATAc = (Ac)TAc
= Ac2 ≥ 0
with equality iff Ac = 0, i.e. when c = 0 since{
ai}
n
i=1is linearly independent. Hence
ATA is positive definite.
To find the minimum of Q(x), find x⋆ such that ▽Q(x⋆) = 0 and D2Q(x⋆) is positivedefinite. Get
▽Q(x) = 2(Gx − µ) = 2(ATAx − ATb),
D2Q(x) = 2G = 2ATA.
Therefore x⋆ has to solve ATAx = ATb. As ATA is positive definite, (ATA)−1 exists.Hence there exists a unique x⋆ solving ATAx = ATb. As D2Q(x⋆) is positive definite,x⋆ is the unique global minimum of Q(x) =
Ax
−b
2.
Example. For m = 3, n = 2, A =
3 65
4 012 13
, b =
1
11
. It is obvious that
no x ∈ R2 solves Ax = b. Find the least square solution x⋆ ∈ R2: solve the normalequations
ATAx⋆ = ATb
to get x⋆ =
0.090587 . . .0.010515 . . .
.
In practice, it is not a good idea to solve the normal equations since the matrixATA is generally badly conditioned. A matrix B ∈ Rn×n is ill-conditioned if smallill-conditioned
changes to b lead to large changes in the solution Bx = b, so if in
B(x + δx) = +δb
for small δb, δx is large.
We now find the x⋆ using the QR approach. Using a sequence of Givens rotations,can find G orthogonal such that GA = R upper triangular with Rii > 0. ThenA = GTR.
Rx = Gb
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 27/50
1.9. A MORE ABSTRACT APPROACH 23
with
Gb =
(Gb)1.
..(Gb)n
0...0
+
0.
..0
(Gb)n+1...
(Gb)m
= α + β.
If β = 0, then there exists a unique solution to Rx = α = Gb so there exists a uniquesolution x to Ax = b.If β = 0 then Rx = Gb is an inconsistent system and has no solution x and so doesAx = b. However, we can solve Rx⋆ = α. We claim that x⋆
∈Rn is the least squares
solution of Ax = b. Also β = Ax − b.
1.9 A more abstract approach
A more abstract definition of the inner product:
Definition. Let V be a real vector space. A inner product on V × V is a function·, · : V × V → R such that, for all u,v,w ∈ V,λ,µ ∈ R,
(1) λu + µv,w = λu, w + µv, w
(2) u, v = v, u,
(3) u, u ≥ 0 with equality iff v = 0.inner
product An inner product induces a norm u = (u, u)1/2 for all u ∈ V . This impliesu = 0 iff u = 0.
Example. Let V = C [a, b] be continuous functions over [a, b]. Let w ∈ C [a, b] with
w(x) > 0 for all x ∈ [a, b]. Define f, g = ba w(x)f (x)g(x)dx. Clearly (1) and (2)
hold. Also
f, f =
ba
w(x) (f (x))2 dx
≥ 0
and f, f = 0 implies f = 0.
Let V be a real vector space with inner product·, ·. Let U be a finite dimensional subspace of V with basis {φ
i}ni=1. Given v ∈ V , find u⋆ ∈ U such that v − u⋆ ≤
v − u for all u ∈ V .
Example. Let V = C [a, b] and
f, g
= ba f (x)g(x)dx (i.e. w(x) = 1). Let U be
polynomials of degree ≤ n − 1 with basis φi
= xi−1.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 28/50
24 1.9. A MORE ABSTRACT APPROACH
We have u ∈ U implies u =n
i=1 λiφiwith λi ∈ R. Also u⋆ ∈ U implies u⋆ =n
i=1 λ⋆i φ
iwith λ⋆
i ∈ R. Therefore
v − u⋆2 ≤ v − u2v −n
i=1
λ⋆i φ
i
2
≤v −
n j=1
λ jφ j
2
.
Let E (λ) =v −n
i=1 λiφi
2. Now we have to find λ⋆ ∈ Rn such that E (λ⋆) ≤ E (λ)
for all λ ∈ Rn. We have
E (λ) = v −n
j=1λ jφ j, v −
ni=1
λiφi
= v2 − 2n
i=1
λiv, φi +
ni=1
n j=1
λiλ jφi, φ
j.
Let µ ∈ Rn where µi = v, φi. Let G ∈ Rn×n where Gij = φ
i, φ
j. Now we have
E (λ) = v2 − 2µT λ + λT Gλ,
▽E (λ) = −2µ + 2Gλ,
D2
E (λ) = 2G.
So λ⋆ minimises E (λ) if ▽E (λ⋆) = 0. This is equivalent to Gλ⋆ = µ. The matrix Gis called the Gram matrix and depends on the basis for U. It is sometimes writtenGram matrix
as G(φ1
,...,φn
).
Lemma 1.14. Let {φi}ni=1 be a basis of U . Let G ∈ Rn×n be such that Gij = φ
i, φ
j.
Then G is positive-definite.
Proof. Check that for any λ ∈ Rn
λT Gλ =n
i=1
n j=1
λiλ jφi, φ
j
= n
i=1
λiφi,
n j=1
λiφ j
=
n
i=1
λiφi
2
≥ 0.
This only equals to zero if ni=1 λiφi
= 0. As φi’s are linearly independent this
implies λ = 0. Therefore λT Gλ > 0 for all λ = 0.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 29/50
1.9. A MORE ABSTRACT APPROACH 25
As G is positive definite, we can deduce that G−1 exists, and therefore there is aunique λ⋆ ∈ Rn solving Gλ⋆ = µ, i.e. ▽E (λ⋆) = 0 and therefore λ⋆ is a global minimum of E (·).
Theorem 1.15 (Orthogonality Property). Finding λ⋆ ∈ Rn which minimises E (λ)is equivalent to finding u⋆ =
ni=1 λ⋆
i φi∈ U such that v − u⋆, u = 0 for all u ∈ Rn.
Proof. Gλ⋆ = µ implies that λT Gλ⋆ = λT µ for all λ ∈ Rn. Moreover λT Gλ⋆ = λT µimplies that (Gλ⋆)i = µ
i, repeat for i = 0 → n and we get Gλ⋆ = µ equivalent to
λT Gλ⋆ = λT µ for all λ ∈ Rn. So
Gλ⋆ = µ
λT
Gλ⋆
= λT
µn
i=1
n j=1
λiGijλ⋆ j =
ni=1
λiµi
n
i=1
λiφi,
n j=1
λ⋆ jφ
j =
ni=1
λiφi, v
u, u⋆ = u, vv − u⋆, u = 0,
where u = ni=1 λiφi
.
Example. Let V = C [0, 1] and f, g = 1
0 f (x)g(x)dx and let U = Pn−1. Takeφi
= xi−1. Given v ∈ V , find u⋆ =n
i=1 λ⋆i xi−1 such that
v − u⋆ ≤ v − uv − u⋆2 ≤ v − u2 1
0(v − u⋆)2dx ≤
1
0(v − u)2dx
for all u ∈ V . We now have to solve the normal equations Gλ⋆ = µ, where
µi = v, φi
=
1
0v(x)xi−1dx,
Gij = φi, φ
j
=
1
0xi−1x j−1dx
=
1
0xi+ j−2dx
=1
i + j − 1 .
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 30/50
26 1.10. ORTHOGONAL POLYNOMIALS
This gives the Hilbert Matrix
G = 1 1
2 . . . 1n
12
13 . . . 1
n+1
... ... . . . ...1n
1n+1 . . . 1
2n−1
which is very badly conditioned as the columns are linearly dependent as n → ∞.We need to change basis; we have two options:
1. We can use the Gram-Schmidt algorithm to change the basis to an orthonormal basis {ψ
i}ni=1 where ψ
i, ψ
j = δ ij. This implies that G = I .
2. We can also create an orthogonal basis {ψi}ni=1 where ψ
i, ψ
j = 0 for i = j.
Now G is diagonal and Gii =
ψi
2> 0. We have λ⋆
i = µi
ψi
2 and therefore
u⋆ = ni=1 µiψi2 ψi.
Example. Let V = Rm and let a, b = aT b. Let U = Span {ai}ni=1 with n ≤ m. So{ai}ni=1 is a basis for U . Given v ∈ Rm, we want to find u⋆ =
ni=1 λ⋆
i ai such that
v − u⋆ ≤ v − u for all u ∈ U . We need to solve the normal equations Gλ⋆ = µ,
µi = v, ai= aT
i v,
Gij = ai, a j= aT
i a j.
Let A =
ai · · · an
so AT A = G and µ = AT v. We can deduce that
AT Aλ⋆ = AT v
Aλ⋆ = v.
So AT A is ill-conditioned and we shouldn’t solve these normal equations and use theQR approach instead.
1.10 Orthogonal Polynomials
V = C [a, b] and
f, g
= ba w(x)f (x)g(x)dx where w is the weight function w
∈C (a, b) such that w ≥ 0 with possibly a finite number of zeros. This is required forthe integral to be well-defined.
|f, g| =
ba
w(x)f (x)g(x)dx
≤
ba
|w(x)f (x)g(x)| dx
=
ba
w(x) |f (x)g(x)| dx
≤ ba
w(x)dx maxa≤x≤b |f (x)| maxa≤x≤b |g(x)| .
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 31/50
1.10. ORTHOGONAL POLYNOMIALS 27
Therefore ·, · is well-defined if ba w(x)dx < ∞.
Let U = Pn be the polynomials of degree n. The natural basis
xini=0
leads to anill-conditioned Gram matrix. We will construct a new basis for Pn, {φi}ni=0 where
φ j(x) is a monic polynomial of degree j, i.e. φ j(x) = x j + j−
1i=o aijxi. monic polynomial
Theorem 1.16. Monic orthogonal polynomials, φ j ∈ P j, satisfy the three termrecurrence relation, for j ≥ 1
φ j+1(x) = (x − a j)φ j(x) − b jφ j−1(x) for j ≥ 1
where
a j =xφ j, φ j
φ j
2
and
b j =φ j2
φ j−12 .
Proof. Let φ j ∈ P j be monic. This implies that
φ j+1(x) − xφ j(x)
∈Pj
=
jk=0
bkxk
=
jk=0
ckφk(x).
Now we need to find ck. We have
j
k=0
ckφk(x), φi(x) = φ j+1(x) − xφ j(x), φi(x).
But φ j is orthogonal to φk for k = 0 → j − 1. Therefore φ j is orthogonal to any
p
∈P j
−1 as
{φk
} j−1k=0 is a basis for P j
−1. Then for i = 0
→j
ci φi2 = φ j+1, φi − xφ j, φi= −φ j, xφi.
We have xφi ∈ Pi+1 and hence φ j, xφi = 0 if i ≤ j − 2. Since ci φi2 = −φ j, xφi,we have ci = 0 for i = 0 → j − 2. Hence
φ j+1(x) − xφ j(x) = c j−1φ j−1(x) + c jφ j(x).
This implies φ j+1(x) = (x + c j)φ j(x) + c j−1φ j−1(x).
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 32/50
28 1.10. ORTHOGONAL POLYNOMIALS
We have
c j−1 = −φ j, xφ j−1
φ j−
1
2
,
c j = −φ j, xφ jφ j2 .
Now note that
φ j, xφ j−1 = φ j, xφ j−1 − φ j =0
+φ j, φ j.
Therefore c j−1 = − φj2φj−12 . Set b j = −c j−1 and a j = c j.
To apply this Theorem we need φ0(x) = 1 and φ1(x) = x − a0 where a0 ∈ R must
be chosen such that φ1, φ0 = 0, i.e.x − a0, 1 = 0
a01, 1 = x, 1a0 =
x, 112
=xφ0, φ0
φ02 .
We use the theorem for j ≥ 0 by setting φ−1(x) = 0. Thus
φ j+1(x) = (x − a j)φ j(x) − b jφ j−1(x)
where j ≥ 0 and
a j =xφ j, φ jφ j2 ,
b j =φ j2
φ j−12,
φ0(x) = 1,
φ−1(x) = 0.
Remark 1.2. Recall that g(x) is even iff g(x) = g(−x) or 2−2 g(x)dx = 2 20 g(x)dx.and g(x) is odd iff g(x) = −g(x) or
2−2 g(x)dx = 0.odd/even
function Example. Let f, g =
1−1 f (x)g(x)dx be our inner product (i.e. w(x) = 1 ). We
shall apply our method with j = 0 to this case. We have φ0(x) = 0 and φ1(x) = x−a0
which implies φ1(x) = x. Also
a1 =xφ0, φ0
φ02
=
1−1 xdx
1−1 1dx
= 0
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 33/50
1.10. ORTHOGONAL POLYNOMIALS 29
(since x is an odd function). Using the method with j = 1 we deduce φ2(x) =(x − a1)φ1(x) − b1φ0(x) = x2 − a1x − b1. Then
a1 =xφ1, φ1
φ12
=
−1−1 x3dx
φ12
= 0,
b1 =φ12
φ02
=
−1−1 x2dx
−1
−1 1dx
=1
3.
So φ2(x) = x2 − 13 and we can continue in this matter.
Recall now our original problem. Given f ∈ C [a, b] we wish to find p⋆n ∈ Pn such
that f − p⋆n ≤ f − pn for all pn ∈ Pn.
We wish to find an orthogonal basis for Pn, {φ j}n j=0. Then p⋆
n =n
j=0 λ⋆ jφ j(x). We
solve the normal equations Gλ⋆ = µ with G ∈ R(n+1)×(n+1) and for i = 1 → n
Gij =
φi, φ j
= 0 if i = j,
φi2 if i = j,
µi = f, φi,
λ⋆i =
µi
Gii,
=µi
φi2 .
This implies that
p⋆n(x) =
n
j=0
f, φ j
φ j
2
φ j(x)
is the best approximation to f .
Example. Show that the polynomials T k(x) = cos(k cos−1(x)) for −1 ≤ x ≤ 1 areorthogonal with respect to the inner product f, g =
1−1(1 − x2)−1/2f (x)g(x)dx.
Does T k(x) belong to Pk?
T 0(x) = cos 0
= 1,
T 1(x) = cos(cos−1 x)
= x.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 34/50
30 1.10. ORTHOGONAL POLYNOMIALS
Let’s use a change of variable θ = cos−1 x and so x = cos θ. Now we can writeT k(x) = cos kθ. Using cos((k + 1)θ) + cos((k − 1)θ) = 2cos kθ cos θ. We can deducethe following
T k+1(x) + T k−1(x) = 2T k(x)x
T k+1(x) = 2xT k(x) − T k−1(x).
We have
T 2(x) = 2xT 1(x) − T 0(x)
= 2x2 − 1,
T 3(x) = 2xT 2(x) − T 1(x)
= 22x3 − 3x.
By induction we have T k(x) ∈ Pk. The coefficient of xk is 2k−1. Using x = cos θ,
T k(x), T j(x) =
1
−1(1 − x2)−1/2T k(x)T j(x)dx
=
0
π(sin θ)−1 cos(kθ)cos( jθ)(− sin θ)dθ
=
π
0cos(kθ)cos( jθ)dθ
= 12
π0
cos(( j + k)θ) + cos(( j − k)θ)dθ
=
0 if j = k,π2 if j = k = 0,π if j = k = 0.
Call T k(x) the Chebyshev polynomials .Chebyshev polynomials
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 35/50
31
Chapter 2
Polynomial interpolation
Given {(z j, f j)}n j=0, z j, f j ∈ C, 0 = 1 → n, we want to find a polynomial pn(z) ∈ Pnsuch that pn(z j) = f j for j = 0 → n. Call such pn the interpolating polynomial . To interpolating
polynomial prove that this polynomial exists:
Lemma 2.1 (Lagrange Basis Function). Let
l j(z) =n
k=0,k= j
(z − zk)
(z j − zk)
for j = 0 → n. Then l j(z) ∈ Pn and l j(zr) = δ jr for j, r = 0 → n.
Proof. For j = 0 → n, l j(z) is a product of n factors of the form z−zkzj−zk
and therefore
l j
(z)∈Pn
.We have
l j(zr) =n
k=0,k= j
zr − zkz j − zk
for r = 0 → n. If r = j, then clearly L j(zr) = 1. Otherwise for k = r, zr−zkzj−zk
= 0 and
so l j(zr) = 0. Hence l j(zr) = δ rj .
Lemma 2.2. The interpolating polynomial pn(z) ∈ Pn for data {(z j, f j)}n j=0 withz j distinct is
pn(z) =n
j=0
f jl j(z).
Note. Call pn in this form the Lagrange form of the interpolating polynomial. Lagrange form
Proof. We have pn(z) ∈ Pn since each l j(z) ∈ Pn. Also by the previous lemma, forr = 0 → n,
pn(zr) =n
j=0
f jl j(zr)
=n
j=0
f jδ jr
= f r.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 36/50
32
To prove the uniqueness of the interpolating polynomial:
Theorem 2.3 (Fundamental Theorem of Algebra). Let pn(z) = a0 +a1z +·+anzn ∈Pn where ai ∈ C, an = 0. Then pn(z) has n distinct roots in C unless ai = 0 fori = 0 → n.
Lemma 2.4. Given {(z j, f j)}n j=0, z j distinct, there exists a unique interpolatingpolynomial pn(z) ∈ Pn.
Proof. Assume the contrary, i.e. there exists q n ∈ Pn such that pn(z j) = q n(z j) = f jfor j = 0 → n. Consider the polynomial ( pn − q n) ∈ Pn. Then
( pn − q n)(z j) = ( pn(z j) − q n(z j)) = 0
for j = 0 → n. Hence ( pn − q n) has n + 1 roots and therefore by the FundamentalTheorem of Algebra is 0, i.e. pn = q n. Hence pn is unique.
Example (of interpolating polynomial). For n = 2. Find p2 ∈ P2 such that p2(0) =a, p2(1) = b and p2(4) = c. Get
l0(z) =(z − z1)(z − z2)
(z0 − z1)(z0 − z2)=
(z − 1)(z − 4)
(0 − 1)(0 − 4)=
1
4(z2 − 5z + 4),
l1(z) =(z − 0)(z − 4)
(1 − 0)(1 − 4)= −1
3(z2 − 4z),
l2(z) =(z − 0)(z − 1)
(4 − 0)(4 − 1)
=1
12
(z2
−z).
Hence
p2(z) = al0(z) + bl1(z) + cl2(z)]]
=
a
4− b
3+
c
12
z2 −
3a
4− 4b
3+
c
12
z + a
in Lagrange form.
We are interested in finding the coefficients of the interpolating polynomial in thecanonical form
pn(z) =n
k=0
ak
zk.
Consider the equations
pn(z j) =n
k=0
akzk j = f j
for j = 0 → n. Get a system of equations
1 z0 z20 · · · zn
0
1 z1 z21 · · · zn
1...
......
...1 zn z2
n
· · ·znn
V
a0...
an
=
f 0...
f n
.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 37/50
33
Call V the Vandermonde Matrix . So we need to solve V a = f . In general, V is Vandermonde matrix ill-conditioned. With the canonical basis
zknk=0
, we can solve V a = f by findingthe Lagrange basis {lk(z)}nk=0 and thus solving Ia = f . However, the Lagrange basis
has to be constructed.Assume we found pn−1 ∈ Pn−1 interpolating {(z j, f j)}n−1
j=0 and are given a new datapoint (zn, f n). One cannot use pn−1 to compute pn, since it is necessary to computethe new Lagrange basis for Pn.We now look for an alternative construction. If pn−1 ∈ Pn−1 is such that pn−1(z j) =f j for j = 0 → n − 1, let pn ∈ Pn be such that pn(z j) = f j for j = 0 → n and
pn(z) = pn−1(z) + cn−1k=0
(z − zk).
Clearly pn(z j) = pn−1(z j) = f j for j = 0 → n − 1. Choose c ∈ C such that
pn(zn) = pn−1(zn) + cn
k=0
(zn − zk) = f n
that is
c =f n − pn−1(zn)n−1
k=0 (zn − zk).
Therefore c depends on {(z j, f j)}n j=0. We will use the notation c = f [z0, z1, . . . , zn] f [z0, z1, . . . , zn]
so that
pn(z) = pn−1(z) + f [z0, z1, . . . , zn]n−1k=0
(z − zk).
That is, the coefficient of zn in pn(z) is f [z0, z1, . . . , zn].Note that since pn is such that pn(z j) = f j, j = 0 → n, is unique,
f [zπ(0), . . . , zπ(n)] = f [z0, . . . , zn]
for any permutation π of {0, 1, . . . , n}.
Lemma 2.5. For {(z j, f j)}n j=0, z j, f j ∈ C, z j distinct,
f [z0, z1, . . . , zn] =n
j=0
f jk=0,k= j(z j − zk)
.
Furthermore, if f j = f (z j), j = 0 → n for some function f (z) then f [z0, . . . , zn] = 0if f ∈ Pn−1.
Proof. Compare coefficient of zn in the Lagrange form of pn with
pn(z) = pn−1(z) + f [z0, . . . , zn]n−1k=0
(z − zk)
=n
j=0
f j
k=0,k= j
(z − zk)
z j − zk
=
n
j=0 f j zn +
· · ·nk=0,k= j(z j − zk) .
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 38/50
34 2.1. DIVIDED DIFFERENCE
Clearly the leading coefficient of zn in the Lagrange form is
n
j=0
f j
k=0,k= j(z j − zk)
= f [z0, . . . , zn].
If f j = f (z j) for some f ∈ Pn−1 then pn = f ∈ Pn−1 as the interpolating polynomialis unique. Therefore the leading coefficient of pn f [z0, . . . , zn] is 0.
Note that
pn(z) = pn−1(z) + f [z0, . . . , zn]n−1k=0
(z − zk),
pn−1(z) = pn−2(z) + f [z0, . . . , zn−1]n−2
k=0
(z − zk),
...
p1(z) = p0(z) + f [z0, z1](z − z0)
p0(z) = f 0 = f [z0]
and so we can write
pn(z) = f [z0] +n
j=1
f [z0, . . . , z j]
j−1k=0
(z − zk).
Call this the Newton form of the interpolating polynomial.Newton form
2.1 Divided difference
Call f [z0, . . . , zn] the divided difference .divided difference
Theorem 2.6. For any distinct complex numbers z0, z1, . . . , zn+1, the divided dif-ference satisfies the recurrence
f [z0, z1, . . . , zn+1] =f [z0, . . . , zn] − f [z1, . . . , zn+1]
z0 − zn+1.
Proof. Given
{(z j, f j)
}
n+1 j=0 we construct pn, q n
∈Pn such that pn(z j) = f j for j =
0 → n and q n(z j) = f j for j = 1 → n + 1. Observe that f [z0, . . . , zn] is the coefficientof zn in pn(z) and that f [z1, . . . , zn+1] is the coefficient of zn in q n(z). Then
rn+1(z) =(z − zn+1) pn(z) − (z − z0)q n(z)
z0 − zn+1∈ Pn+1
and hence
rn+1(z0) = pn(z0) = f 0,
rn+1(zn+1) = q n(zn+1) = f n+1,
rn+1(z j) =(z j − zn+1)f j − (z j − z0)f j
z0
−zn+1
= f j
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 39/50
2.2. FINDING THE ERROR 35
for j = 1 → n. Therefore rn+1(z) is the interpolating polynomial of {(z j, f j)}n+1 j=0 .
Since f [z0, . . . , zn+1] is the coefficient of zn+1 in rn+1(z),
f [z0, . . . , zn+1] = f [z0, . . . , zn] − f [z1, . . . , zn+1]z0 − zn+1.
2.2 Finding the error
Given {(z j, f j)}n j=0, z j, f j ∈ C, z j distinct, there exists an interpolating polynomial pn ∈ Pn such that pn(z j) = f j for j = 0 → n. The Newton Form of pn(z) is
pn(z) = f [z0] +n
j=1
f [z0, . . . , z j]
j−1
k=0
(z
−zk).
Theorem 2.6 gives a recurrence relation
f [z0, . . . , z j+1] =f [z0, . . . , z j] − f [z1, . . . , z j+1]
z0 − z j+1.
Can construct a divided difference table divided difference
table z0, f [z0],z1, f [z1], f [z0, z1],z2, f [z2], f [z1, z2], f [z0, z1, z2],
... . . .
zn, f [zn], f [zn−1, zn], · · · · · · f [z0, . . . , zn].
Note that the diagonal entries appear in the Newton form of pn(z).
Example. For n = 2 and
{(z j, f j)}2 j=0 = {(0, a), (1, b), (4, c)} ,
we have
f [z0] = f 0 = a,
f [z1] = f 1 = b, f [z0, z1] =f [z0] − f [z1]
z0 − z1=
a − b
−1= b − a,
f [z2] = f 2 = c, f [z1, z2] =f [z1] − f [z2]
z1 − z2=
b − c
−3=
c − b
3.
Therefore
f [z0, z1, z2] =(b − a) −
c−b3
−4
=a
4− b
3+
c
12
and so the Newton Form of p2(z) is
p2(z) = a + (b − a)(z − z0) + a
4 −b
3 +
c
12 (z − z0)(z − z1).
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 40/50
36 2.2. FINDING THE ERROR
Theorem 2.7. Let pn(z) interpolate f (z) at n + 1 distinct points {z j}n j=0, z j ∈ C.Then the error e(z) = f (z) − pn(z) is
e(z) = f [z0, . . . , zn, z]
nk=0
(z − zk)
for z = z j and e(z j) = 0 for j = 0 → n.
Proof. Polynomial pn(z) interpolates f (z) at {z j}n j=0. Add a new distinct point z.Newton Form of pn+1(z) is
pn+1(z) = pn(z) + f [z0, . . . , zn, z]n
k=0
(z − zk).
Therefore
e(z) = f (z) − pn(z) = f [z0, . . . , zn, z]n
k=0
(z − zk).
Theorem 2.8. Let f ∈ C n[x0, xn], f and its first n derivatives continuous over[x0, xn], xi ordered x0 < x1 < · · · < xn. Then there exists ξ ∈ [x0, xn] such that
f [x0, x1, . . . , xn] =1
n!f (n)(ξ ).
Proof. Let pn(x) interpolate f at xi, i = 0 → n. Let e(x) = f (x) − pn(x) and soe(xi) = 0, i = 0 → n, therefore e(x) has at least (n +1) zeroes in [x0, xn]. By Rolle’sTheorem
e′(x) has at least n zeroes in [x0, xn],
e′′(x) has at least n − 1 zeroes in [x0, xn],
en(x) has at least 1 zero ξ in [x0, xn].
Since e(x) = f (x) − pn(x), e(n)(x) = f (n)(x) − p(n)n (x). The Newton Form of pn(x) is
pn(x) = pn−1(x) + f [x0, . . . , xn]n−1i=0
(x − xi)
= f [x0, . . . , xn]xn + · · · .
Therefore
p(n)n (x) = n!f [x0, . . . , xn]
and hence, as e(n)(ξ ) = 0,
f (n)(ξ ) = p(n)n (ξ ) = n!f [x0, . . . , xn].
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 41/50
2.2. FINDING THE ERROR 37
Theorem 2.9. Let f ∈ C n+1[a, b] and {xi}ni=0 distinct in [a, b]. If pn ∈ Pn interpo-lates f at {xi}ni=0, then e(x) = f (x) − pn(x) satisfies
|e(x)| ≤ 1(n + 1)!
n
i=0
(x − xi) maxa≤y≤b
f (n+1)(y)for all x ∈ [a, b].
Proof. Result trivially true at x = xi since e(xi) = 0 for i = 0 → n¡++¿. FromTheorem 2.7,
e(x) = f [x0, . . . , x]n
k=0
(x − xk).
From Theorem 2.8, there exists ξ x ∈ [a, b] such that
e(x) =f (n+1)(ξ x)
(n + 1)!
nk=0
(x − xk).
Therefore
|e(x)| =1
(n + 1)!
n
k=0
(x − xk)
f (n+1)(ξ x)
≤ 1
(n + 1)!
nk=0
(x − xk)
maxa≤y≤b
f (n+1)(y)
.
Definition. The infinity norm of g ∈ C [a, b] is infinity norm
g∞ = maxa≤x≤b
|g(x)|.
Note. Beware that
f − pn∞ → 0
as n → ∞ in all cases.
Example.
1. Let [a, b] = [−12 ,
12 ], f (x) = e
x
, xi ∈ [a, b], i = 0 → n. Then |x − xi| ≤ 1 and soni=0(x − xi)∞ ≤ 1. Also f (n+1)∞ = ex∞ = e1/2. Therefore
f − pn∞ ≤ e1/2
(n + 1)!→ 0
as n → ∞.
2. For any [a, b] and f (x) = cos x, f (n+1)∞ ≤ 1. Also
cos − pn∞ ≤ (b − a)n+1
(n + 1)!→ 0
as n → ∞. Therefore pn(x) → cos x for all x.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 42/50
38 2.3. BEST APPROXIMATION
3. Let [a, b] = [0, 1], f (x) = (1 + x)−1. Then
f ′(x) = (−1)(1 + x)−2,
f (n+1)
(x) = (−1)n+1
(1 + x)−(n+2)
(n + 1)!
and therefore f (n+1)∞ ≤ (n + 1)!. Hence
f − pn∞ ≤ 1
(n + 1)!(n + 1)! = 1 → 0
as n → ∞.
2.3 Best Approximation
Given [a, b] and f ∈ C [a, b], we want to choose interpolation points {xk}nk=0 in [a, b]
to minimize nk=0(x − xk)∞, i.e. to find
min{xk}nk=0
n
k=0
(x − xk)
∞
,
i.e.
minqn∈Pn
xn+1 − q n(x)∞.
Consider the more general problem: to find
minqn∈Pn
g − q n(x)∞,
that is to find q ⋆n such that
g − q ⋆n∞ ≤ g − q n∞
for all q n ∈ Pn. Call such q ⋆n ∈ Pn the best approximation .
Theorem 2.10 (Equioscillation Property). Let g ∈ C [a, b] and n ≥ 0. Supposethere exists q ⋆n ∈ Pn and n + 2 distinct points {x⋆
j}n+1 j=0 ,
a ≤ x⋆0 < x⋆1 < · · · < x⋆n+1 ≤ b,
such thatg(x⋆
j ) − q ⋆n(x⋆ j ) = (−1) jσg − q ⋆n∞
for j = 0 → n + 1 where σ = ±1.Then q ⋆n is the best approximation to g from Pn with respect to · ∞, that is
g − q ⋆n∞ ≤ g − q n∞for all q n ∈ Pn.
Note. Call {x⋆k}n+1k=0 the equioscillation points .equioscillation points
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 43/50
2.3. BEST APPROXIMATION 39
Proof. Let E = g − q ⋆n∞. If E = 0, then q ⋆n = g is the best approximation. If E > 0, then suppose that there exists q n ∈ Pn such that g − q n∞ < E . Considerq ⋆n − q n ∈ Pn at the n + 2 points {x⋆
j}n+1 j=0 :
q ⋆n(x⋆ j) − q n(x⋆
j ) = (q ⋆n(x⋆ j ) − g(x⋆
j )) + (g(x⋆ j ) − q n(x⋆
j))
= σ(−1) j+1E + γ j
with γ j ∈ R and |γ j| < E . Thus
sgn((q ⋆n − q n)(x⋆ j )) = sgn(σ(−1) j+1E )
and therefore q ⋆n − q n ∈ Pn changes sign n + 1 times and hence has n +1 roots. Thenby the Fundamental Theorem of Algebra q ⋆n ≡ q n; a contradiction, so q n does notexist and q ⋆n is the best approximation.
Theorem 2.11 (Chebyshev Equioscillation Theorem). Let g ∈ C [a, b] and n ≥ 0.Then there exists a unique q ⋆n ∈ Pn satisfying the equioscillation property such that
g − q ⋆n∞ ≤ g − q n∞for all q n ∈ Pn.
Note. Construction of q ⋆n is difficult, we use best approximation in the least squaresense. But if g(x) = xn+1, the construction of q ⋆n is easy.
Lemma 2.12. If g(x) = xn+1 on [−1, 1], then the best approximation to g by Pn
with respect to · ∞ is
q ⋆n(x) = xn+1 − 2−nT n+1(x)
where T n+1(x) is the Chebyshev Polynomial of degree n + 1, i.e.
T n+1(x) = cos((n + 1) cos−1 x).
Proof. We first need to show that q ⋆n is really in Pn: recall that
T 0(x) = 1,
T 1(x) = x,
T n+1(x) = 2xT n(x) − T n−1(x)
and soT n+1(x) = 2nxn+1 + · · · .
Therefore q ⋆n ∈ P n.The error is xn+1 − q ⋆n(x) = 2−nT n+1(x) for x ∈ [−1, 1]. Change the variable:x = cos θ, so θ = cos−1 x ∈ [0, π]. Then
T n+1(x) = cos((n + 1)θ).
Hence
xn+1 − q ⋆n∞ = max−1≤x≤1 | cos((n + 1) cos−1 x)| = 1.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 44/50
40 2.3. BEST APPROXIMATION
Choose
θ⋆ j =
jπ
n + 1
for j = 0 → n + 1 and so x⋆ j = cos θ⋆ j = cos jπn+1 . Then
T n+1(x⋆ j ) = cos((n + 1)θ⋆
j ) = cos( jπ) = (−1) j.
Hence xn+1 − 2−nT n(x) satisfies the equioscillation property and is thus the bestapproximation to xn+1 in Pn.
Note. Note that the points are equally spaced in terms of θ, but clustered aroundthe end points ±1 in terms of x.
Example. The interpolation points are the zeros of the error E . Therefore
n j=0
(x − x j) = xn+1 − q ⋆n = 2−nT n+1(x).
Choose
θ j =(2 j + 1)
2(n + 1)
and so
x j = cos
(2 j + 1)π
2(n + 1)
.
Then
T n+1(x j) = cos(n + 1)θ j = cos(2 j + 1)π
2
= 0.
Therefore cos
(2 j + 1)π
2(n + 1)
n
j=0
are the optimal Chebyshev Interpolation points for pn ∈ Pn on [−1, 1].
Generalize this for an interval [a, b]. For x ∈ [a, b], introduce t = 2x−(a+b)b−a ∈ [−1, 1],
so x = 12 [(b − a)t + (a + b)]. Then the optimal interpolation points for [a, b] are
x j =
1
2 (b − a)cos(2 j + 1)π
2(n + 1) + (a + b)for j = 0 → n.
Proof. Need to find
min{xj}nj=0∈[a,b]
n
j=0
(x − x j)
∞
.
That is to find
minqn∈Pn b
−a
2 n+1 2x
−(a + b)
b − a n+1
− q n(x)∞
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 45/50
2.4. PIECEWISE POLYNOMIAL INTERPOLATION 41
for [a, b]. That is the same as finding
minq̂n∈Pn
b − a
2 n+1
tn+1 − q̂ n(t)∞
for [−1, 1] with
q n(x) =
b − a
2
n+1
q̂ n(t).
Therefore
q ⋆n(x) =
b − a
2
n+1
2x − (a + b)
b − a
n+1
− 2−nT n+1
2x − (a + b)
b − a
.
Using the Equioscillation Property, get
t⋆ j = cosjπ
n + 1
and so
x⋆ j =
(b − a)cos jπn+1 + a + b
2.
2.4 Piecewise Polynomial Interpolation
We can try to decrease the error of Polynomial Interpolation by either increasing theorder of the interpolating polynomial or decreasing the interval between individualinterpolation points (i.e. increasing the number of them).
We can also consider piecewise linears . For given ordered equally spaced interpolationpoints {xi}ni=0 with x0 = a, xn = b, x j − x j−1 = h, we can use linear interpolationon each subinterval [x j−1, x j] for j = 1 → n. Define for x ∈ [x j−1, x j], j = 1 → n,
P L(x) = f (x j−1 +(x − x j−1)
n(f (x j) − f (x j−1))
and so P L(x j−1) = f (x j−1), P L(x j) = f (x j). The error is
f − P L∞ = maxa≤x≤b
|f (x) − P L(x)|
= max j=1→J
max
xj−1<x≤xj|f (x) − P L(x)|
= max j=1→J
maxxj−1<x≤xj
|(x − x j−1)(x − x j)|2!
f ′′(z j)where z j ∈ (x j − 1, x j). Since maximum of |(x − x j−1)(x − x j)| occurs at x =(x j−1 + x j)/2,
f − P L∞ ≤ max j=1→n
h2
8f ′′(z j)
≤ h2
8f ′′∞.
Then for h
→0, P L
→f provided f
∈C 2[a, b]. We can generalize this method to
piecewise quadratics, cubics etc.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 46/50
42 2.4. PIECEWISE POLYNOMIAL INTERPOLATION
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 47/50
43
Chapter 3
Quadrature (NumericalIntegration)
We are given an interval [a, b] and a weight function w(x) ∈ C (a, b), such that
w(x) > 0 except for a finite number of zeroes and ba w(x)dx < ∞. Now given a
function f (x), we want to approximate
I (f ) =
ba
w(x)f (x)dx
by approximating f (x) by an interpolating polynomial pn(x), that is approximateI (f ) by
I n(f ) = I ( pn) = ba w(x) pn(x)dx.
The Lagrange form of pn(x) is
pn(x) =n
k=0
f (xk)lk(x), lk =
j=0,j=k
(x − xi)
xk − x j.
Hence
I ( pn) =
ba
w(x)
n
k=0
f (xk)lk(x)
dx
=n
k=0
ba
w(x)lk(x)dx
=n
k=0
wkf (xk)
where wk = ba w(x)lk(x)dx for k = 0 → n.
Example. Let [a, b] = [0, 1] and w(x) = x−1/2, n = 1, x0 = 1, x1 = 1. ApproximateI (f ) =
10 x−1/2f (x)dx by
I 1(f ) =
1
k=0
wkf (xk)
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 48/50
44
where
w0 = 1
0
x−1/2(1−
x)dx = x1/2
1/2 −x3/2
3/2 1
0
= 4/3,
w1 =
1
0x−1/2xdx =
2
3.
Hence
I 1(f ) =4
3f (x0) +
2
3f (x1).
If w(x) ≡ 1, we get
I 1(f ) =1
2[f (x0) + f (x1)] ,
the trapesium rule .
In general, the error of approximation is
|I (f ) − I n(f )| =
ba
w(x) [f (x) − pn(x)] dx
≤
ba
w(x)dxf − pn∞.
The error is zero if f ∈ Pn, regardless of the interpolation (sampling) points {xk}nk=0.
Otherwise, we can choose{
xk
}n
k=0in a smart way so that I n(f ) = I (f ) for all f
∈Pm,
where m > n is as large as possible.
Lemma 3.1. The orthogonal polynomial φn has n distinct roots in [a, b].
Proof. Let σ denote the number of sign changes of φn in [a, b]. If σ < n, let x1, . . . , xσ
denote the ordered points in [a, b] where φn changes sign. Consider q σ(x) = (x −x1) · · · (x − xσ). We have 2 possibilities – q σ is either positive or negative at b.
• If it is positive, then
φn, q σ = ba
w(x) φn(x)q σ(x) >0 except at xi
dx > 0.
• If it is negative, then
φn, −q σ = − ba
w(x) φn(x)q σ(x) <0 except at xi
dx > 0.
Contradiction as φn is the orthogonal polynomial of degree n, i.e. it is orthogonal
to all polynomials in Pn−1. Therefore σ ≥ n.
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 49/50
45
Theorem 3.2. Let w ∈ C (a, b) with w > 0 except for a finite number of points and ba w(x)dx < ∞. Let φn+1 be the orthogonal polynomial of degree n + 1 associated
with the inner product
g1, g2 =
ba
w(x)g1(x)g2(x)dx.
Let {x⋆i }ni=0, x⋆
i ∈ [a, b] be the n + 1 distinct zeroes of φn+1 (see the above lemma).If we approximate
I (f ) =
ba
w(x)f (x)dx
by I n(f ) = I ( pn) where pn ∈ Pn is such that pn(x⋆i ) = f (x⋆
i ) for i = 0 → n, then
I n(f ) =n
i=0
w⋆i f (x⋆
i ),
w⋆i =
ba
w(x)li(x)dx,
li =n
j=0,j=i
(x − x⋆ j )
(x⋆i − x⋆
j )
for i = 0 → n. Also I n(f ) = I (f ) for all f ∈ P2n+1.
Proof. Let f ∈ P2n+1. Then f − pn ∈ P2n+1 has roots at {x⋆i }ni=0 and therefore
f
− pn = q nφn+1 for some q n
∈Pn. Then
I (f ) − I n(f ) = I (f ) − I ( pn)
=
ba
w(x) [f (x) − pn(x)] dx
=
ba
w(x)q n(x)φn+1(x)dx
= q n, φn+1 = 0
as φn+1 is the orthogonal polynomial of degree n + 1. Hence I n(f ) = I (f ) for allf ∈ P2n+1.
With n + 1 sampling points, it is not possible to choose wi, i = 0 → n, such thatI n(f ) = I (f ) for all f ∈ P2n+2. Consider f (x) =
ni=0(x − xi)
2 ∈ P2n+2. ClearlyI (f ) > 0, but I n(f ) = 0.
Choosing the sampling points as the roots of φn+1 is called Gaussian Quadrature . Gaussian Quadrature
Example. Let [a, b] = [−1, 1], w ≡ 1 and
g1, g2 =
1
−1g1(x)g2(x)dx.
For n = 1,
I 1(f ) = w⋆1f (x⋆
0) + w⋆1f (x⋆
1).
7/27/2019 m2n1
http://slidepdf.com/reader/full/m2n1 50/50
46
Recall that φ2(x) = x2 − 1/3 and so x⋆0 = −1/
√ 3, x⋆
1 = 1/√
3. Now we determinew⋆
0, w⋆1. Observe that
I 1(1) = w⋆0 + w⋆1 = I (1) = 1
−11dx = 2,
I 1(x) =1√
3(−w⋆
0 + w⋆1) = I (x) =
1
−1xdx = 0
and hence w⋆0 = w⋆
1 = 1. Therefore I 1(f ) = f (−1/√
3) + f (1/√
3). Also I 1(x2) =2/3 = I (x2) and I 1(x3) = 0 = I (x3).For n = 2, φ3(x) = x3 − (3/5)x and so x⋆
0 = − 3/5, x⋆
1 = 0, x⋆2 =
3/5. Therefore
I 2(f ) = w⋆0f (−
3/5) + w⋆
1f (0) + w⋆2f (
3/5).
This is exact for cubics and so
I 2(1) = w⋆0 + w⋆
1 + w⋆2 = 2,
I 2(x) = −
3/5w⋆0 +
3/5w⋆
2 = 0,
I 2(x2) =3
5w⋆
0 +3
5w⋆
2 =2
3.
Hence
I 2(f ) =1
9
5f (−
3/5) + 8f (0) + 5f (
3/5)
.