NUMERICAL ANALYSIS -...

NUMERICAL ANALYSIS

D- L- N

December 17, 2016

Contents

1 Solutions of Equations in One Variable 4

1.1 The Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.1 Bisection Technique . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.2 Bisection Algorithm . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Fixed-Point Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.4 Error Analysis for Iterative Methods . . . . . . . . . . . . . . . . . . 22

2 Interpolation and Polynomial Approximation 25

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.2 Interpolation and the Lagrange Polynomial . . . . . . . . . . . . . . 26

2.2.1 Lagrange Interpolating Polynomials . . . . . . . . . . . . . . 28

2.3 Divided Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.3.1 Forward Differences . . . . . . . . . . . . . . . . . . . . . . . 38

2.3.2 Backward Differences . . . . . . . . . . . . . . . . . . . . . . 39

2.4 Approximation theory . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.4.2 Discrete Least Squares Approximation . . . . . . . . . . . . . 43

2.4.3 Linear Least Squares . . . . . . . . . . . . . . . . . . . . . . 44

2.4.4 Polynomial Least Squares . . . . . . . . . . . . . . . . . . . . 47

3 Initial-Value Problems for Ordinary Differential Equations 51

3.1 The Elementary Theory of Initial-Value Problems . . . . . . . . . . . 51

3.2 Euler’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.2.1 Error Bounds for Euler’s Method . . . . . . . . . . . . . . . . 57

2

CONTENTS 3

3.3 Higher-Order Taylor Methods . . . . . . . . . . . . . . . . . . . . . 59

3.4 Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.4.1 Runge-Kutta Methods of OrderTwo . . . . . . . . . . . . . . 60

Chapter 1

Solutions of Equations in One Variable

The growth of a population can often be modeled over short periods of timeby assuming that the population grows continuously with time at a rate propor-tional to the number present at that time. Suppose that N(t) denotes the numberin the population at time t and λ denotes the constant birth rate of the population.Then the population satisfies the differential equation

dN(t)dt

= λN(t),

whose solution is N(t) = N0eλt, where N0 denotes the initial population.

This exponential model is valid only when the population is isolated, with noimmigration. If immigration is permitted at a constant rate v, then the differentialequation becomes

dN(t)dt

= λN(t) + v

whose solution is

N(t) = N0eλt +vλ(eλt − 1).

Suppose a certain population contains N(0) = 1, 000, 000 individuals initially,that 435, 000 individuals immigrate into the community in the first year, and thatN(1) = 1, 564, 000 individuals are present at the end of one year. To determinethe birth rate of this population, we need to find λ in the equation

1, 564, 000 = 1, 000, 000eλ +435, 000

λ

It is not possible to solve explicitly for λ in this equation, but numerical methodsdiscussed in this chapter can be used to approximate solutions of equations ofthis type to an arbitrarily high accuracy. The solution to this particular problemis considered in Exercise 24 of Section 1.3.

4

1.1. THE BISECTION METHOD 5

1.1 The Bisection Method

In this chapter we consider one of the most basic problems of numerical approx-imation, the root-finding problem. This process involves finding a root, or solu-tion, of an equation of the form f (x) = 0, for a given function f . A root of thisequation is also called a zero of the function f .

1.1.1 Bisection Technique

The first technique, based on the Intermediate Value Theorem, is called the Bi-section, or Binary-search, method.

Suppose f is a continuous function defined on the interval [a, b], with f (a) andf (b) of opposite sign. The Intermediate Value Theorem implies that a number pexists in (a, b) with f (p) = 0. Although the procedure will work when there ismore than one root in the interval (a, b), we assume for simplicity that the root inthis interval is unique. The method calls for a repeated halving (or bisecting) ofsubintervals of [a, b] and, at each step, locating the half containing p.

To begin, set a1 = a and b1 = b, and let p1 be the midpoint of [a, b]; that is,

p1 =a1 + b1

2.

• If f (p1) = 0, then p = p1, and we are done.

• If f (p1) 6= 0, then f (p1) has the same sign as either f (a1) or f (b1).

– If f (p1) and f (a1) have the same sign, p ∈ (p1, b1). Set a2 = p1 andb2 = b1.

– If f (p1) and f (b1) have the same sign, p ∈ (a1, p1). Set a2 = a1 andb2 = p1.

Then reapply the process to the interval [a2, b2]. This produces the method de-scribed in Algorithm Bisection.

1.1.2 Bisection Algorithm

To find a solution to f (x) = 0 given the continuous function f on the interval[a, b], where f (a) and f (b) have opposite signs:

• INPUT endpoints a, b; tolerance TOL; maximum number of iterations N0.

• OUTPUT approximate solution p or message of failure.

Step 1 Set i = 1; FA = f (a).

6 CHAPTER 1. SOLUTIONS OF EQUATIONS IN ONE VARIABLE

Step 2 While i ≤ N0 do Steps 3–6.

Step 3 Set p = a+b2 ; (compute pi), FP = f (p).

Step 4 If FP = 0 or b−a2 < TOL then

• OUTPUT (p); (Procedure completed successfully.)• STOP.

Step 5 Set i = i + 1;

Step 6 If FA.FP > 0 then

• set a = p; (Compute ai, bi).• set FA = FP.

else set b = p. (FA is unchanged.)

Step 7 • OUTPUT (‘Method failed after N0 iterations, N0 =′, N0). (The proce-dure was unsuccessful).

• STOP.

Other stopping procedures can be applied in Step 4 of Bisection Algorithm orin any of the iterative techniques in this chapter. For example, we can select atolerance ε > 0 and generate p1, p2, . . . , pN until one of the following conditionsis met:

|pN − pN−1| < ε, (1.1)|pN − pN−1||pN|

< ε, or (1.2)

f (pN) < ε. (1.3)

Unfortunately, difficulties can arise using any of these stopping criteria. Forexample, there are sequences {pn}∞

n=0 with the property that the differences pn−pn−1 converge to zero while the sequence itself diverges. It is also possible forf (pn) to be close to zero while pn differs significantly from p. Without additionalknowledge about f or p, Inequality (1.2) is the best stopping criterion to applybecause it comes closest to testing relative error.

When using a computer to generate approximations, it is good practice to setan upper bound on the number of iterations. This eliminates the possibility ofentering an infinite loop, a situation that can arise when the sequence diverges(and also when the program is incorrectly coded). This was done in Step 2 ofBisection Algorithm where the bound N0 was set and the procedure terminatedif i > N0.

Note that to start the Bisection Algorithm, an interval [a, b] must be foundwith f (a). f (b) < 0. At each step the length of the interval known to contain a

1.1. THE BISECTION METHOD 7

zero of f is reduced by a factor of 2; hence it is advantageous to choose the initialinterval [a, b] as small as possible. For example, if f (x) = 2x3 − x2 + x − 1, wehave both

f (−4). f (4) < 0 and f (0). f (1) < 0,

so the Bisection Algorithm could be used on [−4, 4] or on [0, 1]. Starting the Bi-section Algorithm on [0, 1] instead of [−4, 4] will reduce by 3 the number ofiterations required to achieve a specified accuracy.

The following example illustrates the Bisection Algorithm. The iteration inthis example is terminated when a bound for the relative error is less than 0.0001.This is ensured by having

|p− pn|min{|an|, |bn|}

< 10−4.

Example 1.1. Show that f (x) = x3 + 4x2− 10 = 0 has a root in [1, 2], and use theBisection method to determine an approximation to the root that is accurate to atleast within 10−4.

Solution. Because f (1) = −5 and f (2) = 14 the Intermediate Value Theoremensures that this continuous function has a root in [1, 2].

For the first iteration of the Bisection method we use the fact that at the mid-point of [1, 2] we have f (1.5) = 2.375 > 0. This indicates that we should select theinterval [1, 1.5] for our second iteration. Then we find that f (1.25) = −1.796875so our new interval becomes [1.25, 1.5], whose midpoint is 1.375. Continuing inthis manner gives the values in Table 1.1.

n an bn pn f (pn)0 1.0 2.0 1.5 2.3751 1.0 1.5 1.25 −1.796872 1.25 1.5 1.375 0.162113 1.25 1.5 1.375 0.162114 1.25 1.375 1.3125 −0.848395 1.3125 1.375 1.34375 −0.350986 1.34375 1.375 1.3593757 −0.096417 1.359375 1.375 1.3671875 0.032368 1.359375 1.3671875 1.36328125 −0.032159 1.36328125 1.3671875 1.365234375 0.00007210 1.36328125 1.365234375 1.364257813 −0.0160511 1.364257813 1.365234375 1.364746094 −0.0079912 1.364746094 1.365234375 1.364990235 −0.0039613 1.364990235 1.365234375 1.365112305 −0.00194

Table 1.1:


After 13 iterations, p13 = 1.365112305 approximates the root p with an error

|p− p13| < |b14 − a14| = |1.365234375− 1.365112305| = 0.000122070.

Since |a14| < |p|, we have

|p− p13||p| <

|b14 − a14||a14|

≤ 9.0× 10−5,

so the approximation is correct to at least within 10−4. The correct value of p tonine decimal places is p = 1.365230013. Note that p9 is closer to p than is the finalapproximation p13. You might suspect this is true because | f (p9)| < | f (p13)|, butwe cannot be sure of this unless the true answer is known.

The Bisection method, though conceptually clear, has significant drawbacks.It is relatively slow to converge (that is, N may become quite large before |p− pN|is sufficiently small), and a good intermediate approximation might be inadver-tently discarded. However, the method has the important property that it alwaysconverges to a solution, and for that reason it is often used as a starter for themore efficient methods we will see later in this chapter.

Theorem 1.1. Suppose that f ∈ C[a, b] and f (a). f (b) < 0. The Bisection methodgenerates a sequence {pn}∞

n=1 approximating a zero p of f with

|p− pn| ≤b− a

2n , when n ≥ 1.

It is important to realize that Theorem 1.1 gives only a bound for approxi-mation error and that this bound might be quite conservative. For example, thisbound applied to the problem in Example 1.1 ensures only that

|p− p9| ≤2− 1

29 ≈ 2× 10−3,

but the actual error is much smaller:

|p− p9| = |1.365230013− 1.365234375| ≈ 4.4× 10−6.

Example 1.2. Determine the number of iterations necessary to solve

f (x) = x3 + 4x2 − 10 = 0

with accuracy 10−3 using a1 = 1 and b1 = 2.

Solution. We we will use logarithms to find an integer N that satisfies

|pN − p| ≤ 2−N(b− a) = 2−N < 10−3.

1.2. FIXED-POINT ITERATION 9

Logarithms to any base would suffice, but we will use base-10 logarithms be-cause the tolerance is given as a power of 10. Since 2−N < 10−3 implies that

log10 2−N < log10 10−3 = −3,

we haveN >

3log10 2

≈ 9.96

Hence, ten iterations will ensure an approximation accurate to within 10−3.

Table 1.1 shows that the value of p9 = 1.365234375 is accurate to within 10−4.Again, it is important to keep in mind that the error analysis gives only a boundfor the number of iterations. In many cases this bound is much larger than theactual number required.

1.2 Fixed-Point Iteration

A fixed point for a function is a number at which the value of the function doesnot change when the function is applied.

Definition 1.1. The number p is a fixed point for a given function g if g(p) = p.

In this section we consider the problem of finding solutions to fixed-pointproblems and the connection between the fixed-point problems and the root-finding problems we wish to solve. Root-finding problems and fixed-point prob-lems are equivalent classes in the following sense:

• Given a root-finding problem f (p) = 0, we can define functions g with afixed point at p in a number of ways, for example, as

g(x) = x− f (x) or as g(x) = x + 3 f (x).

• Conversely, if the function g has a fixed point at p, then the function definedby

f (x) = x− g(x)

has a zero at p.

Although the problems we wish to solve are in the root-finding form, thefixed-point form is easier to analyze, and certain fixed-point choices lead to verypowerful root-finding techniques.

We first need to become comfortable with this new type of problem, and todecide when a function has a fixed point and how the fixed points can be approx-imated to within a specified accuracy.


Example 1.3. Determine any fixed points of the function g(x) = x2 − 2.

Solution. A fixed point p for g has the property that

p = g(p) = p2 − 2

which implies thatp2 − p− 2 = 0.

A fixed point for g occurs precisely when the graph of y = g(x) intersects thegraph of y = x, so g has two fixed points, one at p = −1 and the other at p = 2.These are shown in Figure 3.1.

Figure 1.1:

The following theorem gives sufficient conditions for the existence and unique-ness of a fixed point.

Theorem 1.2. (i) If g ∈ C[a, b] and g(x) ∈ [a, b] for all x ∈ [a, b], then g has atleast one fixed point in [a, b].

(ii) If, in addition, g(x) exists on (a, b) and a positive constant k < 1 exists with

|g′(x)| ≤ k, for all x ∈ (a, b),

then there is exactly one fixed point in [a, b]. (See Figure 1.2.)

Proof. (i) If g(a) = a or g(b) = b, then g has a fixed point at an endpoint. If not,then g(a) > a and g(b) < b. The function h(x) = g(x)− x is continuous on[a, b], with

h(a) = g(a)− a > 0 va h(b) = g(b)− b.


Figure 1.2:

The Intermediate Value Theorem implies that there exists p ∈ (a, b) forwhich h(p) = 0. This number p is a fixed point for g because

0 = h(p) = g(p)− p implies that g(p) = p.

(ii) Suppose, in addition, that |g(x)| ≤ k < 1 and that p and q are both fixedpoints in [a, b]. If p 6= q, then the Mean Value Theorem implies that a num-ber ξ exists between p and q, and hence in [a, b], with

g(p)− g(q)p− q

= g′(ξ).

Thus|p− q| = |g(p)− g(q)| = |g′(ξ)||p− q| < |p− q|.

which is a contradiction. This contradiction must come from the only sup-position, p 6= q. Hence, p = q and the fixed point in [a, b] is unique.

Example 1.4. Show that g(x) = x2−23 has a unique fixed point on the interval

[−1, 1].

Solution. The maximum and minimum values of g(x) for x in [−1, 1] must occureither when x is an endpoint of the interval or when the derivative is 0. Sinceg′(x) = 2x

3 , the function g is continuous and g′(x) exists on [−1, 1]. The maximumand minimum values of g(x) occur at x = −1, x = 0, or x = 1. But g(−1) =0, g(1) = 0, and g(0) = −1

3 so an absolute maximum for g(x)on[−1, 1] occurs atx = −1 and x = 1, and an absolute minimum at x = 0.


Moreover

|g′(x)|∣∣∣∣2x

3

∣∣∣∣ ≤ 23

, for all x ∈ (−1, 1).

So g satisfies all the hypotheses of Theorem 1.2 and has a unique fixed point in[−1, 1].

For the function in Example 1.4 the unique fixed point p in the interval [−1, 1]can be determined algebraically. If

p = g(p) =p2 − 2

3, then p2 − 3p− 1 = 0,

which, by the quadratic formula, implies, that

p =3−√

132

.

Note that g also has a unique fixed point p = 3+√

132 for the interval [3, 4].

However, g(4) = 5 and g(4) = 83 > 1, so g does not satisfy the hypotheses of

Theorem 1.2 on [3, 4]. This demonstrates that the hypotheses of Theorem 1.2 aresufficient to guarantee a unique fixed point but are not necessary.

Example 1.5. Show that Theorem 1.2 does not ensure a unique fixed point ofg(x) = 3−x on the interval [0, 1], even though a unique fixed point on this intervaldoes exist.

Solution. g(x) = −3−x ln 3 < 0 on [0, 1], the function g is strictly decreasing on[0, 1]. So

g(1) =13≤ g(x) ≤ 1 = g(0), for 0 ≤ x ≤ 1.

Thus, for x ∈ [0, 1], we have g(x) ∈ [0, 1]. The first part of Theorem 1.2 ensuresthat there is at least one fixed point in [0, 1].

However,g′(0) = − ln 3 = −1.098612289,

so |g′(x)| � 1 on (0, 1), and Theorem 1.2 cannot be used to determine uniqueness.But g is always decreasing, and it is clear from Figure 1.3 that the fixed point mustbe unique.

Fixed-Point Iteration

We cannot explicitly determine the fixed point in Example 1.5 because we haveno way to solve for p in the equation p = g(p) = 3−p. We can, however, deter-mine approximations to this fixed point to any specified degree of accuracy. Wewill now consider how this can be done.


Figure 1.3:

To approximate the fixed point of a function g, we choose an initial approxi-mation p0 and generate the sequence {pn}∞

n=0 by letting pn = g(pn−1), for eachn ≥ 1. If the sequence converges to p and g is continuous, then

p = limn→∞

pn = limn→∞

g(pn−1) = g(

limn→∞

pn−1

)= g(p),

and a solution to x = g(x) is obtained. This technique is called fixed-point, orfunctional iteration. The procedure is illustrated in Figure 1.4 and detailed inAlgorithm 2.2.

Figure 1.4:

Fixed-Point Iteration Algorithm

To find a solution to p = g(p) given an initial approximation p0 :

• INPUT initial approximation p0; tolerance TOL; maximum number of iter-ations N0.



Step 1 Set i = 1.


Step 3 Set p = g(p0); (compute pi).

Step 4 If |p− p0| < TOL then



Step 6 Set p0 = p; (Update p0).

Step 7 OUTPUT (‘Method failed after N0 iterations, N0 =′, N0); (The procedurewas unsuccessful.)STOP.

The following illustrates some features of functional iteration.

Illustration

The equation x3 + 4x2 − 10 = 0 has a unique root in [1, 2]. There are many waysto change the equation to the fixed-point form x = g(x) using simple algebraicmanipulation. For example, to obtain the function g described in part (c), we canmanipulate the equation x3 + 4x2 − 10 = 0 as follows:

4x2 = 10− x3, so x2 =14(10− x3), and x = ±1

2(10− x3)

12 .

To obtain a positive solution, g3(x) is chosen. It is not important for you to derivethe functions shown here, but you should verify that the fixed point of each isactually a solution to the original equation, x3 + 4x2 − 10 = 0.

(a) x = g1(x) = x− x3 − 4x2 + 10 (b) x = g2(x) =(

10x− 4x

) 12

(c) x = g3(x) =12(10− x3)

12 (d) x = g4(x) =

(10

4 + x

) 12

(e) x = g3(x) = x− x3 + 4x2 − 103x2 + 8x

With p0 = 1.5, Table 1.2 lists the results of the fixed-point iteration for all fivechoices of g. The actual root is 1.365230013, as was noted in Example 1 of Section2.1. Comparing the results to the Bisection Algorithm given in that example, it


n (a) (b) (c) (d) (e)0 1.5 1.5 1.5 1.5 1.51 −0.875 0.8165 1.286953768 1.348399725 1.3733333332 6.732 2.9969 1.402540804 1.367376372 1.3652620153 −469.7 (−8.65)1/2 1.345458374 1.364957015 1.3652300144 1.03× 108 1.375170253 1.365264748 1.3652300135 1.360094193 1.3652255946 1.367846968 1.3652305767 1.363887004 1.3652299428 1.365916734 1.3652300229 1.364878217 1.36523001210 1.365410062 1.36523001415 1.365223680 1.36523001320 1.36523023625 1.36523000630 1.365230013

Table 1.2:

can be seen that excellent results have been obtained for choices (c), (d), and (e)(the Bisection method requires 27 iterations for this accuracy). It is interestingto note that choice (a) was divergent and that (b) became undefined because itinvolved the square root of a negative number.

Although the various functions we have given are fixed-point problems forthe same root-finding problem, they differ vastly as techniques for approximat-ing the solution to the root-finding problem. Their purpose is to illustrate whatneeds to be answered: • Question: How can we find a fixed-point problem thatproduces a sequence that reliably and rapidly converges to a solution to a givenroot-finding problem?

The following theorem and its corollary give us some clues concerning thepaths we should pursue and, perhaps more importantly, some we should reject.

Theorem 1.3. Let g ∈ C[a, b] be such that g(x) ∈ [a, b], for all x in [a, b]. Suppose, inaddition, that g exists on (a, b) and that a constant 0 < k < 1 exists with

|g′(x)| ≤ k, for all x ∈ (a, b).

Then for any number p0 in [a, b], the sequence defined by

pn = g(pn−1), n ≥ 1,

converges to the unique fixed point p in [a, b].

Proof. Theorem 1.2 implies that a unique point p exists in [a, b] with g(p) = p.Since g maps [a, b] into itself, the sequence {pn}∞

n=0 is defined for all n ≥ 0, and


pn ∈ [a, b] for all n. Using the fact that |g′(x)| ≤ k and the Mean Value Theorem1.8, we have, for each n,

|pn − p| = |g(pn−1)− g(p)| = |g′(ξn)||pn−1 − p| ≤ k|pn−1 − p|,

where ξn ∈ (a, b). Applying this inequality inductively gives

|pn − p| ≤ k|pn−1 − p| ≤ k2|pn−2 − p| ≤ · · · ≤ kn|p0 − p|. (1.4)

Since 0 < k < 1, we have limn→∞ kn = 0 and

limn→∞|pn − p| ≤ lim

n→∞kn|p0 − p| = 0.

Hence {pn}∞n=0 converges to p.

Corollary 1.1. If g satisfies the hypotheses of Theorem 1.3 then bounds for the errorinvolved in using pn to approximate p are given by

|pn − p| ≤ kn max{p0 − a, b− p0} (1.5)

and|pn − p| ≤ kn

1− k|p1 − p0|, for all n ≥ 1. (1.6)

Proof. Because p ∈ [a, b], the first bound follows from Inequality (1.4):

|pn − p| ≤ kn|p0 − p| ≤ kn max{p0 − a, b− p0}.

For n ≥ 1, the procedure used in the proof of Theorem 1.3 implies that

|pn+1 − pn| = |g(pn)− g(pn−1)| ≤ k|pn − pn−1| ≤ · · · ≤ kn|p1 − p0|.

Thus for m > n ≥ 1,

|pm − pn| = |pm − pm−1 + pm−1 − · · ·+ pn+1 − pn|≤ |pm − pm−1|+ |pm−1 − pm−2|+ · · ·+ |pn+1 − pn|≤ km−1|p1 − p0|+ km−2|p1 − p0|+ · · ·+ kn|p1 − p0|= kn|p1 − p0|(1 + k + k2 · · ·+ km−n−1)

By Theorem 1.3, limm→∞ pm = p, so

|p− pn| = limm→∞

|pm − pn| ≤ kn|p1 − p0|m−n−1∑

j=0

kj ≤ kn|p1 − p0|∞∑

j=0

kj.

But∑∞

j=0 kj is a geometric series with ratio k and 0 < k < 1. This sequenceconverges to 1/(1− k), which gives the second bound:

|p− pn| ≤kn

1− k|p1 − p0|.

1.3. NEWTON’S METHOD 17

Both inequalities in the corollary relate the rate at which {pn}∞n=0 converges

to the bound k on the first derivative. The rate of convergence depends on thefactor kn. The smaller the value of k, the faster the convergence, which may bevery slow if k is close to 1.

1.3 Newton’s Method

Newton’s (or the Newton-Raphson) method is one of the most powerful and well-known numerical methods for solving a root-finding problem. There are manyways of introducing Newton’s method.

If we only want an algorithm, we can consider the technique graphically, asis often done in calculus. Another possibility is to derive Newton’s method asa technique to obtain faster convergence than offered by other types of func-tional iteration, as is done in Section 2.4. A third means of introducing Newton’smethod, which is discussed next, is based on Taylor polynomials. We will seethere that this particular derivation produces not only the method, but also abound for the error of the approximation.

Suppose that f ∈ C2[a, b]. Let p0 ∈ [a, b] be an approximation to p such thatf ′(p0) 6= 0 and |p− p0| is “small.” Consider the first Taylor polynomial for f (x)expanded about p0 and evaluated at x = p.

f (p) = f (p0) + (p− p0) f ′(p0) +(p− p0)

2

2f ′′ (ξ(p)) ,

where ξ(p) lies between p and p0. Since f (p) = 0, this equation gives

0 = f (p0) + (p− p0) f ′(p0) +(p− p0)

2

2f ′′ (ξ(p)) .

Newton’s method is derived by assuming that since |p− p0| is small, the terminvolving (p− p0)

2 is much smaller, so

0 ≈ f (p0) + (p− p0) f ′(p0).

Solving for p gives

p ≈ p0 −f (p0)

f ′(p0)≡ p1.

This sets the stage for Newton’s method, which starts with an initial approxi-mation p0 and generates the sequence {pn}∞

n=0, by

pn = pn−1 −f (pn−1)

f ′(pn−1), for n ≥ 1. (1.7)

Figure 1.5 illustrates how the approximations are obtained using successivetangents. Starting with the initial approximation p0, the approximation p1 is the


x-intercept of the tangent line to the graph of f at (p0, f (p0)). The approximationp2 is the x-intercept of the tangent line to the graph of f at (p1, f (p1)) and so on.Algorithm 2.3 follows this procedure.

Figure 1.5:

Newton’s algorithm

To find a solution to f (x) = 0 given an initial approximation p0 :

• INPUT initial approximation p0; tolerance TOL; maximum number of iter-ations N0.


Step 1 Set i = 1.


Step 3 Set p = p0 −f (p0)

f ′(p0). (compute pi).

Step 4 If |p− p0| < TOL then



Step 6 Set p0 = p; (Update p0).


Step 7 OUTPUT (‘Method failed after N0 iterations, N0 =′, N0); (The procedurewas unsuccessful.)STOP.

The stopping-technique inequalities given with the Bisection method are ap-plicable to Newton’s method. That is, select a tolerance ε > 0, and constructp1, . . . , pN until

|pN − pN−1| < ε, (1.8)|pN − pN−1||pN|

< ε, or (1.9)

f (pN) < ε. (1.10)

A form of Inequality 1.8 is used in Step 4 of Newton’s algorithm. Note that noneof the inequalities (1.8), (1.9), or (1.10) give precise information about the actualerror |pN − p|.

Newton’s method is a functional iteration technique with pn = g(pn−1), forwhich

g(pn−1) = pn−1 −f (pn−1)

f ′(pn−1), for n ≥ 1. (1.11)

In fact, this is the functional iteration technique that was used to give the rapidconvergence we saw in column (e) of Table 1.2 in Section 2.2.

It is clear from Equation (1.7) that Newton’s method cannot be continued iff ′(pn−1) = 0 for some n. In fact, we will see that the method is most effectivewhen f ′ is bounded away from zero near p.

Example 1.6. Consider the function f (x) = cos x− x = 0. Approximate a root off using (a) a fixed-point method, and (b) Newton’s Method.

Solution. (a) A solution to this root-finding problem is also a solution to thefixed-point problem x = cosx, and the graph in Figure 2.9 implies that asingle fixed-point p lies in [0, π/2].

Table 1.6 shows the results of fixed-point iteration with p0 = π/4. The bestwe could conclude from these results is that p ≈ 0.74.

(b) To apply Newton’s method to this problem we need f (x) = − sin x − 1.Starting again with p0 = π/4, we generate the sequence defined, for n ≥ 1,by

pn = pn−1 −f (pn−1)

f ′(pn−1)= pn−1 −

cos pn−1 − pn−1

− sin pn−1 − 1.

This gives the approximations in Table 1.6 An excellent approximation isobtained with n = 3. Because of the agreement of p3 and p4 we could rea-sonably expect this result to be accurate to the places listed.


Figure 1.6:

n pn (Fixed-point method) pn (Newton’s method)0 0.7853981635 0.78539816351 0.7071067810 0.73953613372 0.7602445972 0.73908517813 0.7246674808 0.73908513324 0.7487198858 0.73908513325 0.73256084466 0.74346421137 0.7361282565

Table 1.3:

Convergence using Newton’s Method

Example 1.1 shows that Newton’s method can provide extremely accurate ap-proximations with very few iterations. For that example, only one iteration ofNewton’s method was needed to give better accuracy than 7 iterations of thefixed-point method. It is now time to examine Newton’s method more carefullyto discover why it is so effective.

The Taylor series derivation of Newton’s method at the beginning of the sec-tion points out the importance of an accurate initial approximation. The crucialassumption is that the term involving (p− p0)

2 is, by comparison with |p− p0|,so small that it can be deleted. This will clearly be false unless p0 is a good ap-proximation to p. If p0 is not sufficiently close to the actual root, there is littlereason to suspect that Newton’s method will converge to the root. However, insome instances, even poor initial approximations will produce convergence.

The following convergence theorem for Newton’s method illustrates the the-


oretical importance of the choice of p0.

Theorem 1.4. Let f ∈ C2[a, b]. If p ∈ (a, b) is such that f (p) = 0 and f ′(p) 6= 0,then there exists a δ > 0 such that Newton’s method generates a sequence {pn}∞

n=0,converging to p for any initial approximation p0 ∈ [p− δ, p + δ].

Proof. The proof is based on analyzing Newton’s method as the functional iter-ation scheme pn = g(pn−1), for n ≥ 1, with

g(x) = x− f (x)f ′(x)

.

Let k be in (0, 1). We first find an interval p0 ∈ [p− δ, p+ δ] that g maps into itselfand for which |g′(x)| ≤ k, for all x ∈ (p− δ, p + δ).

Since f ′ is continuous and f ′(p) 6= 0, implies that there exists a δ1 > 0 suchthat f ′(x) 6= 0 for x ∈ [p− δ1, p + δ1] ⊆ [a, b]. Thus g is defined and continuouson [p− δ1, p + δ1]. Also

g′(x) = 1− f ′(x) f ′(x)− f (x) f ′′(x)[ f ′(x)]2

=f (x) f ′′(x)[ f ′(x)]2

,

for x ∈ [p− δ1, p + δ1], and, since f ∈ C2[a, b], we have g ∈ C1[p− δ1, p + δ1].

By assumption, f (p) = 0, so

g′(p) =f (p) f ′′(p)[ f ′(p)]2

= 0.

Since g′ is continuous and 0 < k < 1, implies that there exists a δ, with 0 < δ < δ1,and

|g′(x)| ≤ k, for all x ∈ [p− δ, p + δ].

It remains to show that g maps [p − δ, p + δ] into [p − δ, p + δ]. If x ∈ [p −δ, p + δ], the Mean Value Theorem implies that for some number ξ between xand p, |g(x)− g(p)| = |g′(ξ)||x− p|. So

|g(x)− p| = |g(x)− g(p)| = |g′(ξ)||x− p| ≤ |x− p| < |x− p|.

Since x ∈ [p− δ, p + δ], it follows that |x− p| < δ and that |g(x)− p| < δ. Hence,g maps [p− δ, p + δ] into [p− δ, p + δ].

All the hypotheses of the Fixed-Point Theorem 2.4 are now satisfied, so thesequence {pn}∞

n=0, defined by

pn = g(pn−1) = pn−1 −f (pn−1)

f ′(pn−1), for n ≥ 1.

converges to p for any p0 ∈ [p− δ, p + δ].


Theorem 1.4 states that, under reasonable assumptions, Newton’s methodconverges provided a sufficiently accurate initial approximation is chosen. It alsoimplies that the constant k that bounds the derivative of g, and, consequently, in-dicates the speed of convergence of the method, decreases to 0 as the procedurecontinues. This result is important for the theory of Newton’s method, but it isseldom applied in practice because it does not tell us how to determine δ.

In a practical application, an initial approximation is selected and successiveapproximations are generated by Newton’s method. These will generally eitherconverge quickly to the root, or it will be clear that convergence is unlikely.

1.4 Error Analysis for Iterative Methods

In this section we investigate the order of convergence of functional iterationschemes and, as a means of obtaining rapid convergence, rediscover Newton’smethod. We also consider ways of accelerating the convergence of Newton’smethod in special circumstances. First, however, we need a new procedure formeasuring how rapidly a sequence converges.

Order of Convergence

Definition 1.2. Suppose {pn}∞n=0 is a sequence that converges to p, with pn 6= p

for all n. If positive constants λ and α exist with

limn→∞

|pn+1 − p||pn − p|α = λ,

then {pn}∞n=0 converges to p of order ff, with asymptotic error constant ˘.

An iterative technique of the form pn = g(pn−1) is said to be of order α if thesequence {pn}∞

n=0 converges to the solution p = g(p) of order α.

In general, a sequence with a high order of convergence converges morerapidly than a sequence with a lower order. The asymptotic constant affects thespeed of convergence but not to the extent of the order. Two cases of order aregiven special attention:

(i) If α = 1 (and λ < 1), the sequence is linearly convergent.

(ii) If α = 2, the sequence is quadratically convergent.

Quadratically convergent sequences are expected to converge much quickerthan those that convergent only linearly, but the next result implies that an arbi-trary technique that generates a convergent sequences does so only linearly.

1.4. ERROR ANALYSIS FOR ITERATIVE METHODS 23

Theorem 1.5. Let g ∈ C[a, b] be such that g(x) ∈ [a, b], for all x ∈ [a, b]. Suppose, inaddition, that g is continuous on (a, b) and a positive constant k < 1 exists with

|g(x)| ≤ k, for all x ∈ (a, b).

If g′(p) 6= 0, then for any number p0 6= p in [a, b], the sequence

pn = g(pn−1), for n ≥ 1,

converges only linearly to the unique fixed point p in [a, b].

Proof. We know from the Fixed-Point Theorem 1.3 in Section 1.2 that the se-quence converges to p. Since g exists on (a, b), we can apply the Mean ValueTheorem to g to show that for any n,

pn+1 − p = g(pn)− g(p) = g′(ξn)(pn − p),

where ξn is between pn and p. Since {pn}∞n=0 converges to p, we also have {ξn}∞

n=0converging to p. Since g is continuous on (a, b), we have

limn→∞

g′(ξn) = g′(p).

Thus

limn→∞

|pn+1 − p||pn − p| = lim

n→∞|g′(ξn)| = |g′(p)|.

Hence, if g′(p) 6= 0, fixed-point iteration exhibits linear convergence with asymp-totic error constant |g(p)|.

Theorem 1.5 implies that higher-order convergence for fixed-point methodsof the form g(p) = p can occur only when g′(p) = 0. The next result describesadditional conditions that ensure the quadratic convergence we seek.

Theorem 1.6. Let p be a solution of the equation x = g(x). Suppose that g′(p) = 0and g′′ is continuous with |g′′(x)| < M on an open interval I containing p. Then thereexists a δ > 0 such that, for p0 ∈ [p− δ, p + δ], the sequence defined by pn = g(pn−1),when n ≥ 1, converges at least quadratically to p. Moreover, for sufficiently large valuesof n,

|pn+1 − p| < M2(pn − p)2.

Theorems 1.5 and 1.6 tell us that our search for quadratically convergentfixed-point methods should point in the direction of functions whose derivativesare zero at the fixed point. That is:


• For a fixed point method to converge quadratically we need to have bothg(p) = p, and g′(p) = 0.

The easiest way to construct a fixed-point problem associated with a root-finding problem f (x) = 0 is to add or subtract a multiple of f (x) from x.Consider the sequence

pn = g(pn−1), for n ≥ 1,

for g in the formg(x) = x− φ(x) f (x),

where φ is a differentiable function that will be chosen later.

For the iterative procedure derived from g to be quadratically convergent,we need to have g′(p) = 0 when f (p) = 0. Because

g′(x) = 1− φ′(x) f (x)− φ(x) f ′(x),

and f (p) = 0, we have

g′(p) = 1− φ′(p) f (p)− φ(p) f ′(p) = 1− φ(p) f ′(p),

and g(p) = 0 if and only if φ(p) = 1/ f ′(p).

If we let φ(x) = 1/ f (x), then we will ensure that φ(p) = 1/ f ′(p) andproduce the quadratically convergent procedure

pn = g(pn−1) = pn−1 −f (pn−1)

f ′(pn−1), for n ≥ 1.

This, of course, is simply Newton’s method. Hence

• If f (p) = 0 and f ′(p) 6= 0, then for starting values sufficiently close to p,Newton’s method will converge at least quadratically.

Multiple Roots

In the preceding discussion, the restriction was made that f ′(p) 6= 0, where p isthe solution to f (x) = 0. In particular, Newton’s method and the Secant methodwill generally give problems if f ′(p) = 0 when f (p) = 0. To examine thesedifficulties in more detail, we make the following definition.

Definition 1.3. A solution p of f (x) = 0 is a zero of multiplicity m of f if forx 6= p, we can write f (x) = (x− p)mq(x), where limx→p q(x) 6= 0.

In essence, q(x) represents that portion of f (x) that does not contribute to thezero of f . The following result gives a means to easily identify simple zeros of afunction, those that have multiplicity one.

Chapter 2

Interpolation and Polynomial Approximation

2.1 Introduction

A census of the population of the United States is taken every 10 years. Thefollowing table lists the population, in thousands of people, from 1950 to 2000,and the data are also represented in the figure.

Year 1950 1960 1970 1980 1990 2000Population 151,326 179,323 203,302 226,542 249,633 281,422(in thousands)

In reviewing these data, we might ask whether they could be used to providea reasonable estimate of the population, say, in 1975 or even in the year 2020.Predictions of this type can be obtained by using a function that fits the givendata. This process is called interpolation and is the subject of this chapter. Thispopulation problem is considered throughout the chapter .

25

26 CHAPTER 2. INTERPOLATION AND POLYNOMIAL APPROXIMATION

2.2 Interpolation and the Lagrange Polynomial

One of the most useful and well-known classes of functions mapping the set ofreal numbers into itself is thealgebraic polynomials, the set of functions of theform

Pn(x) = anxn + an−1xn−1 + ... + a1x + a0,

where n is a nonnegative integer and a0, . . . , an are real constants. One reasonfor their importance is that they uniformly approximate continuous functions.By this we mean that given any function, defined and continuous on a closedand bounded interval, there exists a polynomial that is as “close” to the givenfunction as desired. This result is expressed precisely in the Weierstrass Approx-imation Theorem. (See Figure 2.1)

Theorem 2.1 (Weierstrass Approximation Theorem). Suppose that f is defined andcontinuous on [a, b]. For each ε > 0, there exists a polynomial P(x), with the propertythat | f (x)− P(x)| < ε , for all x in [a, b].

The proof of this theorem can be found in most elementary texts on real anal-ysis (see, for example, [Bart], pp. 165–172). Another important reason for con-sidering the class of polynomials in the approximation of functions is that thederivative and indefinite integral of a polynomial are easy to determine and arealso polynomials. For these reasons, polynomials are often used for approximat-ing continuous functions.

The Taylor polynomials were introduced in Section 1.1, where they were de-scribed as one of the fundamental building blocks of numerical analysis. Giventhis prominence, you might expect that polynomial interpolation would makeheavy use of these functions. However this is not the case. The Taylor polynomi-als agree as closely as possible with a given function at a specific point, but theyconcentrate their accuracy near that point.

A good interpolation polynomial needs to provide a relatively accurate ap-proximation over an entire interval, and Taylor polynomials do not generally dothis. For example, suppose we calculate the first six Taylor polynomials aboutx0 = 0 for f (x) = ex.

Since the derivatives of f (x) are all ex, which evaluated at x0 = 0 gives 1, theTaylor polynomials are

P0(x) = 1,P1(x) = 1 + x,

P2(x) = 1 + x + x2,

P3(x) = 1 + x + x2 + x3,

P4(x) = 1 + x + x2 + x3 + x4,

2.2. INTERPOLATION AND THE LAGRANGE POLYNOMIAL 27

Figure 2.1:

P5(x) = 1 + x +x2

2+

x3

6+

x4

24+

x5

120.

The graphs of the polynomials are shown in Figure 2.1. (Notice that evenfor the higher-degree polynomials, the error becomes progressively worse as wemove away from zero.)

Very little of Weierstrass’s work was published during his lifetime, but hislectures, particularly on the theory of functions, had significant influence on anentire generation of students.

Although better approximations are obtained for f (x) = ex if higher-degreeTaylor polynomials are used, this is not true for all functions. Consider, as anextreme example, using Taylor polynomials of various degrees for f (x) = 1

x ex-panded about x0 = 1 to approximate f (3) = 1

3 . Since

f (x) = x−1, f ′(x) = −x−2, f ′′(x) = (−1)22x−3,

and, in general,f (k)(x) = (−1)kk!x−k−1,

the Taylor polynomials are

Pn(x) =n∑

k=0

f (k)(1)k!

(x− 1)k =n∑

k=0

(−1)k(x− 1)k

To approximate f (3) = 13 by Pn(3) for increasing values of n, we obtain the values

in Table 2.2 - rather a dramatic failure! When we approximate f (3) = 13 by Pn(3)


n 0 1 2 3 4 5 6 7Pn(3) 1 −1 3 −5 11 −21 43 −85

for larger values of n, the approximations become increasingly inaccurate. Edi-torial review has deemed that any suppressed content does not materially affectthe overall learning experience. Cengage Learning reserves the right to removeadditional content at any time if subsequent rights restrictions require it.

For the Taylor polynomials all the information used in the approximation isconcentrated at the single number x0, so these polynomials will generally giveinaccurate approximations as we move away from x0. This limits Taylor polyno-mial approximation to the situation in which approximations are needed only atnumbers close to x0. For ordinary computational purposes it is more efficient touse methods that include information at various points. We consider this in theremainder of the chapter. The primary use of Taylor polynomials in numericalanalysis is not for approximation purposes, but for the derivation of numericaltechniques and error estimation.

2.2.1 Lagrange Interpolating Polynomials

The problem of determining a polynomial of degree one that passes through thedistinct points (x0, y0) and (x1, y1) is the same as approximating a function f forwhich f (x0) = y0 and f (x1) = y1 by means of a first-degree polynomial interpo-lating, or agreeing with, the values off at the given points. Using this polynomialfor approximation within the interval given by the endpoints is called polyno-mial interpolation.

Define the functions

L0(x) =x− x1

x0 − x1and L1(x) =

x− x0

x1 − x0.

The linear Lagrange interpolating polynomial through (x0, y0) and (x1, y1) is

P(x) = L0(x) f (x0) + L1(x) f (x1) =x− x1

x0 − x1f (x0) +

x− x0

x1 − x0f (x1).

Note thatL0(x0) = 1, L0(x1) = 0, L1(x0) = 0, and L1(x1) = 1,

which implies that

P(x0) = 1 · f (x0) + 0 · f (x1) = f (x0) = y0

andP(x1) = 0 · f (x0) + 1 · f (x1) = f (x1) = y1.

So P is the unique polynomial of degree at most one that passes through (x0, y0)and (x1, y1).


Figure 2.2:

Example 2.1. Determine the linear Lagrange interpolating polynomial that passesthrough the points (2, 4) and (5, 1).

Solution. In this case we have

L0(x) =x− 52− 5

= −13(x− 5) and L1(x) =

x− 25− 2

=13(x− 2),

soP(x) = −1

3(x− 5)4 +

13(x− 2)1 = −4

3x +

203

+13

x− 23= −x + 6.

The graph of y = P(x) is shown in Figure 2.2.

To generalize the concept of linear interpolation, consider the construction ofa polynomial of degree at most n that passes through the n + 1 points

(x0, f (x0)), (x1, f (x1)), ..., (xn, f (xn)).

(See Figure 2.3.)

In this case we first construct, for each k = 0, 1, ..., n, a function Ln,k(x) with theproperty that Ln,k(xi) = 0 when i 6= k and Ln,k(xk) = 1. To satisfy Ln,k(xi) = 0for each i 6= k requires that the numerator of Ln,k(x) contain the term

(x− x0)(x− x1)...(x− xk−1)(x− xk+1)...(x− xn).

To satisfy Ln,k(xk) = 1, the denominator of Ln,k(x) must be this same term butevaluated at x = xk. Thus

Ln,k(x) =(x− x0)...(x− xk−1)(x− xk+1)...(x− x + n)(xk − x0)...(xk − xk−1)(xk − xk+1)...(xk − xn)

.


Figure 2.3:

A sketch of the graph of a typical Ln,k (when n is even) is shown in Figure 3.5.

The interpolating polynomial is easily described once the form of Ln,k is known.This polynomial, called then th Lagrange interpolating polynomial, is defined inthe following theorem.

Theorem 2.2. If x0, x1, ..., xn are n + 1 distinct numbers and f is a function whose val-ues are given at these numbers, then a unique polynomial P(x) of degree at mostnexistswith f (xk) = P(xk), for each k = 0, 1, ..., n. This polynomial is given by

P(x) = f (x0)Ln,0(x) + ... + f (xn)Ln,n(x) =n∑

k=0

f (xk)Ln,k(x), (2.1)

where, for each k = 0, 1, ..., n,

Ln,k(x) =(x− x0)(x− x1)...(x− xk−1)(x− xk+1)...(x− xn)

(xk − x0)(xk − x1)...(xk − xk−1)(xk − xk+1)...(xk − xn)(2.2)

=n∏

i=0i 6=j

(x− xi)

(xi − xj)

We will write Ln,k(x) simply as Lk(x) when there is no confusion as to its de-gree.

Example 2.2. (a) Use the numbers (called nodes) x0 = 2, x1 = 2.75, and x2 = 4to find the second Lagrange interpolating polynomial for f (x) = 1

x .(b) Use this polynomial to approximate f (3) = 1

3 .

Solution. (a) We first determine the coefficient polynomials L0(x), L1(x), and L2(x).In nested form they are

L0(x) =(x− 2.75)(x− 4)(2− 2.5)(2− 4)

=23(x− 2.75)(x− 4),


Figure 2.4:

L1(x) =(x− 2)(x− 4)

(2.75− 2)(2.75− 4)= −16

15(x− 2)(x− 4),

and

L2(x) =(x− 2)(x− 2.75)(4− 2)(4− 2.5)

=25(x− 2)(x− 2.75).

Also, f (x0) = f (2) = 12 , f (x1) = f (2.75) = 4

11 , and f (x2) = f (4) = 14 , so

P(x) =2∑

k=0

f (xk)Lk(x)

=13(x− 2.75)(x− 4)− 64

165(x− 2)(x− 4) +

110

(x− 2)(x− 2.75)

=1

22x2 − 35

88x +

4944

.

(b) An approximation to f (3) = 13 (see Figure 2.4) is

f (3) ≈ P(3) =9

22− 105

88+

4944

=2988≈ 0.32955.

Recall that in the opening section of this chapter (see Table 2.2) we found that noTaylor polynomial expanded about x0 = 1 could be used to reasonably approxi-mate f (x) = 1

x at x = 3.

The next step is to calculate a remainder term or bound for the error involvedin approximating a function by an interpolating polynomial.

Theorem 2.3. Suppose x0, x1, ..., xn are distinct numbers in the interval [a, b] and f ∈Cn+1[a, b]. Then, for each x in [a, b], a number ξ(x)(generally unknown) betweenx0, x1, ..., xn, and hence in (a, b), exists with

f (x) = P(x) +f (n+1)(ξ(x))(n + 1)!

(x− x0)(x− x1)...(x− xn), (2.3)


where P(x) is the interpolating polynomial given in Eq. 2.1.

The error formula in Theorem 2.3 is an important theoretical result becauseLagrange polynomials are used extensively for deriving numerical differentia-tion and integration methods. Error bounds for these techniques are obtainedfrom the Lagrange error formula. Note that the error form for the Lagrangepolynomial is quite similar to that for the Taylor polynomial. The n th Taylorpolynomial about x0 concentrates all the known information at x0 and has anerror term of the form

f (n+1)(ξ(x))(n + 1)!

(x− x0)n+1.

The Lagrange polynomial of degreen uses information at the distinct numbersx0, x1, ..., xn and, in place of (x − x0)

n , its error formula uses a product of then + 1 terms (x− x0), (x− x1), ..., (x− xn) :

f (n+1)(ξ(x))(n + 1)!

(x− x0)(x− x1)...(x− xn).

Example 2.3. In Example 2.2 we found the second Lagrange polynomial for f (x) =1x on [2, 4] using the nodes x0 = 2, x1 = 2.75, and x2 = 4. Determine the errorform for this polynomial, and the maximum error when the polynomial is usedto approximate f (x) for x ∈ [2, 4].

Solution. Because f (x) = x−1, we have f ′(x) = −x−2, f ′′(x) = 2x−3, andf ′′′(x) = −6x−4.

As a consequence, the second Lagrange polynomial has the error form

f ′′′(ξ(x))3!

(x− x0)(x− x1)(x− x2) = −(ξ(x))−4(x− 2)(x− 2.75)(x− 4),

for ξ(x) in (2, 4). The maximum value of (ξ(x))−4 on the interval is 2−4 = 116 . We

now need to determine the maximum value on this interval of the absolute valueof the polynomial

g(x) = (x− 2)(x− 2.75)(x− 4) = x3 − 354

x2 +492

x− 22.

Because

ddx

(x3 − 35

4x2 +

492

x− 22)= 3x2 − 35

2+

492

=12(3x− 7)(2x− 7),

the critical points occur at x = 73 , with g

(73

)= 25

108 , and x = 72 , with g

(72

)=

− 916 . Hence, the maximum error is

f ′′′(ξ(x))3!

|(x− x0)(x− x1)(x− x2)| ≤116

∣∣∣∣− 916

∣∣∣∣ = 3512≈ 0.00586.

2.3. DIVIDED DIFFERENCES 33

The next example illustrates how the error formula can be used to preparea table of data that will ensure a specified interpolation error within a specifiedbound.

Example 2.4. Suppose a table is to be prepared for the function f (x) = ex, for xin [0, 1]. Assume the number of decimal places to be given per entry is d ≥ 8 andthat the difference between adjacent x-values, the step size, is h. What step sizeh will ensure that linear interpolation gives an absolute error of at most 10−6 forall x in [0, 1]?

Solution. Let x0, x1, ... be the numbers at which f is evaluated, x be in [0, 1], andsuppose j satisfies xj ≤ x ≤ xj+1. Eq. 2.3 implies that the error in linear interpo-lation is

| f (x)− P(x)| =∣∣∣∣∣ f (2)(ξ)

2!(x− xj)(x− xj+1)

∣∣∣∣∣ = | f (2)(ξ)|2|(x− xj)| |(x− xj+1)|.

The step size is h, so xj = jh, xj+1 = (j + 1)h, and

| f (x)− P(x)| ≤ | f(2)(ξ)|

2!|(x− jh)(x− (j + 1)h)|.

Hence

| f (x)− P(x)| ≤maxξ∈[0,1] eξ

2max

xj≤x≤xj+1|(x− jh)(x− (j + 1)h)|

≤ e2

maxxj≤x≤xj+1

|(x− jh)(x− (j + 1)h)|

Consider the function g(x) = (x − jh)(x − (j + 1)h), for jh ≤ x ≤ (j + 1)h.Because

g′(x) = (x− (j + 1)h) + (x− jh) = 2(

x− jh− h2

),

the only critical point for g is at x = jh + h2 , with g(jh + h

2 ) = ( h2 )

2 = h2

4 .

Since g(jh) = 0 and g((j + 1)h) = 0, the maximum value of |g′(x)| in [jh, (j +1)h] must occur at the critical point which implies that

| f (x)− P(x)| ≤ e2

maxxj≤x≤xj+1

|g(x)| ≤ e2

h2

4=

eh2

8.

Consequently, to ensure that the the error in linear interpolation is bounded by10−6, it is sufficient for h to be chosen so that

eh2

8≤ 10−6. This implies that h < 1.72−3.

Because n = 1−0h must be an integer, a reasonable choice for the step size is

h = 0.001.


2.3 Divided Differences

Iterated interpolation was used in the previous section to generate successivelyhigher-degree polynomial approximations at a specific point. Divided-differencemethods introduced in this section are used to successively generate the polyno-mials themselves.

Suppose that Pn(x) is the n th Lagrange polynomial that agrees with the func-tion f at the distinct numbers x0, x1, ..., xn. Although this polynomial is unique,there are alternate algebraic representations that are useful in certain situations.The divided differences of f with respect to x0, x1, ..., xn are used to express Pn(x)in the form.

Pn(x) = a0 + a1(x− x0)+ a2(x− x0)(x− x1)+ ...+ an(x− x0)...(x− xn−1), (2.4)

for appropriate constants a0, a1, ..., an. To determine the first of these constants,a0, note that if Pn(x) is written in the form of Eq. 2.4, then evaluating Pn(x) at x0leaves only the constant term a0; that is,

a0 = Pn(x0) = f (x0).

Similarly, when P(x) is evaluated at x1, the only nonzero terms in the evaluationof Pn(x1) are the constant and linear terms,

f (x0) + a1(x1 − x0) = Pn(x1) = f (x1);

so

a1 =f (x1)− f (x0)

x1 − x0. (2.5)

We now introduce the divided-difference notation, which is related to Aitken’s∆2 notation used in Section 2.51. The zeroth divided difference of the function fwith respect to xi, denoted f [xi], is simply the value off at xi:

f [xi] = f (xi). (2.6)

The remaining divided differences are defined recursively; the first divided dif-ference of f with respect to xi and xi+1is denoted f [xi, xi+1] and defined as

f [xi, xi+1] =f [xi+1]− f [xi]

xi+1 − xi. (2.7)

The second divided difference, f [xi, xi+1, xi+2], is defined as

f [xi, xi+1, xi+2] =f [xi+1, xi + 2]− f [xi, xi+1]

xi+2 − xi.

1see p.86-88[B-F]


Similarly, after the (k− 1)st divided differences,

f [xi, xi+1, xi+2, ..., xi+k−1] and f [xi+1, xi+2, ..., xi+k−1, xi+k],

have been determined, the kth divided difference relative to xi, xi+1, xi+2, ..., xi+kis

f [xi, xi+1, ..., xi+k−1, xi+k] =f [xi+1, xi+2, ..., xi+k]− f [xi, xi+1, ..., xi+k−1]

xi+k − xi. (2.8)

The process ends with the singlenth divided difference,

f [x0, x1, ..., xn] =f [x1, x2, ..., xn]− f [x0, x1, ..., xn−1]

xn − x0.

Because of Eq. (2.5) we can write a1 = f [x0, x1], just as a0 can be expressed asa0 = f (x0) = f [x0]. Hence the interpolating polynomial in Eq. (2.4) is

Pn(x) = f [x0] + f [x0, x1](x− x0) + a2(x− x0)(x− x1)

+ . . . + an(x− x0)(x− x1)...(x− xn−1)

As might be expected from the evaluation of a0 and a1, the required constants are

ak = f [x0, x1, x2, ..., xk],

for each k = 0, 1, . . . , n. So Pn(x) can be rewritten in a form called Newton’sDivided Difference:

Pn(x) = f [x0] +n∑

k=1

f [x0, x1, ..., xk](x− x0)...(x− xk−1). (2.9)

The value of f [x0, x1, ..., xk] is independent of the order of the numbers x0, x1, ..., xk,as shown in Exercise 21.2

The generation of the divided differences is outlined in Table 2.3 Two fourthand one fifth difference can also be determined from these data.

Newton’s Divided-Difference Formula Algorithm

To obtain the divided-difference coefficients of the interpolatory polynomial P onthe (n + 1) distinct numbers x0, x1, ..., xn for the function f :

• INPUT numbers x0, x1, ..., xn ; values f (x0), f (x1), ..., f (xn) as F0,0, F1,0, ..., Fn,0.

• OUTPUT the numbers F0,0, F1,1, ..., Fn,n where

Pn = F0,0 +n∑

i=1

Fi,j

i−1∏j=0

(x− xj). (Fi,j is f [x0, x1, ..., xi].)

2see p. [B-F]


x f (x) f [xk, xk+1] f [xk, xk+1, xk+2] f [xk, xk+1, xk+2, xk+3]x0 f [x0]

f [x1]− f [x0]x1−x0

x1 f [x1]f [x1,x2]− f [x0,x1]

x2−x0f [x2]− f [x1]

x2−x1

f [x1,x2,x3]− f [x0,x1,x2]x3−x0

x2 f [x2]f [x2,x3]− f [x1,x2]

x3−x1f [x3]− f [x2]

x3−x2

f [x2,x3,x4]− f [x1,x2,x3]x4−x1

x3 f [x3]f [x3,x4]− f [x2,x3]

x4−x2f [x4]− f [x3]

x4−x3

f [x3,x4,x5]− f [x2,x3,x4]x5−x2

x4 f [x4]f [x4,x5]− f [x3,x4]

x5−x3f [x5]− f [x4]

x5−x4x5 f [x5]

Table 2.1:

Step 1 For i = 1, 2, ..., nFor j = 1, 2, ..., i set

Fi,j =Fi,j−1 − Fi−1,j−1

xi − xi−j. (Fi,j = f [xi−j, ..., xi]).

Step 2 • OUTPUT (F0,0, F1,1, ..., Fn,n);

• STOP.

The form of the output in Algorithm 3.2 can be modified to produce all the di-vided differences, as shown in Example 2.5.

Example 2.5. Complete the divided difference table for the data used Table 2.2. ,and construct the interpolating polynomial that uses all this data.

x f (x)1.0 0.76519771.3 0.62008601.6 0.45540221.9 0.28181862.2 0.1103623

Table 2.2:

Solution. The first divided difference involving x0 and x1 is

f [x0, x1] =f [x1]− f [x0]

x1 − x0=

0.6200860− 0.76519771.3− 1.0

= −0.4837057.


The remaining first divided differences are found in a similar manner and areshown in the fourth column in Table 2.3. The second divided difference involving

i xi f [xi] f [xi−1, xi] f [xi−2, xi−1, xi] f [xi−3, ..., xi] f [xi−4, ..., xi]0 1.0 0.7651977

−0.48370571 1.3 0.6200860 −0.1087339

−0.5489460 0.06587842 1.6 0.4554022 −0.0494433 0.0018251

−0.5786120 0.06806853 1.9 0.2818186 0.0118183

−0.57152104 2.2 0.1103623

Table 2.3:

x0, x1, and x2 is

f [x0, x1, x2] =f [x1, x2]− f [x0, x1]

x2 − x0=−0.5489460− (−0.4837057)

1.6− 1.0= −0.1087339.

The remaining second divided differences are shown in the 5th column of Ta-ble 2.3. The third divided difference involving x0, x1, x2, and x3 and the fourthdivided difference involving all the data points are, respectively,

f [x0, x1, x2, x3] =f [x1, x2, x3]− f [x0, x1, x2]

x3 − x0=−0.0494433− (−0.1087339)

1.9− 1.0= 0.0658784,

and

f [x0, x1, x2, x3, x4] =f [x1, x2, x3, x4]− f [x0, x1, x2, x3]

x4 − x0

=0.0680685− 0.0658784

2.2− 1.0= 0.0018251.

All the entries are given in Table 2.3. The coefficients of the Newton forwarddivided-difference form of the interpolating polynomial are along the diagonalin the table. This polynomial is

P4(x) = 0.7651977− 0.4837057(x− 1.0)− 0.1087339(x− 1.0)(x− 1.3)+ 0.0658784(x− 1.0)(x− 1.3)(x− 1.6)

+ 0.0018251(x− 1.0)(x− 1.3)(x− 1.6)(x− 1.9).

Notice that the value P4(1.5) = 0.5118200 agrees with the result in Table 3.6 forExample 2 of Section 3.2, as it must because the polynomials are the same.


Theorem 2.4. Suppose that f ∈ Cn[a, b] and x0, x1, ..., xn are distinct numbers in [a, b].Then a number ξ exists in (a, b) with

f [x0, x1, ..., xn] =f (n)(ξ)

n!.

Newton’s divided-difference formula can be expressed in a simplified formwhen the nodes are arranged consecutively with equal spacing. In this case, weintroduce the notation h = xi+1− xi, for each i = 0, 1, ..., n− 1 and let x = x0 + sh.Then the difference x− xi is x− xi = (s− i)h. So Eq. (3.10) becomes

Pn(x) = Pn(x0 + sh)

= f [x0] + sh f [x0, x1] + s(s− 1)h2 f [x0, x1, x2]

+ ... + s(s− 1)...(s− n + 1)hn f [x0, x1, ..., xn]

= f [x0] +n∑

k=1

s(s− 1)...(s− k + 1)hk f [x0, x1, ..., xk].

Using binomial-coefficient notation,(

sk

)= s(s−1)...(s−k+1)

k! , we can express Pn(x)

compactly as

Pn(x) = Pn(x0 + sh) = f [x0] +n∑

k=1

(sk

)k!hk f [x0, xi, ..., xk]. (2.10)

2.3.1 Forward Differences

The Newton forward-difference formula, is constructed by making use of the for-ward difference notation (introduced in Aitken’s ∆2 method. With this notation,

f [x0, x1] =f (x1)− f (x0)

x1 − x0=

1h( f (x1)− f (x0)) =

1h

∆ f (x0)

f [x0, x1, x2] =1

2h

[∆ f (x1)− ∆ f (x0)

h

]=

12h2 ∆2 f (x0),

and, in general,

f [x0, x1, ..., xk] =1

k!hk ∆k f (x0).

Since f [x0] = f (x0), Eq. (3.11) has the following form.

Newton Forward-Difference Formula

Pn(x) = f (x0) +n∑

k=1

(sk

)∆k f (x0) (2.11)


2.3.2 Backward Differences

If the interpolating nodes are reordered from last to first as xn, xn−1, ..., x0, we canwrite the interpolatory formula as

Pn(x) = f [xn] + f [xn, xn−1](x− xn) + f [xn, xn−1, xn−2](x− xn)(x− xn−1)

+ ... + f [xn, ..., x0](x− xn)(x− xn−1)...(x− x1)

If, in addition, the nodes are equally spaced with x = xn + sh and x = xi +(s + n− i)h, then

Pn(x) = Pn(xn + sh) = f [xn] + sh f [xn, xn−1] + s(s + 1)h2 f [xn, xn−1, xn−2]

+ ... + s(s + 1)...(s + n− 1)hn f [xn, ..., x0].

This is used to derive a commonly applied formula known as the Newtonbackward difference formula. To discuss this formula, we need the followingdefinition.

Definition 2.1. Given the sequence {pn}∞n=1, define the backward difference∇pn

(read nabla pn) by∇pn = pn − pn−1, for n ≥ 1.

Higher powers are defined recursively by

∇k pn = ∇(∇k−1pn), for k ≥ 2.

Definition 2.1 implies that

f [xn, xn−1] =1h∇ f (xn), f [xn, xn−1, xn−2] =

12h2∇

2 f (xn),

and, in general,

f [xn, xn−1, ..., xn−k] =1

k!hk∇k f (xn).

Consequently,

Pn(x) = f [xn]+ s∇ f (xn)+s(s + 1)

2∇2 f (xn)+ ...+

s(s + 1)...(s + n− 1)n!

∇n f (xn).

If we extend the binomial coefficient notation to include all real values of s byletting(

−sk

)=−s(−s− 1)...(−s− k + 1)

k!= (−1)k s(s + 1)...(s + k− 1)

k!,

then

Pn(x) = f [xn] + (−1)1(−s1

)∇ f (xn) + (−1)2

(−s2

)∇2 f (xn)

+ . . . + (−1)n(−sn

)∇n f (xn).

This gives the following result.


xk f (xk) f [xk, xk+1] f [xk, xk+1, xk+2] f [xk, xk+1, xk+2] f [xk, . . . , xk+3]1.0 0.7651977

−0.48370571.3 0.6200860 −0.1087339

−0.5489460 0.06587841.6 0.4554022 −0.0494433 0.0018251

−0.5786120 0.06806851.9 0.2818186 0.0118183

−0.57152102.2 0.1103623

Table 2.4:

Newton Backward–Difference Formula

Pn(x) = f [xn] +n∑

k=1

(−1)k(−sk

)∇k f (xk) (2.12)

Illustration.

The divided-difference Table 2.4 corresponds to the data in Example 1. Only oneinterpolating polynomial of degree at most 4 uses these five data points, but wewill organize the data points to obtain the best interpolation approximations ofdegrees 1, 2, and 3. This will give us a sense of accuracy of the fourth-degreeapproximation for the given value of x.

If an approximation to f (1.1) is required, the reasonable choice for the nodeswould be x0 = 1.0, x1 = 1.3, x2 = 1.6, x3 = 1.9, and x4 = 2.2 since this choicemakes the earliest possible use of the data points closest to x = 1.1, and alsomakes use of the fourth divided difference. This implies that h = 0.3 and s = 1

3 ,so the Newton forward divided difference formula is used with the divided dif-ferences that have asolid underline (– )in Table 2.4:

P4(1.1) = P4(1.0 +13(0.3)) = 0.7651977 +

13(0.3)(−0.4837057)+

13

(−2

3

)(0.3)2(−0.1087339) +

13

(−2

3

)(−5

3

)(0.3)3(0.0658784)

+13

(−2

3

)(−5

3

)(−8

3

)(0.3)4(0.0018251) = 0.7196460.

To approximate a value when x is close to the end of the tabulated values, say,x = 2.0, we would again like to make the earliest use of the data points closesttox. This requires using the Newton backward divided-difference formula with


s = −23 and the divided differences in Table 2.4 that have a wavy underline (—).

Notice that the fourth divided difference is used in both formulas.P4(2.0) = P4

(2.2− 2

3(0.3))=

0.1103623− 23(0.3)(−0.5715210)− 2

3

(13

)(0.3)2(0.0118183)

−23

(13

)(43

)(0.3)3(0.0680685)− 2

3

(13

)(43

) (73

)(0.3)4(0.0018251) = 0.2238754.

Centered Differences

The Newton forward- and backward-difference formulas are not appropri-ate for approximating f (x) when x lies near the center of the table because nei-ther will permit the highest-order difference to have x0 close to x. A number ofdivided-difference formulas are available for this case, each of which has situa-tions when it can be used to maximum advantage. These methods are knownascentered-difference formulas. We will consider only one centereddifferenceformula, Stirling’s method. For the centered-difference formulas, we choose x0near the point being approximated and label the nodes directly below x0 as x1, x2,...and those directly above as x − 1, x − 2,.... With this convention,Stirling’s for-mulais given by

Pn(x) = P2m+1(x)

= f [x0] +sh2( f [x−1, x0] + f [x0, x1]) + s2h2 f [x−1, x0, x1](3.14)

+s(s2 − 1)h3

2f [x− 2, x− 1, x0, x1] + f [x− 1, x0, x1, x2]) + ...

+ s2(s2 − 1)(s2 − 4)...(s2 − (m− 1)2)h2m f [x−m, ..., xm]+

s(s2 − 1)...(s2 −m2)h2m+1

2( f [x−m−1, ..., xm] + f [x−m, ..., xm+1])

if n = 2m + 1 is odd. If n = 2m is even, we use the same formula but delete thelast line.The entries used for this formula are underlined in Table 2.5 .

Example 2.6. Consider the table of data given in the previous examples. UseStirling’s formula to approximate f (1.5) with x0 = 1.6.

Solution. To apply Stirling’s formula we use the underlined entries in the differ-ence Table 2.6.The formula, with h = 0.3, x0 = 1.6, and s = −1

3 , becomes

f (1.5) ≈ P4

(1.6 + (−1

3 )(0.3))= 0.4554022+(−1

3)(0.32 ((−0.5489460)+ (−0.5786120))+


First divided Second divided Third divided Fourth dividedx f (x) differences differences differences differencesx−2 f [x−2]

f [x−2, x−1]x−1 f [x−1] f [x−2, x−1, x0]

f [x−1, x0] f [x−2, x−1, x0, x1]x0 f [x0] f [x−1, x0, x1] f [x−2, x−1, x0, x1, x2]

f [x0, x1] f [x−1, x0, x1, x2]x1 f [x1] f [x0, x1, x2]

f [x1, x2]x2 f [x2]

Table 2.5:

First divided Second divided Third divided Fourth dividedx f (x) differences differences differences differences1.0 0.7651977

−0.48370571.3 0.6200860 −0.1087339

−0.5489460 0.06587841.6 0.4554022 −0.0494433 0.0018251

−0.5786120 0.06806851.9 0.2818186 0.0118183

−0.57152102.2 0.1103623

Table 2.6:

(−13)

2(0.3)2(−0.0494433) + 12(−

13)((−1

3)2 − 1

)(0.3)3(0.0658784 + 0.0680685) +

(−13)

2((−1

3)2 − 1

)2(0.3)4(0.0018251) = 0.5118200.

Most texts on numerical analysis written before the wide-spread use of com-puters have extensive treatments of divided-difference methods. If a more com-prehensive treatment of this subject is needed, the book by Hildebrand [Hild] isa particularly good reference.

2.4 Approximation theory

2.4.1 Introduction

Approximation theory involves two types of problems. One arises when a func-tion is given explicitly, but we wish to find a “simpler” type of function, such asa polynomial, for representation. The other problem concerns fitting functions to

2.4. APPROXIMATION THEORY 43

xi yi xi yi1 1.3 6 8.82 3.5 7 10.13 4.2 8 12.54 5.0 9 13.05 7.0 10 15.6

Table 2.7:

given data and finding the “best” function in a certain class that can be used torepresent the data. We will begin the section with this problem.

2.4.2 Discrete Least Squares Approximation

Consider the problem of estimating the values of a function at non-tabulatedpoints, given the experimental data in Table 2.7.

The plot obtained (with the data points added) is shown in Figure 2.5.

Figure 2.5:

Interpolation requires a function that assumes the value of yi at xi for eachi = 1, 2, ..., 10.

Figure 2.5 on the following page shows a graph of the values in Table 2.7 .From this graph, it appears that the actual relationship between x and y is lin-ear. However, it is likely that no line precisely fits the data, because of errors inthe data. In this case, it is unreasonable to require that the approximating func-tion agree exactly with the given data. In fact, such a function would introduceoscillations that should not be present.


This polynomial is clearly a poor predictor of information between a numberof the data points. A better approach for a problem of this type would be to findthe “best” (in some sense) approximating line, even if it did not agree preciselywith the data at the points. Let a1xi + a0 denote the ith value on the approxi-mating line and yi the ith given y-value.Then, assuming that there are no errorsin the x-values in the data, |yi − (a1xi + a0)| gives a measure for the error at theith point when we use this approximating line.The problem of finding the equa-tion of the best linear approximation in the absolute sense requires that values ofa0 and a1 be found to minimize

E∞(a0, a1) = max1≤i≤10

{|yi − (a1x + a0)|}

This quantity is called the absolute deviation. To minimize a function of twovariables, we need to set its partial derivatives to zero and simultaneously solvethe resulting equations. In the case of the absolute deviation, we would need tofind a0 and a1 with

0 = ∂∂a0

10∑i=1|y0 − (a1xi + a0)| and 0 = ∂

∂a1

10∑i=0|yi − (a1x + a0)| . The problem is that

the absolute-value function is not differentiable at zero, and we might not be ableto find solutions to this pair of equations.

2.4.3 Linear Least Squares

The least squares approach to this problem involves determining the best approx-imating line when the error involved is the sum of the squares of the differencesbetween they-values on the approximating line and the giveny-values. Hence,constants a0 and a1 must be found that minimize the least squares error:

E(a0, a1) =10∑

i=1

[yi − (a1xi + a0)]2

The least squares method is the most convenient procedure for determiningbest linear approximations, but there are also important theoretical considera-tions that favor it. The minimax approach generally assigns too much weight toa bit of data that is badly in error, whereas the absolute deviation method doesnot give sufficient weight to a point that is considerably out of line with the ap-proximation. The least squares approach puts substantially more weight on apoint that is out of line with the rest of the data, but will not permit that pointto completely dominate the approximation. An additional reason for consider-ing the least squares approach involves the study of the statistical distribution oferror. The general problem of fitting the best least squares line to a collection of


data involves minimizing the total error,

E ≡ E2(a0, a1) =m∑

i=1

[yi − (a1xi + a0)]2

,With respect to the parameters a0 and a1 . For a minimum to occur, we need

both ∂E∂a0

= 0 and ∂E∂a1

= 0 that is 0 = ∂E∂a0

= 2m∑

i=1(yi − a1xi − a0)(−1) and

0 = ∂E∂a1

= 2m∑

i=1(yi − a1xi − a0)(−xi)

These equations simplify to the normal equations:

a0m + a1

m∑i=1

xi =m∑

i=1yi

a0m∑

i=1xi + a1

m∑i=1

x2i =

m∑i=1

xiyi

The solution to this system of equations is

a0 =

m∑i=1

x2i

m∑i=1

yi −m∑

i=1xiyi

m∑i=1

xi

m( m∑

i=1x2

i

)−( m∑

i=1xi

)2 (2.13)

and

a0 =

mm∑

i=1xiyi −

m∑i=1

xim∑

i=1yi

m( m∑

i=1x2

i

)−( m∑

i=1xi

)2 (2.14)

Example 2.7. Find least squares line approximating the data in Table 2.7

Solution. We first extend the table to include x2i and xiyi and sum the columns.

This is shown in Table 2.7 The normal equation (2.13) and (2.14) imply that.

a0 =385(81)− 55(572.4)

10(385)− (55)2 = −0.360

and

a1 =10(572.4)− 55(81)

10(385)− (55)2 = 1.538

So P(x) = 1.538x − 0.360. The graph of this line and the data points are shownin Figure 2.6. The approximate values given by the least squares technique at thedata points are in Table 2.7.


xi yi x2i xiyi P(xi) = 1.538xi − 0.360

1 1.3 1 1.3 1.182 3.5 4 7.0 2.723 4.2 9 12.6 4.254 5.0 16 20.0 5.795 7.0 25 35.0 7.336 8.8 36 52.8 8.877 10.1 49 70.7 10.418 12.5 64 100.0 11.949 13.0 81 117.0 13.4810 15.6 100 156.0 15.02

55 81.0 385 572.4 E =10∑

i=1(yi = Pxi) ≈ 2.34

Table 2.8:

Figure 2.6:


2.4.4 Polynomial Least Squares

The general problem of approximating a set of data, {(xi, yi) |i = 1, ..., m}with analgebraic polynomial Pn(x) = anxn + an−1xn−1 + .... + a0 , of degree n < m− 1, using the least squares procedure is handle similaly. We choose the constantsa0, a1, ..., an to minimize the least squares error E = E(a0, a1, .., an) where

E =m∑

i=1

(yi − Pn (xi))2

=m∑

i=1

y2i − 2

m∑i=1

Pn(xi)yi +m∑

i=1

(Pn(xi))2

=m∑

i=1

y2i − 2

m∑i=1

n∑j=0

ajxji

yi +m∑

i=1

n∑j=0

ajxji

2

=m∑

i=1

y2i − 2

n∑j=0

aj

( m∑i=1

yixji

)+

n∑j=0

n∑k=0

ajak

( m∑i=1

xj+ki

)

As in the linear case. For E to be minimized it is necessary that ∂E∂aj

= 0 , for eachj = 0, 1, ..., n . Thus, for each j , we must have

0 =∂E∂aj

= −2m∑

i=1

yixji + 2

n∑k=0

ak

m∑i=0

xj+ki .

This given n + 1 normal equations as follows:

n∑k=0

ak

m∑i=1

xj+ki =

m∑i=1

yixji (2.15)

for each j = 0, 1, ..., n.It is helpful to write the equations as follows:

a0n∑

i=1x0

i + a1m∑

i=1x1

i + a2m∑

i=1x2

i + · · ·+ ann∑

i=1xn

i =m∑

i=1yix0

i

a0n∑

i=1x1

i + a1m∑

i=1x2

i + a2m∑

i=1x3

i + · · ·+ ann∑

i=1xn+1

i =m∑

i=1yix1

i

...

a0n∑

i=1xn

i + a1m∑

i=1xn+1

i + a2m∑

i=1xn+2

i + · · ·+ ann∑

i=1xn2

i =m∑

i=1yixn

i

These normal equations have a unique solution provided that the xi are distinct.3

3Exercise


i xi yi1 0 1.00002 0.25 1.28403 0.50 1.64874 0.75 2.11705 1.00 2.7183

Table 2.9:

i 1 2 3 4 5xi 0 0.25 0.50 0.75 1.00yi 1.0000 1.2840 1.6487 2.1170 2.7183P(xi) 1.0051 1.2740 1.6482 2.1279 2.7129yi − P(xi) −0.0051 0.0100 0.0004 −0.0109 0.0054

Table 2.10:

Example 2.8. Fit the data in Table 2.9 with the discrete least squares polynomialof degree at most 2.

Solution. For this problem n = 2, m = 5 , and the three normal equations are5a0 + 25a1 + 1.875a2 = 8.76802.5a0 + 1.875a1 + 1.5625a2 = 5.45141.875a0 + 1.5625a1 + 1.3828a2 = 4.4015

.

This give a0 = 1.005075519, a1 = 0.8646758482, a2 = 0.8431641518

Thus the least squares polynomial of degree fitting the data in Table 2.9 isP2(x) = 1.0051 + 0.86468x + 0.84316x2 whose graph is shown in Figure . At thegiven values of xi we have the approximations shown in Table 2.10 .

The total error, E =5∑

i=1(yi − P(xi))

2 = 2.74 × 10−4 , is the least that can be

obtained by using a polynomial of degree at most 2.

At times it is appropriate to assume that the data are exponentially related.This requires the approximating function to be of the form

y = beax (2.16)

ory = bxa (2.17)

for some constant a and b. The difficulty with applying the least squares proce-dure in a situation of this type comes from attempting to minimize


E =m∑

i=1(yi − beaxi)2 , in the case of Eq.(2.16) or E =

m∑i=1

(yi − bxai )

2 , in the case of

Eq.(2.17) .The normal equations associated with these procedures are obtained from ei-

ther 0 = ∂E∂b = 2

m∑i=1

(yi − beaxi)(−eaxi) and 0 = ∂E∂a = 2

m∑i=1

(yi − beaxi)(−bxieaxi)

, in the case of Eq. (2.16); or 0 = ∂E∂b = 2

m∑i=1

(yi − bxai )(−xa

i ) and 0 = ∂E∂a =

2m∑

i=1(yi − bxa

i )(−b(lnxi)xai ) , in the case of Eq.(2.17) . No exact solution to either

of these system in a and b can general be found. The method that is commonlyused when the data are suspected to be exponentially related is to consider thelogarithm of the approximating equation: ln y = ln b + ax , in the case of Eq.(2.16), and ln y = ln b + a ln x , in the case of Eq.(2.17).

In either case, a linear problem now appears, and solutions for ln b and a canbe obtained by appropriately modifying the normal equations (2.13) and (2.14).However, the approximation obtained in this manner is not the least squares ap-proximation for the original problem, and this approximation can in some casesdiffer significantly from the least squares approximation to the original problem.4

Illustration Consider the collection of data in the first three columns of Table 2.11.

i xi yi ln yi x2i xi ln yi

1 1.00 5.10 1.629 1.0000 1.6292 1.25 5.79 1.756 1.5625 2.1953 1.50 6.53 1.876 2.2500 2.8144 1.75 7.45 2.008 3.0625 3.5145 2.00 8.46 2.135 4.0000 4.270

7.50 9.404 11.875 14.422

Table 2.11:

If xi is graphed with ln yi , the data appear to have a linear relation, so it isreasonable to assume an approximation of the form y = beax , which implies thatln y = ln b + ax . Extending the table and summing the appropriable columnsgives the remaining data in table 2.11. Using the normal equations (2.13) and(2.14), a = (5)(14.442)−(7.5)(9.404)

(5)(11.875)−(7.5)2 = 0.5056

And

4The application in Exercise 13 describes such a problem. This application will be reconsideredas Exercise 11 in Section 10.3, where the exact solution to the exponential least squares problem isapproximated by using methods suitable for solving nonlinear systems of equations.[B-D p.652-653]


a = (11.875)(9.404)−(14.442)(7.5)(5)(11.875)−(7.5)2 = 1.122.

With ln b = 1.122 we have b = e1.122 = 3.071, and the approximation assumes theform y = 3.07e0.5056x . At the data points this gives the values in Table 2.12.(SeeFigure 2.7)

i xi yi 3.071e0.5056xi |yi − 3.071e0.5056xi |1 1.00 5.10 5.09 0.012 1.25 5.79 5.78 0.013 1.50 6.53 6.56 0.034 1.75 7.45 7.44 0.015 2.00 8.46 8.44 0.02

Table 2.12:

Figure 2.7:

Chapter 3

Initial-Value Problems for Ordinary Differ-ential Equations

The motion of a swinging pendulum under certain simplifying assumptionsis described by the second-order differential equation

d2θ

dt2 +gL

sin θ = 0,

where L is the length of the pendulum, g ≈ 32.17 f t/s2 is the gravitational con-stant of the earth, and θ is the angle the pendulum makes with the vertical. If,in addition, we specify the position of the pendulum when the motion begins,θ(t0) = θ0, and its velocity at that point, θ′(t0) = θ′0, we have what is called aninitial-value problem.

3.1 The Elementary Theory of Initial-Value Problems

Differential equations are used to model problems in science and engineeringthat involve the change of some variable with respect to another. Most of theseproblems require the solution of an initial-value problem, that is, the solution toa differential equation that satisfies a given initial condition.

In common real-life situations, the differential equation that models the prob-lem is too complicated to solve exactly, and one of two approaches is taken toapproximate the solution. The first approach is to modify the problem by sim-plifying the differential equation to one that can be solved exactly and then usethe solution of the simplified equation to approximate the solution to the orig-inal problem. The other approach, which we will examine in this chapter, usesmethods for approximating the solution of the original problem. This is the ap-proach that is most commonly taken because the approximation methods givemore accurate results and realistic error information.

The methods that we consider in this chapter do not produce a continuous ap-proximation to the solution of the initial-value problem. Rather, approximationsare found at certain specified, and often equally spaced, points. Some method of

51

52CHAPTER 3. INITIAL-VALUE PROBLEMS FOR ORDINARY DIFFERENTIAL EQUATIONS

interpolation, commonly Hermite, is used if intermediate values are needed.

We need some definitions and results from the theory of ordinary differentialequations before considering methods for approximating the solutions to initial-value problems.

Definition 3.1. A function f (t, y) is said to satisfy a Lipschitz condition in thevariable y on a set D ⊂ R2 if a constant L > 0 exists with

| f (t, y1)− f (t, y2)| ≤ L|y1− y2|,

whenever (t, y1) and (t, y2) are in D. The constant L is called a Lipschitz constantfor f .

Example 3.1. Show that f (t, y) = t|y| satisfies a Lipschitz condition on the inter-val D = {(t, y)|1 ≤ t ≤ 2 and − 3 ≤ y ≤ 4}.

Solution. For each pair of points (t, y1) and (t, y2) in D we have

| f (t, y1)− f (t, y2)| = |t|y1| − t|y1|| = |t|.||y1| − |y2|| ≤ 2|y1 − y2|.

Thus f satisfies a Lipschitz condition on D in the variable y with Lipschitz con-stant 2. The smallest value possible for the Lipschitz constant for this problem isL = 2, because, for example,

| f (2, 1)− f (2, 0)| = |2− 0| = 2|1− 0|.

Definition 3.2. A set D ⊂ R2 is said to be convex if whenever (t1, y1) and (t2, y2)belong to D, then ((1− λ)t1 + λt2, (1− λ)y1 + λy2) also belongs to D for everyλ in [0, 1].

In geometric terms, Definition 5.2 states that a set is convex provided thatwhenever two points belong to the set, the entire straight-line segment betweenthe points also belongs to the set. (See Figure 5.1.) The sets we consider in thischapter are generally of the form D = {(t, y)|a ≤ t ≤ b,−∞ < y < ∞} for someconstants a and b. It is easy to verify that these sets are convex.

Theorem 3.1. Suppose f (t, y) is defined on a convex set D ⊂ R2. If a constant L > 0exists with ∣∣∣∣∂ f

∂y(t, y)

∣∣∣∣ ≤ L, for all (t, y) ∈ D, (3.1)

then f satisfies a Lipschitz condition on D in the variable y with Lipschitz constant L.

Theorem 3.2. Suppose that D = {(t, y)|a ≤ t ≤ b,−∞ < y < ∞} and that f (t, y)is continuous on D. If f satisfies a Lipschitz condition on D in the variable y, then theinitial-value problem

y′(t) = f (t, y), a ≤ t ≤ b, y(a) = α,

has a unique solution y(t) for a ≤ t ≤ b.

3.2. EULER’S METHOD 53

Figure 3.1:

Example 3.2. Use Theorem 3.2 to show that there is a unique solution to theinitial-value problem

y′ = 1 + t sin(ty), 0 ≤ t ≤ 2, y(0) = 0.

Solution. Holding t constant and applying the Mean Value Theorem to the func-tion

f (t, y) = 1 + t sin(ty),

we find that when y1 < y2 a number ξ in (y1, y2) exists with

f (t, y2)− f (t, y1)

y2 − y1=

∂

∂yf (t, ξ) = t2 cos(ξt).

Thus| f (t, y1)− f (t, y2)| = |t2 cos(tξ)(y1 − y2)| ≤ 4|y1 − y2|,

and f satisfies a Lipschitz condition in the variable y with Lipschitz constantL = 4. Additionally, f (t, y) is continuous when 0 ≤ t ≤ 2 and −∞ < y < ∞, soTheorem 3.2 implies that a unique solution exists to this initial-value problem.

3.2 Euler’s Method

Euler’s method is the most elementary approximation technique for solving ini-tial - value problems. Although it is seldom used in practice, the simplicity of itsderivation can be used to illustrate the techniques involved in the constructionof some of the more advanced techniques, without the cumbersome algebra thataccompanies these constructions.

The object of Euler’s method is to obtain approximations to the initial-valueproblem

dydt

= f (t, y), a ≤ t ≤ b, y(a) = α. (3.2)

A continuous approximation to the solution y(t) will not be obtained; instead,approximations to y will be generated at various values, called mesh points, in


the interval [a, b]. Once the approximate solution is obtained at the points, the ap-proximate solution at other points in the interval can be found by interpolation.

We first make the stipulation that the mesh points are equally distributedthroughout the interval [a, b]. This condition is ensured by choosing a positiveinteger N and selecting the mesh points

ti = a + ih, for each i = 0, 1, 2, ..., N.

The common distance between the points h = (b− a)/N = ti+1 − ti is called thestep size.

We will use Taylor’s Theorem to derive Euler’s method1. Suppose that y(t),the unique solution to (3.2), has two continuous derivatives on [a, b], so that foreach i = 0, 1, 2, ..., N − 1,

y(ti+1) = y(ti) + (ti+1 − ti)y′(ti) +(ti+1 − ti)

2

2y′′(ξi),

for some number ξi in (ti, ti+1). Because h = ti+1 − ti, we have

y(ti+1) = y(ti) + hy′(ti) +h2

2y′′(ξi),

and, because y(t) satisfies the differential equation (3.2),

y(ti+1) = y(ti) + h f (ti, y(ti)) +h2

2y′′(ξi). (3.3)

Euler’s method constructs wi ≈ y(ti), for each i = 1, 2, ..., N, by deleting theremainder term. Thus Euler’s method is

w0 = α

wi+1 = wi + h f (ti, wi), for each i = 1, 2, ..., N. (3.4)

Equation (3.4) is called the difference equation associated with Euler’s method.As we will see later in this chapter, the theory and solution of difference equa-tions parallel, in many ways, the theory and solution of differential equations.Algorithm 5.1 implements Euler’s method.

Euler’s Algorithm

To approximate the solution of the initial-value problem

dydt

= f (t, y), a ≤ t ≤ b, y(a) = α.

at (N + 1) equally spaced numbers in the interval [a, b] :1The use of elementary difference methods to approximate the solution to differential equa-

tions was one of the numerous mathematical topics that was first presented to the mathematicalpublic by the most prolific of mathematicians, Leonard Euler (1707–1783).


• INPUT endpoints a, b; integer N; initial condition α.

• OUTPUT approximation w to y at the (N + 1) values of t.

Step 1 • Set h = (b− a)/N; t = a; w = α.

• OUTPUT (t, w).

Step 2 For i = 1, 2, ..., N do Steps 3, 4.

Step 3 • Set w = w + h f (t, w); (Compute wi).• t = a + ih. (Compute ti).

Step 4 OUTPUT (t, w).

Step 5 STOP.

To interpret Euler’s method geometrically, note that when wi is a close ap-proximation to y(ti), the assumption that the problem is well-posed implies that

f (ti, wi) ≈ y′(ti) = f (ti, y(ti)).

The graph of the function highlighting y(ti) is shown in Figure 3.2. One stepin Euler’s method appears in Figure 3.3, and a series of steps appears in Figure3.4.

Figure 3.2:

Example 3.3. Euler’s method was used in the first illustration with h = 0.5 toapproximate the solution to the initial-value problem

y′ = y− t2 + 1, 0 ≤ t ≤ 2, y(0) = 0.5.

Use Euler’s Algorithm with N = 10 to determine approximations, and comparethese with the exact values given by y(t) = (t + 1)2 − 0.5et.


Figure 3.3:

Figure 3.4:

Solution. With N = 10 we have h = 0.2, ti = 0.2i, w0 = 0.5, and

wi+1 = wi + h(wi − t2i + 1) = wi + 0.2[wi − 0.04i2 + 1] = 1.2wi − 0.008i2 + 0.2,

for i = 0, 1, ..., 9. So

w1 = 1.2(0.5)− 0.008(0)2 + 0.2 = 0.8; w2 = 1.2(0.8)− 0.008(1)2 + 0.2 = 1.152;

and so on. Table 3.3 shows the comparison between the approximate values at tiand the actual values.

Note that the error grows slightly as the value of t increases. This controllederror growth is a consequence of the stability of Euler’s method, which impliesthat the error is expected to grow in no worse than a linear manner.


ti wi yi = y(ti) |yi − wi|0.0 0.5000000 0.5000000 0.00000000.2 0.8000000 0.8292986 0.02929860.4 1.1520000 1.2140877 0.06208770.6 1.5504000 1.6489406 0.09854060.8 1.9884800 2.1272295 0.13874951.0 2.4581760 2.6408591 0.18268311.2 2.9498112 3.1799415 0.23013031.4 3.4517734 3.7324000 0.28062661.6 3.9501281 4.2834838 0.33335571.8 4.4281538 4.8151763 0.38702252.0 4.8657845 5.3054720 0.4396874

Table 3.1:

3.2.1 Error Bounds for Euler’s Method

Although Euler’s method is not accurate enough to warrant its use in practice,it is sufficiently elementary to analyze the error that is produced from its appli-cation. The error analysis for the more accurate methods that we consider insubsequent sections follows the same pattern but is more complicated.

To derive an error bound for Euler’s method, we need two computationallemmas.

Lemma 3.1. For all x ≥ −1 and any positive m, we have

0 ≤ (1 + x)m ≤ emx.

Lemma 3.2. If s and t are positive real numbers, {ai}ki=0 is a sequence satisfying a0 ≥

−t/s, andai+1 ≤ (1 + s)ai + t, for each i = 0, 1, 2, ..., k− 1,

thenai+1 ≤ e(i+1)s

(a0 +

ts

)− t

s.

Theorem 3.3. Suppose f is continuous and satisfies a Lipschitz condition with constantL on

D = {(t, y)| a ≤ t ≤ b,−∞ < y < ∞}and that a constant M exists with

|y′′(t)| ≤ M, for all t ∈ [a, b],

where y(t) denotes the unique solution to the initial-value problem

y′ = f (t, y), a ≤ t ≤ b, y(a) = α.


Let w0, w1, . . . , wN be the approximations generated by Euler’s method for some positiveinteger N. Then, for each i = 0, 1, 2, ..., N,

|y(ti)− wi| ≤hM2L

[eL(ti−a) − 1].

The weakness of Theorem 3.3 lies in the requirement that a bound be knownfor the second derivative of the solution. Although this condition often prohibitsus from obtaining a realistic error bound, it should be noted that if ∂ f

∂t and ∂ f∂y both

exist, the chain rule for partial differentiation implies that

y′′(t) =dy′

dt(t) =

d fdt

(t, y(t)) =∂ f∂t

(t, y(t)) +∂ f∂y

(t, y(t)). f (t, y(t)).

So it is at times possible to obtain an error bound for y′′(t) without explicitlyknowing y(t).

Example 3.4. The solution to the initial-value problem

y′ = y− t2 + 1, 0 ≤ t ≤ 2, y(0) = 0.5.

was approximated in Example 3.3 using Euler’s method with h = 0.2. Use the in-equality in Theorem 3.3 to find a bounds for the approximation errors and com-pare these to the actual errors.

Solution. Because f (t, y) = y− t2 + 1, we have ∂ f∂y = 1 for all y, so L = 1. For this

problem, the exact solution is y(t) = (t + 1)2 − 0.5et, so y′′(t) = 2− 0.5et and

|y′′(t)| ≤ 0.5e2 − 2, for all t ∈ [0, 2].

Using the inequality in the error bound for Euler’s method with h = 0.2, L = 1,and M = 0.5e2 − 2 gives

|yi − wi| ≤ 0.1(0.5e2 − 2)(eti − 1).

Hence

|y(0.2)− w1| ≤ 0.1(0.5e2 − 2)(e0.2 − 1) = 0.03752;

|y(0.4)− w2| ≤ 0.1(0.5e2 − 2)(e0.4 − 1) = 0.08334;

and so on. Table 5.2 lists the actual error found in Example 3.3, together with thiserror bound. Note that even though the true bound for the second derivativeof the solution was used, the error bound is considerably larger than the actualerror, especially for increasing values of t.

The principal importance of the error-bound formula given in Theorem 3.3 isthat the bound depends linearly on the step size h. Consequently, diminishingthe step size should give correspondingly greater accuracy to the approxima-tions.

3.3. HIGHER-ORDER TAYLOR METHODS 59

3.3 Higher-Order Taylor Methods

Since the object of a numerical techniques is to determine accurate approxima-tions with minimal effort, we need a means for comparing the efficiency of var-ious approximation methods. The first device we consider is called the localtruncation error of the method.

The local truncation error at a specified step measures the amount by whichthe exact solution to the differential equation fails to satisfy the difference equa-tion being used for the approximation at that step. This might seem like an un-likely way to compare the error of various methods. We really want to know howwell the approximations generated by the methods satisfy the differential equa-tion, not the other way around. However, we don’t know the exact solution sowe cannot generally determine this, and the local truncation will serve quite wellto determine not only the local error of a method but the actual approximationerror.

Consider the initial value problem

dydt

= f (t, y), a ≤ t ≤ b, y(a) = α.

3.4 Runge-Kutta Methods

The Taylor methods outlined in the previous section have the desirable propertyof highorder local truncation error, but the disadvantage of requiring the com-putation and evaluation of the derivatives of f (t, y). This is a complicated andtime-consuming procedure for most problems, so the Taylor methods are seldomused in practice.

Runge-Kutta methods have the high-order local truncation error of the Tay-lor methods but eliminate the need to compute and evaluate the derivatives off (t, y). Before presenting the ideas behind their derivation, we need to considerTaylor’s Theorem in two variables. The proof of this result can be found in anystandard book on advanced calculus.

Theorem 3.4. Suppose that f (t, y) and all its partial derivatives of order less than orequal to n + 1 are continuous on D = {(t, y)|a ≤ t ≤ b, c ≤ y ≤ d}, and let (t0, y0) ∈D. For every (t, y) ∈ D, there exists ξ between t and t0 and µ between y and y0 with

f (t, y) = Pn(t, y) + Rn(t, y),

where

Pn(t, y) = f (t0, y0) +

[(t− t0)

∂ f∂t

(t0, y0) + (y− y0)∂ f∂y

(t0, y0)

]


+

[(t− t0)

2 ∂2 f∂t2 (t0, y0) + (t− t0)(y− y0)

∂2 f∂t∂y

(t0, y0) + (y− y0)2 ∂2 f

∂y2 (t0, y0)

]+ . . . +

1n!

n∑j=0

Cjn(t− t0)

n−j(y− y0)j ∂n f∂tn−j∂yj (t0, y0). (3.5)

and

Rn(t, y) =1

(n + 1)!

n+1∑j=0

Cjn+1(t− t0)

n+1−j(y− y0)j ∂n+1 f∂tn+1−j∂yj (ξ, µ).

The function Pn(t, y) is called the nth Taylor polynomial in two variables forthe function f about (t0, y0), and Rn(t, y) is the remainder term associated withPn(t, y).

3.4.1 Runge-Kutta Methods of OrderTwo

The first step in deriving a Runge-Kutta method is to determine values for a1, α1,and β1 with the property that a1 f (t + α1, y + β1) approximates

T(2)(t, y) = f (t, y) +h2

f ′(t, y),

with error no greater than O(h2), which is same as the order of the local trunca-tion error for the Taylor method of order two. Since

f ′(t, y) =d fdt

f (t, y) =∂ f∂t

(t, y) +∂ f∂y

(t, y).y′(t) and y′(t) = f (t, y),

we haveT(2)(t, y) = f (t, y) +

h2

∂ f∂t

(t, y) +h2

∂ f∂y

(t, y). f (t, y). (3.6)

Expanding f (t+ α1, y+ β1) in its Taylor polynomial of degree one about (t, y)gives

a1 f (t + α1, y + β1) = a1 f (t, y) + a1α1∂ f∂t

(t, y)

+ a1β1∂ f∂y

(t, y) + a1R1(t + α1, y + β1), (3.7)

where

R1(t + α1, y + β1) =α2

12

∂2 f∂t2 (ξ, µ) + α1β2

∂2 f∂t∂y

(ξ, µ) +β2

12

∂2 f∂y2 (ξ, µ) (3.8)

for some ξ between t and t + α1 and µ between y and y + β1.

3.4. RUNGE-KUTTA METHODS 61

Matching the coefficients of f and its derivatives in Eqs. (3.6) and (3.7) givesthe three equations

f (t, y) : a1 = 1;∂ f∂t

(t, y) : a1α1 =h2

;∂ f∂y

(t, y) : a1β1 =h2

f (t, y).

The parameters a1, α1 and β1 are therefore

a1 = 1, α1 =h2

, and β1 =h2

f (t, y),

so

T(2)(t, y) = f(

t +h2

, y +h2

f (t, y))− R1

(t +

h2

, y +h2

f (t, y))

,

and from Eq. (3.8),

R1

(t +

h2

, y +h2

f (t, y))=

h2

8∂2 f∂t2 (ξ, µ) +

h2

4f (t, y)

∂2 f∂t∂y

(ξ, µ)

+h2

8( f (t, y))2 ∂2 f

∂y2 (ξ, µ).

If all the second-order partial derivatives of f are bounded, then

R1

(t +

h2

, y +h2

f (t, y))

is O(h2). As a consequence: The order of error for this new method is the sameas that of the Taylor method of order two.

The difference-equation method resulting from replacing T(2)(t, y) in Taylor’smethod of order two by f

(t + h

2 , y + h2 f (t, y)

)is a specific Runge-Kutta method

known as the Midpoint method.

Midpoint Method

w0 = α,

wi+1 = wi + h f(

ti +h2

, wi +h2

f (ti, wi)

), for i = 0, 1, ..., N − 1.

Only three parameters are present in a1 f (t + α1, y + β1) and all are needed inthe match of T(2). So a more complicated form is required to satisfy the conditionsfor any of the higher-order Taylor methods.

The most appropriate four-parameter form for approximating

T(3)(t, y) = f (t, y) +h2

f ′(t, y) +h2

6f ′′(t, y)


isa1 f (t, y) + a2 f (t + α2, y + δ2 f (t, y)) ; (3.9)

and even with this, there is insufficient flexibility to match the term

h2

6

[∂ f∂y

(t, y)]2

f (t, y),

resulting from the expansion of h2

6 f ′′(t, y). Consequently, the best that can beobtained from using (3.9) are methods with O(h2) local truncation error.

The fact that (3.9) has four parameters, however, gives a flexibility in theirchoice, so a number of O(h2) methods can be derived. One of the most importantis the Modified Euler method, which corresponds to choosing a1 = a2 = 1

2 andα2 = δ2 = h. It has the following difference-equation form.

Modified Euler Method

w0 = α,

wi+1 = wi +h2[ f (ti, wi) + f (ti+1, wi + h f (ti, wi))] , for i = 0, 1, ..., N − 1.

Example 3.5. Use the Midpoint method and the Modified Euler method withN = 10, h = 0.2, ti = 0.2i, and w0 = 0.5 to approximate the solution to our usualexample,

y′ = y− t2 + 1, 0 ≤ t ≤ 2, y(0) = 0.5.

Solution. The difference equations produced from the various formulas are

• Midpoint method: wi+1 = 1.22wi − 0.0088i2 − 0.008i + 0.218;

• Modified Euler method: wi+1 = 1.22wi − 0.0088i2 − 0.008i + 0.216,

for each i = 0, 1, ..., 9. The first two steps of these methods give

• Midpoint method: w1 = 1.22(0.5)− 0.0088(0)2 − 0.008(0) + 0.218 = 0.828;

• Modified Euler method: w1 = 1.22(0.5)− 0.0088(0)2− 0.008(0) + 0.216 =0.826,

and

• Midpoint method: w2 = 1.22(0.828)− 0.0088(0.2)2− 0.008(0.2) + 0.218 =1.21136;

• Modified Euler method: w2 = 1.22(0.826) − 0.0088(0.2)2 − 0.008(0.2) +0.216 = 1.20692,

Table ?? lists all the results of the calculations. For this problem, the Midpointmethod is superior to the Modified Euler method.


Midpoint Modified Eulerti y(ti) Method Error method Error0.0 0.5000000 0.5000000 0 0.5000000 00.2 0.8292986 0.8280000 0.0012986 0.8260000 0.00329860.4 1.2140877 1.2113600 0.0027277 1.2069200 0.00716770.6 1.6489406 1.6446592 0.0042814 1.6372424 0.01169820.8 2.1272295 2.1212842 0.0059453 2.1102357 0.01699381.0 2.6408591 2.6331668 0.0076923 2.6176876 0.02317151.2 3.1799415 3.1704634 0.0094781 3.1495789 0.03036271.4 3.7324000 3.7211654 0.0112346 3.6936862 0.03871381.6 4.2834838 4.2706218 0.0128620 4.2350972 0.04838661.8 4.8151763 4.8009586 0.0142177 4.7556185 0.05955772.0 5.3054720 5.2903695 0.0151025 5.2330546 0.0724173

Table 3.2:

Higher-Order Runge-Kutta Methods

The term T(3)(t, y) can be approximated with error O(h3) by an expression of theform

f (t + α1, y + δ1 f (t + α2, y + δ2 f (t, y))),

involving four parameters, the algebra involved in the determination of α1, δ1, α2,and δ2 is quite involved. The most common O(h3) is Heun’s method2, given by

w0 = α,

wi+1 = wi +h4

[f (ti, wi) + 3

(f(

ti +2h3

, wi +2h3

f(

ti +h3

, wi +h3

f (ti, wi)

)))],

for i = 0, 1, ..., N − 1.

Example 3.6. Applying Heun’s method with N = 10, h = 0.2, ti = 0.2i, andw0 = 0.5 to approximate the solution to our usual example,

y′ = y− t2 + 1, 0 ≤ t ≤ 2, y(0) = 0.5

gives the values in Table 3.6. Note the decreased error throughout the range overthe Midpoint and Modified Euler approximations.

Runge-Kutta methods of order three are not generally used. The most com-mon Runge-Kutta method in use is of order four in difference-equation form, isgiven by the following.

2Karl Heun (1859–1929) was a professor at the Technical University of Karlsruhe. He intro-duced this technique in a paper published in 1900. [Heu].


Heun’sti y(ti) Method Error0.0 0.5000000 0.5000000 00.2 0.8292986 0.8292444 0.00005420.4 1.2140877 1.2139750 0.00011270.6 1.6489406 1.6487659 0.00017470.8 2.1272295 2.1269905 0.00023901.0 2.6408591 2.6405555 0.00030351.2 3.1799415 3.1795763 0.00036531.4 3.7324000 3.7319803 0.00041971.6 4.2834838 4.2830230 0.00046081.8 4.8151763 4.8146966 0.00047972.0 5.3054720 5.3050072 0.0004648

Table 3.3:

Runge-Kutta Order Four

w0 = α,k1 = h f (ti, wi),

k2 = h f(

ti +h2

, wi +k1

2

),

k3 = h f(

ti +h2

, wi +k3

2

),

k4 = h f (ti+1, wi + k3) ,

wi+1 = wi +16(k1 + 2k2 + 2k3 + k4)

for i = 0, 1, ..., N − 1. This method has local truncation error O(h4), provided thesolution y(t) has five continuous derivatives. We introduce the notation k1, k2, k3and k4 into the method is to eliminate the need for successive nesting in the sec-ond variable of f (t, y).

Runge-Kutta (Order Four) Algorithm

To approximate the solution of the initial-value problem

dydt

= f (t, y), a ≤ t ≤ b, y(a) = α.

at (N + 1) equally spaced numbers in the interval [a, b] :

• INPUT endpoints a, b; integer N; initial condition α.


• OUTPUT approximation w to y at the (N + 1) values of t.

Step 1 • Set h = (b− a)/N; t = a; w = α.

• OUTPUT (t, w).

Step 2 For i = 1, 2, ..., N do Steps 3, 5.

Step 3 Set

• k1 = h f (t, w);

• k2 = h f(

t + h2 , w + k1

2

);

• k3 = h f(

t + h2 , w + k3

2

);

• k4 = h f (t + h, w + k3) .

Step 4 Set

• w = w + 16(k1 + 2k2 + 2k3 + k4); (Compute wi).

• t = a + ih. (Compute ti).

Step 5 OUTPUT (t, w).

Step 6 STOP.

Example 3.7. Use the Runge-Kutta method of order four with h = 0.2, N = 10,and ti = 0.2i to obtain approximations to the solution of the initial-value problem

y′ = y− t2 + 1, 0 ≤ t ≤ 2, y(0) = 0.5

Solution. The approximation to y(0.2) is obtained by

w0 = 0.5;k1 = 0.2 f (0, 0.5) = 0.2(1.5) = 0.3;k2 = 0.2 f (0.1, 0.65) = 0.328;k3 = 0.2 f (0.1, 0.664) = 0.3308;k4 = 0.2 f (0.2, 0.8308) = 0.35816;

w1 = 0.5 +16(0.3 + 2(0.328) + 2(0.3308) + 0.35816) = 0.8292933.

The remaining results and their errors are listed in Table 3.7.


Runge-KuttaExact Order Four Error

ti yi = y(ti) wi |yi − wi|0.0 0.5000000 0.5000000 00.2 0.8292986 0.8292933 0.00000530.4 1.2140877 1.2140762 0.00001140.6 1.6489406 1.6489220 0.00001860.8 2.1272295 2.1272027 0.00002691.0 2.6408591 2.6408227 0.00003641.2 3.1799415 3.1798942 0.00004741.4 3.7324000 3.7323401 0.00005991.6 4.2834838 4.2834095 0.00007431.8 4.8151763 4.8150857 0.00009062.0 5.3054720 5.3053630 0.0001089

Table 3.4:

NUMERICAL ANALYSIS -...

Documents

Transcript of NUMERICAL ANALYSIS -...