NonLinear Equations.pdf

8/12/2019 NonLinear Equations.pdf

1/28

Root Finding for Nonlinear EquationsS. Natesan

Department of MathematicsIndian Institute of Technology Guwahati,

Guwahati - 781 039, India .email: [email protected]

Contents

1 Introduction 1

2 The Bisection Method 4

3 Newtons Method 54 The Secant Method 7

5 M ullers Method 8

6 Fixed-Point Iteration Methods 10

7 Fixed-Point Iteration (Conte & De Boor) 14

8 Convergence Acceleration for Fixed-Point Iteration 18

9 Numerical Evaluation of Multiple Roots 219.1 Newtons Method and Multiple Roots . . . . . . . . . . . . . . . . . . . . . . . . . . 21

10 Roots of Polynomials 22

11 Systems of Nonlinear Equations 2311.1 Fixed-Point Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2411.2 Newtons Method for Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . . . 27

1 Introduction

Finding one or more roots of an equationf (x) = 0 (1.1)

is one of the more commonly occurring problems of applied mathematics. In most cases explicitsolutions are not available and we must be satised with being able to nd a root to any specieddegree of accuracy. The numerical methods for nding the roots are called iterative methods .

A second major problem is that of nding one or more roots of a polynomial equation

p(x) a0 + a1x + a2x2 + an xn = 0 , an = 0 . (1.2)The methods of the rst problem are often specialized to deal with (1.2).

The third class of problems is the solution of nonlinear systems of equations. These systemsare very diverse in form, and associated numerical analysis is both extensive and sophisticated.

1


2/28

x

x0 x11/a

(x0 , f (x0))

y = a 1x

y = ay

Figure 1: Iterative solution of a (1/x ) = 0.

We begin with iterative methods for solving (1.1) when f (x) is any continuously differentiablereal-valued function of a real variable x. The iterative methods for this quite general class of equations will require knowledge of one or more initial guesses x0 for the desired root of f (x).An initial guess x0 can usually be found by using the context in which the problem rst arose;

otherwise, a simple graph of y = f (x) will often suffice for estimating x0.Consider the following problem

f (x) a 1x

= 0 , a > 0. (1.3)

Let x = 1 /a be an approximate solution of the equation. At the point ( x0, f (x0)) draw the tangentline to the graph of y = f (x) see Figure 1. Let x1 be the point at which the tangent line intersectsthe x-axis. It should be an improved approximation of the root .

To obtain an equation for x1, match the slopes obtained from the tangent line and the derivativeof f (x) at x0

f (x0) =

f (x0)

0

x0 x1 .Substituting from (1.3) and manipulating, we obtain

x1 = x0(2 ax 0).The general iteration formula is then obtained by repeating the process, with x1 replacing x0, andrepeating this, we get

xn +1 = xn (2 ax n ), n 0 (1.4)A form more convenient for theoretical purposes is obtained by introducing the scaled residual

r n = 1

ax n (1.5)

2


3/28

Using it,xn +1 = xn (1 r n ), n 0 (1.6)

For the error,en =

1a xn =

rna

. (1.7)

We will analyze the convergence of this method, its speed, and its dependence on x0. First,

r n +1 = 1 ax n +1 = 1 ax n (1 + rn ) = 1 (1 r n )(1 + rn )i.e.,

r n +1 = r 2n (1.8)

Inductively,r n = r 2

n

0 , n 0 (1.9)From (1.7), the error en converges to zero as n if and only if, rn converges to zero. From(1.9), r n converges to zero if and only if |r 0| < 1, or equivalently,

1 < 1 ax 0 < 10 < x 0 0. If p = 1 , the sequence is said to converge linearly to . In that case, we require c < 1; the constant c is called the rate of linear convergence of xn to .

3


4/28

2 The Bisection Method

Assume that f (x) is continuous on an interval [ a, b] and that it also satises

f (a)f (b) < 0. (2.1)

Using the intermediate-value theorem, the function f (x) must have at least one root in [ a, b].Usually, [ a, b] is chosen to contain only one root , but the following algorithm for the bisectionmethod will always converge to some root in [a, b] because of (2.1).

Algorithm. Bisect ( f, a, b , root, )

1. Dene c := ( a + b)/ 2.

2. If b

c

, then accept root := c, and exit.

3. If sign(f (b)) sign(f (c)) 0, then a := c; otherwise b := c.4. Return to Step 1.

The interval [ a, b] is halved in size for every pass through the algorithm. Because of Step 3,[a, b] will always contain a root of f (x). Since a root is in [a, b], it must lie within either [ a, c] or[c, b]; and consequently

|c | bc = c a.This is justication for the test in Step 2. On completion of the algorithm, c will be an approxi-mation to the root with

|c

| .

Example 2.1 Find the largest real root of

f (x) x6 x 1 = 0 .(Here the real exact value = 1 .13472413840152.)

It is straightforward to show that 1 < < 2, and we will use this as our initial interval [ a, b].Here, we take = 5e 05. The answer is c15 = 1 .13474, which is an approximation to with| c15| 4e 05. ( c15 = 0.000016.)

To examine the speed of convergence, let cn denote the n th value of c in the algorithm. Then,

it is easy to see that = lim

n cn .

i.e.,

| cn | 12

n(ba), (2.2)

where (ba) denotes the length of the original interval input into Bisect. Using the variant (2.2)for dening linear convergence, we say that the bisection method converges linearly with a rate of 1/ 2. The actual error may not decrease by a factor of 1 / 2 at each step, but the average rate of decrease is 1 / 2, based on (2.2).

There are several deciencies in the algorithm Bisect . First, it does not take account of the

limits of machine precision.

4


5/28

A practical program would take account of the unit round on the machine, adjusting thegiven if necessary. It converges very slowly when compared with other methods.The major advantages of the bisection method are

It is guaranteed to converge A reasonable error bound is available.Methods that at every step give upper and lower bounds on the root are called enclosure

methods .

3 Newtons MethodAssume that an initial estimate x0 is known for the desired root of f (x) = 0. Newtons methodwill produce a sequence of iterates {xn : n 1}, which we hope will converge to . Since x0 isassumed close to , approximate the graph of y = f (x) in the vicinity of its root by constructingits tangent line at ( x0, f (x0)). Then, use the root of this tangent to approximate ; call this newapproximation as x1. Repeat this process, ad innitum, to obtain a sequence of iterates xn . Aswith the example (1.3), this leads to the iteration formula

xn +1 = xn f (xn )f (xn )

, n 0 (3.1)

The process is illustrated in Figure 2, for the iterates x1 and x2.

x

y

x0x1x2

y = f (x)

Figure 2: Newtons method.

Newtons method is the best known procedure for nding the roots of an equation. It has beengeneralized in many ways for the solution of other, more difficult nonlinear problems, for instance,systems of nonlinear equations.

5


6/28


7/28

4 The Secant Method

As with Newtons method the graph of y = f (x) is approximated by a straight line in the vicinity of the root . In this case, assume that x0 and x1 are two initial estimates of the root . Approximatethe graph of y = f (x) by the secant line determined by ( x0, f (x0)) and ( x1, f (x1)). Let its rootdenoted by x2; we hope it will be an improved approximation of . This is illustrated in Figure 3.

x

y

x0

x1

x2

y = f (x)

Figure 3: Secant method.

Using the slope formula with the secant line, we have

f (x1) f (x0)x1 x0

= f (x1) 0

x1 x2Solving

x2 = x1 f (x1) x1 x0

f (x1) f (x0)Using x1 and x2, repeat this process to obtain x3, etc. The general formula based on this is

xn +1 = xn f (xn ) xn xn 1

f (xn ) f (xn 1), n 1 (4.1)

This is the secant method . AS with Newtons method, it is not guaranteed to converge, but it doesconverge, the speed is usually greater than that of the bisection method.

Error analysis. Multiply both sides of (4.1) by -1 and then add to both sides, obtaining

xn +1 = xn + f (xn ) xn xn 1

f (xn ) f (xn 1).

The RHS can be manipulated algebraically to obtain the formula

xn +1 = ( xn 1)( xn )f [xn 1, xn , ]

f [xn 1, xn ] . (4.2)

7


8/28

The quantities f [xn 1, xn ] and f [xn 1, xn , ] are rst- and second-order Newton divided differencesdened by

f [xn 1, xn ] = f (xn ) f (xn 1)xn xn 1

, f [xn 1, xn , ] = f [xn , ]f [xn 1, xn ] xn 1

.

Using, the following formula

f [x0, x1] = f ( ), f [x0, x1, x2] = 12

f ( ),

where is between x0 and x1, and between the minimum and maximum of x0, x1 and x2, (4.2)becomes

xn +1 = ( xn 1)( xn )f ( n )2f n )

, (4.3)

where n between xn 1, and n between xn 1, xn and . Using this error formula, we can examinethe following convergence of the secant method.

Theorem 4.1 Assume that f (x), f (x), f (x) are continuous for all values of x in some interval containing , and assume f ( ) = 0 . Then, if the initial guesses x0 and x1 are chosen sufficiently close to , the iterates xn of (4.1) will converge to . The order of convergence will be p =(1 + 5)/ 2 = 1 .62.

Comparison between Newtons and Secant Methods: Newtons method and the secantmethod are closely related. If the approximation

f (xn ) = f (xn ) f (xn 1)xn xn 1

is used in the Newtons formula (3.1), we obtain the secant formula (4.1). The conditions forconvergence are almost the same, and the error formulae are similar. Nonetheless, there are twomajor differences:

1. Newtons method requires two function evaluations per iterate, that of f (xn ) and f (xn ),whereas the secant method requires only one function evaluation per iterate, that of f (xn )[provided the needed function value f (xn 1) is retained from the last iteration]. Therefore,Newtons method is generally, more expensive per iteration.

2. Newtons method converges more rapidly [order p = 2 vs. the secant methods p = 1 .62], andconsequently it will require fewer iterations to attain a given desired accuracy.

5 M ullers Method

Mullers method is useful for obtaining both real and complex roots of a function, and it is reason-ably straightforward to implement in a computer.

Mullers method is a generalization of the approach that led to the secant method. Giventhree points x0, x1, x2, a quadratic polynomial is constructed that passes through the three points(x i , f (x i)) , i = 0 , 1, 2; one of the roots of this polynomial is used as an improved estimate for a root of f (x).

8


9/28

The quadratic polynomial is given by

p(x) = f (x2) + ( x

x2)f [x2, x1] + ( x

x2)(x

x1)f [x2, x1, x0]. (5.1)

To check that p(x i ) = f (x i), i = 0 , 1, 2

just substitute x i into (5.1) and then reduce the resulting expression using the divided differences.There are other formulas for p(x) are available, but the above form is the most convenient formfor dening Mullers method. The formula (5.1) is called Newtons divided difference form of theinterpolation polynomial.

To nd the zeros of (5.1) we rst rewrite it in the more convenient form

y = f (x2) + w(x x2) + f [x2, x1, x0](x x2)2w = f [x2, x1] + ( x2

x1)f [x2, x1, x0]

= f [x2, x1] + f [x2, x0]f [x0, x1].We want to nd the smallest value of x x2 that satises the equation y = 0, thus nding the rootof (5.1) that is closest to x2. The solution is

x x2 = w w2 4f (x2)f [x2, x1, x0]2f [x2, x1, x0]

with the sign chosen to make the numerator as small as possible. Because of the loss-of-signicanceerrors implicit in this formula, we rationalize the numerator to obtain the new iteration formula

x3 = x2 2f (x2)

w w2 4f (x2)f [x2, x1, x0] (5.2)

with the sign chosen to maximize the magnitude of the denominator.Repeat (5.2) recursively to dene a sequence of iterates {xn : n 0}. If they converge to apoint , and if f ( ) = 0, then is a root of f (x). To see this,

w f ( ), as n =

2f ( )f ( ) [f ( )]2 2f ( )f ( )showing that the RHS fraction must be zero. Since f ( ) = 0 by assumption, the method of choosing

the sign in the denominator implies that the denominator is nonzero. Then the numerator mustbe zero, showing f ( ) = 0). The assumption f ( ) = 0 will say that is a simple root.

By an argument similar to that used for the secant method, it can be shown that

limn

| xn +1 || xn | p

=f (3) ( )6f ( )

( p 1)/ 2

p .= 1 .84 (5.3)

provided f (x) C3(I ) , I is the neighborhood of , and f ( ) = 0. The order p is the positive rootof

x3 x2 x 1 = 0 .With the secant method, real choices of x0 and x1 lead to a real value of x2. But with M ullersmethod, real choices of x0, x1, x2 can and do lead to complex roots of f (x). This is an importantaspect of Mullers method.

9


10/28

6 Fixed-Point Iteration Methods

We now consider solving an equation x = g(x) for a root by the iteration

xn +1 = g(xn ), n 0 (6.1)with an initial guess x0 to . The Newton method ts in the pattern with

g(x) x f (x)f (x)

(6.2)

Each solution of x = g(x) is called a xed point of g. Although we are interested in solving anequation f (x) = 0, there are many ways this can be reformulated as a xed-point problem.

Example 6.1 Consider the equation x2

a = 0 , for some a > 0.

This equation can be reformulated to xed point problem in the following ways:

(1). x = x2 + x a, or more generally, x = x + c(x2 a) for some c = 0.(2). x =

ax

.

(3). x = 12

x + ax

.

We give a numerical example with a = 3 , x0 = 2 and = 3 = 1 .732051. The results are given inTable 4.It is natural to ask what makes the various iterative schemes behave in the way they do inthis example. We will develop a general theory to explain this behavior and aid in analyzing newiterative methods.

Table 4. Iteration results for x 2 - 3 = 0.-----------------------------------------------------n case(1) case(2) case(3)

x_n x_n x_n-----------------------------------------------------

0 2.0000e+000 2.0000e+000 2.0000e+0001.0000e+000 3.0000e+000 1.5000e+000 1.7500e+0002.0000e+000 9.0000e+000 2.0000e+000 1.7321e+0003.0000e+000 8.7000e+001 1.5000e+000 1.7321e+0004.0000e+000 7.6530e+003 2.0000e+000 1.7321e+000----------------------------------------------------

Lemma 6.2 Let g(x) be continuous in the interval [a, b], and assume that a g(x) b for every x [a,b, ]. (We say that g sends [a, b] into [a, b], and denote it by g([a, b]) [a, b].) Then, x = g(x)has at least one solution in [a, b].

Proof. Consider the continuous function g(x) x. At x = a, it is positive, and at x = b it isnegative. Thus, by the intermediate value theorem, it must have a root in the interval [ a, b]. InFigure 4, the roots are the intersection points of y = x and y = g(x).

10


11/28

a

a

bx

y = g(x)

y

b

y = x

Figure 4: Plot of the function y = g(x).

Lemma 6.3 Let g(x) be continuous in the interval [a, b], and assume that g([a, b]) [a, b]. Fur-thermore, assume that there is a constant 0 < < 1, with

|g(x) g(y)| |x y|, x,y, [a, b]. (6.3)Then, x = g(x) has a unique solution in [a, b]. Also, the iterates

xn = g(xn 1), n

1

will converge to for any choice of x0 in [a, b], and

| xn | n

1 |x1 x0|. (6.4)

Proof. Suppose x g(x) has two solutions and in [a, b]. Then,

| | = |g() g( )| | |,i.e.,

(1 )| | 0.Since 0 < < 1, this implies that = . Also, we know by Lemma 6.2 that there is at least oneroot in [a, b].

To examine the convergence of the iterates xn , rst note that they all remain in [ a, b]. To seethis, note that the result

xn [a, b] implies xn +1 = g(xn ) [a, b]can be used with mathematical induction to prove xn [a, b] for all n . For the convergence,

| xn +1 | = |g() g(xn )| | xn |, (6.5)and by induction,

| xn | n | x0|, n 0. (6.6)11


12/28

As n , n 0; thus xn .To prove the bound (6.4), begin with

| x0| | x1|+ |x1 x0| | x0|+ |x1 x0|where the last step used (6.5). Then solving for | x0|, we have

| x0| 11 |

x1 x0|, n 0. (6.7)Combining this with (6.6) will complete the proof.

The bound (6.5) shows that the sequence {xn} is linearly convergent, with the rate of conver-gence bounded by . Also, from the proof, we can devise a possibly more accurate error boundthan (6.4). Repeating the argument that led to (6.7), we obtain

| xn | 11 |xn +1 xn |.

Further, applying (6.5) yields the bound

| xn +1 | 1 |

xn +1 xn |, n 0. (6.8)When is computable, this furnishes a practical bound in most situations.

If g(x) is differentiable in [ a, b], then

g(x) g(y) = g ( )(x y), between x and yfor all x, y [a, b]. Dene = max

x [a,b] |g (x)|.Then,

|g(x) g(y)| |x y|, x,y, [a, b].Theorem 6.4 Assume that g(x) is continuously differentiable in [a, b], that g([a, b]) [a, b], and that

= maxx [a,b] |g (x)| < 1 (6.9)

Then,

(i). x = g(x) has a unique solution in [a, b].

(ii). For any choice of x0 in [a, b] with xn +1 = g(xn ), n 0,lim

n xn = .

(iii).

| xn | n | x0| n

1 |x1 x0|

and

limn

xn +1

xn = g (). (6.10)12


13/28

Proof. Every result comes from the preceding lemmas, except for the rare of convergence (6.10).For it, use

xn +1 = g( ) g(xn ) = g ( n )( xn ), n 0 (6.11)with n an unknown point between and xn . Since xn , we must have n , and thuslim

n xn +1 xn

= limn

g ( n ) = g ().

If g () = 0, then the sequence {xn} converges to with order exactly p = 1, linear convergence.To see the importance of the assumption (6.9) on the size of g (x), suppose that |g ()| > 1.Then, if we had a sequence of iterates xn +1 = g(xn ) and a root = g(), we have (6.11). If xn

becomes sufficiently close to , then |g ( n )| > 1 and the error | xn+1 | will be greater than| xn |. Thus, convergence is not possible if |g ()| > 1. We graphically portray the computationof the iterates in four cases, see Figures 5 and 6.

y = x

y = g(x)

y = x

x0 x1x2x x

yy

y = g(x)

x0 x1 x2 x 3

Figure 5: Convergent sequences: 0 < g () < 1 and 1 < g () < 0.

y = x y = xy = g(x)y = g(x)

x0 x0x1 x1x2 x2

Figure 6: nonconvergent sequences: g () > 1 and g ( ) < 1.

Theorem 6.5 Assume is a solution of x = g(x), and suppose that g(x) is continuously differen-tiable in some neighboring interval about with |g ()| < 1. Then, the results of Theorem 6.4 are still true, provided x0 is chosen sufficiently close to .

Proof. Pick a number satisfying |g ()| < < 1. Then, pick an interval I = [ , + ] withmaxx I |g (x)| < 1

13


14/28

We have g(I ) I , since | x| implies

|

g(x)

|=

|g()

g(x)

|=

|g ( )

| |

x

|

|

x

| .

Now, apply the preceding theorem using [ a, b] = [ , + ].Now, we can verify the condition for Example 6.1. Calculate g (alpha ):(i). g(x) = x2 + x 3, g () = g ( 3) = 2 3 + 1 > 1.

(ii). g(x) = 3x

, g ( 3) = 3( 3)2 = 1.

(iii). g(x) = 12

x + 3x

, g (x) = 12

1 3x2

, g ( 3) = 0 .

7 Fixed-Point Iteration (Conte & De Boor)

We know that xed-point iteration method is a possible method for obtaining a root of the equation

f (x) = 0 . (7.1)

In this method, one derives from (7.1) an equation of the form

x = g(x) (7.2)

so that any solution of (7.2), i.e., and xed point of g(x), is a solution of (7.1). For instance,f (x) = x2 x 2 (7.3)

then among possible choices for g(x) are the following:

(1). g(x) = x2 2(2). g(x) = 2 + x(3). g(x) = 1 +

2x

(3). g(x) = x x2

x 2m , for some nonzero constant m.Each such g(x) is called an iteration function for solving (7.1) [with f (x) given by (7.1)]. Once aniteration function is chosen, one carries out the following algorithm.

Algorithm: Fixed-point iteration Given an iteration function g(x) and a starting point x0.

For n = 0 , 1, , until satised, do:Calculate xn +1 = g(xn )

For this algorithm to be useful, we must prove that:

(i) For the given initial guess x0, we can calculate successively x1, x2, .14


15/28

(ii) The sequence {xn} converges to some point .(iii) The limit is xed-point of g(x), i.e., = g().

The real-valued functiong(x) = x

shows that (i) is not a trivial requirement. For in this case, g(x) is dened only for x 0. Startingwith any x0 > 0, we get x1 = g(x0) < 0; hence we cannot calculate x2. Therefore, we need thefollowing assumption.

Assumption 1. There is an interval I = [a, b] such that, for all x I , g(x) is dened andg(x) I ; i.e., the function g(x) maps I into itself.It follows from this assumption that, be induction on n , that if x0 I , then for all n, xn I ;hence xn +1 = g(xn ) is dened and is in I .To satisfy (iii), we need the continuity of g(x), let the sequence

{xn

} as n

, then

= limn

xn +1 = limn g(xn ) = g limn xn = g().

Assumption 2. The iteration function g(x) is continuous on I = [a, b].

Lemma 7.1 Let Assumptions 1 and 2 hold true. Then the xed-point problem (7.2) has a xed-point in I = [ a, b].

Proof. If we have either g(a) = a or g(b) = b, then it is true. Otherwise, we have g(a) = a andg(b) = b. But, by assumption 1, both g(a) and g(b) are in I ; hence g(a) > a and g(b) < b. Thisimplies that the function h(x) = g(x)

x satises h(a) > 0, h(b) < 0. Since h(x) is continuous onI , by assumption 2. By intermediate-value theorem for continuous functions, h(x) must vanish inI . Thus, g(x) has a xed point in I .

For the discussion of (ii) concerning convergence, it is instructive to carry out the iterationgraphically. This can be done as follows. Since xn = g(xn 1) the point {xn 1, xn}lies on the graphof g(x). To locate {xn , xn +1 } from {xn 1, xn}, draw the straight line through {xn 1, xn} parallelto the x axis. This line intersects the line y = x at the point {xn , xn}. Through this point, drawthe straight line parallel to the y axis. This line intersects the graph y = g(x) of g(x) at the pointxn , g(xn )). But since g(xn ) = xn +1 , this is the desired point {xn , xn +1 }. In Figures 7 and 8, wehave carried out the rst few steps of xed-point iteration for four typical cases. Note that is axed point of g(x) if and only if y = g(x) and y = x intersect at {, }.As Figures 7-8 show that the xed-point iteration may well fail to converge, as it does in Figures7 (a) and 8 (d) . Whether or not the iteration converges [given that g(x) has a xed point] seems todepend on the slope of g(x). If the slope of g(x) is too large in absolute value, near a xed point of g(x), then we cannot hope for convergence to that xed point. We therefore make the followingassumption.

Assumption 3. The iteration function is differentiable on I = [a, b]. Further, there exists anonnegative constant K < 1 such that

|g (x)| K, x I.Note that Assumption 3 implies Assumption 2, since a differentiable function is, in particular,

continuous.

15


16/28

x

yy = g(x)

y = x

x0 x4

x4 x0 x

y = g(x)

y = x

y

(a) (b)Figure 7: Fixed-point iterations.

_0 xxx1 x2

y = g(x)

y = xy

x3 x0x1 x2x3 x

x

y = xy = g(x)

(a) (b)

Figure 8: Fixed-point iterations.

Theorem 7.2 Let g(x) be an iterative function satisfying Assumptions 1 and 3. Then, g(x) has exactly one xed point I , and starting with any initial approximation x0inI , the sequence {xn}generated by xed-point iteration Algorithm converges to .

Proof. To prove this theorem, we have already proved the existence of the xed point of g(x)in I . Let the sequence {xn} be generated by the xed-point iteration Algorithm. Denote the errorin the n th iterate by

en = xn , n = 0 , 1, .Then, since = g() and xn = g(xn 1), we have

en = xn = g( ) g(xn 1) = g ( n )en 1, (7.4)for some n between and xn 1, by mean-value theorem for derivatives. Hence, by Assumption 3,

|en | K |en 1|.16


17/28

It follows by induction on n that

|en

| K

|en 1

| K 2

|en 2

| K n

|e0

|.

Since 0 K < 1, we have limn K n = 0; thereforelim

n |en | = limn K n |e0| = 0regardless of the initial error e0. But his says that {xn}converges to . It also proves that is theonly xed point of g(x) in I . For if, assume is another xed point of g(x) in I , then with x0 = ,we should have x1 = g(x0) = ; hence |e0| = |e1| K |e0|. Since K < 1, this then implies |e0| = 0,or = . This completes the proof.Corollary 7.3 If g(x) is continuously differentiable in some open interval containing the xed point , and if |g ()| < 1, then there exists an > 0 so that xed-point iteration with g(x) converges whenever |x0 | .Proof. Indeed, since g (x) is continuous near and |g ()| < 1, there exists for any K with |g ()| K for every |x | . Fix one such K with its corresponding . Then, for I = [ , + ],Assumption 3 is satised. As to Assumption 1, let x be any point in I , thus |x | . Then, asin the proof of 7.2,

g(x) = g(x) g() = g ( )(x )for some point between x and , hence in I . But then

|g(x) | |g ( )||x | K < showing that g(x) is in I if x I . This veries Assumption 1, and the conclusion now follows from

Theorem 7.2.Because of the corollary, a xed point of g(x) for which |g ()| < 1, is often called a point of attraction [for the iteration with g(x)].

We consider again the quadratic function f (x) = x2 x 2. The zeros of this function are 2and -1. Suppose that we wish to calculate the root = 2 by xed-point iteration. If we use theiteration function g(x) = x2 2, then for x > 1/ 2, we have g (x) > 1. It follows that Assumption3 is not satised for any interval containing = 2; that is, = 2 is not a point of attraction. Infact, one can prove that, starting at any point x0, the sequence {xn} generated by this xed-pointiteration will converge to = 2 only if, for some n0, xn = 2 for all n n0; that is, = 2 is hitaccidentally.

On the other hand, if we choose g(x) = 2 + x, theng (x) = 1

2 2 + x .Now x 0 implies g(x) 0 and 0 g (x) 1/ 8 < 1, while for example, x 7 implies thatg(x) = 2 + x 2 + 7 = 3. Hence, with I = [0, 7] both Assumptions 1, and 3 are satised, andany x0 [0, 7] leads, therefore, to a convergent sequence. Indeed, if we take x0 = 0, then

x1 = 2 = 1 .41421x2 = 3.41421 = 1.84775x3 = 3.84775 = 1.96157x4 = 3.96157 = 1.99036x5 = 3.99036 = 1.99759

17


18/28

which clearly converges to the root = 2.Consider the more realistic example of the following transcendental equation

f (x) = x 2sin(x) = 0 .The most natural rearrangement here is

x = 2sin( x)

so that g(x) = 2sin( x). As examination of the curves y = g(x) and y = x shows that there is aroot between / 3 and 2/ 3. Further,

if

3 x 2

3 , then 3 g(x) 2.

Hence, if / 3

a

3 and 2

b

2/ 3, then Assumption 1 is satised. Finally, g (x) = 2cos( x)

strictly decreases from 1 to -1 as increases from / 3 to 2/ 3. It follows that Assumption 3 is satisedwhenever / 3 < a 3, 2 b < 2/ 3. In conclusion, xed-point iteration with g(x)2 sin(x)converges to the unique solution of the equation in [ / 3, 2/ 3] whenever x0 (/ 3, 2/ 3).

8 Convergence Acceleration for Fixed-Point Iteration

Here, we investigate the rate of convergence of xed-point iteration and show how informationabout the rate of convergence can be used at times to accelerate convergence.

We assume that the iteration function g(x) is continuously differentiable and that, starting withsome point x0, the sequence {xn}generated by xed-point converges to some point . This is thena xed-point of g(x), and we have that

en +1 = xn +1 = g ( n )en (8.1)for some n between and xn , n = 1 , 2, . Since limn xn = , it then follows that lim n n = ; hence

limn

g ( n ) = g ()

g (x) being continuous, y assumption. Consequently,

en +1 = g ( )en + n en (8.2)

where lim n n = 0. Hence, if g () = 0, then for large enough n,

en+1 g ()en (8.3)i.e., the error en +1 in the ( n + 1)st iterate depends (more or less) linearly on the error en in thenth iterate. We therefore, say that {xn} converges linearly to .Now, note that we can solve (8.1) for . For

xn+1 = g ( n )( xn ) (8.4)gives

(1 g ( n )) = xn +1 g ( n )xn= [1 g ( n )]xn +1 + g ( n )(xn +1 xn )

18


19/28

Therefore,

= xn +1 + g ( n )(xn +1 xn )

1 g ( n ) = xn +1 +

(xn +1 xn )g ( n )

1

1. (8.5)

Of course, we do not know the number g ( n ). But, we know that the ratio

r n := (xn xn 1)(xn +1 xn )

= (xn xn 1)g(xn ) g(xn 1)

= g ( n ) 1 (8.6)

for some n between xn and xn 1, by the mean-value theorem for derivatives. For large enough n,therefore, we have

r n = 1g ( n )

1g ( )

1g ( n )

and then the point

xn := xn+1 + (xn +1 xn )r n 1

with rn = (xn xn 1)(xn +1 xn )

(8.7)

should be a very much better approximation to than xn or xn +1 .This can also be seen graphically. In effect we obtained (8.7) by solving (8.4) for after replacing

g ( n ) by the number g[xn 1, xn ] and calling the solution xn . Thus, xn xn +1 = g[xn 1, xn ]( xn xn ).Since xn +1 = g(xn ), this shows that xn is xed point of the straight lines(x) = g(xn ) + g[xn 1, xn ](x xn )This we recognize as the linear interpolant to g(x) at xn 1, xn . If now the slope of g(x) variesbetween xn 1 and , that is, if g(x) is approximately a straight line between xn 1 and , then thesecant s(x) should be a very good approximation to g(x) in that interval; hence the xed point xnof the secant should be very good approximation to the xed point of g(x), see Figure 9.

x

y y = x

y = g(x)

y = s(x)

xn xn1xn+1 xn

y y = x

y = g(x)

y = s(x)

xn1 xxn+1xn xn(a) (b)

Figure 9: Fixed-point iterations.

In practice, we will not be able to prove that any particular xn is close enough to to make

xn a better approximation to than is xn or xn +1 . But we can test the hypothesis that xn is closeenough by checking the ratios rn 1, rn . If the ratios are approximately constant, we accept thehypothesis that the slope of g(x) varies little in the interval of interest; hence we believe that the

19


20/28

secant s(x) is a good enough approximation to g(x) to make

xn a very much better approximation

to than is xn . In particular, we then accept |

x xn | as a good estimate for the error |en |.

Whether or not any particular xn is a better approximation to than is xn , one can prove thatthe sequence { xn} converges faster to than does the original sequence {xn}; that is, xn = + o(en ). (8.8)This process of deriving from a linearly converging sequences {xn}a faster converging sequence

{ xn} by (8.7) is usually called Aitkens 2 process . Using the abbreviations

xk = xk+1 xk , 2xk = ( xk ) = xk+1 xk(8.7) can be expressed in the form

xn = xn +1 ( xn )

2

2xn 1 (8.9)

therefore the name 2 process. This process is applicable to any linearly convergent sequence,whether generated by xed-point iteration or not.

Algorithm: Aitkens 2 process Given a sequence {xn}converging to , calculate the sequence{ xn} by (8.9).If the sequence {xn} converges linearly to , that is, if

xn+1 = K ( xn ) + o( xn ), for some K = 0

then

xn = + o( xn ).Furthermore, if starting from a certain k on the sequence xk 1/ xk , xk / xk+1 , of differenceratios is approximately constant, then xn can be assumed to be a better approximation to thanis xk . In particular, | xn xn | is then a good estimate for the error | xk|.If, in the case of xed-point iteration, we decide that a certain xk is a very much better approx-imation to than xk , then it is certainly wasteful to continue generating X k+1 , xk+2 , etc. It seemsmore reasonable to start xed-point iteration afresh with xk as the initial guess. This leads to thefollowing algorithm.Algorithm: Steffensen iteration Given the iteration function g(x) and a point y0:

For n = 0 , 1, 2, until satised, do:x0 := ynCalculate x1 = g(x), x2 = g(x1)Calculate d = x1, r = x0/dCalculate yn +1 = x2 + d/ (r 1)

One step of this algorithm consists of two steps of xed-point iteration followed by one appli-cation of (8.7), using the three iterates available to get the starting value for the next step.

20


21/28


22/28

with n between xn and . Thus,

xn +1 = 12(xn )

2

g ( n )showing that the method

xn +1 = xn p f (xn )f (xn )

, n 0 (9.4)her order of convergence two, the same as the original Newton method for simple roots.

10 Roots of Polynomials

Consider the polynomial equation

p(x) a0 + a1x + + an xn

= 0 , an = 0 (10.1)Nested multiplication: A very efficient way to evaluate the polynomial p(x) given in (10.1) isto use nested multiplication:

p(x) = a0 + x(a1 + x(a2 + + x(an 1 + an x) )) . an = 0 (10.2)With formula (10.1), there are n additions and 2 n 1 multiplications, and with (10.2) there are nadditions and n multiplications, a considerable saving.

It is convenient to introduce the following auxiliary coefficients. Let bn = an ,

bk = ak + zbk+1 , k = n

1, n

2,

, 0 (10.3)

By considering (10.3), it is easy to see that

p(z) = b0 (10.4)

Introduce the polynomialq (x) = b1 + b2x + + bn xn

1. (10.5)

Then,

b0 + ( x z)q (x) = b0 + ( x z)[b1 + b2x + + bn xn 1]

= ( b0 b1z) + ( b1 b2z)x + + ( bn 1 bn z)xn 1 + bn xn

= a0 + a1x + + an xn = p(x) p(x) = b0 + ( x z)q (x), (10.6)

where q (x) is the quotient and b0 the remainder when p(x) is divided by ( x z). The use of (10.3)to evaluate p(z) and to form the quotient polynomial q (x) is also called Horners method .If z is a root of p(x), then b0 = 0 and p(x) = ( x z)q (x). To nd additional roots of p(x), we

can restrict our search to the roots of q (x). This reduction process is called deation .

Newtons Method: If we want to apply Newtons method to nd a root of p(x), we must be ableto evaluate both p(x) and p (x) at any point z. From (10.6),

p (x) = ( x z)q (x) + q (x),22


23/28

i.e., p (z) = q (z). (10.7)

We use (10.5) and (10.7) in the following adaption of Newtons method to polynomial root nding.

Algorithm Polynew (a,n,x 0,,itmax,root,b,ier )

1. Remark: a is a vector of coefficients, itmax the maximum number of iterates to be computed,b the vector of coefficients for the deated polynomial, and ier an error indicator.

2. itnum := 1

3. z := x0, bn := c := an

4. For k = n 1, , 1, bk := ak + zbk+1 , c := bk + zc.5. b0 := a0 + zb1.

6. If c = 0, ier := 2 and exit.

7. x1 := x0 b0/c .8. If |x1 x0| , then ier := 0, root : = x1, and exit.9. If itnum = itmax, then ier := 1 and exit.

10. Otherwise, itnum := itnum + 1, x0 := x1, and go to Step 3.

11 Systems of Nonlinear Equations

Here, we consider some numerical methods to solve systems of nonlinear equations. These problemsare widespread in applications, for instance, one encounter systems of nonlinear algebraic equationswhen solving nonlinear differential equations.

Consider the following two equations:

f 1(x1, x2) = 0 , f 2(x1, x2) = 0 . (11.1)

The generalizations to n equations in n variables should be straightforward once the principal ideashave been grasped. Rewrite the equations (11.1) in vector notation:

f (x) = 0, x = x1x2, f (x) =

f 1(x1, x2)f 2(x1, x2)

. (11.2)

The solution of (11.1) can be looked upon as a two-step process:

(1). Find the zero curves in the x1x2-plane of the surfaces z = f 1(x1, x2) and z = f 2(x1, x2).

(2). Find the points of intersection of these zero curves in the x1x2-plane.

This perspective will be used to generalize the Newtons method to solve the system (11.1).

23


24/28

11.1 Fixed-Point Theory

We generalize some of the xed-point theory to the system (11.1). Assume that the root ndingproblem (11.1) has been reformulated in an equivalent form as

x1 = g1(x1, x2), x2 = g2(x1, x2). (11.3)

Denote its solution by

= 1 2.

We study the xed-point iteration

x1,n +1 = g1(x1,n , x2,n ), x2,n +1 = g2(x1,n , x2,n ). (11.4)

Using vector notation, we rewrite this as

xn +1 = g(xn ), (11.5)

with

xn = x1,nx2,n

, g(x ) = g1(x1, x2)g2(x1, x2)

.

To analyze the convergence of (11.5), begin by subtracting the two equations in (11.4) from thecorresponding equations

1 = g1( 1, 2), 2 = g2(1, 2)

involving the exact solution . Apply the mean-value theorem for functions of two variables tothese differences to obtain

i x i,n +1 =g i (

(i)1,n ,

(i)2,n )

x 1( 1 x1,n ) +

g i( (i)1,n ,

(i)2,n )

x 2( 2 x2,n ), i = 1 , 2.

The point (i)n = ( (i)1,n ,

(i)2,n ) are on the line segment joining and xn . In matrix form, these

equations become

1 x1,n +1 2 x2,n +1 =

g1( (1)n )

x 1g1(

(1)n )

x 2g2(

(2)n )

x 1g2(

(2)n )

x 2

1 x1,n 2 x2,n (11.6)

Let Gn denote the matrix (11.6). Then, we can rewrite this equation as

xn +1 = Gn ( xn ) (11.7)It is convenient to introduce the Jacobian matrix for the functions g1 and g2:

G(x) =

g1(x)x 1

g1(x)x 2

g2(x)x 1

g2(x)x 2

(11.8)

In (11.7), if xn is close to , then Gn will be close to G(). This will make the norm of G(). Thematrix G( ) play a crucial role of g () in the single equation xed-point theory .

24


25/28

Theorem 11.1 Let D be a closed, bounded, and convex set in the plane. Assume that the compo-nents of g(x ) are continuously differentiable at all points of D, and further assume that

g(D ) D, (11.9)

maxx D ||G(x)|| < 1 (11.10)Then, we have the following:

(a) x = g(x ) has a unique solution D .

(b) For any initial point x0 D , the iteration (11.5) will converge to D .(c)

||

xn +1

||

(

||G()

|| + n )

||

xn

|| (11.11)

with n 0 as n .Proof. The existence of a xed point can be shown by proving that the sequence of iterates xnfrom (11.3) are convergent in D .

Suppose and are both xed points of g(x) in D . Then

= g() g( ) (11.12)Apply the mean value theorem to component i, obtaining

gi( ) gi ( ) = gi ( (i) )( ), i = 1 , 2 (11.13)with

gi (x) =g ix 1

g ix 2

and (i) D, on the line segment joining and . Since ||G(x)|| < 1, we have from thedenition of the norm thatg i (x)

x 1+

g i(x )x 2 < 1, x D, i = 1 , 2.

Combining this with (11.13)

|gi( ) gi ( )| || ||||g( ) g( )|| || || (11.14)

Combining with (11.12), this yields

|| || || ||which is possible only if = , showing the uniqueness of D .

(b). Condition (11.9) will ensure that all xn D if x0 D. Next subtract xn +1 = g(xn ) from = g( ), obtaining xn +1 = g( ) g(xn )

25


26/28


27/28

11.2 Newtons Method for Nonlinear Systems

As with Newtons method for a single equation, there is more than one way of viewing and deriv-ing the Newton method for solving a system of nonlinear equations. We begin with an analyticderivation, and then we give a geometrical perspective.

Apply Taylors theorem for functions of two variables to each of the equations f i(x1, x2) = 0,expanding f i ( ) about x 0; for i = 1 , 2

0 = f i ( ) = f i(x0) + ( 1 x1,0)f i(x0)

x 1+ ( 2 x2,0)

f i (x0)x 2

+

+12

( 1 x1,0) x 1

+ ( 2 x2,0) x 2

2f i ( (i) ) (11.22)

with (i)

on the line segment joining x0 and . If we drop the second-order terms, we obtain theapproximation

0 .= f 1(x0) + ( 1 x1,0)f 1(x0)

x 1+ ( 2 x2,0)

f 1(x0)x 2

0 .= f 2(x0) + ( 1 x1,0)f 2(x0)

x 1+ ( 2 x2,0)

f 2(x0)x 2

(11.23)

In matrix form,0 .= f (x0) + F (x0)( x0) (11.24)

with F (x0) the Jacobian matrix of f , given in (11.20).

Solving for , .= x0 F (x0)

1f (x0) x1.The approximation x1 should be an improvement on x0, provided x0 is chosen sufficiently close to . This leads to the iteration method rst obtained at the end of the last section,

xn +1 = xn F (xn ) 1f (xn ), n 0. (11.25)

This is Newtons method for solving the nonlinear system f (x) = 0.In practice, we do not invert F(xn ), particularly for systems having more than two equations.

Instead, we solve a linear system for a correction term to xn :

F (xn ) n +1 = f (xn )xn +1 = xn + n +1 (11.26)this is more efficient in computation time, requiring only about one-third as many operations asinverting F (xn ).

There is a geometrical derivation for Newtons method, in analogy with the tangent line ap-proximation used with single nonlinear equations studied in Section 3. The graph in space of theequation

z = f i (x0) + ( x1 x1,0)f i (x0)

x 1+ ( x2 x2,0)

f i(x0)x 2 pi(x1, x2)

is a plane that is tangent to the graph of z = f i (x1, x2) at the point x0, i = 1 , 2. If x0 is near to ,then these tangent planes should be good approximations to the associated surfaces of z = f i(x1, x2),

27


28/28

for x = ( x1, x2) near . Then, the intersection of the zero curves of the tangent planes z = pi(x1, x2)should be a good approximation to the corresponding intersection of the zero curves of the original

surfaces z = f i (x1, x2). This results in the statement (11.23). The intersection of the zero curvesof z = f i (x1, x2), i = 1 , 2.

Convergence Analysis: For the convergence of Newtons method (11.25), regard it as axed-point iteration method with

g(x) = xn F (xn ) 1f (x). (11.27)

Also assume thatDeterminanat of F ( ) = 0

which is the analogue of assuming is a simple root when dealing with a single equation, as inTheorem 3.1. It can then be shown that the Jacobian G(x) of (11.27) is zero at x = ; consequentlythe condition (11.18) is easily satised.

Corollary 11.2 then implies that xn converges to , provided x0 is chosen sufficiently close to . In addition, it can be shown that the iteration is quadratic convergent, i.e.,

|| xn +1 || B|| xn ||2 , n 0,for some constant B > 0.

NonLinear Equations.pdf

Documents

Transcript of NonLinear Equations.pdf