Linear Algebra Study Guide - EdUHKMTH 2032 Linear Algebra Study Guide Dr. Tony Yee Department of...

MTH 2032

Linear Algebra

Study Guide

Dr. Tony Yee

Department of Mathematics and Information Technology

The Hong Kong Institute of Education

June 23, 2011

Contents

Table of Contents iii

1 Matrix Algebra 1

1.1 Real Life Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Matrix Addition and Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.5 Partitioned Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.6 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.7 Matrix Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.8 Solve Matrix Equations by Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.9 Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1 Matrix Algebra (True or False) 19

1 Matrix Algebra (Worked Examples) 23

2 Linear Equations 41

2.1 Systems of m Equations and n Unknowns . . . . . . . . . . . . . . . . . . . . . . . . 412.2 Solve Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.3 Geometrical Interpretation of Solutions . . . . . . . . . . . . . . . . . . . . . . . . . 442.4 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.5 (Reduced) Row Echelon Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552.6 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582.7 Existence and Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582.8 Homogeneous Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612.9 Fundamental solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

2 Linear Equations (True or False) 65

2 Linear Equations (Examples done in Class) 71

3 Vector Spaces 125

3.1 Geometry of Euclidean vectors in R2, R3 . . . . . . . . . . . . . . . . . . . . . . . . . 1253.2 Vector Addition and Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . . 1263.3 Linear Combination of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1273.4 Express Solution in Vector Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1283.5 Find General Solution from RREF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1283.6 Free Variables and Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

iii

CONTENTS

3.7 Vector Spaces and Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1323.8 Column Space, Row Space and Null Space . . . . . . . . . . . . . . . . . . . . . . . . 1363.9 Span of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1373.10 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1413.11 Basis of Euclidean Spaces (Rn) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1453.12 Basis of Subspaces (Column Space, Row Space, Null Space) . . . . . . . . . . . . . . 149

3 Vector Spaces (True or False) 151

3 Vector Spaces (Worked Examples) 157

4 Determinants and Eigenvalues 179

4.1 Find Determinants by Cofactor Expansions . . . . . . . . . . . . . . . . . . . . . . . 1794.2 Determinants and Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1794.3 Definitions of Egenvalue and Eigenvector . . . . . . . . . . . . . . . . . . . . . . . . . 1824.4 Characteristic Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1834.5 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1874.6 Diagonalizability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1884.7 Application of Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

4 Determinants and Eigenvalues (True or False) 193

4 Determinants and Eigenvalues (Worked Examples) 197

5 Vector Geometry 219

5.1 Geometric Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2195.2 Dot Product and Inner Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2195.3 Line and Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2205.4 Orthogonal and Orthonormal Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2225.5 Orthogonality and Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . 2235.6 Orthogonal Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2255.7 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2265.8 Gram–Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2295.9 Projection and Approximation (Least Squares Applications . . . . . . . . . . . . . . 231

5 Vector Geometry (True or False) 235

5 Vector Geometry (Worked Examples) 239

6 Miscellaneous 257

6.1 Cross Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2576.2 Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2586.3 Linear Operators and Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2596.4 Fourier Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

7 Answers to True or False Questions 261

iv

Chapter 1

Matrix Algebra

Our ability to analyse and solve linear equations will be greatly enhanced when we can perform algebraicoperations with matrices. Furthermore, the definitions and theorems in this chapter provide some basictools for handling the many applications of linear algebra that involve two or more matrices.

1.1 Real Life Examples

Matrices are a very important tool in expressing and discussing problems which arise from real life examples.

Example 1.1 (Electric network)

In the following electric network,

−20V 2Ω

+

−11V

+

i1 i2 i3

5Ω P

Q

2Ω

1Ω

Figure 5.1: An electric networkthe currents i1, i2, i3 satisfy the equations

Node P : −i1 +i2 −i3 = 0,

Node Q : i1 −i2 +i3 = 0,

Left loop : 5i1 +2i2 = 20,

Right loop : 2i2 +3i3 = 11.

The above system of equations is characterized by the following rectangular array of numbers

[A b

]=

−1 1 −1 01 −1 1 05 2 0 200 2 3 11

,

called the augmented matrix. 2

1

1. Matrix Algebra

Example 1.2 (Rotation in plane)

Consider the rotation by an angle θ in the plane with respect to the origin. The rotation maps a pointv = (x, y) to another point v = (x, y).

θ

•

• v = (x, y)

v = (x, y)

Figure 5.2: Rotation by angle θ

v may be expressed in terms of v by the formula

x = x cos θ − y sin θ,

y = x sin θ + y cos θ.

The formula is characterized by the following rectangular array of numbers

Rθ =

[cos θ − sin θsin θ cos θ

],

called the transformation matrix of the rotation. 2

Example 1.3 (Spring-mass system)

In the following spring-mass system (where m1, m2 are masses, and k1, k2, k3 are spring constants),

k1 = 7

m1 = 10

k2 = 5

m2 = 8

k3 = 4

Figure 5.3: Spring-mass system

the displacement functions x1(t) and x2(t) of the two masses satisfy the following equations of motions

10x′′

1 = −7x1 − 5(x1 − x2),

8x′′2 = −4x2 − 5(x2 − x1),

or

x′′1 = −6

5x1 +

1

2x2,

x′′2 =

5

8x1 −

9

8x2.

The differential system is characterized by the following array of numbers

−6

5

1

2

5

8−9

8

,

called the coefficient matrix. 2

2

Chapter 2

Linear Equations

2.1 Systems of m Equations and n Unknowns

In this chapter we first review how systems of linear equations involving two variables are solved using themethod of substitution learn in secondary school elementary algebra. (Basic properties of linear equations andlines should be reviewed.) Because the method of substitution is not suitable for linear systems involvinglarge numbers of equations and variables, we then turn to a different method of solution involving theconcept of an augmented matrix, which arises quite naturally when dealing with larger linear systems. Wethen study matrices and matrix basic operations in their own right as a new mathematical tool. Withthese new operations added to our mathematical toolbox, we return to systems of linear equations from afresh point of view. A new technique, called the elementary row operations, will play an important role.

Solution of systems of linear equations has its own importance in applications and they appear frequentlyin many practical problems. We briefly review several real life problems in this section. To establish basicconcepts, we consider the first simple example.

Example 2.1 (Systems with two variables)

If 2 adult tickets and 1 child ticket cost $8, and if 1 adult ticket and 3 child tickets cost $9, what is the priceof each?

Solution Let x be the price of adult ticket and y the price of child ticket. Then

2x + y = 8 and x + 3y = 9.

We now have a system of two linear equations with two variables. It is easy to find ordered pairs (x, y) thatsatisfy one or the other of these equations. For example, the ordered pair (4, 0) satisfies the first equation,but not the second, and the ordered pair (6, 1) satisfies the second, but not the first. To solve this system,we must find all ordered pairs of real numbers that satisfy both equations at the same time. The set of allsuch ordered pairs is called the solution set.

The method of substitution works nicely for systems involving two variables. By eliminating one variablefrom these two equations will immediately solve the problem. For example, if we choose to eliminate y, thenwe may substitute y = 8 − 2x (follows from the first equation) into the second equation. That gives

x + 3(8 − 2x) = 9.

After solving, we have the solution x = 3 and hence y = 2. 2

In the following we give another example: The quantity of a product that people are willing to buyduring some period of time depends on its price. Generally, the higher the price, the less the demand; thelower the price, the greater the demand. Similarly, the quantity of a product that a supplier is willing to sell

41

2. Linear Equations

during some period of time also depends on the price. Generally, a supplier will be willing to supply moreof a product at higher prices and less of a product at lower prices. The simplest supply and demand modelis a linear model where the graphs of a demand equation and a supply equation are straight lines.

Example 2.2 (Supply and demand)

Suppose that we are interested in analyzing the sale of cherries each day in Hong Kong. Using specialanalytical techniques (regression analysis) and data collected, an analysis arrives at the following price-demand and price-supply models:

p = −2q + 40, (Price-demand equation, consumer)

p = 0.7q + 7.6, (Price-supply equation, supplier)

where q represents the quantity of cherries in thousands of pounds and p represents the price in dollars.For example, we see that consumers will purchase 10 thousand pounds (q = 10) when the price is p =−2(10) + 40 = 20 dollars per pound. On the other hand, suppliers will be willing to supply 17.714 thousandpounds of cherries at 20 dollars per pound (solve 20 = 0.7q + 7.6). Thus, at $20 per pound the suppliersare willing to supply more cherries than consumers are willing to purchase. The supply exceeds the demandat that price and the price will come down. At what price will cherries stabilize for the day? That is,at what price will supply equal demand? This price, if it exists, is called the equilibrium price, andthe quantity sold at that price is called the equilibrium quantity. The result could also be interpretedgeometrically. The point where the two straight lines (two curves in general) for the price-demand equationand the price-supply equation intersect is called the equilibrium point. How do we find these quantities?

Solution We solve the linear system

p = −2q + 40, (Demand equation)

p = 0.7q + 7.6 (Supply equation)

by using the method of substitution (substituting p = −2q + 40 into the second equation)

−2q + 40 = 0.7q + 7.6,

−2.7q = −32.4,

q = 12 thousand pounds. (equilibrium quantity)

Now substitute q = 12 back into either of the original equations in the system and solve for p (we choosethe first equation)

p = −2(12) + 40

= 16.0 dollars per pound. (equilibrium price)

If the price is above the equilibrium price of $16.0 per pound, the supply will exceed the demand and theprice will come down. If the price is below the equilibrium price of $16.0 per pound, the demand will exceedthe supply and the price will rise. Thus, the price will reach equilibrium at $16.0. At this price, supplierswill supply 12 thousand pounds of cherries and consumers will purchase 12 thousand pounds. 2

42

2.2 Solve Systems of Linear Equations

2.2 Solve Systems of Linear Equations

The substitution method used in the previous examples is an algebraic method that is easy to use andprovides exact solutions to a system of two equations with two variables, provided that solutions exist. Wenow define some terms that we can use to describe the different types of solutions to systems of equationsthat we will encounter. Solutions may exist or may not exist. See the following definition.

2 Definition A system of linear equations is called consistent if it has one or more solutions and incon-sistent if no solutions exist. Furthermore, a consistent system is said to be independent if it has exactlyone solution (often referred to the unique solution) and dependent if it has more than one solution (re-ferred to infinitely many solutions). Two systems of linear equations are equivalent if they have thesame solution set.

Referring to the system in Example 2.1, the system is a consistent and independent system with exactlyone solution x = 3, y = 2. A natural question may arise: Can a consistent system have exactly twosolutions? Exactly three solutions?

The answer is simply NO. For example, by geometrically interpreting a system of two linear equationswith two variables, we gain useful information about what to expect in the way of solutions to the system.In general, any two lines in a coordinate plane must intersect in exactly one point, be parallel, or coincide(have identical graphs). In fact there are only three possible types of solutions for systems of two linearequations in two variables. These ideas are illustrated geometrically in the following.

Given a system of linear equations, we will have one of the three possibilities for the number of solutions.

1. Exactly one solution (i.e., a unique solution).

2. No solution.

3. Infinitely many solutions (i.e., non-unique solutions).

Let us first review what are solutions to systems of two linear equations with two variables and its geometricalinterpretation.

43

2. Linear Equations

2.3 Geometrical Interpretation of Solutions

The pair x1 = 1, x2 = 3 is a solution of the following system of linear equations

2x1 − x2 = −1,

−x1 + x2 = 2

because if we replace the variables by the given values, then the two equalities hold. Another pair x1 = 2,x2 = 2 is not a solution because the first equality does not hold.

x1

x2

• (1, 3)

• (2, 2)

−x1 + x2 = 2

2x1 − x2 = −1Figure 1.1: Unique solution

The two linear equations represent two straight lines on the (two-dimensional) plane. The solution (x1, x2) =(1, 3) of the system of the two equations is the intersection of the two lines. In particular, we can find exactlyone solution.

The following system of linear equations

2x1 − x2 = −1,

4x1 − 2x2 = 2

represents two different parallel lines. Since the two lines never meet, the system does not have any solution.

x1

x2

4x1 − 2x2 = 2

2x1 − x2 = −1

Figure 1.2: No solution

The following system of linear equations

2x1 − x2 = −1,

4x1 − 2x2 = −2

44


represents two identical lines. Any point on the line is a solution to the system. In particular, the systemhas infinitely many solutions.

x1

x2

2x1 − x2 = −1

4x1 − 2x2 = −2

Figure 1.3: Infinite number of solutions

The geometrical interpretation of the solution of linear equations can be extended to more variables andmore equations. For example, one linear equation with three variables

a1x1 + a2x2 + a3x3 = b

is represented by a plane in the (three-dimensional) space. A system of two linear equations with threevariables is represented by two planes. The following are the possible intersections (= solution to thesystem) of the two planes.

line of solutions no solution plane of solutions

Figure 1.4: Intersection of two planes

In Figure 1.4, we observed that a system of 2 linear equations with 3 variables cannot have a uniquesolution. In that case, if the system is consistent, it must have infinitely many solutions. In the followingwe also give some examples of linear systems of more than two variables. Just remark that no professionalknowledge in chemistry nor physics is needed for this elementary course MATH 113, the following twoexamples are only used for illustrating the wide applicability of linear systems to real life examples.

Example 2.3 (Electric network)

In the following electric network,

−20V 2Ω

+

−11V

+

i1 i2 i3

5Ω P

Q

2Ω

1Ω

Figure 1.5: An electric network

45

2. Linear Equations

the currents i1, i2, i3 satisfy the equations

Node P : −i1 +i2 −i3 = 0,

Node Q : i1 −i2 +i3 = 0,

Left loop : 5i1 +2i2 = 20,

Right loop : 2i2 +3i3 = 11.

2

Example 2.4 (Linear equations in four variables)

In order to balance the following chemical reaction

x1 CH4 + x2 O2 −→ y1 CO2 + y2 H2O,

the number of molecules x1, x2, y1, y2 must satisfy

x1 −y1 = 0,

4x1 −2y2 = 0,

2x2 −2y1 −y2 = 0.

Remind you that there is no geometrical meaning to the system of more than three variables. 2

Referring to the systems in Examples 2.3–2.4, the first system (four equations with three variables) is aconsistent and independent system with the unique solution: i1 = 78/31, i2 = 115/31, i3 = 37/31. Thesecond system (three equations with four variables) is a consistent and dependent system with an infinitenumber of solutions. Remark that for these two examples the method of substitution seems to be not asefficient as for the system of two equations with two variables. Students may try it out. Indeed we computethe solutions by a more general and efficient method called the Gaussian elimination. This method will bediscussed in Section 2.4 (page 48). But before we go into the details of this new method we need to borrowsome terminologies in matrices and express the problem in another point of view.

Let us recall that a linear equation is

a1x1 + a2x2 + · · · + anxn = b.

We call a1, a2, · · · , an the coefficients of the equation, and x1, x2, · · · , xn the variables of the equation.

A system of linear equations is a list of linear equations. For example, a system of four equationswith three variables (we say a “four by three system”) is

a11x1 + a12x2 + a13x3 = b1,a21x1 + a22x2 + a23x3 = b2,a31x1 + a32x2 + a33x3 = b3,a41x1 + a42x2 + a43x3 = b4.

(2.1)

The system is completely characterized (besides the notations for the variables) by the following rectangulararray of numbers called the coefficient matrix

A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

a41 a42 a43

46


and the right side

b =

b1

b2

b3

b4

.

The system also corresponds to the augmented matrix

[A b

]=

a11 a12 a13 b1

a21 a22 a23 b2

a31 a32 a33 b3

a41 a42 a43 b4

.

We emphasize the following correspondence

columns of A ⇐⇒ variables,

rows of[A b

]⇐⇒ equations.

(2.2)

Example 2.5 (Augmented matrix)

The augmented matrices of the systems in Examples 2.3 and 2.4 are respectively

−1 1 −1 01 −1 1 05 2 0 200 2 3 11

and

1 0 −1 0 04 0 0 −2 00 2 −2 −1 0

.

2

In terms of matrices, the system (2.1) is equivalent to simply Ax = b, where x = (x1, x2, x3). Nextwe review some basic operations of matrices which will be useful to our later discussion of solving generalsystem Ax = b by Gaussian elimination. In Chapter 1 we will confine our attention to the world ofmatrices. Now, we will focus on the techniques of solving systems of linear equations.

47

2. Linear Equations

2.4 Gaussian Elimination

The method of substitution may work well for systems involving two variables. However, it is not easilyextended to larger systems. In fact, the most efficient and practical way for solving systems of linear equationsis through an algorithm called Gaussian elimination. The idea is to simplify the system by creating asmany zeros as in the coefficients. The solution may then be easily deduced from the simplified system. TheGaussian elimination is probably the most important method of solution. It readily generalizes to largersystems and forms the basis for computer-based solution methods. As a matter of fact, all the importantcomputer packages nowadays for solving systems of linear equations are based on Gaussian elimination.

If you have learned before to use the Cramer’s rule to solve systems of linear equations, then you shouldforget this old method as soon as possible. Cramer’s rule is a limited method because of the following:

1. The computation of determinants is much more complicated than elimination.

2. Cramer’s rule cannot solve underdetermined (such as 2 equations, 3 variables) or overdetermined

(such as 3 equations, 2 variables) systems.

Gaussian elimination has no such limitations and indeed it works for any linear systems of m equations withn variables, where m < n, m = n, m > n are all permissible.

Now, with some knowledge of matrix notations and operations (details can be found in Chapter 1) weare going to introduce the method of Gaussian elimination formally. But, first of all, we would also like togive you the insight of how and why the method works. Let’s read the very simple example.

Example 2.6 (Equivalent equations)

To solve an equation such as 2x− 5 = 3, we perform permissible operations on the equation until we reachan equivalent equation whose solution is obvious.

2x − 5 = 3,

2x − 5 + 5 = 3 + 5,

2x = 8,

2x

2=

8

2,

x = 4.

Recall that we added 5 to both sides in the second step and divided both sides by 2 in the fourth step. Thesetwo operations simply convert the equation 2x − 5 = 3 to an equivalent equation x = 4. The solutionfollows as we wish. 2

The following theorem indicates that we can solve systems of linear equations in a similar manner.

Theorem 2.4.1 (Three operations that produce equivalent systems) A system of linear equationsis transformed into an equivalent system if

(A) Two equations are interchanged.

(B) An equation is multiplied by a nonzero constant.

(C) A constant multiple of one equation is added to another equation.

48


Any one of the three operations in Theorem 2.4.1 can be used to produce an equivalent system. The basicprinciple here is to transform a given (complicated) system into an equivalent but “simpler” system througha series of operations. Based on this principle we have to eliminate as many variables as possible.

A system Ax = b of linear equations is completely characterized by its augmented matrix[A b

], as

we mentioned in Section 2.3 (page 46). Gaussian elimination can then be considered as a manipulation on theaugmented matrix. In the following examples, we indicate the relation by writing systems and augmentedmatrices side by side. Theorem 2.4.1 is used in the examples that rows of

[A b

]are equivalent to equations.

Example 2.7 (2 × 2 system with unique solution)

Solve the system of linear equations 2x1 −x2 = −1,

−x1 +x2 = 2.

Solution Let us solve the system

2x1 −x2 = −1,

−x1 +x2 = 2.

[2 −1 −1−1 1 2

]

We do the row operation 2R2 + R1 (add twice of second equation/row to the first equation/row) to obtain

x2 = 3,

−x1 +x2 = 2.

[0 1 3−1 1 2

]

We may further do R1 ↔ R2 (exchange the first and the second equations/rows) so that the equations arelisted from the most complicated to the simplest

−x1 +x2 = 2,

x2 = 3.

[−1 1 20 1 3

]

Substituting x2 = 3 into the first equation, we get x1 = x2 − 2 = 1. 2


Solve the system of linear equations

x1 +x2 +x3 = 1,

2x1 +3x2 +2x3 = 1,

3x1 +8x2 +2x3 = 2.

Solution Let us solve the system

x1 +x2 +x3 = 1,

2x1 +3x2 +2x3 = 1,

3x1 +8x2 +2x3 = 2.

1 1 1 12 3 2 13 8 2 2

We do −2R1 + R2 and −3R1 + R3 to get

x1 +x2 +x3 = 1,

x2 = −1,

5x2 −x3 = −1.

1 1 1 10 1 0 −10 5 −1 −1

49

2. Linear Equations

Then we do −5R2 + R3 to get

x1 +x2 +x3 = 1,

x2 = −1,

−x3 = 4.

1 1 1 10 1 0 −10 0 −1 4

This corresponds to x2 = −1 and x3 = −4. Substituting them into the first equation, we have x1 = 6. 2

Example 2.9 (3 × 3 system without solution)


x1 +x2 +x3 = 2,

x1 +3x2 −x3 = 4,

x2 −x3 = 2.

Solution We simplify the augmented matrix as follows

1 1 1 21 3 −1 40 1 −1 2

−R1+R2−−−−−−→

1 1 1 20 2 −2 20 1 −1 2

12

R2−−−−−→

1 1 1 20 1 −1 10 1 −1 2

−R2+R3−−−−−−→

1 1 1 20 1 −1 10 0 0 1

.

Interpreted as a system of linear equations, we have

x1 +x2 +x3 = 2,

x2 −x3 = 1,

0 = 1.

The last equation is a contradiction which implies that the system has no solution. 2

Example 2.10 (3 × 3 system with non-unique solutions)


x1 −x2 +3x3 = 3,

3x1 +x2 +x3 = 5,

x2 −2x3 = −1.


1 −1 3 33 1 1 50 1 −2 −1

−3R1+R2−−−−−−−→

1 −1 3 30 4 −8 −40 1 −2 −1

−4R3+R2−−−−−−−→

1 −1 3 30 0 0 00 1 −2 −1

R2↔R3−−−−−→

1 −1 3 30 1 −2 −10 0 0 0

.

Interpreted as a system of linear equations, we have

x1 −x2 +3x3 = 3,

x2 −2x3 = −1,

0 = 0.

50


The last equation is redundant. From the second equation, we have x2 = 2x3 − 1. Substituting this intothe first equation, we have x1 = (2x3 − 1) − 3x3 + 3 = −x3 + 2. The system has non-unique solutions

(x1, x2, x3) = (−r + 2, 2r − 1, r), for any real number r.

2

51

2. Linear Equations

To summarize, the Gaussian elimination is done through the following three operations.

Table 2.1: Three (row) operations used in the Gaussian elimination.

notation description

rRi + Rj adding r times the i-th equation/row to the j-th equation/row

rRi multiplying r 6= 0 to the i-th equation/row

Ri ↔ Rj exchanging the i-th equation/row with the j-th equation/row

When applied to matrices, the three operations are also called row operations.

The three operations do not change the solutions of the system of equations. By making use of the threeoperations, the system may eventually be simplified to the following “upper triangular shape”

. . .0

.

Then one may solve the system by backward substitution (i.e., solving the system from bottom up, andfinding the values of more and more variables).



x1 −x2 +x3 = 0,

−x1 +x2 −x3 = 0,

10x1 +4x2 = 15,

2x2 +3x3 = 11.

Solution Let us solve the system. We simplify the augmented matrix as follows

1 −1 1 0−1 1 −1 010 4 0 150 2 3 11

−→

1 −1 1 00 14 −10 150 2 3 110 0 0 0

−→

1 −1 1 00 2 3 110 0 −31 −620 0 0 0

−→

1 −1 1 00 2 3 110 0 1 20 0 0 0

.

The last row is redundant while the third row implies that x3 = 2. Substituting this value into the secondrow, we have x2 = 5/2. Substituting the values of x2 and x3 into the first row, we have x1 = 1/2. Hencethe system has the unique solution

(x1, x2, x3) = (1

2,

5

2, 2).

52


2

Having discussed how to simplify a system of linear equations, we may ask: how simple can a system bereduced to? The answer to the question can tell us a lot of information about the solutions of the system,particularly the existence and uniqueness of solutions.

• Row Echelon Form

In Examples 2.7, 2.8, 2.9, and 2.10, the systems (or rather, the augmented matrices[A b

]) are reduced to

the following shapes

[

0

],

0

0 0

,

0

0 0 0

,

0

0 0 0 0

, (2.3)

where are nonzero numbers, and are any numbers. These are the simplest shapes you can get by thethree row operations. These are called the row echelon forms of

[A b

]. Generally speaking, a matrix

is called in row echelon form if the following two conditions hold (where a leading nonzero entry of a row ofthe matrix is the first nonzero element in the row counted from the left):

(1) All zero rows, if any, are at the bottom of the matrix.

(2) Each leading nonzero entry in a row is to the right of the leading nonzero entry in the preceding row.

Example 2.12 (Row echelon form)

The following are all the possible row echelon forms of 2 × 2 matrices (⇐⇒ simplest shapes of 2 linearequations with 1 variable).

[

0

],

[

0 0

],

[0

0 0

],

[0 00 0

].

The following are all the possible row echelon forms of 3 × 3 matrices (⇐⇒ simplest shapes of 3 linearequations with 2 variables).

0

0 0

,

0

0 0 0

,

0 0

0 0 0

,

0

0 0

0 0 0

,

0

0 0 00 0 0

,

0 0

0 0 00 0 0

,

0 0 00 0 00 0 0

.

2

In a row echelon form, the entry denoted by is located further and further to the right when one goesfrom the top to the bottom. We call these entries pivots of the matrix, and the columns containing thepivots the pivot columns of the matrix.


The following is the row echelon form for the augmented matrix in Example 2.9

0

0 0 0

.

The pivots are the (1, 1)-, (2, 2)-, and (3, 4)-entries. The pivot columns are the first, second, and the fourthcolumns. 2

53

2. Linear Equations


The following is a more complicated row echelon form

0

0 0 0

0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0 0

.

The pivots are the (1, 2)-, (2, 4)-, (3, 5)-, and (4, 7)-entries. The pivot columns are the second, fourth, fifth,and the seventh columns. 2

Example 2.15 (Non-row echelon form)

The matrix

1 0 1 00 0 2 10 −1 0 0

is not in row echelon form because the for the second row is after the for the third row. If we exchangethe second and the third rows (so that the matrix is further simplified), then we do get a row echelon form.

Similarly, the matrix

1 0 1 00 0 2 10 0 −1 0

is not in row echelon form because the for the second row is not before the for the third row. Exchangingrows will never give us a row echelon form. We have to use row operations such as 2R3 + R2 and R2 ↔ R3

(or simply one operation 12R2 + R3) to further simplify the matrix in order to get a row echelon form. 2

54

2.5 (Reduced) Row Echelon Form


Row echelon form is the simplest shape one can get by performing row operations. What is the ultimatesimplest matrix one can get?

Example 2.16 (Reduced row echelon form)

The system in Example 2.8 (page 49) has been simplified to the following row echelon form

1 1 1 10 1 0 −10 0 −1 4

.

We reduce the coefficient of (3, 3)-pivot to 1 by doing −R3

1 1 1 10 1 0 −10 0 1 −4

.

We cancel entries above the (1, 3)-pivot by doing −R3 + R1

1 1 0 50 1 0 −10 0 1 −4

.

We cancel entries above the (1, 2)-pivot by doing −R2 + R1

1 0 0 60 1 0 −10 0 1 −4

.

Interpreted the above (reduced) row echelon form as a system of linear equations, we have

x1 = 6,x2 = −1,

x3 = −4,

which immediately gives the unique solution of the original system. 2


The system in Example 2.10 has been simplified to the following row echelon form

1 −1 3 30 1 −2 −10 0 0 0

.

We use the (2, 2)-pivot to cancel entries above it by doing R2 + R1

1 0 1 20 1 −2 −10 0 0 0

.

Interpreted the above (reduced) row echelon form as a system of linear equations, we have

x1 +x3 = 2,

x2 −2x3 = −1,

0 = 0,

which gives the non-unique solutions of the system x1 = −r + 2, x2 = 2r − 1, x3 = r, with r arbitrary.2

55

2. Linear Equations

In similar ways, we may further reduce the row echelon forms (2.3) to the following

[1 0

0 1

],

1 0 0

0 1 0

0 0 1

,

1 0 00 1 00 0 0 1

,

1 0

0 1

0 0 0 0

,

where are reduced to 1, the entries above are reduced to 0, and are again any numbers. These arethe simplest matrices you can get by the three row operations, and we call them the reduced row echelonforms of

[A b

].

56


Generally speaking, a matrix is called in reduced row echelon form if it is in row echelon form, thatis, if it satisfies the two conditions (1) and (2) (on page 53), and if it satisfies the following additional twoconditions:

(3) Each pivot (leading nonzero entry) is equal to 1.

(4) Each pivot is the only nonzero entry in its column.

By comparing Examples 2.16, 2.17 with Examples 2.8, 2.10, we observe that the advantage of reducedrow echelon form is that we can write down the solutions directly, without any backward substitution.

Reduced row echelon form of[A b

]⇐⇒ solution of Ax = b.

This is called the method of reduction.


The following are all the possible reduced row echelon forms of 2 × 2 matrices.

[1 00 1

],

[1

0 0

],

[0 10 0

],

[0 00 0

].

The following are all the possible reduced row echelon forms of 3 × 3 matrices.

1 0 00 1 00 0 1

,

1 0

0 1

0 0 0

,

1 00 0 10 0 0

,

0 1 00 0 10 0 0

,

0 1

0 0 00 0 0

,

0 0 10 0 00 0 0

,

0 0 00 0 00 0 0

.

2


The following are the reduced row echelon forms of the matrices in Examples 2.13 and 2.14 (page 53),respectively.

1 0 00 1 00 0 0 1

,

0 1 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 0 10 0 0 0 0 0 0

.

2

57

2. Linear Equations

2.6 Elementary Matrices

2.7 Existence and Uniqueness

In solving a specific system Ax = b of linear equations, we have to first clarify

Existence: Does the system have solutions?

If the answer is no, then the system has no solution. If the answer is yes, then we may further clarify

Uniqueness: Does the system have a single or more than one solutions?

If the solution is single, then the system has a unique solution. If the solution is many, then the system hasnon-unique solutions.

Thus a basic question about a system Ax = b is which of the three possibilities (no solution, uniquesolution, non-unique solutions) is the solution in. The answer to the question can be answered from therow echelon form of the augmented matrix

[A b

]. Furthermore, the solution (if any) of the system is

immediately given by the reduced row echelon form of[A b

].

Example 2.20 (Row echelon form vs. existence and uniqueness)

Let us recall again the row echelon forms of the systems of three linear equations with three variables inExamples 2.8, 2.9 and 2.10 (page 49). They are respectively

1 1 1 10 1 0 −10 0 −1 4

,

1 1 1 20 1 −1 10 0 0 1

,

1 −1 3 30 1 −2 −10 0 0 0

.

The corresponding reduced row echelon forms are

1 0 0 60 1 0 −10 0 1 −4

,

1 0 2 00 1 −1 00 0 0 1

,

1 0 1 20 1 −2 −10 0 0 0

. (2.4)

The corresponding systems of linear equations and their solutions are

x1 = 6,x2 = −1,

x3 = −4=⇒ unique solution :

x1 = 6,x2 = −1,x3 = −4,

x1 +2x3 = 0,x2 −x3 = 0,

0 = 1=⇒ no solution (“0 = 1” is a contradiction),

x1 +x3 = 2,x2 −2x3 = −1,

0 = 0=⇒ non-unique solutions :

x1 = −r + 2,x2 = 2r − 1,x3 = r.

2

58

2.7 Existence and Uniqueness

In general, given a system Ax = b of linear equations, we first find the row echelon form of theaugmented matrix

[A b

], from which we can determine which columns of

[A b

]are pivot and which

are not.

Theorem 2.7.1 The existence and uniqueness of the solutions of Ax = b is determined by pivot columnsof

[A b

]as follows.

Pivot columns of[A b

]Existence and uniqueness for Ax = b

1. b pivot no solution

2. b not pivot, all A-columns pivot unique solution

3. b not pivot, not all A-columns pivot non-unique solutions

Let us explain in more details of Example 2.20 based on the above theorem. By (2.4) we circle the pivotsin the reduced row echelon forms of the corresponding augmented matrices:

1 0 0 6

0

1 0 −1

0 0

1 −4

,

1 0 2 0

0

1 −1 0

0 0 0

1

,

1 0 1 2

0

1 −2 −10 0 0 0

.

For the first matrix of the above, since the last column (i.e., b-column) is not a pivot column, the correspond-ing system has solutions. Meanwhile, since the first, second, and third columns (i.e., all the A-columns) arepivot columns, the solution is unique. Therefore the system has a single solution. For the second matrixof the above, since the last column (i.e., b-column) is a pivot column, the system has no solution. For thethird matrix of the above, since the last column (i.e., b-column) is not a pivot column and the third columnis not a pivot column (i.e., not all A-columns are pivot columns), the system has non-unique solutions.

Example 2.21 (Determine existence and uniqueness)

For which b1, b2, b3 that the following system of linear equations has solutions?

x1 +4x2 = b1,

2x1 +5x2 = b2,

3x1 +6x2 = b3.

Solution We do the row operations,

1 4 b1

2 5 b2

3 6 b3

−2R1+R2−−−−−−−→−3R1+R3

1 4 b1

0 −3 b2 − 2b1

0 −6 b3 − 3b1

−2R2+R3−−−−−−−→

1 4 b1

0 −3 b2 − 2b1

0 0 b1 − 2b2 + b3

.

Thus we conclude that

the system has solutions ⇐⇒ the last column is not pivot ⇐⇒ b1 − 2b2 + b3 = 0.

2

Example 2.22 (Determine existence and uniqueness)

59

2. Linear Equations

For which values of a that the following system of linear equations

x1 +x2 −x3 = −2,

x1 −ax3 = −1,

x1 +ax2 = −1

has no solution, unique solution, non-unique solutions?

Solution We do the row operations,

1 1 −1 −21 0 −a −11 a 0 −1

−R1+R2−−−−−−→−R1+R3

1 1 −1 −20 −1 1 − a 10 a − 1 1 1

(a−1) R2+R3−−−−−−−−−→

1 1 −1 −20 −1 1 − a 10 0 2a − a2 a

.

Thus we conclude that if

1. 2a − a2 6= 0 (⇐⇒ a 6= 0, 2), then we have

1 1 −1 −21 0 −a −11 a 0 −1

−→

1 1 −1 −2

0

−1 1 − a 1

0 0

2a − a2 a

.

This shows that all except the last column are pivot and hence the system has a unique solution.

2. a = 0, then we have

1 1 −1 −21 0 −a −11 a 0 −1

−→

1 1 −1 −2

0

−1 1 1

0 0 0 0

.

This shows that the third and the last columns are not pivot and hence the system has non-uniquesolutions.

3. a = 2, then we have

1 1 −1 −21 0 −a −11 a 0 −1

−→

1 1 −1 −2

0

−1 −1 1

0 0 0

2

.

This shows that the last column is pivot and hence the system has no solution. 2

60

2.8 Homogeneous Systems

2.8 Homogeneous Systems

A linear system is called homogeneous if all the constant terms are zero, like

x1 + 2x2 − x3 = 0,

2x1 − x2 + 5x3 = 0,

3x1 + 3x2 − x3 = 0.

Using matrix-vector notation, we can rewrite the above homogeneous linear system as

Ax = 0, where A =

1 2 −12 −1 53 3 −1

, x =

x1

x2

x3

, 0 =

000

.

In general, when A is an m × n coefficient matrix, then x will be a vector of size n, and 0 will bea vector of size m (called the zero vector of size m). We will use the same notation 0 for zero vectors ofdifferent sizes, as the actual size of the vector is usually quite clear from the context.

A homogeneous system is always consistent, as one can easily find a solution for it, namely

x1 = x2 = · · · = 0.

Such a solution x = 0 (note that 0 is now an n-vector) will be called the zero solution. Becauseit is trivial to find this solution, we sometimes call it the trivial solution (for the homogeneous system).Therefore, given a homogeneous system, we are interested in whether it has a non-trivial solution, i.e., asolution in which not all the variables are zeros.

Recall that the solution set of a linear system has three possible cases:

(i) No solution exists: This case will not happen for a homogeneous system.

(ii) Has a unique solution: This case happens when all the variables are basic variables, and this uniquesolution must be the zero solution.

(iii) Has infinitely many solution: This case happens when there are at least one free variable(s).

So, non-trivial solutions exist only in case (iii). Recall that this will happen when the rank of the coefficientmatrix A is smaller than the total number of variables. In particular, it is always the case in the followingtheorem.

Theorem 2.8.1 If a homogeneous system Ax = 0 has more unknowns than the equations, it has non-trivial solutions.

Proof: Let the size of the coefficient matrix A be m × n (m rows and n columns), where m < n.Recall the fact that we cannot have two pivot positions sitting in the same row, so A has at most m pivotpositions, i.e., rank A 6 m. Thus,

number of free variables = n − rank A > n − m > 0.

So there will be non-trivial solutions.

Remark. In case m > n, no corresponding results available (may or may not have non-trivial solutions).One need to check directly the existence of free variables instead.

61

2. Linear Equations

2.9 Fundamental solutions

The augmented matrix of a homogeneous system is of the form[A 0

]. Performing row operations to

the augmented matrix thus will not change the constant terms, so the resulting new augmented matrix stillrepresents a homogeneous system. In the RREF, we can solve the system by assigning free parameters tothe free variables, and then express the basic variables in terms of the free parameters. Let us consider aconcrete example.

Example 2.23 (Homogeneous system)

Consider the following homogeneous system:

x1 +x2 −2x3 +3x4 +4x5 = 0,

2x1 +x2 −3x3 +8x4 +5x5 = 0,

x1 +x2 −2x3 +2x4 +2x5 = 0,

3x1 +2x2 −5x3 +10x4 +7x5 = 0.

Solution First we transform the augmented matrix to its RREF.

[A 0

]=

1 1 −2 3 4 02 1 −3 8 5 01 1 −2 2 2 03 2 −5 10 7 0

−2R1+R2, −R1+R3−−−−−−−−−−−−−→

−3R1+R4

1 1 −2 3 4 00 −1 1 2 −3 00 0 0 −1 −2 00 −1 1 1 −5 0

−R2+R4−−−−−−→

1 1 −2 3 4 00 −1 1 2 −3 00 0 0 −1 −2 00 0 0 −1 −2 0

−R3+R4−−−−−−→

−R3

1 1 −2 3 4 00 −1 1 2 −3 00 0 0 1 2 00 0 0 0 0 0

R2+R1−−−−−→−R2

1 0 −1 5 1 00 1 −1 −2 3 00 0 0 1 2 00 0 0 0 0 0

−5R3+R1−−−−−−−→2R3+R2

1 0 −1 0 −9 0

0

1 −1 0 7 0

0 0 0

1 2 00 0 0 0 0 0

.

Set x3 = s, x5 = t, then the solution of the system Ax = 0 can be represented as

x1 = s + 9t,

x2 = s − 7t,

x3 = s,

x4 = −2t,

x5 = t.

We may also write the solution in vector form:

x =

x1

x2

x3

x4

x5

= s

11100

+ t

9−70−21

, where s, t ∈ R.

2

The variables can be expressed by using s, t, only as the constant terms are all zeros. Here we observe twospecific solutions:

x1 =

11100

(choose s = 1, t = 0), x2 =

9−70−21

(choose s = 0, t = 1).

62

2.9 Fundamental solutions

We will call them fundamental solutions of the homogeneous system, as all the other solution of thesystem can be expressed in terms of them.

The results of the above example can be generalized to the following theorem.

Theorem 2.9.1 Let Ax = 0 be a homogeneous system with k free variables. Set s1, s2, · · · , sk to bethe corresponding free parameters. Then any solution of Ax = 0 can be expressed as

x = s1x1 + s2x2 + · · · + skxk,

where xi is the solution vector associated to the free parameter si (i.e., set si = 1 and all other sj = 0).

3 Definition The above solution vectors x1, x2, · · · , xk are said to form a set of fundamental solutionsof the linear system produced by row reduction algorithm. In case Ax = 0 has no free variable, we say thatthe system has no fundamental solution (but still have the unique zero solution).

4 Definition An expression like s1x1 + s2x2 + · · · + skxk (where s1, s2, · · · , sk are numbers) is calleda linear combination of x1, x2, · · · , xk.

Example 2.24 (Fundamental solutions)

Find the fundamental solutions, if any, of the homogeneous systems with the following coefficient matrices.

(i)

[1 1 −22 2 3

], (ii)

1 2 −12 −1 −53 1 1

, (iii)

1 2 0 1 0 00 0 1 2 0 00 0 0 0 1 00 0 0 0 0 0

.

Solution We first note that in Example 2.23 we have applied the row operations to the augmentedmatrix. If you read the computations carefully, you will see the right side zero vector indeed has no role inthe analysis. So we will have the same conclusion if we perform the row operations to the coefficient matrixonly. For this example (and from now on), since the constant terms of a homogeneous system are all zeros,we can work directly on the coefficient matrix instead of the augmented matrix (but keep in mind that thelast column now corresponds to a variable). For (i),

[1 1 −22 2 3

]→

[1 1 −20 0 7

]→

[1 1 00 0 1

].

Therefore, x1, x3 are basic variables, x2 is a free variable. Set x2 = s, then

x1

x2

x3

=

−ss0

= s

−110

, where s ∈ R.

So, there is one fundamental solution of the homogeneous system, which is the vector (−1, 1, 0). For (ii),

1 2 −12 −1 −53 1 1

→

1 2 −10 −5 −30 −5 4

→

1 2 −10 −5 −30 0 7

.

63

2. Linear Equations

We observe that all the variables of the system are basic variables. So there will be no fundamental solution.For (iii), the coefficient matrix is already in RREF. Therefore, x1, x3, x5 are basic variables, and x2, x4,x6 are free variables. Set x2 = s, x4 = t, x6 = u, then

x1

x2

x3

x4

x5

x6

=

−2s − ts

−2tt0u

= s

−210000

+ t

−10−2100

+ u

000001

, where s, t, u ∈ R.

The above three vectors form a set of fundamental solutions of the homogeneous system. 2

64

Chapter 2

Linear Equations (True or False)

2.1 If A is a square matrix and A2 = I, then A = I or A = −I. 2

2.2 If AB = O, then A = O or B = O. 2

2.3 If A, B, C are square and ABC = O, then one of them is O. 2

2.4 If AB = AC, then B = C. 2

2.5 If A is nonzero and AB = AC, then B = C. 2

2.6 The square of a nonzero square matrix must be a nonzero matrix. 2

2.7 If AB = BA, then (A + B)3 = A3 + 3A2B + 3AB2 + B3. 2

2.8 An invertible matrix must be a square matrix. 2

2.9 A non-square matrix can never be invertible. 2

2.10 If A has a zero row or a zero column, then A is not invertible. 2

2.11 If A is a square matrix which has no zero rows, then A is invertible. 2

65

2. Linear Equations (True or False)

2.12 Let A, B be invertible matrices of same size. Then AB is also invertible. 2

2.13 Let A, B be invertible matrices of same size. Then A + B is also invertible. 2

2.14 If AB is equal to the identity matrix, then A must be an invertible matrix. 2

2.15 A, B are square matrices. If AB = I, then BA = I. Hence, A is invertible. 2

2.16 For square matrix A, AAt = I if and only if AtA = I. 2

2.17 If AB is invertible, then BA is invertible. 2

2.18 If A2 6= O, then A is invertible. 2

2.19 If A is invertible, then A2 6= O. 2

2.20 If A is a square matrix and A2 + 7A − I = O, then A is invertible. 2

2.21 A symmetric matrix must be a square matrix. 2

2.22 If A is symmetric, so are A−1 (if exists) and A3. 2

2.23 If B = AtA, then 2B is symmetric. 2

2.24 If A is symmetric, so is f(A), for any polynomial f(x). 2

2.25 det (A + B) = detA + detB. 2

2.26 det (kA) = k · detA, for any integer k. 2

66

2.27 detAt = (−1) detA. 2

2.28 Three elementary row operations do not change the determinant of a square matrix. 2

2.29 A row replacement operation does not change the determinant of a matrix. 2

2.30 If A is row equivalent to B, then detA = detB. 2

2.31 If A is row equivalent to B, then detA and detB are either both zero or both nonzero. 2

2.32 The determinant of a triangular matrix is the sum of the entries on the main diagonal. 2

2.33 Let A be a square matrix without zero rows and columns. Then A must be row equivalent to theidentity matrix of same size. 2

2.34 If A, B are nonzero square matrices and A is row equivalent to B, then both A, B are invertible. 2

2.35 If A is an invertible matrix and B is row equivalent to A, then B is also invertible. 2

2.36 rank (cA) = rankA, for any scalar c. 2

2.37 If A is m × n matrix with m < n, then rank A = m. 2

2.38 If A is m × n matrix with m > n, then rank A = m. 2

2.39 An n × n matrix having rank n is invertible. 2

2.40 If m < n, then the system Am×nx = 0 always has a nontrivial solution. 2

2.41 If A is an m × n matrix with m > n, then Ax = 0 always has a nontrivial solution. 2

67


2.42 If rank Am×n = m with m < n, then Ax = b has solutions for all b. 2

2.43 If rank Am×n = m with m < n, then Ax = 0 has only trivial solution. 2

2.44 If rankA7×5 = 5, then the solution of Ax = 0 is unique. 2

2.45 If rankA7×5 = 5, then Ax = b has solutions for all b. 2

2.46 If Am×nx = b, with m < n, has solutions for all b, then rank A = m. 2

2.47 If Am×nx = b, with m < n, has solutions for all b, then rank A = n. 2

2.48 If Am×nx = b, with m < n, has solutions for all b, then Atx = 0 has only trivial solution. 2

2.49 If Am×nx = b has solutions for all b ∈ Rm, then m 6 n. 2

2.50 If Am×nx = 0 has only trivial solution, then m > n. 2

2.51 If Am×nx = b has solutions for all b ∈ Rm, then the rank of A is m. 2

2.52 If Am×nx = b has solutions for all b ∈ Rm, then the rank of A is n. 2

2.53 If Am×nx = 0 has only zero solution, then the rank of A is m. 2

2.54 If A3×5x = b has solutions for all b ∈ R3, then Atx = 0 has only trivial solution. 2

2.55 If A and B have the same rank, then after a finitely many elementary row or column operations wecan turn A into B. 2

2.56 Let A be an n × n matrix without zero rows and columns. If the equation Ax = b has a solutionfor a nonzero vector b ∈ Rn, then the solution is unique. 2

68

2.57 Let A be an n× n matrix. If the equation Ax = b has a unique solution for a given nonzero vectorb, then detA 6= 0. 2

2.58 Let A be an m × n matrix. If the equation Ax = b has a unique solution for a nonzero vectorb ∈ Rm, then the homogeneous equation Ax = 0 has only trivial solution. 2

2.59 Let A be an m× n matrix. The equation Ax = 0 has only trivial solution if A has a pivot positionon each row. 2

2.60 Let A be an m × n matrix. If Ax = 0 has only trivial solution, then the rank of A is n. 2

2.61 Let A be an n×n matrix. The equation Ax = 0 has a nontrivial solution if and only if detA = 0.2

2.62 Let A be an n×n matrix. If the equation Ax = 0 has only trivial solution, then A is row equivalentto the n × n identity matrix. 2

2.63 Let A be an n× n matrix. The equation Ax = 0 has only trivial solution if and only if detA 6= 0.2

2.64 Let A be an 3× 5 matrix. If the general solution of Ax = 0 has two free parameters, then the rankof A is 2. 2

2.65 If A is an 11× 17 matrix, and the general solution of Ax = 0 has 8 free variables, then rankA = 9.2

2.66 If Ax = b has solutions for all b, then the rank of A is the number of columns of A. 2

2.67 If Ax = b has solutions for all b, then the rank of A is the number of rows of A. 2

2.68 Suppose A is a non-square matrix and Ax = b has solutions for all b. Then, the number of rows ofA must be larger than the number of columns of A. 2

2.69 If rank[A b

]> rankA, then Ax = b has no solution. 2

69


2.70 Let A be a 3 × 4 matrix. If the first, second, fifth columns of[A b

]are pivot, then Ax = b has

non-unique solutions. 2

2.71 Let A be a square matrix. Then Ax = b has solutions for all b if Ax = 0 has only trivial solution.2

2.72 The equation Ax = 0 has only trivial solution if A is an m × n matrix with m > n and does notcontain a zero row. 2

2.73 Let A be a square matrix. Then Ax = b has a unique solution if Ax = 0 has only trivial solution.2

2.74 Let A be a 4 × 3 matrix. If[A b

]is invertible, then Ax = b has no solution. 2

2.75 Let A be a square matrix. If Ax = b has no solution for some b, then Ax = 0 has a nontrivialsolution. 2

2.76 The equation Ax = b is consistent if the augmented matrix[A b

]has a pivot position in every

row. 2

2.77 Let A be an m × n matrix, B an n × m matrix. Suppose n < m. One can always find A andB such that AB is invertible. 2

70

Chapter 2

Linear Equations (Examples donein Class)


Write down the augmented matrix for the system of linear equations.

x1 −x3 = 5,x1 +x2 +x4 = 1,x1 −3x4 = 2.

Suggested Solution

71

http://www.math.ied.edu.hk/tlyee/seminar/studyguide-ch2-sol.pdf

2. Linear Equations (Examples done in Class)


Suppose ax + by = 1 represents a straight line which passes through the points (1, 2) and (3, 4). Write downthe augmented matrix for the corresponding system of linear equations for a, b.

Suggested Solution

72



Which of the following matrices are in row echelon form?

A =

[3 40 0

], B =

1 1 10 0 01 0 0

, C =

0 3 10 0 30 0 0

,

P =

0 1 1 00 0 1 00 0 0 0

, Q =

0 0 00 0 00 0 0

, R =

1 0 01 1 00 0 1

,

X =

[−1 0 10 1 0

], Y =

[1 0 −10 1 −1

], Z =

0 1 −1 00 0 −1 00 0 0 1

.

Suggested Solution

73



Example 2.4 (Gaussian elimination – unique solution)


x1 +x2 −2x3 = 1,

2x1 +3x2 −2x3 = −2,

3x1 −11x3 = 4.

Suggested Solution

74




x1 +2x2 = 7,

2x1 −3x2 = −7,

3x1 −5x2 = −12.

Suggested Solution

75



Example 2.6 (Gaussian elimination – no solution)


x +2y −3z = 0,

−2x −4y +6z = 1.

Suggested Solution

76




x1 −x2 = 2,

3x1 −x2 +x3 = 8,

x1 +x2 +x3 = 3,

x1 −3x2 −x3 = 0.

Suggested Solution

77



Example 2.8 (Gaussian elimination – non-unique solutions)


x1 −2x2 +3x3 = 2,

2x1 −4x2 +3x3 = −2.

Suggested Solution

78




x1 −2x2 +2x3 = 0,

3x1 −6x2 +7x3 = −1,

2x1 −4x2 +3x3 = 1.

Suggested Solution

79




Determine whether the matrices are in reduced row echelon form

A =

[1 0 0 30 0 1 2

], B =

1 10 10 00 0

, C =

0 0 0 00 1 0 00 0 1 00 0 0 0

, D =

0 1 00 0 10 0 0

.

Suggested Solution

80



Find the reduced row echelon form of the matrix

1 1 1 63 2 −1 43 1 2 11

.

Suggested Solution

81





0 0 4 02 2 −2 55 5 −1 5

.

Remark In the following Examples 2.13–2.20, we shall use the method of reduction to solve systems oflinear equations. We should emphasize that the method of reduction is just the Gaussian elimination plussome more row eliminations until the augmented matrix is in its reduced row echelon form. The advantageof reduced row echelon form is that we can write down the solutions directly with no backward substitutionis needed.

Suggested Solution

82


Example 2.13 (Method of reduction – unique solution)


x1 +2x3 = 1,

−2x1 +2x2 −x3 = 3,

x1 +x2 +4x3 = 2.

Suggested Solution

83





2x1 −x2 = −1,

−x1 +x2 = 1,

2x1 −3x2 = −3.

Suggested Solution

84




x +y −2z = −1,

2x −4z = 8,

3x +y +z = 0,

x −2y −2z = 14.

Suggested Solution

85



Example 2.16 (Method of reduction – unique solutions)


x +y +2z = 4a,

x +2y +z = 4b,

2x +y +z = 4c.

Suggested Solution

86


Example 2.17 (Method of reduction – non-unique solutions)


x1 −x2 +2x3 = 1,

2x1 +2x3 = 1,

x1 −3x2 +4x3 = 2.

Suggested Solution

87





x1 +2x2 +3x3 +4x4 = 5,

2x1 +3x2 +4x3 +5x4 = 6,

3x1 +4x2 +5x3 +6x4 = 7.

Suggested Solution

88




x1 +x2 +5x4 = 1,

x1 +x3 +2x4 = 1,

x1 −3x2 +4x3 −7x4 = 1,

x2 −x3 +3x4 = 0.

Suggested Solution

89





x1 +2x3 +x4 +4x5 = 1,

x2 +x3 −3x4 = −2,

4x1 −3x2 +5x3 +13x4 +16x5 = 10,

x1 +2x2 +4x3 −5x4 +4x5 = −3.

Suggested Solution

90


Example 2.21 (Existence and uniqueness)

Find the value of t for which the following system has solution and solve the system for this value of t by themethod of reduction

x +y = 1,

tx +y = t,

(1 + t) x +2y = 3.

Suggested Solution

91




Find the value of h for which the following system has solution and solve the system for this value of h bythe method of reduction

x1 +x2 +x3 = 2,

2x1 +3x2 +2x3 = 5,

x1 +x2 +(h2 − 5) x3 = h.

Suggested Solution

92



For which values of a and b does the system has (1) no solution, (2) a unique solution, (3) infinitely manysolutions?

x −2y +3z = 4,

2x −3y +az = 5,

3x −4y +5z = b.

Suggested Solution

93




For which values of a does the system has (1) no solution, (2) a unique solution, (3) infinitely many solutions?

x +2y −3z = 4,

x +y −z = 2,

4x +5y +(a2 − 22) z = a + 6.

Suggested Solution

94



Solve the following homogeneous system

x +y +z = 0,

5x −2y −9z = 0,

3x −2y −7z = 0.

Suggested Solution

95





2x +3y = 0,2x +y = 0.

Suggested Solution

96



For which values of λ does the system has a nontrivial solution?

x + (λ − 3) y = 0,

(λ − 3) x + y = 0.

Suggested Solution

97




Without performing any computation, determine whether the system below has a nontrivial solution.

2x − 4y + 7z + 4v − 5w = 0,

9x + 3y + 2z − 7v + 2w = 0,

5x + 2y − 3z + 2v + 3w = 0,

6x − 5y + 4z − 3v − 2w = 0.

Suggested Solution

98


Example 2.29 (Pivots and rank)

Find the rank of the matrix [1 2 33 2 1

].

Suggested Solution

99




Find the rank of the matrix

−1 1 0−3 1 −11 −3 −4

.

Suggested Solution

100



Find the reduced row echelon form of the following matrix and then determine the rank of the matrix

2 1 11 2 33 1 2

.

Suggested Solution

101





1 2 0 1 42 4 1 2 −13 6 1 3 3

.

Suggested Solution

102




1 2 1 32 1 −4 −51 1 0 00 0 1 1

.

Suggested Solution

103





4 2 6−1 3 1−8 0 8−3 2 3

.

Suggested Solution

104



If a, b, c are distinct, find the rank of the matrix

1 1 1a b ca2 b2 c2

.

Suggested Solution

105





1 1 1 1 11 0 0 h 10 1 h 0 1

.

Suggested Solution

106



Without performing any row operations, write down the rank of the matrix

A =

1 0 03 0 00 2 04 0 1

.

Remark Based on Examples 2.29–2.37, we observe that one can determine the pivots of a given matrix(and hence the rank of the matrix) even though the matrix is not yet reduced. Practically, all we need todo is perform row operations until the matrix is in the row echelon form (i.e., upper triangular shape). InChapter 3, we shall use pivots to determine the linear independence of given vectors. This will be a simpleyet efficient method when the number of given vectors is more than two. In case of only two vectors, youjust need to decide whether one vector is a scalar multiple of the other or not. If the answer is yes, the twovectors are linearly dependent.

Suggested Solution

107



Example 2.38 (Matrix transformation)

Find a suitable matrix that represents the following matrix transformation.

T : R3 → R2,

x1

x2

x3

7→[

x1 − 2x2 + x3

3x1 + 2x2 − x3

].

Definition Let e1, e2, · · · , en be the following vectors in Rn:

e1 =

10...0

, e2 =

01...0

, · · · , en =

00...1

.

Then it is easy to see that every vector x in Rn has a unique representation as a linear combination ofe1, e2, · · · , en:

x1

x2

...xn

= x1e1 + x2e2 + · · · + xnen.

Theorem Let T : Rn → Rm be a matrix transformation. Then the matrix A that gives T (x) = Axfor every x ∈ Rn is given by the following formula:

A =[T (e1) T (e2) · · · T (en)

].

The matrix A determined in this theorem is called the standard matrix of T .

Suggested Solution

108



Find the standard matrix A of the matrix transformation T : R2 → R3 that satisfies

T

[x1 + x2

x1 − x2

]=

3x1 − x2

−x1

x2

for every x1, x2 ∈ R.

Suggested Solution

109



Suggested Solutions for Reference


Write down the augmented matrix for the system of linear equations.

x1 −x3 = 5,x1 +x2 +x4 = 1,x1 −3x4 = 2.

Solution The augmented matrix is

1 0 −1 0 51 1 0 1 11 0 0 −3 2

.

2


Suppose ax + by = 1 represents a straight line which passes through the points (1, 2) and (3, 4). Write downthe augmented matrix for the corresponding system of linear equations for a, b.

Solution For ax + by = 1 to pass through (1, 2) and (3, 4), we have a · 1 + b · 2 = 1 and a · 3 + b · 4 = 1.

Then we have two linear equations with two variables a, b. The augmented matrix is

[1 2 13 4 1

]. 2


Which of the following matrices are in row echelon form?

A =

[3 40 0

], B =

1 1 10 0 01 0 0

, C =

0 3 10 0 30 0 0

,

P =

0 1 1 00 0 1 00 0 0 0

, Q =

0 0 00 0 00 0 0

, R =

1 0 01 1 00 0 1

,

X =

[−1 0 10 1 0

], Y =

[1 0 −10 1 −1

], Z =

0 1 −1 00 0 −1 00 0 0 1

.

Solution Matrices A, C, P, Q, X, Y, Z are in row echelon form. Matrix B is not in row echelon formbecause the second row which consists of all zero entries is not at the bottom of the matrix. Matrix R isnot in row echelon form because the (2, 1)-entry is not zero. 2



x1 +x2 −2x3 = 1,

2x1 +3x2 −2x3 = −2,

3x1 −11x3 = 4.

110


1 1 −2 12 3 −2 −23 0 −11 4

−2R1+R2−−−−−−−→−3R1+R3

1 1 −2 10 1 2 −40 −3 −5 1

3R2+R3−−−−−→

1 1 −2 10 1 2 −40 0 1 −11

.

Interpreted the last matrix as a system of linear equations, we have

x1 +x2 −2x3 = 1,x2 +2x3 = −4,

x3 = −11.

By backward substitution, starting from the third equation, we have x3 = −11. Substituting this value intothe second equation, we have x2 = 18. Substituting the values of x2 and x3 into the first equation, we havex1 = −39. We conclude that the system has a unique solution (x1, x2, x3) = (−39, 18,−11). 2



x1 +2x2 = 7,

2x1 −3x2 = −7,

3x1 −5x2 = −12.


1 2 72 −3 −73 −5 −12

−2R1+R2−−−−−−−→−3R1+R3

1 2 70 −7 −210 −11 −33

− 17

R2−−−−−→

1 2 70 1 30 −11 −33

11R2+R3−−−−−−→

1 2 70 1 30 0 0

.


x1 +2x2 = 7,x2 = 3,0 = 0.

Although the last equation is redundant, we have x2 = 3 from the second equation. Substituting it into thefirst equation, we have x1 = 1. We conclude that the system has a unique solution (x1, x2) = (1, 3). 2



x +2y −3z = 0,

−2x −4y +6z = 1.


[1 2 −3 0−2 −4 6 1

]2R1+R2−−−−−→

[1 2 −3 00 0 0 1

].


x +2y −3z = 0,

0 = 1.

The equation “0 = 1” is a contradiction which simply means that the system has no solution. 2

111




x1 −x2 = 2,

3x1 −x2 +x3 = 8,

x1 +x2 +x3 = 3,

x1 −3x2 −x3 = 0.


1 −1 0 23 −1 1 81 1 1 31 −3 −1 0

−3R1+R2−−−−−−−−−−−−−→

−R1+R3, −R1+R4

1 −1 0 20 2 1 20 2 1 10 −2 −1 −2

−R2+R3−−−−−−→R2+R4

1 −1 0 20 2 1 20 0 0 −10 0 0 0

.

The third row of the last matrix is equivalent to the equation 0 = −1. This is a contradiction which meansthat the system has no solution. 2



x1 −2x2 +3x3 = 2,

2x1 −4x2 +3x3 = −2.

Solution We simplify the augmented matrix as follows[1 −2 3 22 −4 3 −2

]−2R1+R2−−−−−−−→

[1 −2 3 20 0 −3 −6

]− 1

3R2−−−−−→

[1 −2 3 20 0 1 2

].


x1 −2x2 +3x3 = 2,x3 = 2.

From the second equation, we have x3 = 2. Substituting this value into the first equation, we can findx1 = 2x2 − 4. The system has the non-unique solutions (x1, x2, x3) = (2r − 4, r, 2), with r arbitrary. 2



x1 −2x2 +2x3 = 0,

3x1 −6x2 +7x3 = −1,

2x1 −4x2 +3x3 = 1.


1 −2 2 03 −6 7 −12 −4 3 1

−3R1+R2−−−−−−−→−2R1+R3

1 −2 2 00 0 1 −10 0 −1 1

R2+R3−−−−−→

1 −2 2 00 0 1 −10 0 0 0

.


x1 −2x2 +2x3 = 0,x3 = −1,0 = 0.

The last equation is redundant, so we may neglect it without harm. From the second equation, we havex3 = −1. Substituting this value into the first equation, we have x1 = 2x2 + 2. The system has thenon-unique solutions (x1, x2, x3) = (2r + 2, r, −1), with r arbitrary. 2

112


Determine whether the matrices are in reduced row echelon form

A =

[1 0 0 30 0 1 2

], B =

1 10 10 00 0

, C =

0 0 0 00 1 0 00 0 1 00 0 0 0

, D =

0 1 00 0 10 0 0

.

Solution The matrices A and D are reduced, while B and C are not yet reduced. The matrix B is notreduced because the entry above the (2, 2)-entry is not reduced to 0. The matrix C is not reduced becausethe first row which consists entirely zeros is not at the bottom of the matrix. 2



1 1 1 63 2 −1 43 1 2 11

.

Solution By row operations,

1 1 1 63 2 −1 43 1 2 11

−3R1+R2−−−−−−−→−3R1+R3

1 1 1 60 −1 −4 −140 −2 −1 −7

R2+R1−−−−−−−→−2R2+R3

1 0 −3 −80 −1 −4 −140 0 7 21

17

R3−−−−−→

1 0 −3 −80 −1 −4 −140 0 1 3

3R3+R1−−−−−→4R3+R2

1 0 0 10 −1 0 −20 0 1 3

−R2−−−−−→

1 0 0 10 1 0 20 0 1 3

.

2



0 0 4 02 2 −2 55 5 −1 5

.


0 0 4 02 2 −2 55 5 −1 5

14

R1−−−−−−−→− 5

2R2+R3

0 0 1 02 2 −2 50 0 4 −15/2

2R1+R2−−−−−−−→−4R1+R3

0 0 1 02 2 0 50 0 0 −15/2

23

R3+R2−−−−−−→− 2

15R3

0 0 1 02 2 0 00 0 0 1

12

R2−−−−−→R1↔R2

1 1 0 00 0 1 00 0 0 1

.

2

Remark In the following Examples 2.13–2.20, we shall use the method of reduction to solve systems oflinear equations. We should emphasize that the method of reduction is just the Gaussian elimination plussome more row eliminations until the augmented matrix is in its reduced row echelon form. The advantageof reduced row echelon form is that we can write down the solutions directly with no backward substitutionis needed.

113




x1 +2x3 = 1,

−2x1 +2x2 −x3 = 3,

x1 +x2 +4x3 = 2.


1 0 2 1−2 2 −1 31 1 4 2

2R1+R2−−−−−−→−R1+R3

1 0 2 10 2 3 50 1 2 1

−2R3+R2−−−−−−−→R2↔R3

1 0 2 10 1 2 10 0 −1 3

2R3+R1−−−−−−−−−→2R3+R2, −R3

1 0 0 70 1 0 70 0 1 −3

.


x1 = 7,x2 = 7,

x3 = −3,

which immediately gives the unique solution (x1, x2, x3) = (7, 7, −3). 2



2x1 −x2 = −1,

−x1 +x2 = 1,

2x1 −3x2 = −3.


2 −1 −1−1 1 12 −3 −3

2R2+R1−−−−−→2R2+R3

0 1 1−1 1 10 −1 −1

−R1+R2−−−−−−→R1+R3

0 1 1−1 0 00 0 0

−R2−−−−−→R1↔R2

1 0 00 1 10 0 0

,

which gives the unique solution (x1, x2) = (0, 1). 2



x +y −2z = −1,

2x −4z = 8,

3x +y +z = 0,

x −2y −2z = 14.


1 1 −2 −12 0 −4 83 1 1 01 −2 −2 14

−2R1+R2−−−−−−−−−−−−−→

−3R1+R3, −R1+R4

1 1 −2 −10 −2 0 100 −2 7 30 −3 0 15

− 1

2R2−−−−−→

1 1 −2 −10 1 0 −50 −2 7 30 −3 0 15

−R2+R1−−−−−−−−−−−−→2R2+R3, 3R2+R4

1 0 −2 40 1 0 −50 0 7 −70 0 0 0

17

R3−−−−−→2R3+R1

1 0 0 20 1 0 −50 0 1 −10 0 0 0

,

which gives the unique solution (x, y, z) = (2,−5,−1). 2

114

Example 2.16 (Method of reduction – unique solutions)


x +y +2z = 4a,

x +2y +z = 4b,

2x +y +z = 4c.


1 1 2 4a1 2 1 4b2 1 1 4c

−R1+R2−−−−−−−→−2R1+R3

1 1 2 4a0 1 −1 −4a + 4b0 −1 −3 −8a + 4c

−R2+R1−−−−−−→R2+R3

1 0 3 8a − 4b0 1 −1 −4a + 4b0 0 −4 −12a + 4b + 4c

− 14

R3−−−−−→

1 0 3 8a − 4b0 1 −1 −4a + 4b0 0 1 3a − b − c

−3R3+R1−−−−−−−→R3+R2

1 0 0 −a − b + 3c0 1 0 −a + 3b − c0 0 1 3a − b − c

.


x = −a − b + 3c,y = −a + 3b − c,

z = 3a − b − c,

which gives the unique solution (x, y, z) = (−a − b + 3c, −a + 3b − c, 3a − b − c). 2



x1 −x2 +2x3 = 1,

2x1 +2x3 = 1,

x1 −3x2 +4x3 = 2.


1 −1 2 12 0 2 11 −3 4 2

−2R1+R2−−−−−−−→−R1+R3

1 −1 2 10 2 −2 −10 −2 2 1

R2+R3−−−−−→

1 −1 2 10 2 −2 −10 0 0 0

12

R2−−−−−→

1 −1 2 10 1 −1 −1/20 0 0 0

R2+R1−−−−−→

1 0 1 1/20 1 −1 −1/20 0 0 0

.


x1 +x3 = 1/2,x2 −x3 = −1/2,

0 = 0,

which gives the non-unique solutions (x1, x2, x3) = (−r + 1/2, r − 1/2, r), with r arbitrary. 2

115




x1 +2x2 +3x3 +4x4 = 5,

2x1 +3x2 +4x3 +5x4 = 6,

3x1 +4x2 +5x3 +6x4 = 7.


1 2 3 4 52 3 4 5 63 4 5 6 7

−2R1+R2−−−−−−−→−3R1+R3

1 2 3 4 50 −1 −2 −3 −40 −2 −4 −6 −8

2R2+R1−−−−−−−−−−→−2R2+R3, −R2

1 0 −1 −2 −30 1 2 3 40 0 0 0 0

,

which gives the non-unique solutions (x1, x2, x3, x4) = (r +2s− 3, −2r− 3s+4, r, s), with r, s arbitrary. 2



x1 +x2 +5x4 = 1,

x1 +x3 +2x4 = 1,

x1 −3x2 +4x3 −7x4 = 1,

x2 −x3 +3x4 = 0.


1 1 0 5 11 0 1 2 11 −3 4 −7 10 1 −1 3 0

−R1+R2−−−−−−→−R1+R3

1 1 0 5 10 −1 1 −3 00 −4 4 −12 00 1 −1 3 0

R2+R1, −4R2+R3−−−−−−−−−−−−→

R2+R4, −R2

1 0 1 2 10 1 −1 3 00 0 0 0 00 0 0 0 0

,

which gives the non-unique solutions (x1, x2, x3, x4) = (−r − 2s + 1, r − 3s, r, s), with r, s arbitrary. 2



x1 +2x3 +x4 +4x5 = 1,

x2 +x3 −3x4 = −2,

4x1 −3x2 +5x3 +13x4 +16x5 = 10,

x1 +2x2 +4x3 −5x4 +4x5 = −3.


1 0 2 1 4 10 1 1 −3 0 −24 −3 5 13 16 101 2 4 −5 4 −3

−4R1+R3−−−−−−−→−R1+R4

1 0 2 1 4 10 1 1 −3 0 −20 −3 −3 9 0 60 2 2 −6 0 −4

3R2+R3−−−−−−−→−2R2+R4

1 0 2 1 4 10 1 1 −3 0 −20 0 0 0 0 00 0 0 0 0 0

,

which gives the non-unique solutions (x1, x2, x3, x4, x5) = (−2r − s − 4t + 1, −r + 3s − 2, r, s, t), with r,s, t arbitrary. 2

116


Find the value of t for which the following system has solution and solve the system for this value of t by themethod of reduction

x +y = 1,

tx +y = t,

(1 + t) x +2y = 3.


1 1 1t 1 t

1 + t 2 3

−tR1+R2−−−−−−−−−→−(1+t)R1+R3

1 1 10 1 − t 00 2 − (1 + t) 3 − (1 + t)

11−t

R2−−−−−→(t 6=1)

1 1 10 1 00 1 − t 2 − t

−R2+R1−−−−−−−−−→−(1−t)R2+R3

1 0 10 1 00 0 2 − t

.

Hence the system has solution only when t = 2 and the solution is given by (x, y) = (1, 0). Note that thesystem has no solution when t = 1 because there is a contradiction that x + y = 1 and x + y = 3/2. 2


Find the value of h for which the following system has solution and solve the system for this value of h bythe method of reduction

x1 +x2 +x3 = 2,

2x1 +3x2 +2x3 = 5,

x1 +x2 +(h2 − 5) x3 = h.


1 1 1 22 3 2 51 1 h2 − 5 h

−2R1+R2−−−−−−−→−R1+R3

1 1 1 20 1 0 10 0 h2 − 6 h − 2

−R2+R1−−−−−−→

1 0 1 10 1 0 10 0 h2 − 6 h − 2

.

Hence the system has

(1) no solution when h2 − 6 = 0, or h = ±√

6 because the last equation is equivalent to

0 = ±√

6 − 2,

which is a contradiction;

(2) a unique solution when h2 − 6 6= 0, or h 6= ±√

6. In this case, the last matrix could be reduced to

1 0 1 10 1 0 10 0 h2 − 6 h − 2

1

h2−6R3

−−−−−−→

1 0 1 10 1 0 10 0 1 h−2

h2−6

−R3+R1−−−−−−→

1 0 0 h2−h−4

h2−6

0 1 0 10 0 1 h−2

h2−6

,

which gives the unique solution (x1, x2, x3) = (h2 − h − 4

h2 − 6, 1 ,

h − 2

h2 − 6).

2

117



For which values of a and b does the system has (1) no solution, (2) a unique solution, (3) infinitely manysolutions?

x −2y +3z = 4,

2x −3y +az = 5,

3x −4y +5z = b.


1 −2 3 42 −3 a 53 −4 5 b

−2R1+R2−−−−−−−→−3R1+R3

1 −2 3 40 1 a − 6 −30 2 −4 b − 12

2R2+R1−−−−−−−→−2R2+R3

1 0 2(a − 6) + 3 −20 1 a − 6 −30 0 −2(a − 6) − 4 b − 6

=

1 0 2a − 9 −20 1 a − 6 −30 0 −2a + 8 b − 6

.

The system has

(1) no solution when −2a + 8 = 0 and b − 6 6= 0 (i.e., when a = 4 and b 6= 6);

(2) a unique solution when −2a + 8 6= 0 (i.e., when a 6= 4). In this case the unique solution is given by(x, y, z) = ((9 − 2a) z0 − 2, (6 − a) z0 − 3, z0), where z0 = (b− 6)/(8− 2a). By a unique solution herewe mean there is only one solution for each given pair of fixed values of (a, b);

(3) infinitely many solutions when −2a + 8 = 0 and b − 6 = 0 (i.e., when a = 4 and b = 6). In this casethe non-unique solutions are (x, y, z) = (r − 2, 2r − 3, r), with r arbitrary.

2


For which values of a does the system has (1) no solution, (2) a unique solution, (3) infinitely many solutions?

x +2y −3z = 4,

x +y −z = 2,

4x +5y +(a2 − 22) z = a + 6.


1 2 −3 41 1 −1 24 5 a2 − 22 a + 6

−R1+R2−−−−−−−→−4R1+R3

1 2 −3 40 −1 2 −20 −3 a2 − 10 a − 10

−3R2+R3−−−−−−−→−R2

1 2 −3 40 1 −2 20 0 a2 − 16 a − 4

−2R2+R1−−−−−−−→

1 0 1 00 1 −2 20 0 (a − 4)(a + 4) a − 4

.

The system has

(1) no solution when (a − 4)(a + 4) = 0 and a − 4 6= 0 (i.e., when a = −4);

(2) a unique solution when (a − 4)(a + 4) 6= 0 (i.e., when a 6= ±4). In this case the unique solution isgiven by (x, y, z) = (−z0, 2z0 + 2, z0), where z0 = (a + 4)−1;

(3) infinitely many solutions when (a − 4)(a + 4) = 0 and a − 4 = 0 (i.e., when a = 4). In this case thenon-unique solutions are (x, y, z) = (−r, 2r + 2, r), with r arbitrary.

2

118



x +y +z = 0,

5x −2y −9z = 0,

3x −2y −7z = 0.

Solution By a homogeneous system we mean the right side of the system is all zero. For such systems,there always exists a zero solution which is also called a trivial solution. Therefore, the real interestingquestion for homogeneous systems is: does there exist any nontrivial solution?

As usual, we simplify the augmented matrix as follows

1 1 1 05 −2 −9 03 −2 −7 0

−5R1+R2−−−−−−−→−3R1+R3

1 1 1 00 −7 −14 00 −5 −10 0

− 17

R2−−−−−→− 1

5R3

1 1 1 00 1 2 00 1 2 0

−R2+R1−−−−−−→−R2+R3

1 0 −1 00 1 2 00 0 0 0

,

which gives the nontrivial solution (x, y, z) = (r, −2r, r), with r arbitrary. 2



2x +3y = 0,2x +y = 0.

Solution By the row operations in Example 2.25 we find that the right side (with all zeros) has no realcontribution. Therefore we only simplify the coefficient matrix as follows

[2 32 1

]−R1+R2−−−−−−→

[2 30 −2

]− 1

2R2−−−−−→

[2 30 1

]−3R2+R1−−−−−−−→

12

R1

[1 00 1

].

The system has no nontrivial solution. 2


For which values of λ does the system has a nontrivial solution?

x + (λ − 3) y = 0,

(λ − 3) x + y = 0.

Solution We simplify the coefficient matrix as follows[

1 λ − 3λ − 3 1

]−(λ−3) R1+R2−−−−−−−−−−→

[1 λ − 30 1 − (λ − 3)2

].

The system has nontrivial solution when 1 − (λ − 3)2 = 0 ⇐⇒ λ2 − 6λ + 8 = 0 ⇐⇒ λ = 2 or λ = 4. 2


Without performing any computation, determine whether the system below has a nontrivial solution.

2x − 4y + 7z + 4v − 5w = 0,

9x + 3y + 2z − 7v + 2w = 0,

5x + 2y − 3z + 2v + 3w = 0,

6x − 5y + 4z − 3v − 2w = 0.

Solution Since we have four equations with five variables in the (so-called underdetermined) system, andwe might even get less number of equations than four after row operations. Thus the homogeneous systemmust have a nontrivial solution. 2

119



Find the rank of the matrix [1 2 33 2 1

].


[1 2 33 2 1

]−3R1+R2−−−−−−−→

[

1 2 3

0

−4 −8

].

By a pivot of a matrix we mean the location of the first (from left to right) nonzero entry for each row in itsrow echelon form. In this example, the first nonzero entry for the first row is 1 located at the (1, 1)-position.Similarly, the first nonzero entry for the second row is −4 located at the (2, 2)-position. The rank of thematrix (i.e., the total number of pivots) is therefore 2. 2



−1 1 0−3 1 −11 −3 −4

.


−1 1 0−3 1 −11 −3 −4

−3R1+R2−−−−−−−→R1+R3

−1 1 00 −2 −10 −2 −4

−R2+R3−−−−−−→

−1 1 0

0

−2 −1

0 0

−3

.

The (1, 1)-, (2, 2)-, (3, 3)-entries are pivots. The rank of the matrix is 3. 2


Find the reduced row echelon form of the following matrix and then determine the rank of the matrix

2 1 11 2 33 1 2

.


2 1 11 2 33 1 2

R1↔R2−−−−−→

1 2 32 1 13 1 2

−2R1+R2−−−−−−−→−3R1+R3

1 2 30 −3 −50 −5 −7

− 13

R2−−−−−→

1 2 30 1 5/30 −5 −7

−2R2+R1−−−−−−−→5R2+R3

1 0 −1/3

0

1 5/3

0 0

4/3

34

R3−−−−−→

1 0 −1/30 1 5/30 0 1

13

R3+R1−−−−−−−→− 5

3R3+R2

1 0 00 1 00 0 1

.

The last matrix is the reduced row echelon form. The pivots are the (1, 1)-, (2, 2)-, and (3, 3)-entries andthe rank of the matrix is 3. This example reveals that one can determine the rank of a matrix so long asthe matrix is either in the row echelon form or in the reduced row echelon form. 2

120



1 2 0 1 42 4 1 2 −13 6 1 3 3

.


1 2 0 1 42 4 1 2 −13 6 1 3 3

−2R1+R2−−−−−−−→−3R1+R3

1 2 0 1 40 0 1 0 −90 0 1 0 −9

−R2+R3−−−−−−→

1 2 0 1 4

0 0

1 0 −90 0 0 0 0

.

The pivots are the (1, 1)-, (2, 3)-entries and the rank of the matrix is 2. 2



1 2 1 32 1 −4 −51 1 0 00 0 1 1

.


1 2 1 32 1 −4 −51 1 0 00 0 1 1

−2R1+R2−−−−−−−→−R1+R3

1 2 1 30 −3 −6 −110 −1 −1 −30 0 1 1

2R3+R1−−−−−−−→−3R3+R2

1 0 −1 −30 0 −3 −20 −1 −1 −30 0 1 1

R4+R1−−−−−−−−−−−→3R4+R2, R4+R3

1 0 0 −20 0 0 10 −1 0 −20 0 1 1

R2↔R3−−−−−→R3↔R4

1 0 0 −2

0

−1 0 −2

0 0

1 1

0 0 0

1

.

The pivots are the (1, 1)-, (2, 2)-, (3, 3)-, (4, 4)-entries and the rank of the matrix is 4. 2



4 2 6−1 3 1−8 0 8−3 2 3

.


4 2 6−1 3 1−8 0 8−3 2 3

4R2+R1−−−−−−−−−−−−−−→

−8R2+R3, −3R2+R4

0 14 10−1 3 10 −24 00 −7 0

12

R1−−−−−−−−−−→− 1

24R3, − 1

7R4

0 7 5−1 3 10 1 00 1 0

−7R3+R1−−−−−−−→−R3+R4

0 0 5−1 3 10 1 00 0 0

R1↔R2−−−−−→R2↔R3

−1 3 1

0

1 0

0 0

50 0 0

.

The pivots are the (1, 1)-, (2, 2)-, (3, 3)-entries and the rank of the matrix is 3. 2

121



If a, b, c are distinct, find the rank of the matrix

1 1 1a b ca2 b2 c2

.


1 1 1a b ca2 b2 c2

−aR2+R3−−−−−−−→−aR1+R2

1 1 10 b − a c − a0 b(b − a) c(c − a)

−bR2+R3−−−−−−→

1 1 1

0

b − a c − a

0 0

(c − a)(c − b)

.

Since a, b, c are distinct, the (1, 1)-, (2, 2)-, (3, 3)-entries are pivots. The rank of the matrix is 3. 2



1 1 1 1 11 0 0 h 10 1 h 0 1

.


1 1 1 1 11 0 0 h 10 1 h 0 1

−R1+R2−−−−−−→

1 1 1 1 10 −1 −1 h − 1 00 1 h 0 1

R2+R3−−−−−→

1 1 1 1 10 −1 −1 h − 1 00 0 h − 1 h − 1 1

.

When h = 1, then the pivots are the (1, 1)-, (2, 2)-, (3, 5)-entries and the rank of the matrix is 3. Whenh 6= 1, then the pivots are the (1, 1)-, (2, 2)-, (3, 3)-entries and again the rank of the matrix is 3. 2


Without performing any row operations, write down the rank of the matrix

A =

1 0 03 0 00 2 04 0 1

.

Solution Note that

At =

1 0 03 0 00 2 04 0 1

t

=

1 3 0 4

0 0

2 0

0 0 0

1

is already in row echelon form. Thus, rank A = rank At = 3. 2

Remark Based on Examples 2.29–2.37, we observe that one can determine the pivots of a given matrix(and hence the rank of the matrix) even though the matrix is not yet reduced. Practically, all we need todo is perform row operations until the matrix is in the row echelon form (i.e., upper triangular shape). InChapter 3, we shall use pivots to determine the linear independence of given vectors. This will be a simpleyet efficient method when the number of given vectors is more than two. In case of only two vectors, youjust need to decide whether one vector is a scalar multiple of the other or not. If the answer is yes, the twovectors are linearly dependent.

122


Find a suitable matrix that represents the following matrix transformation.

T : R3 → R2,

x1

x2

x3

7→[

x1 − 2x2 + x3

3x1 + 2x2 − x3

].

Solution The size of the matrix A should be 2 × 3. If we compute the given transformation with thematrix transformation given by a generic 2 × 3 matrix:

[a b cd e f

]

x1

x2

x3

=

[ax1 + bx2 + cx3

dx1 + ex2 + fx3

],

we immediately identify that A =

[1 −2 13 2 −1

]. 2

Definition Let e1, e2, · · · , en be the following vectors in Rn:

e1 =

10...0

, e2 =

01...0

, · · · , en =

00...1

.

Then it is easy to see that every vector x in Rn has a unique representation as a linear combination ofe1, e2, · · · , en:

x1

x2

...xn

= x1e1 + x2e2 + · · · + xnen.

Theorem Let T : Rn → Rm be a matrix transformation. Then the matrix A that gives T (x) = Axfor every x ∈ Rn is given by the following formula:

A =[T (e1) T (e2) · · · T (en)

].

The matrix A determined in this theorem is called the standard matrix of T .


Find the standard matrix A of the matrix transformation T : R2 → R3 that satisfies

T

[x1 + x2

x1 − x2

]=

3x1 − x2

−x1

x2

for every x1, x2 ∈ R.

Solution By the above theorem, we need to compute[T (e1) T (e2)

]. To get e1, we can choose

x1 = x2 = 1/2; and to get e2, we can choose x1 = 1/2, x2 = −1/2:

T (e1) = T

[12

+ 12

12− 1

2

]=

3 · 1

2− 1

2

− 12

12

=

1

− 12

12

, T (e2) = T

[12

+ (− 12)

12− (− 1

2)

]=

3 · 1

2− (− 1

2)

− 12

(− 12)

=

2

− 12

− 12

.

So the standard matrix of T is A =

1 2

− 12

− 12

12

− 12

. 2

123


124

Chapter 3

Vector Spaces

It is rare that an object of our real world can be described by a single variable. We often need many variables,and sometimes even infinitely many.

Depending on the situation, sometimes it is preferable to think of individual variables, sometimes it ismore convenient to put many variables together and think of one big “multivariable”. The advantage of themultivariable viewpoint is the old wisdom: whole > sum of parts.

3.1 Geometry of Euclidean vectors in R2, R3

An n-component multivariable x = (x1, x2, · · · , xn) is called an n-dimensional Euclidean vector. Thecomponents x1, x2, · · · , xn are called the coordinates of the vector x. The collection Rn of all n-dimensional Euclidean vectors is called the n-dimensional Euclidean space. We indicate x is an n-dimensional Euclidean vector by writing x ∈ Rn.

We may visualize low dimensional Euclidean spaces as follows.

R0 : a single point;

R1 : a straight line with origin;

R2 : a plane with origin;

R3 : our living world with a choice of reference point as origin.

The origin corresponds to the zero vector 0 = (0, 0, · · · , 0). We may also imagine higher dimensionalEuclidean spaces by analogy.

A Euclidean vector is represented either as a point in the Euclidean space or as an arrow from the originto the point.

0

•(−2, 1)

•

(1,−1)

•(2, 2)

0

(−2, 1)

(1,−1)

(2, 2)

vectors as points vectors as arrows

Figure 4.1: Vectors in R2

125

3. Vector Spaces

In the real world, one often has to deal with arrows not starting from the reference point (i.e., theorigin). In linear algebra, we get around the problem by parallely moving the arrows so that the startingpoints become the origin. Throughout this course any two arrows will be considered as the same vector ifone can be parallely moved to the other.

0

v

v

Figure 4.2: Parallely move to the origin

Because of the subtle difference between the theory and the real world, occasionally some adjustment isnecessary for applying linear algebra to real world problems.

3.2 Vector Addition and Scalar Multiplication

Euclidean vectors of the same dimension may be added and scalar multiplied in the natural way:

(x1, x2, · · · , xn) + (y1, y2, · · · , yn) = (x1 + y1, x2 + y2, · · · , xn + yn),

c(x1, x2, · · · , xn) = (cx1, cx2, · · · , cxn).

The scalar multiplication is easily visualized as the stretching and shrinking of vectors. The addition isvisualized with the help of parallelograms.

−u 0•

12u u 2u

v u + v

Figure 4.3: Addition and scalar multiplication

Figure 7.3 indicates that the operations on vectors have physical meaning in the real world. For example,suppose vectors x and y represent two forces applied to the same point of an object. Then the combinedeffect of the two forces on the object is a force represented by the vector x + y.

By repeatedly making use of the two operations, we have linear combinations of vectors.

Example 3.1 (Linear combination of Euclidean vectors)

Letx = (1,−2,−3, 5), y = (0,−1, 2,−4), z = (−1, 3, 1,−1).

Then−x + 2y − 3z = (2,−9, 4,−10), 2x + y − 2z = (4,−11,−6, 8).

2

The following figure indicates all the linear combinations of two vectors.

126

3.3 Linear Combination of Vectors

•

−u − v•

−v•

u − v•

2u − v

•

−u 0 u

u − 12v

•

2u

•

−u + v v12u + 1

2v

•

u + v•

2u + v

•

2v•

u + 2v

Figure 4.4: All linear combinations of two vectors

3.3 Linear Combination of Vectors

In linear algebra, especially when vectors are mixed with matrices, it is often very convenient to identify ann-dimensional Euclidean vector with an n × 1 matrix:

x = (x1, x2, · · · , xn) =

x1

x2

...xn

.

When written in such a way, x is called a column vector. The addition and scalar multiplication oncolumn vectors is the same as the matrix operations. In particular, the operations on vectors have the similarproperties as the operations on matrices.

Occasionally, we need to identify an n-dimensional Euclidean vector with a 1 × n matrix:

x =[x1 x2 · · · xn

].

When written in such a way, x is called a row vector. Again the operations on vectors is the same as theoperations on matrices.

In this course, we will denote a vector in any one of the three ways whenever we feel convenient. It isoften clear from the context which way we use. The only rule you need to stick to is that in an expressioninvolving both matrices and vectors, the vector is almost always a column vector.

Therefore, for

v1 = (a11, a21, a31, a41), v2 = (a12, a22, a32, a42), v3 = (a13, a23, a33, a43),

we write

[v1 v2 v3

]=

a11 a12 a13

a21 a22 a23

a31 a32 a33

a41 a42 a43

and

v1

v2

v3

=

a11 a21 a31 a41

a12 a22 a32 a42

a13 a23 a33 a43

.

127

3. Vector Spaces

3.4 Express Solution in Vector Form

In Chapter 2, we have treated variables of a system of linear equations individually. Now we put all thevariables together in a vector and treat them as one vector variable. By doing this, we can uncover thestructure of the solutions of a system of linear equations.

Example 3.2 (General solution)

In Example 2.10 (page 50), the augmented matrix has been simplified to[A b

]=

1 −1 3 3

0

1 −2 −10 0 0 0

.

The general solution of the system is the vector

x =

x1

x2

x3

=

−x3 + 22x3 − 1

x3

=

2−10

+ x3

−121

,

where x3 can be any number. Geometrically, x3 (−1, 2, 1) represents a line through the origin in (−1, 2, 1)direction. If we shift the line by distance (2,−1, 0), then we get the general solution of the system. Remarkthat a line through the origin is so-called a “1-dimensional linear subspace”. Informally, a linear subspaceis an “infinite flat part through the origin”.

x1

x2

x3

0

(−1, 2, 1)

(2,−1, 0)

span (−1, 2, 1)

line of solution

Figure 4.5: General solution = shift of line

3.5 Find General Solution from RREF

From the row echelon form of the augmented matrix above, we observe the correspondence between thepivot columns and free variables of the system. In fact, the locations of nonpivot columns of A will tellwhich variables are actually free. The general solution can be obtained by writing the nonfree variables interms of free variables (x1, x2 in terms of x3 for this example).

2

Example 3.3 (General solution from reduced row echelon form)

Suppose the augmented matrix of a system of linear equations is reduced to the row echelon form

0

0 0 0

0 0 0 0

0 0 0 0 0 0 00 0 0 0 0 0 0

.

128

3.5 Find General Solution from RREF

Table 3.1: Pivot columns compared with free variables

Column 1 2 3

Pivot yes yes no

Variable x1 x2 x3

Free choice no no yes

The reduced row echelon form is

0

1 a1 0 0 a2 d1

0 0 0

1 0 a3 d2

0 0 0 0

1 a4 d3

0 0 0 0 0 0 00 0 0 0 0 0 0

.

Interpreted the last matrix as a system of linear equations

x2 +a1x3 +a2x6 = d1,

x4 +a3x6 = d2,

x5 +a4x6 = d3.

This gives the general solution

x2 = d1 − a1x3 − a2x6,

x4 = d2 − a3x6,

x5 = d3 − a4x6,

where x1, x3, x6 are arbitrary. In (column) vector notation, the general solution is

x =

x1

d1 − a1x3 − a2x6

x3

d2 − a3x6

d3 − a4x6

x6

=

0d1

0d2

d3

0

+ x1

100000

+ x3

0−a1

1000

+ x6

0−a2

0−a3

−a4

1

.

The general solution is obtained by shifting a “3-dimensional linear subspace”.

We note the following correspondence.

Table 3.2: Pivot columns compared with free variables

Column 1 2 3 4 5 6

Pivot no yes no yes yes no

Variable x1 x2 x3 x4 x5 x6

Free choice yes no yes no no yes

2

129

3. Vector Spaces

3.6 Free Variables and Rank

From the examples above, we see that some variables can be chosen arbitrarily (free variables), and oncethese variables are chosen, then the other basic variables can be determined (nonfree variables). Moreover,we have the following correspondence

pivots columns of A ⇐⇒ nonfree variable,

nonpivot columns of A ⇐⇒ free variable.(3.1)

This further implies

number of nonfree variables = rankA,

number of free variables = n − number of nonfree variables = n − rankA.(3.2)

Example 3.4 (Free variables and rank)

Suppose Ax = b is a consistent system of 4 linear equations in 7 variables. Since the rank of A is nobigger than 4 (i.e., no bigger than the number of rows of A), then the number of free variables is at least7 − 4 = 3. In other words, the solution can never be unique. For example, if the rank of A is 2, then thenumber of free variables in the general solution is 7 − 2 = 5.

The system has infinitely many solutions.

2

When expressed as a vector, the general solution is of the form

x = u0 + c1u1 + c2u2 + · · · + ckuk, arbitrary c1, c2, · · · , ck, (3.3)

where c1, c2, · · · , ck are simply the free variables. In particular, by (3.2), we have

k = number of free variables = n − rankA.

The general solution (3.3) has two parts:

x = xp + xh,

where

xp = u0, xh = c1u1 + c2u2 + · · · + ckuk.

Here xp is one particular solution of Ax = b because it comes from setting c1 = c2 = · · · = ck = 0(all free variables vanish). On the other hand,

Axh = A (x − xp)

= Ax − Axp = b − b = 0.

130

3.6 Free Variables and Rank

Therefore, xh is a solution of the so-called homogeneous equation

Ax = 0.

Since we are interested in knowing the general solution (all the solutions) of the above homogeneous equation,we introduce the following definition.

5 Definition The null space of a matrix A is

NulA = x : Ax = 0 = all solutions of Ax = 0 .

Note that if A is an m × n matrix, then NulA is in Rn.

The next theorem formalizes the phenomenon seen in Examples 3.2 and 3.3, i.e., the general solutionof Ax = b (if solutions exist) is obtained by shifting NulA. In fact the amount of shifting is given byany special solution of Ax = b.

Theorem 3.6.1 Suppose xp is one particular solution of a system Ax = b of linear equations. Thenthe general solution of Ax = b is

x = xp + xh, xh ∈ NulA.

We may think of xp as the existence part of the system Ax = b, because the assumption on xp

means that Ax = b has solutions. We may think of xh as the uniqueness part of the system Ax = b,because it is the “freedom / variation” from your favorite solution xp (and the number k is the “degreeof freedom”). Thus, if Ax = b has solutions, then

Ax = b has a unique solution

⇐⇒ no freedom / variation

⇐⇒ NulA = 0

⇐⇒ Ax = 0 has only the trivial solution x = 0

⇐⇒ all columns of A are pivot columns.

(3.4)

In particular, the uniqueness is independent of the right side b.

Finally, we remark that in the real world, which variable is considered as free depends on your viewpoint.Consider the equation x1 − x2 = 1 for example. If you write the solution x1 = 1 + x2, for arbitrary x2,then you are taking x2 as the free variable. If you write the solution x2 = −1 + x1, for arbitrary x1, thenyou are taking x1 as the free variable. In our examples above (and in subsequent examples), we usuallywrite xi in terms of xj , j > i.

131

3. Vector Spaces

3.7 Vector Spaces and Subspaces

The algebraic properties of vector operations in Rn turn out to be quite general in the sense that manyalgebraic systems share the same kind of structures. In other words, we can handle the objects in such analgebraic system as if they were vectors in Rn. This is the concept of “vector space” or “linear space”.

Vector Space. A vector space is a (non-empty) collection of algebraic objects in which we can perform“vector-like” operations among the objects. These “vector-like” operations will have properties similar tothe vector addition and scalar multiplication in Rn.

6 Definition A vector space consists of a non-empty collection V of algebraic objects, called vec-tors, and two operations defined on V , called vector addition and scalar multiplication, suchthat the following conditions hold for any u, v, w in V and any numbers c and d:

(i) The sum u + v is always in V (i.e., vector addition is closed in V ).

(ii) u + v = v + u.

(iii) (u + v) + w = u + (v + w).

(iv) There is a zero vector 0 in V such that u + 0 = u.

(v) Each u has a negative vector −u such that u + (−u) = 0.

(vi) The scalar multiple cu is always in V (i.e., scalar multiplication is closed in V ).

(vii) c(u + v) = cu + cv.

(viii) (c + d)u = cu + du.

(ix) c(du) = (cd)u.

(x) 1u = u.

Remark that zero vector 0 is a notation only. Its actual form changes with the vector space.

Theorem 3.7.1 Let V be a vector space. Then

(1) the zero vector 0 is uniquely defined;

(2) for each vector u, its negative vector −u is uniquely defined;

(3) 0u = 0;

(4) c0 = 0;

(5) −u = (−1)u.

Proof. Note that the rules for vector addition and scalar multiplication are not explicitly given (i.e., noway to compute the vectors directly), we can only use the properties in the definition to verify the identities.

132


(1) Suppose that w is another “zero vector”, then

0 = 0 + w = w.

(2) Suppose that w is another “negative vector” of u, then

−u = −u + 0 = −u + (u + w) = (−u + u) + w = 0 + w = w.

(3) First, by the well-known property of number “0” and property (viii), we have

0u = (0 + 0)u = 0u + 0u.

Now, by adding the negative vector of 0u to both sides of the above equality, we have

−0u + 0u = −0u + (0u + 0u),

0 = (−0u + 0u) + 0u) = 0 + 0u = 0u.

(4) Similar to (3), by the property of zero vector “0” and property (vii), we have

c0 = c(0 + 0) = c0 + c0.

When we add the negative vector of c0 to both sides, we obtain the result.

(5) Consider u + (−1)u:

u + (−1)u = 1u + (−1)u = (1 + (−1))u = 0u = 0.

2

Examples of vector spaces. Here we list out some common examples of vector spaces. The verificationsof the properties (i)–(x) are omitted as they are just straightforward checkings, using the given “vectoraddition” and “scalar multiplication”.

(1) V = 0, called the zero vector space.

All the operations give 0 as a result. So the ten conditions are obviously satisfied.

(2) Rn with component-wise vector

(3) Pn, the collection of all polynomials of degree at most n.

p(t) = a0 + a1t + a2t2 + · · · + antn.

The vector addition and scalar multiplication are the usual polynomial operations:

p(t) + g(t) = (a0 + b0) + (a1 + b1) t + (a2 + b2) t2 + · · · + (an + bn) tn,

cp(t) = (ca0) + (ca1) t + (ca2) t2 + · · · + (can) tn.

The zero vector is the zero polynomial.

(4) P, the collection of all polynomials.

(5) The collection of all real-valued functions f : D 7→ R defined on a set D. The vector addition oftwo functions (f + g) and the scalar multiplication of a function with a number (cf) are given by

(f + g)(x) := f(x) + g(x),

(cf)(x) := c · f(x)

for all x in D. The zero vector in this case is the zero function.

(6) The collection of all m × n matrices with matrix multiplication and scalar multiplication. The zeromatrix Om×n is the zero vector of this vector space.

133

3. Vector Spaces

Vector Subspaces. In general, we are usually interested in a certain sub-collection of vectors in a vectorspace. But to maintain a “linear structure” among this sub-collection of vectors, we naturally require thatthis sub-collection of vectors forms a vector space. This is the concept of vector subspace.

7 Definition A subspace H of a vector space V is a non-empty sub-collection of vectors in Vsuch that H itself forms a vector space under the same vector addition and scalar multiplication inducedfrom V .

Remark that the operations of a subspace must be the same as the original vector space.

The above is a conceptual definition. However, the conditions (ii), (iii), (vii), (viii), (ix), (x)for a vector space are automatically satisfied by vectors in H (since they are valid for all vectors in V ).And because −u = (−1)u, so the checking of condition (v) for H can be included into the checking ofcondition (vi) for H. Thus we only need to check conditions (i), (iv), (vi).

Therefore, we have another equivalent technical definition for subspace.

8 Definition (Alternative) A subset H of a vector space V is a subspace of V if

(a) the zero vector of V , 0, is in H.

(b) the sum of any two vectors in H is again a vector in H.

(c) the scalar multiple of a vector in H is again a vector in H.

In fact, the checkings of (b) and (c) can be combined into a single condition.

Theorem 3.7.2 A non-empty sub-collection H of vectors from a vector space V is a subspace of V ifand only if

For any u, v ∈ H and any numbers c, d, the vector cu + dv is always in H.

Proof. We check conditions (a), (b), (c) for “alternative definition”.

(a) Since H is non-empty, we pick the same vector u = v ∈ H. Choose c = d = 0, then the followingvector will be in H by assumption:

0u + 0v = 0 + 0 = 0.

(b) Let u, v be any two vectors in H. Choose c = d = 1, then the following vector will be in H byassumption:

1u + 1v = u + v.

134


(c) Let u be a vector in H and let k be any number. Choose v = u and c = k, d = 0, then thefollowing vector will be in H by assumption:

ku + 0v = ku + 0 = ku.

2

Examples of subspaces and non-subspaces.

(1) H = V is a subspace of V itself, of course. We will say that H is a proper subspace of V if His a subspace of V and H 6= V .

(2) 0, called the zero subspace.

(3) Pn is a subspace of Pm if n 6 m. Every Pn is a subspace of P.

(4) R2 is not a subspace of R3. It is because a vector in R2 has only two entries, but a vector in R3

has three entries. Actually, R2 is not a sub-collection of vectors in R3.

(5) The following subset of R3 is a subspace of R3:

H =

st0

: s, t in R

.

Note that it is a plane (the xy-plane) in the xyz-space passing through the origin.

(6) The following subset is not a subspace of R3:

H =

st1

: s, t in R

.

Because the zero vector is not in H. Note that it is a plane not passing through the origin.

Example 3.5 (Vector spaces)

Let u1 and u2 be two vectors in V . Then the following subset of V :

H = k1u1 + k2u2 : k1, k2 ∈ Ris a subspace of V .

Solution We check the conditions (a), (b), (c) for “alternative definition”.

(a) Choose k1 = k2 = 0, then H collects the vector 0u1 + 0u2 = 0.

(b) Let v1, v2 be two vectors in H. By the definition of H, there are suitable numbers k11, k12, k21,k22 such that

v1 = k11u1 + k12u2, v2 = k21u1 + k22u2.

Then the sum v1 + v2 can be expressed as

v1 + v2 = (k11 + k + 21)u1 + (k12 + k + 22)u2,

and hence it is also in H.

(c) Let v be a vector in H and let k be a number. By definition, there are numbers k1, k2 suchthat v = k1u1 + k2u2. Then

kv = k(k1u1 + k2u2) = (kk1)u1 + (kk2)u2,

and hence it is also in H.

This completes the checking of conditions (a), (b), (c). So, H is a subspace of V . 2

135

3. Vector Spaces

3.8 Column Space, Row Space and Null Space

In this and the next section, we study the system Ax = b of linear equations by partitioning A intocolumns and viewing x as many individual variables. Then the equation Ax = b becomes a relationbetween column vectors of A.

Let us consider the system Ax = b of four equations and three variables in Section 2.1 (page 41) asan example. We think of A as consisting of three column vectors

A =[v1 v2 v3

],

where vi = (a1i, a2i, a3i, a4i) are the three column vectors. Then the left side of the equation becomes

Ax =[v1 v2 v3

]

x1

x2

x3

= x1v1 + x2v2 + x3v3.

Therefore,

Ax = b has solutions ⇐⇒ x1v1 + x2v2 + x3v3 = b for some x1, x2, x3

⇐⇒ b is a linear combination of the columns of A. (3.5)

This leads to the following definition of span.

9 Definition The span of vectors v1, v2, · · · , vk in a Euclidean space Rn is the collection of all thelinear combinations:

span (v1,v2, · · · ,vk) = c1v1 + c2v2 + · · · + ckvk : all numbers c1, c2, · · · , ck .

Note that the span is always a subspace of Rn.

Note that in the definition, if we construct A =[v1 v2 · · · vk

]. Then A is an n × k matrix.

From the viewpoint of the matrix A, we may give different names to the span. In fact we can think of eitherthe column partition of A or the row partition of A. These respectively introduce the definitions of so-calledcolumn space of A or row space of A.

10 Definition The column space of a matrix A, denoted ColA, is the span of the column vectors ofA. The row space of a matrix A, denoted RowA, is the span of the row vectors of A.

Example 3.6 (Column space)

Find a matrix A such that W = ColA if W =

6a − ba + b−7a

: a, b ∈ R

.

Note that

6a − ba + b−7a

= a

61−7

+ b

−110

=⇒ A =

6 −11 1−7 0

.

2

Note that if A is n× k, then ColA will be a subspace of Rn, and RowA will be a subspace of Rk.Moreover, (3.5) becomes

Ax = b has solutions ⇐⇒ b ∈ ColA.

Finally, we note the obvious fact thatRowA = ColAt. (3.6)

136

3.9 Span of Vectors

3.9 Span of Vectors

• Span of one or two vectors

We build up intuitive feelings about the span from the simplest examples.

Example 3.7 (Span of one vector)

The span of one vector v is

span (v) = cv : all number c .

This is all the scalar multiples of v. From the following figure we see that

span (v) =

line in direction of v, if v 6= 0,

origin, if v = 0.

v

•0

v 6= 0 =⇒ span (v) is a line

• 0 (origin)

v = 0 =⇒ span (v) = 0

Figure 4.6: Span of one vector

2

Example 3.8 (Span of two vectors)

The span of two vectors v, w is

span (v,w) = cv + dw : all numbers c and d .

From the following figure we see that

span (v,w) =

plane, if v ∦ w,

line, if v ‖ w and not all 0,

origin, if v = w = 0,

where v ‖ w means one vector is a scalar multiple of the other (two vectors are parallel).

w

v

•0

v•

w

0•0

span (v,w) = plane span (v,w) = line span (v,w) = 0

Figure 4.7: Span of two vectors137

3. Vector Spaces

2

Thus we see that span is all the places one can reach following the given directions.

We also see that span could be a point, a line, a plane, etc. in the Euclidean space containing the origin.These are 0-, 1-, 2-, etc. dimensional “linear subspaces”.

• Determine span

As explained at the beginning of Section 3.8 (page 136), whether a point b is in the span of column vectorsof A is the same as whether the system of linear equations Ax = b has solutions. We give an illustrativeexample in the following.

Example 3.9 (Span vs. existence of solutions)

Since the vectors

v =

123

, w =

456

are not parallel, their span is a plane in R3. We would like to decide whether the vector b = (1, 1, 1) isin this plane. By definition,

b ∈ span (v,w)

⇐⇒ b = (1, 1, 1) = cv + dw = (c + 4d, 2c + 5d, 3c + 6d) for some c and d

⇐⇒ the following system (with variables c, d) has solutions

c + 4d = 1,

2c + 5d = 1,

3c + 6d = 1.

Thus we apply row operations to the augmented matrix

[v w b

]=

1 4 12 5 13 6 1

−→

1 4 10 −3 −10 0 0

.

Since the last column is not pivotal, the system has solutions. We conclude that b ∈ span (v,w).

To actually find c and d, we need to continue row operations to get a reduced row echelon form

[v w b

]=

1 4 12 5 13 6 1

−→

1 0 −1/30 1 1/30 0 0

.

From this we find the solution c = −1/3 and d = 1/3. This means that b = (−1/3)v + (1/3)w.

More generally, we may consider an arbitrary b = (b1, b2, b3) and ask the condition for b ∈ span (v,w).The problem is reduced to whether the system of linear equations in Example 2.21 (page 59) has solutions.From that example we conclude that

b ∈ span (v,w) ⇐⇒ b1 − 2b2 + b3 = 0.

2

138

3.9 Span of Vectors

Example 3.10 (Determine span by row operations)

We would like to determine whether the vectors b = (1, 2, 1) and c = (1, 0,−1) are in the span of

v1 =

123

, v2 =

234

, v3 =

345

, v4 =

456

, v5 =

567

.

We thus do the following row operations

[v1 v2 v3 v4 v5 b

]−→

1 2 3 4 5 1

0

−1 −2 −3 −4 0

0 0 0 0 0

−2

and

[v1 v2 v3 v4 v5 c

]−→

1 2 3 4 5 1

0

−1 −2 −3 −4 −2

0 0 0 0 0 0

.

Therefore b is not in the span, while c is in the span.

In particular, the row operations above also give us the following row operations

[v1 v2 c

]−→

1 2 10 −1 −20 0 0

.

Therefore the vector c also lies in the plane spanned by v1 and v2.

On the other hand, c is not in the straight line spanned by either v1 or v2, because c is clearlynot parallel to either vector.

Finally, as in the last example, we may continue the row operations to get a reduced row echelon form

[v1 v2 v3 v4 v5 c

]−→

1 0 −1 −2 −3 −30 1 2 3 4 20 0 0 0 0 0

.

There are many solutions to the corresponding system of linear equations. By picking the obvious solution

x1 = −3, x2 = 2, x3 = x4 = x5 = 0,

we find that

c = −3v1 + 2v2.

2

139

3. Vector Spaces

Example 3.11 (Minimal spanning vectors)

Consider the following five vectors

v1 =

2−24−2

, v2 =

−33−63

, v3 =

6−393

, v4 =

2−35−4

, v5 =

5−491

.

To find the condition for b = (b1, b2, b3, b4) ∈ span (v1,v2,v3,v4,v5), we do the following row operations

[v1 v2 v3 v4 v5 b

]−→

2 −3 6 2 5 b1

0 0

3 −1 1 b1 + b2

0 0 0

1 3 −2b1 − 3b2 + b4

0 0 0 0 0 −b1 + b2 + b3

.

Thus we see the condition is−b1 + b2 + b3 = 0.

If we pick only the vectors in pivot columns, then the same row operations give

[v1 v3 v4 b

]−→

2 6 2 b1

0 3 −1 b1 + b2

0 0 1 −2b1 − 3b2 + b4

0 0 0 −b1 + b2 + b3

.

Therefore the condition for b ∈ span (v1,v3,v4) is still

−b1 + b2 + b3 = 0.

In particular, we havespan (v1,v2,v3,v4,v5) = span (v1,v3,v4).

If we take away one more vector, say v3, then the same row operations give

[v1 v4 b

]−→

2 2 b1

0 −1 b1 + b2

0 1 −2b1 − 3b2 + b4

0 0 −b1 + b2 + b3

−→

2 2 b1

0 −1 b1 + b2

0 0 −b1 − 2b2 + b4

0 0 −b1 + b2 + b3

.

Therefore the condition for b ∈ span (v1,v4) is

−b1 − 2b2 + b4 = 0, and − b1 + b2 + b3 = 0.

In particular, we havespan (v1,v2,v3,v4,v5) 6= span (v1,v4).

For similar reason, both span (v1,v3) and span (v3,v4) are not equal to span (v1,v2,v3,v4,v5). 2

The example above indicates the following fact: From given vectors, the “pivot collection” of vectors(determined after finding row echelon form) has the same span. Moreover, if we further reduce the collection,even by deleting one vector, then we can no longer get the same span. Therefore the “pivot collection” isa minimal spanning vectors from the collection.

140

3.10 Linear Independence


Let A =[v1 v2 · · · vk

]. The existence of solutions for Ax = b has been related to the span,

or more specifically the column space of A. Assuming the existence, (3.4) tells us the uniqueness of thesolution is equivalent to Ax = 0 =⇒ x = 0. By Ax = x1v1 +x2v2 + · · ·+xkvk, the criterion becomes

x1v1 + x2v2 + · · · + xkvk = 0 =⇒ x1 = x2 = · · · = xk = 0.

This leads to the definition of linear independence of Euclidean vectors.

11 Definition If vectors v1, v2, · · · , vk in a Euclidean space Rn satisfy

c1v1 + c2v2 + · · · + ckvk = 0 =⇒ c1 = c2 = · · · = ck = 0,

then we call the vectors linearly independent.

• Independence of one or two vectors

We build up intuitive feelings about the linear independence from the simplest examples.

Example 3.12 (Linear independence of one vector)

The linear independence of one vector v means

cv = 0 =⇒ c = 0.

This is true ⇐⇒ v 6= 0. Therefore,

v is linearly independent ⇐⇒ v 6= 0.

2

Example 3.13 (Linear independence of two vectors)

The linear independence of two vectors v, w means

cv + dw = 0 =⇒ c = d = 0.

To see what this means, we consider the opposite (linear dependence, see definition on page 142)

cv + dw = 0 for some numbers c, d, not all zero.

If c 6= 0, then v = (−d/c)w is a scalar multiple of w. If d 6= 0, then w = (−c/d)v is a scalarmultiple of v. Either case means v ‖ w (parallel). Therefore,

v and w are linearly independent ⇐⇒ v ∦ w.

2

0 0dependent independent

Figure 4.8: (In)Dependence of two vectors

141

3. Vector Spaces

• Linear dependence

Example 3.13 shows the possibility of gaining insight into linear independence through the opposite case.We first give a formal definition of linear dependence of Euclidean vectors.

12 Definition If we can find numbers c1, c2, · · · , ck, not all zero, such that

c1v1 + c2v2 + · · · + ckvk = 0,

then we call the vectors v1, v2, · · · , vk ∈ Rn linearly dependent.

Let us assume c1 6= 0 in the definition. Then from

c1v1 + c2v2 + · · · + ckvk = 0, c1 6= 0,

we get

v1 = (−c2/c1)v2 + · · · + (−ck/c1)vk.

This leads to

several vectors are linearly dependent ⇐⇒ one vector is a linear combination of the others.

The following figure illustrates the geometric interpretation of linear (in)dependence of 3 vectors.

S

v3 v2

v1

v3 v2

v1

dependent independent

Figure 4.9: (In)Dependence of three vectors

As an example, in the left figure above we have v1, v2, v3 span a plane S. In fact, the vectors v2

and v3 already give us enough directions in order to produce the plane:

span (v2,v3) = S.

Moreover, v1 = cv2 + dv3 ∈ S shows that v1 does not contribute any “new direction” to produce S.In other words, v1 is a “wasted direction”. The phenomenon we observe here is indeed a general one.For several vectors, we have the correspondence

linearly dependent ⇐⇒ some direction wasted;

linearly independent ⇐⇒ no wasted direction.

142


• Determine independence

The method for determining linear independence is based on the following, in which

A =[v1 v2 · · · vk

].

v1, v2, · · · , vk are linearly independent

⇐⇒ Ax = 0 has only the trivial solution x = 0

(3.4)⇐⇒ Ax = b has a unique solution, provided that the system has solutions

Thm 2.7.1⇐⇒ all columns in the row echelon form of A are pivot

(1.4)⇐⇒ rankA = k.

(3.7)

Example 3.14 (Determine linear independence by row operations)

Determine whether

v1 =

130

, v2 =

−111

, v3 =

31−2

are linearly dependent or not?

Solution We do row operations

[v1 v2 v3

]=

1 −1 33 1 10 1 −2

−→

1 −1 3

0

1 −20 0 0

. (3.8)

Since the last column is not pivot, the three vectors are linearly dependent. Moreover, (3.8) also gives

[v1 v2

]−→

1 −10 10 0

in which all columns are pivot. Therefore,

v1, v2 are linearly independent,

and v3 should be considered as “wasted”. This suggests

v3 is a linear combination of v1, v2.

To actually express v3 as a linear combination of v1, v2, we may further do row operations on[v1 v2 v3

]. Thus we continue (3.8) and find the reduced row echelon form

[v1 v2 v3

]=

1 −1 33 1 10 1 −2

−→

1 0 10 1 −20 0 0

.

143

3. Vector Spaces

From the reduced row echelon form, we conclude that

v3 = v1 − 2v2.

2

Example 3.15 (Maximal independent vectors)

Show that the following vectors are linearly dependent. Then find the maximal number of linearly independentvectors from the five.

v1 =

2−24−2

, v2 =

−33−63

, v3 =

6−393

, v4 =

2−35−4

, v5 =

5−491

.

Solution We do row operations

[v1 v2 v3 v4 v5

]−→

2 −3 6 2 5

0 0

3 −1 1

0 0 0

1 30 0 0 0 0

which shows that the five vectors are linearly dependent. If we pick only the vectors of pivot columns, thenthe same row operations give

[v1 v3 v4

]−→

2 6 20 3 −10 0 10 0 0

in which all columns are pivot.

Therefore,v1, v3, v4 are linearly independent.

If we try to make the collection a little bigger, say v1, v2, v3, v4, then we have row operations

[v1 v2 v3 v4

]−→

2 −3 6 20 0 3 −10 0 0 10 0 0 0

which shows thatv1, v2, v3, v4 are linearly dependent.

For similar reason,v1, v3, v4, v5 are linearly dependent.

2

The two examples above indicate the following fact: From given vectors, the “pivot collection” of vectors(determined after finding row echelon form) are linearly independent. Moreover, if we enlarge the collection,even by adding one vector, then we get linear dependence. Therefore, the “pivot collection” is a set ofmaximal independent vectors from the collection.

144

3.11 Basis of Euclidean Spaces (Rn)


Our living world is three dimensional because it can be described in exactly three directions. Take right,front, and up directions for example. Following the three directions (and the opposite directions), we can goto anywhere in the world. Mathematically this means that the three directions span R3. On the other hand,all three directions are necessary. Following any two will only get us anywhere on a plane. Mathematicallythis means that the three directions are linearly independent.

13 Definition The vectors v1,v2, · · · ,vk ∈ Rn form a basis of Rn, if the following are satisfied.

1. v1, v2, · · · , vk span Rn, i.e., any vector of Rn is in the span of v1, v2, · · · , vk.

2. v1, v2, · · · , vk are linearly independent.

Example 3.16 (Standard basis of R3)

Show that the following vectors

e1 =

100

, e2 =

010

, e3 =

001

form a basis of R3.

Solution e1, e2, e3 span R3 because any vector

(x1, x2, x3) = x1e1 + x2e2 + x3e3 (3.9)

is a linear combination of the three vectors. Moreover, e1, e2, e3 are linearly independent because

x1e1 + x2e2 + x3e3 = 0(3.9)=⇒ (x1, x2, x3) = 0 = (0, 0, 0) =⇒ x1 = x2 = x3 = 0.

2

In general, Rn has the following basis

e1 =

10...0

, e2 =

01...0

, · · · , en =

00...1

,

which is called the standard basis of Rn.

Example 3.17 (Determine basis of R3)

Check whether the following vectors

v1 =

123

, v2 =

138

, v3 =

122

form a basis of R3.

145

3. Vector Spaces

Solution To see whether the vectors form a basis of R3, we check the two properties. First, for anyb = (b1, b2, b3), we have the row operations

[v1 v2 v3 b

]=

1 1 1 b1

2 3 2 b2

3 8 2 b3

−→

1 1 1 b1

0 1 0 −2b1 + b2

0 0 −1 7b1 − 5b2 + b3

. (3.10)

Then we see thatany vector b is in the span of v1, v2, v3.

Secondly, if we ignore the last column in (3.10), then the same row operations tell us that all columnsin

[v1 v2 v3

]are pivot columns. Therefore the vectors

v1, v2, v3 are linearly independent.

Thus we conclude thatv1, v2, v3 form a basis of R3.

2

The example above can be easily generalized to the following fact: If we have the vectors v1, v2, v3

in R3, such that row operations give us

[v1 v2 v3

]−→

0

0 0

,

then v1, v2, v3 form a basis of R3.

In fact, we have the following general theorem for the basis of Rn.

Theorem 3.11.1 A basis of Rn must contain n vectors. Moreover, for n vectors

v1, v2, · · · ,vn ∈ Rn

and the corresponding (square) matrix

A =[v1 v2 · · · vn

],

the following are equivalent:

1. v1, v2, · · · , vn is a basis of Rn.

2. v1, v2, · · · , vn span Rn.

3. v1, v2, · · · , vn are linearly independent.

4. A is invertible.

5. Ax = b has a unique solution for all b.

6. Ax = b has solutions for all b.

7. Ax = 0 has only the trivial solution x = 0.

8. All columns of A are pivot.

146


Note that (3.5) and (3.7) tell us items 1, 2, 3 and 5, 6, 7 are equivalent statements under differentviewpoints (vectors vs. system of equations).

Item 4 is the basis from the viewpoint of matrices. Item 8 provides the practical method for determiningwhether given vectors form a basis. We also recall that item 7 means Ax = b has a unique solution,provided that the system has solutions (see (3.4)).

Also note that item 1 is item 2 plus item 3, and 5 is 6 plus 7. Alternatively, items 2 and 3are the two aspects of item 1, and 6 and 7 are the two aspects of 5.

In general, there is no reason that one aspect of an object/concept/property should imply the otheraspect(s). However, the theorem points out the following important principle in linear algebra: If numbersmatch (n vectors in Rn, square matrix, or number of equations = number of variables), the one aspectimplies the other aspect and subsequently implies everything.

Example 3.18 (Basis of R2)

Since the vectors (1, 3) and (6,−11) are not parallel, they must be linearly independent. Therefore byTheorem 3.11.1, these two linearly independent vectors must be a basis of R2. In general, it follows fromthe theorem that a basis of R2 is the same as a pair of nonparallel vectors in R2. 2

To check whether some explicitly given vectors form a basis of Rn, the first thing to check is whetherthe number of vectors is n. As a matter of fact, a basis of Rn must contain n vectors in Rn. More orless number is not permissible.

Example 3.19 (Non-basis of Rn)

The vectors (1, 3, 5), (−2, 4, 1) cannot be a basis of R3 because any basis of R3 must contain exactlythree vectors. With similar reason, the vectors (1, 0, 3, 5), (2,−1, 4, 6), (3, 1,−2,−3), (1,−1,−4, 2),(3,−2, 5,−4) cannot be a basis of R4. 2

When the number is correct, we then need to further apply row operations to find out whether thevectors are linearly dependent or not.

Example 3.20 (Determine basis of Rn by row operations)

To determine whether

v1 =

1212

, v2 =

2121

, v3 =

1020

, v4 =

0201

form a basis of R4, we first observe that the number is correct and then carry out the row operations

[v1 v2 v3 v4

]−→

1 2 1 0

0

−3 −2 2

0 0

1 0

0 0 0

1

.

147

3. Vector Spaces

This tells us thatv1, v2, v3, v4 are linearly independent.

By Theorem 3.11.1, we conclude that the vectors

v1, v2, v3, v4 form a basis of R4.

If we change v4 to v5 = (0, 1, 0, 1), then we have the row operations

[v1 v2 v3 v5

]−→

1 2 1 0

0

−3 −2 1

0 0

1 00 0 0 0

.

This shows that

v1, v2, v3, v5 are linearly dependent,

so that they cannot be a basis of R4. 2

The following two examples show the relation between basis and other concepts.

Example 3.21 (Basis vs. invertibility)

From Example 1.13 (page 10), the invertibility of A implies that (1, 2, 3), (1, 3, 8), (1, 2, 2) form a basisof R3. Since At and A−1 are also invertible, we see that

(1, 1, 1), (2, 3, 2), (3, 8, 2) and (10,−2,−7), (−6, 1, 5), (1, 0,−1) are another two bases of R3.

2

Example 3.22 (Condition for basis)

Consider the row operations in Example 2.21 (page 59). If b1 − 2b2 + b3 6= 0, then all the columns in the3 × 3 matrix are pivot, so that the columns form a basis of R3. For example,

(1, 2, 3), (4, 5, 6), (1, 0, 0) is a basis of R3.

By the same reason, the computation in Example 2.22 (page 59) tells us that

(1, 1, 1), (1, 0, a), (−1,−a, 0) is a basis of R3 exactly when a 6= 0, 2.

2

Theorem 3.11.1 (page 146) illustrates a major theme in the theory of linear algebra. One may often lookat a phenomenon from different angles. Computationally, everything so far is done through row operations.However, new viewpoints leads us to new concepts and new insights.

The theorem may be proved by looking at the shape of the row echelon forms. The detail is omitted.

148

3.12 Basis of Subspaces (Column Space, Row Space, Null Space)

3.12 Basis of Subspaces (Column Space, Row Space, NullSpace)

149

3. Vector Spaces

150

Chapter 3

Vector Spaces (True or False)

3.1 If v1,v2, · · · ,vn and w1,w2, · · · ,wn are linearly independent, then v1+w1,v2+w2, · · · ,vn+wn is linearly independent.

3.2 If v1,v2, · · · ,vn and w1,w2, · · · ,wn are linearly dependent, then v1 +w1,v2 +w2, · · · ,vn +wn is linearly dependent.

3.3 If v1,v2, · · · ,vn is linearly independent, and vn,vn+1, · · · ,vn+k is linearly independent, thenv1,v2, · · · ,vn+k is linearly independent.

3.4 If v1,v2, · · · ,vn+k is linearly independent, then v1,v2, · · · ,vn is linearly independent.

3.5 If v1,v2, · · · ,vn+k is linearly dependent, then v1,v2, · · · ,vn is linearly dependent.

3.6 If any two of v1, v2, v3 are linearly independent, then the three vectors v1, v2, v3 are linearlyindependent.

3.7 If the three vectors v1, v2, v3 are linearly independent, then any two of v1, v2, v3 are linearlyindependent.

3.8 Let S be a set of k vectors in Rn. If k < n, then S is linearly independent.

3.9 If rank Am×n = m, then the columns of A are linearly independent.

151

3. Vector Spaces (True or False)

3.10 If rank Am×n = n, then the columns of A are linearly independent.

3.11 If rank Am×n = m, then the rows of A are linearly independent.

3.12 If rank Am×n = n, then the rows of A are linearly independent.

3.13 The columns of a matrix A are linearly independent if the equation Ax = 0 has the trivial solution.

3.14 The columns of any 4 × 5 matrix are linearly independent.

3.15 If rank Am×n = m, then the columns of A span Rm.

3.16 If rank Am×n = n, then the columns of A span Rm.

3.17 If b is in the span of v1,v2, · · · ,vn+1, then b is in the span of v1,v2, · · · ,vn.

3.18 If b is in the span of v1,v2, · · · ,vn, then b is in the span of v1,v2, · · · ,vn+1.

3.19 If v1, v2, · · · , vn are in Rm, then span v1,v2, · · · ,vn is the same as the column space of thematrix

[v1 v2 · · · vn

].

3.20 The row space of A is the same as the column space of At.

3.21 If v1, v2, · · · , vn span Rn, then v1,v2, · · · ,vn is linearly independent.

3.22 If v1, v2, · · · , vn span Rn, then v1,v2, · · · ,vn is a basis of Rn.

3.23 If columns of A5×3 are linearly independent, then rows of A span R3.

3.24 If v1, v2, · · · , vn span V, then dim V 6 n.

152

3.25 If v1, v2, · · · , vn span V, then v1, v2, · · · , vn, vn+1, · · · , vn+k span V.

3.26 If v1, v2, · · · , vn do not span V, then v1, v2, · · · , vn, vn+1 do not span V.

3.27 If v1, v2, · · · , vn span V, then v1,v2, · · · ,vn is a basis of V.

3.28 If v1,v2, · · · ,vn is a basis of Rn and An×n is invertible, then Av1,Av2, · · · ,Avn is also abasis of Rn.

3.29 The functions 1, sin t, cos t are linearly independent.

3.30 The functions 1, sin2 t, cos2 t are linearly independent.

3.31 A homogeneous equation is always consistent.

3.32 If x is a nontrivial solution of Ax = 0, then every entry in x is nonzero.

3.33 The set of all solutions of a system of m homogeneous equations in n unknowns is a subspace of Rm.

3.34 The columns of an invertible n × n matrix from a basis for Rn.

3.35 If rankA7×5 = 5, then the dimension of the null space of A is 0.

3.36 The dimension of NulA is the number of variables in the equation Ax = b.

3.37 The dimension of ColA is the number of pivot columns of A.

3.38 The dimensions of ColA and NulA add up to the number of columns of A.

3.39 If Ax = 0 has only trivial solution, then the columns of A form a basis of ColA.

153


3.40 If Ax = 0 has only trivial solution, then the rank of A is the number of columns of A.

3.41 If Ax = 0 has only trivial solution, then the rank of A is the number of rows of A.

3.42 If the columns of an m × n matrix A span Rm, then the equation Ax = b is consistent for each bin Rm.

3.43 If A is an m × n matrix whose columns do not span Rm, then the equation Ax = b is inconsistentfor some b in Rm.

3.44 If A is an m×n matrix and if the equation Ax = b is inconsistent for some b in Rm, then A cannothave a pivot position in every row.

3.45 If Ax = b has solutions for all b, then the columns of A form a basis of ColA.

3.46 If B is a row echelon form of a matrix A, then the pivot columns of B form a basis for ColA.

3.47 If Ax = b has solutions for all b, then the rank of A is the number of columns of A.

3.48 If Ax = b has solutions for all b, then the rank of A is the number of rows of A.

3.49 If[A b

]is an invertible 4 × 4 matrix, then b is in the column space of A.

3.50 If[A b

]is an 3 × 5 matrix with rankA = 3, then b is in the column space of A.

3.51 If[A b

]is an 4 × 4 matrix with rank

[A b

]= rankA, then b is in the column space of A.

3.52 If the vectors v1, v2, v3, v4 in R4 are linearly dependent, then the 4×4 matrix A =[v1 v2 v3 v4

]

is not invertible.

3.53 If the 4 × 4 matrix A =[v1 v2 v3 v4

]is not invertible, then the vectors v1, v2, v3, v4 in R4

are linearly dependent.

154

3.54 If A is an 3 × 3 matrix with rankA = 3, then ColA = R3.

3.55 If A is an 3 × 3 matrix with rankA = 3, then RowA = R3.

3.56 If v1 and v2 are in R4 and v1 is not a scalar multiple of v2, then v1,v2 is linearly independent.

3.57 If v1, v2, v3, v4 are vectors in R4 and v1,v2,v3 is linearly independent, then v1,v2,v3,v4 islinearly independent.

3.58 If v1, v2, v3, v4 are linearly independent vectors in R4, then v1,v2,v3 is linearly independent.

3.59 If V has a basis of n vectors, then every basis of V must consist of exactly n vectors.

3.60 The span of any vector in R3 is a line.

3.61 The span of any two nonzero vectors in R3 is a plane.

155


156

Chapter 3

Vector Spaces (WorkedExamples)

Example 3.1 (Linear combination of vectors)

Consider the vectors v1 = (1, 2,−1), v2 = (2, 3,−1), v3 = (3, 1, 1). Is b = (2, 7,−4) a linear combinationof v1, v2, v3?

Solution By a linear combination, we mean

c1v1 + c2v2 + c3v3 = b for some c1, c2, c3 ⇐⇒ c1

12−1

+ c2

23−1

+ c3

311

=

27−4

⇐⇒

c1 +2c2 +3c3 = 2,2c1 +3c2 +c3 = 7,−c1 −c2 +c3 = −4

has solutions.

Therefore we need to check whether the above system of linear equations has solutions for (c1, c2, c3) or not.By doing this, we simplify the augmented matrix as follows

1 2 3 22 3 1 7−1 −1 1 −4

−2R1+R2−−−−−−−→R1+R3

1 2 3 20 −1 −5 30 1 4 −2

2R2+R1−−−−−−−−−→R2+R3, −R2

1 0 −7 80 1 5 −30 0 −1 1

−7R3+R1−−−−−−−−−→5R3+R2, −R3

1 0 0 1

0

1 0 2

0 0

1 −1

.

Since the b-column is nonpivot, the system has solutions for (c1, c2, c3). In fact, the reduced row echelonform implies that we have

b = (1) · v1 + (2) · v2 + (−1) · v3 = v1 + 2v2 − v3.

Remark. In fact we may verify that the above expression of b in terms of v1, v2, v3 is correct.

v1 + 2v2 − v3 =

12−1

+ 2

23−1

−

311

=

1 + 4 − 32 + 6 − 1−1 − 2 − 1

=

27−4

= b.

157

3. Vector Spaces (Worked Examples)

2





12−1

+ c2

23−1

+ c3

312

=

27−4

⇐⇒

c1 +2c2 +3c3 = 2,2c1 +3c2 +c3 = 7,−c1 −c2 +2c3 = −4

has solutions.


1 2 3 22 3 1 7−1 −1 2 −4

−2R1+R2−−−−−−−→R1+R3

1 2 3 20 −1 −5 30 1 5 −2

R2+R3−−−−−→

1 2 3 2

0

−1 −5 3

0 0 0

1

.

Since the b-column is pivot, the system has no solution for (c1, c2, c3). Thus, b = (2, 7,−4) is not a linearcombination of v1, v2, v3. In other words, b 6= c1v1 + c2v2 + c3v3 for any c1, c2, c3. 2





12−1

+ c2

23−1

+ c3

312

=

27−5

⇐⇒

c1 +2c2 +3c3 = 2,2c1 +3c2 +c3 = 7,−c1 −c2 +2c3 = −5

has solutions.


1 2 3 22 3 1 7−1 −1 2 −5

−2R1+R2−−−−−−−→R1+R3

1 2 3 20 −1 −5 30 1 5 −3

2R2+R1−−−−−−−−−→R2+R3, −R2

1 0 −7 8

0

1 5 −30 0 0 0

.

Since the b-column is nonpivot, the system has solutions for (c1, c2, c3). In fact, the reduced row echelonform implies that there are (infinitely) many solutions for (c1, c2, c3). Equivalently there are (infinitely)many choices that b can be expressed as a linear combination of v1, v2, v3. In particular, we may choosec3 = 0, then

b = (8) · v1 + (−3) · v2 + (0) · v3 = 8v1 − 3v2.

2

158

Example 3.4 (General solution in vector form)

Solve the following system of linear equations in vector form. Then indicate the number of free variables,and the rank of the coefficient matrix.

x1 +2x3 = 1,

−2x1 +2x2 +x3 = 1,

x1 +x2 +4x3 = 2.


1 0 2 1−2 2 1 11 1 4 2

2R1+R2−−−−−−→−R1+R3

1 0 2 10 2 5 30 1 2 1

−2R3+R2−−−−−−−→R2↔R3

1 0 2 10 1 2 10 0 1 1

−2R3+R1−−−−−−−→−2R3+R2

1 0 0 −1

0

1 0 −1

0 0

1 1

.

Therefore the system has the solution x1 = −1, x2 = −1, x3 = 1, or in vector form,

x1

x2

x3

=

−1−11

.

Since the solution is unique, the number of free variables is 0. Also, by the above reduced row echelon formof the augmented matrix, we know that particularly the rank of the coefficient matrix is 3. 2



x1 −2x2 +2x3 = 0,

3x1 −6x2 +7x3 = −1,

2x1 −4x2 +3x3 = 1.


1 −2 2 03 −6 7 −12 −4 3 1

−3R1+R2−−−−−−−→−2R1+R3

1 −2 2 00 0 1 −10 0 −1 1

−2R2+R1−−−−−−−→R2+R3

1 −2 0 2

0 0

1 −10 0 0 0

.

Therefore the system has the solutions x1 = 2 + 2x2, x3 = −1, with x2 arbitrary, or in vector form,

x1

x2

x3

=

20−1

+ x2

210

.

Since the general solution contains a free variable x2, the solution is non-unique. Here the number offree variables is 1. Also, by the above reduced row echelon form of the augmented matrix, we know thatparticularly the rank of the coefficient matrix is 2. 2

159




x1 +2x2 −x3 = 2,

2x1 +4x2 −x3 = 3.


[1 2 −1 22 4 −1 3

]−2R1+R2−−−−−−−→

[1 2 −1 20 0 1 −1

]

R2+R1−−−−−→[

1 2 0 1

0 0

1 −1

].

Therefore the system has the solutions x1 = 1 − 2x2, x3 = −1, with x2 arbitrary, or in vector form,

x1

x2

x3

=

10−1

+ x2

−210

.

Since the general solution contains a free variable x2, the solution is non-unique. Here the number offree variables is 1. Also, by the above reduced row echelon form of the augmented matrix, we know thatparticularly the rank of the coefficient matrix is 2. 2



x1 +2x2 +3x3 +4x4 = 5,

2x1 +3x2 +4x3 +5x4 = 6,

3x1 +4x2 +5x3 +6x4 = 7.


1 2 3 4 52 3 4 5 63 4 5 6 7

−2R1+R2−−−−−−−→−3R1+R3

1 2 3 4 50 −1 −2 −3 −40 −2 −4 −6 −8

2R2+R1−−−−−−−−−−→−2R2+R3, −R2

1 0 −1 −2 −3

0

1 2 3 40 0 0 0 0

.

Therefore the system has the solutions x1 = −3 + x3 + 2x4, x2 = 4− 2x3 − 3x4, with x3, x4 arbitrary, or invector form,

x1

x2

x3

x4

=

−3400

+ x3

1−210

+ x4

2−301

.

Since the general solution contains two free variables x3 and x4, the solution is non-unique. Here the numberof free variables is 2. Also, by the above reduced row echelon form of the augmented matrix, we know thatparticularly the rank of the coefficient matrix is 2. 2

160

Remark. In Examples 3.4–3.7 (page 159), we observe the following formula:

number of variables = number of free variables + number of pivots in A

= number of free variables + rankA,

where A is the coefficient matrix. In particular, the formula becomes “3 = 0 + 3” for Example 3.4;“3 = 1 + 2” for both Examples 3.5, 3.6; “4 = 2 + 2” for Example 3.7. In fact, this is generally true for allsystems of linear equations.

Besides, we observe another fact that the general solution of Ax = b (if exist) can always be written as

x = xp + xh,

where xp is a particular solution of Ax = b and xh is the general solution of the corresponding homoge-neous system Ax = 0. Here xh = x1v1 + x2v2 + · · ·+ xkvk, where v1,v2, · · · ,vk are the fundamentalsolutions of Ax = 0, xj ’s are the free variables, and k is the number of free variables and is given by

k = number of variables − rankA.


Suppose Ax = b is a consistent system of 5 linear equations in 7 variables. If the rank of A is 3, how manyfree variables are there in the general solution?

Solution By the fact that

number of variables = number of free variables + rankA,

the general solution has 7 − 3 = 4 free variables. 2


Suppose Ax = b is a consistent system of 5 linear equations in 7 variables. At least how many free variablesare there in the general solution?

Solution Since A is an 5 × 7 matrix, there are at most 5 pivots of A, i.e., rankA 6 5. By the fact that


then, number of free variables = 7−rankA > 7−5 = 2, i.e., the general solution has at least 2 free variables.2


Suppose Ax = b is a consistent system of 4 linear equations in 8 variables. If there are > 4 free variablesin the general solution, what is the rank of A?



then, rankA = 8 − number of free variables 6 8 − 4 = 4, i.e., the rank of A must be no larger than 4. 2


If Am×nx = 0 has only trivial solution, then rankA = m. True or false?

Solution False. Am×nx = 0 has only trivial solution =⇒ no free variables =⇒ all columns of A arepivot =⇒ rankA = n (n 6 m). Counterexample: 3 × 2 matrix A having (1, 0, 0), (0, 1, 0) as columns. 2

161


Example 3.12 (Span)

What do the following vectors span? a plane? a straight line? or a point?

(a) (0, 0, 0). (b) (1, 2, 4), (4, 2, 1). (c) (1,−1, 1,−1, 1), (−1, 1,−1, 1,−1).

Solution

(a) v = (0, 0, 0). Thenspan (v) = cv : all number c = (0, 0, 0)

and hence v spans a point.

(b) v1 = (1, 2, 4), v2 = (4, 2, 1). Then v1 ∦ v2 and hence v1, v2 span a plane.

(c) v1 = (1,−1, 1,−1, 1), v2 = (−1, 1,−1, 1,−1). Then v1 = −v2 and v1 ‖ v2 and hence

span (v1,v2) = span (v1).

Thus, v1, v2 span a line. 2

Example 3.13 (Span)

Determine whether the vector b = (1, 0, 3) is in the straight line spanned by v = (2, 0, 6).

Solution There exists a constant c = 1/2 such that

b = cv.

By definition, b ∈ span (v). In other words, the vector b is in the straight line spanned by v, or

b ∈ span (v).

2

Example 3.14 (Span)

Determine whether the vector b = (1, 0,−2, 5) is in the straight line spanned by v = (1,−2, 5, 0).

Solution Let us assume that b ∈ span (v). Then

b = cv for some c ⇐⇒ (1, 0,−2, 5) = (c,−2c, 5c, 0).

The contradiction 5 = 0 indicates that, in fact,

b 6∈ span (v).

2

Example 3.15 (Span)

Determine whether the vector b = (0, 1,−3, 5) is in the plane spanned by the two vectors v1 = (0,−1, 3,−5),v2 = (−1, 3, 5, 0).

Solution By definition,

b ∈ span (v1,v2) ⇐⇒ b = x1v1 + x2v2 for some x1, x2

⇐⇒

−x2 = 0,−x1 +3x2 = 1,3x1 +5x2 = −3,

−5x1 = 5

has solutions.

Obviously the system has a unique solution (x1, x2) = (−1, 0). Hence,

b ∈ span (v1,v2).

2

162

Example 3.16 (Span)

Determine whether the vector b = (1, 0,−2, 4) is in the plane spanned by the two vectors v1 = (1,−2, 4, 0),v2 = (1,−7, 2, 3).

Solution By definition,

b ∈ span (v1,v2) ⇐⇒ b = x1v1 + x2v2 for some x1, x2

⇐⇒

x1 +x2 = 1,−2x1 −7x2 = 0,

4x1 +2x2 = −2,3x2 = 4

has solutions.

Therefore we need to check whether the above system of linear equations has solutions for (x1, x2) or not.By doing this, we simplify the augmented matrix as follows

1 1 1−2 −7 04 2 −20 3 4

2R1+R2−−−−−−−→−4R1+R3

1 1 10 −5 20 −2 −60 3 4

− 1

2R3−−−−−→

1 1 10 −5 20 1 30 3 4

−R3+R1, 5R3+R2−−−−−−−−−−−−→−3R3+R4

1 0 −20 0 170 1 30 0 −5

175

R4+R2−−−−−−→Ri↔Rj

1 0 −2

0

1 3

0 0

−5

0 0 0

.

Since the b-column is pivot, the system has no solution for (x1, x2). Hence,

b 6∈ span (v1,v2).

2

Example 3.17 (Span)

Let v1 = (1, 0, 2, 0), v2 = (2,−1, 5,−2), v3 = (0, 1,−1, 2), v4 = (1,−2, 4,−4), and b = (1, 1, 1, 2). Showthat b is in the span of v1, v2, v3, v4. Then show that b is in fact in the span of two of the four vectors.

Solution We simplify the matrix[v1 v2 v3 v4 b

]as follows

1 2 0 1 10 −1 1 −2 12 5 −1 4 10 −2 2 −4 2

−2R1+R3−−−−−−−→

1 2 0 1 10 −1 1 −2 10 1 −1 2 −10 −2 2 −4 2

R2+R3−−−−−−−→

−2R2+R4

1 2 0 1 1

0

−1 1 −2 1

0 0 0 0 00 0 0 0 0

.

Since the b-column is nonpivot, the system has solutions. Hence, b ∈ span (v1,v2,v3,v4).

In fact, the row operations above show that only the first two columns (i.e., v1, v2) are pivot. Thisimplies that if we try to simplify the 4 × 3 matrix

[v1 v2 b

], the b-column will still be nonpivot and

hence b ∈ span (v1,v2). The vector b is in the span of the two vectors v1, v2. We may continue the rowoperations to get a reduced row echelon form

1 2 10 −1 12 5 10 −2 2

−−−−−→

1 2 10 −1 10 0 00 0 0

2R2+R1−−−−−→

−R2

1 0 3

0

1 −10 0 00 0 0

.

b = 3v1 − v2 ∈ span (v1,v2).

2

163


Example 3.18 (Row space and column space)

Determine whether the vector b = (3, 5) is in the column space of A =

[4 7−2 −3

].

Solution Recall the fact that

b ∈ ColA ⇐⇒ b is a linear combination of the columns of A

⇐⇒ Ax = b has solutions.

We simplify the augmented matrix[A b

]as follows

[4 7 3−2 −3 5

]2R2+R1−−−−−→

[0 1 13−2 −3 5

]3R1+R2−−−−−→

[0 1 13−2 0 44

]− 1

2R2−−−−−→

R2↔R1

[

1 0 −22

0

1 13

].

Since the last column (i.e., b-column) is nonpivot, the system Ax = b has solutions. Hence,

b ∈ ColA.

2


Determine whether the vector b = (1, 2) is in the column space of A =

[1 2−1 −2

].

Solution We simplify the augmented matrix[A b

]as follows

[1 2 1−1 −2 2

]R1+R2−−−−−→

[

1 2 1

0 0

3

].

Since the last column (i.e., b-column) is pivot, the system Ax = b has no solution, then b is not a linearcombination of the columns of A. Hence,

b 6∈ ColA.

2


Determine whether the vector b = (1, 0,−2, 4) is in the column space of A =

1 2 3 32 −3 −1 −20 2 2 12 −2 0 1

.


]as follows

1 2 3 3 12 −3 −1 −2 00 2 2 1 −22 −2 0 1 4

−2R1+R2−−−−−−−→−2R1+R4

1 2 3 3 10 −7 −7 −8 −20 2 2 1 −20 −6 −6 −5 2

12

R3−−−−−→R2↔R3

1 2 3 3 10 1 1 1/2 −10 −7 −7 −8 −20 −6 −6 −5 2

7R2+R3−−−−−→6R2+R4

1 2 3 3 10 1 1 1/2 −10 0 0 −9/2 −90 0 0 −2 −4

− 4

9R3+R4−−−−−−−→

− 29

R3

1 2 3 3 1

0

1 1 1/2 −1

0 0 0

1 20 0 0 0 0

.

Since the last column (i.e., b-column) is nonpivot, the system Ax = b has solutions. Hence,

b ∈ ColA.

2

164


Find the condition for the vector b = (b1, b2, b3) to be in the column space of A =

4 22 −21 −7

.


]as follows

4 2 b1

2 −2 b2

1 −7 b3

−4R3+R1−−−−−−−→−2R3+R2

0 30 b1 − 4b3

0 12 b2 − 2b3

1 −7 b3

− 52

R2+R1−−−−−−−→

0 0 b1 − 5

2b2 + b3

0 12 b2 − 2b3

1 −7 b3

R1↔R3−−−−−→

1 −7 b3

0

12 b2 − 2b3

0 0 b1 − 52

b2 + b3

.

Therefore the condition for b to be in the column space of A is given by

b1 −5

2b2 + b3 = 0.

2


Find the condition for the vector b = (b1, b2, b3, b4) to be in the column space of A =

1 2 3 32 −3 −1 20 2 2 12 −2 0 1

.

Find a minimal set of vectors which spans the column space of A.


]as follows

1 2 3 3 b1

2 −3 −1 2 b2

0 2 2 1 b3

2 −2 0 1 b4

−2R1+R2−−−−−−−→−2R1+R4

1 2 3 3 b1

0 −7 −7 −4 −2b1 + b2

0 2 2 1 b3

0 −6 −6 −5 −2b1 + b4

72

R3+R2−−−−−−→3R3+R4

1 2 3 3 b1

0 0 0 − 12

−2b1 + b2 + 72

b3

0 2 2 1 b3

0 0 0 −2 −2b1 + 3b3 + b4

− 14

R4+R2−−−−−−−→Ri↔Rj

1 2 3 3 b1

0

2 2 1 b3

0 0 0

−2 −2b1 + 3b3 + b4

0 0 0 0 − 14

(6b1 − 4b2 − 11b3 + b4)

.

Therefore the condition for b to be in the column space of A is given by

6b1 − 4b2 − 11b3 + b4 = 0.

In fact, the row operations above show that only the columns 1, 2, 4 of A (say v1, v2, v4) are pivot.This implies that a minimal set of vectors which spans the column space of A is

v1, v2, v4.

Similarly, the row operations above show that only the first three rows of A are pivot. This implies that aminimal set of vectors which spans the row space of A is

(1, 2, 3, 3), (0, 2, 2, 1), (0, 0, 0, −2).

2

165


Example 3.23 (Linear independence)

Determine whether the following vectors are linearly dependent or not.

v1 =

124

, v2 =

421

.

Solution Since v1, v2 are nonparallel vectors, they are linearly independent. 2



v1 =

−11−11−1

, v2 =

1−11−11

.

Solution Since v1 = −1 · v2, the vectors v1, v2 are parallel and hence linearly dependent. 2



v1 =

121

, v2 =

212

, v3 =

112

.

Solution In case of more than two vectors, we may make use of the following fact.

v1, v2, · · · , vn are linearly independent ⇐⇒ rank[v1 v2 · · · vn

]= n

⇐⇒ all columns are pivot.

We simplify the matrix[v1 v2 v3

]as follows

1 2 12 1 11 2 2

−2R1+R2−−−−−−−→−R1+R3

1 2 1

0

−3 −1

0 0

1

.

Since all columns are pivot, the three vectors v1, v2, v3 are linearly independent. 2



v1 =

123

, v2 =

−112

, v3 =

−1712

.

Solution We simplify the matrix[v1 v2 v3

]as follows

1 −1 −12 1 73 2 12

−2R1+R2−−−−−−−→−3R1+R3

1 −1 −10 3 90 5 15

13

R2−−−−−→15

R3

1 −1 −10 1 30 1 3

−R2+R3−−−−−−→

1 −1 −1

0

1 30 0 0

.

Since the last column is nonpivot, the three vectors v1, v2, v3 are linearly dependent. 2

166


Determine whether the following vectors are linearly dependent or not. For linearly dependent vectors, writeone vector as a linear combination of the others.

v1 =

124

, v2 =

421

, v3 =

2−2−7

, v4 =

30−3

.

Solution We simplify the matrix[v1 v2 v3 v4

]as follows

1 4 2 32 2 −2 04 1 −7 −3

−2R1+R2−−−−−−−→−4R1+R3

1 4 2 30 −6 −6 −60 −15 −15 −15

− 16

R2−−−−−→

1 4 2 30 1 1 10 −15 −15 −15

−4R2+R1−−−−−−−→15R2+R3

1 0 −2 −1

0

1 1 10 0 0 0

.

Since not all columns of the matrix[v1 v2 v3 v4

]are pivot, the four vectors v1, v2, v3, v4 are linearly

dependent. In fact, since only the first two columns are pivot, we call v1,v2 a maximal set of independentvectors. From the same row operations above,

[v1 v2 v3

]−→

1 0 −2

0

1 10 0 0

,[v1 v2 v4

]−→

1 0 −1

0

1 10 0 0

,

we know that even only one vector is added into the set v1,v2, the enlarged set will definitely becomelinearly dependent. In particular, we have

v3 = −2v1 + v2, v4 = −v1 + v2.

2



v1 =

1122

, v2 =

−101−1

, v3 =

34104

, v4 =

0−3−21

.


]as follows

1 −1 3 01 0 4 −32 1 10 −22 −1 4 1

−R1+R2, −2R1+R3−−−−−−−−−−−−−→

−2R1+R4

1 −1 3 00 1 1 −30 3 4 −20 1 −2 1

−3R2+R3−−−−−−−→−R2+R4

1 −1 3 00 1 1 −30 0 1 70 0 −3 4

3R3+R4−−−−−→

1 −1 3 0

0

1 1 −3

0 0

1 7

0 0 0

25

.

Since all columns are pivot, the four vectors v1, v2, v3, v4 are linearly independent. 2

167




v1 =

1212

, v2 =

−110−1

, v3 =

3−244

, v4 =

0−2−31

.


]as follows

1 −1 3 02 1 −2 −21 0 4 −32 −1 4 1

−2R1+R2, −R1+R3−−−−−−−−−−−−−→

−2R1+R4

1 −1 3 00 3 −8 −20 1 1 −30 1 −2 1

−3R3+R2−−−−−−−→−R3+R4

1 −1 3 00 0 −11 70 1 1 −30 0 −3 4

R2↔R3−−−−−→− 1

3R4

1 −1 3 00 1 1 −30 0 −11 70 0 1 −4/3

11R4+R3−−−−−−→

1 −1 3 00 1 1 −30 0 0 −23/30 0 1 −4/3

R3↔R4−−−−−→

1 −1 3 0

0

1 1 −3

0 0

1 −4/3

0 0 0

−23/3

.

Since all columns are pivot, the vectors v1, v2, v3, v4 are linearly independent. 2



v1 =

1020

, v2 =

2−151

, v3 =

01−14

, v4 =

−13−57

.


]as follows

1 2 0 −10 −1 1 32 5 −1 −50 1 4 7

−2R1+R3−−−−−−−→

1 2 0 −10 −1 1 30 1 −1 −30 1 4 7

2R2+R1−−−−−−−−−−→

R2+R3, R2+R4

1 0 2 50 −1 1 30 0 0 00 0 5 10

−R2, 15

R4−−−−−−−→R3↔R4

1 0 2 50 1 −1 −30 0 1 20 0 0 0

−2R3+R1−−−−−−−→

R3+R2

1 0 0 1

0

1 0 −1

0 0

1 20 0 0 0

.

Since not all columns are pivot, the four vectors v1, v2, v3, v4 are linearly dependent. However, among them,the three vectors v1, v2, v3 are linearly independent and they form a maximal set of linearly independentvectors. To express v4 as a linear combination of the others, we have

v4 = v1 − v2 + 2v3.

2

168



v1 =

1020

, v2 =

2−15−2

, v3 =

01−12

, v4 =

1−24−4

.


]as follows

1 2 0 10 −1 1 −22 5 −1 40 −2 2 −4

−2R1+R3−−−−−−−→

1 2 0 10 −1 1 −20 1 −1 20 −2 2 −4

2R2+R1−−−−−−−−−−−−→R2+R3, −2R2+R4

1 0 2 −30 −1 1 −20 0 0 00 0 0 0

−R2−−−−−→

1 0 2 −3

0

1 −1 20 0 0 00 0 0 0

.

Since the last two columns are nonpivot, the four vectors v1, v2, v3, v4 are linearly dependent. However,among them, the two vectors v1, v2 are linearly independent and they form a maximal set of linearlyindependent vectors. To express v3 (resp. v4) as a linear combination of v1, v2, we have

v3 = 2v1 − v2, v4 = −3v1 + 2v2.

2


Show that the following vectors are linearly dependent. Then find maximal number of linearly independentvectors from the five.

v1 =

012−1

, v2 =

0−2−42

, v3 =

14−31

, v4 =

2311

, v5 =

−1302

.

Solution We simplify the matrix[v1 v2 v3 v4 v5

]as follows

0 0 1 2 −11 −2 4 3 32 −4 −3 1 0−1 2 1 1 2

−2R2+R3−−−−−−−→

R2+R4

0 0 1 2 −11 −2 4 3 30 0 −11 −5 −60 0 5 4 5

11R1+R3−−−−−−−→−5R1+R4

0 0 1 2 −11 −2 4 3 30 0 0 17 −170 0 0 −6 10

117

R3−−−−−→

0 0 1 2 −11 −2 4 3 30 0 0 1 −10 0 0 −6 10

R1↔R2−−−−−→6R3+R4

1 −2 4 3 3

0 0

1 2 −1

0 0 0

1 −1

0 0 0 0

4

.

Since the second column is nonpivot, the five vectors v1, v2, v3, v4, v5 are linearly dependent. In fact,one may observe in the beginning that v2 = −2v1 which implies that v1, v2, v3, v4, v5 are indeed linearlydependent. However, among them, the four vectors v1, v3, v4, v5 are linearly independent and they forma maximal set of linearly independent vectors. 2

169



Show that 9 vectors in R6 must be linearly dependent.

Solution Suppose the 9 (column) vectors v1, v2, · · · , v9 ∈ R6 are given. Then we may construct the 6×9partition matrix A =

[v1 v2 · · · v9

]. Although we do not know exactly what are the given vectors,

we do know the fact that rankA 6 6 (limited by the number of rows of A). It follows that rankA < 9, i.e.,not all columns of A are pivot columns, we conclude that v1,v2, · · · ,v9 must be linearly dependent. 2


Let A =

1 1 2−1 −1 −23 2 3−1 0 1

. Are the rows of A linearly independent? If not, write some rows as the linear

combinations of the others.

Solution We simplify the transpose matrix At as follows

1 −1 3 −11 −1 2 02 −2 3 1

−R1+R2−−−−−−−→−2R1+R3

1 −1 3 −10 0 −1 10 0 −3 3

3R2+R1−−−−−−−−−−→−3R2+R3, −R2

1 −1 0 2

0 0

1 −10 0 0 0

.

Since not all columns of At are pivot, the four column vectors of At are linearly dependent. Equivalently,the four row vectors of A are linearly dependent. In fact, the row 1 and row 3 form a maximal independentset of vectors. We also have

row 2 = −row 1, row 4 = 2 row 1 − row 3.

2


LetS = span ((1, 0, 1, 0, 1), (1, 1,−1, 0,−2), (2, 1, 0, 0,−1), (4, 1, 2, 0, 1)) .

(a) What is the dimension of S, i.e., the number of linearly independent vectors that span S?

(b) Determine whether the vector (k, 1 − k, 3k − 2, 0, 4k − 3) is in S for all constant k.

Solution

(a) We do row operations,

1 1 2 40 1 1 11 −1 0 20 0 0 01 −2 −1 1

−R1+R3−−−−−−→−R1+R5

1 1 2 40 1 1 10 −2 −2 −20 0 0 00 −3 −3 −3

−R2+R1, 2R2+R3−−−−−−−−−−−−→

3R2+R5

1 0 1 3

0

1 1 10 0 0 00 0 0 00 0 0 0

.

The first two columns form a maximal independent set that spans S. The dimension of S is 2.

(b) The given vector (k, 1− k, 3k − 2, 0, 4k − 3) is in S if and only if there exist constants a, b such that

a(1, 0, 1, 0, 1) + b(1, 1,−1, 0,−2) = (k, 1 − k, 3k − 2, 0, 4k − 3).

This is a system of linear equations for (a, b). In fact, for each constant k, the system has a uniquesolution a = 2k − 1, b = 1 − k. Thus, for all constant k,

(k, 1 − k, 3k − 2, 0, 4k − 3) ∈ S.

2

170

Example 3.36 (Basis of Rn)

Consider vectors in the Euclidean space R3. Find an example of two bases v1,v2,v3, w1,w2,w3,such that v1 + w1, v2 + w2, v3 + w3 is not a basis.

Solution Let e1, e2, e3 def.= (1, 0, 0), (0, 1, 0), (0, 0, 1) be the standard basis of R3. Take

v1,v2,v3 = e1, e2, e3 and w1,w2,w3 = −e1,−e2,−e3.

Then v1,v2,v3 and w1,w2,w3 are bases of R3. But then v1 + w1, v2 + w2, v3 + w3 = 0,0,0 isof course linearly dependent and hence it is not a basis. 2


Consider vectors in the Euclidean space R3. Find an example of two non-bases v1,v2,v3, w1,w2,w3,such that v1 + w1, v2 + w2, v3 + w3 is a basis.

Solution Let e1, e2, e3 be the standard basis of R3. Take

v1,v2,v3 = e1, e2,0 and w1,w2,w3 = 0,0, e3.

Then v1,v2,v3 is not a basis (v1, v2, v3 being dependent vectors) because

0 · e1 + 0 · e2 + 1 · 0 = 0,

and w1,w2,w3 is also not a basis (w1, w2, w3 being dependent vectors) because

1 · 0 + 1 · 0 + 0 · e3 = 0.

But then v1 + w1, v2 + w2, v3 + w3 = e1, e2, e3 is of course a basis of R3. 2


Determine whether the vectors (1, 0), (1, 2) form a basis of the Euclidean space R2.

Solution The two given vectors in R2 are nonparallel and hence linearly independent. Two linearlyindependent vectors in R2 naturally form a basis of R2. In particular, (1, 0), (1, 2) form a basis of R2. 2


Determine whether the vectors (1, 0, 0), (1, 2, 0) form a basis of the Euclidean space R2.

Solution Definitely not. A basis of R2 must be a subset of R2. In other words, a basis of R2 must containvectors in R2 only. Thus the two given vectors in R3 cannot form a basis of R2. 2


Determine whether the vectors (1, 2), (3, 4), (5, 6) form a basis of the Euclidean space R2.

Solution Three vectors in R2 must be linearly dependent. Therefore, (1, 2), (3, 4), (5, 6) do not form abasis for R2. A basis of R2 must be two linearly independent vectors in R2. 2

171



Determine whether the vectors (1, 2, 3), (4, 5, 6) form a basis of the Euclidean space R3.

Solution In general, a basis of Rn must contain exactly n (linearly independent) vectors in Rn. Hence,the two vectors (1, 2, 3), (4, 5, 6) do not form a basis of R3. 2


Determine whether the vectors (1.27, 4.36, −5.28, 3.72), (−0.35, 3.29, 6.54, −4.31), (2.51, −7.22, 1.98, 6.48)form a basis of the Euclidean space R4.

Solution Three vectors cannot span R4. The vectors do not form a basis of R4. 2


Determine whether the vectors

v1 =

121

, v2 =

212

, v3 =

112

form a basis of the Euclidean space R3.


v1,v2, · · · ,vn ∈ Rn form a basis of Rn ⇐⇒ rank[v1 v2 · · · vn

]= n,

we need to apply the row operations to the matrix[v1 v2 v3

]and find its rank. Then

1 2 12 1 11 2 2

−2R1+R2−−−−−−−→−R1+R3

1 2 1

0

−3 −1

0 0

1

.

Therefore, rank[v1 v2 v3

]= 3. Hence, the vectors v1, v2, v3 form a basis of R3. 2


Determine whether the vectors

v1 =

121

, v2 =

212

, v3 =

111

form a basis of the Euclidean space R3.

Solution We apply the row operations to the matrix[v1 v2 v3

]and find its rank. Then

1 2 12 1 11 2 1

−2R1+R2−−−−−−−→−R1+R3

1 2 1

0

−3 −1

0 0 0

− 1

3R2−−−−−→

1 2 10 1 1/30 0 0

−2R2+R1−−−−−−−→

1 0 1/3

0

1 1/30 0 0

.

Therefore only the first two columns are pivot, and rank[v1 v2 v3

]= 2 6= 3. Hence, v1, v2, v3 do not

form a basis of R3. In fact, the vectors v1, v2, v3 are linearly dependent because

v3 =1

3v1 +

1

3v2.

2

172


Let

v1 =

21−1

, v2 =

95−4

, v3 =

−1−1−1

, v4 =

−3−21

.

Which subsets of v1,v2,v3,v4 are bases of R3.

Solution We do row operations to the matrix[v1 v2 v3 v4

]. Then

2 9 −1 −31 5 −1 −2−1 −4 −1 1

−2R2+R1−−−−−−−→R2+R3

0 −1 1 11 5 −1 −20 1 −2 −1

R3+R1−−−−−−−→−5R3+R2

0 0 −1 01 0 9 30 1 −2 −1

9R1+R2, −2R1+R3−−−−−−−−−−−−−→−R1, Ri↔Rj

1 0 0 3

0

1 0 −1

0 0

1 0

.

From the above row operations, we observe the following.

(1) We can reduce the matrix[v1 v2 v3

]to a reduced row echelon form in which all columns are pivot.

[v1 v2 v3

]−→

1 0 0

0

1 0

0 0

1

.


]to a reduced row echelon form in which a column is nonpivot.

[v1 v2 v4

]−→

1 0 3

0

1 −10 0 0

.



[v1 v4 v3

]−→

1 3 00 −1 00 0 1

−→

1 0 0

0

1 0

0 0

1

.



[v4 v2 v3

]−→

3 0 0−1 1 00 0 1

−→

1 0 0

0

1 0

0 0

1

.

In conclusion, by combining the cases (1)–(4), we deduce that

v1, v2, v3, v1, v3, v4, v2, v3, v4

are bases of R3. No other subsets of v1,v2,v3,v4 form bases of R3. 2

173



Find condition on a such that the vectors

v1 =

1−21

, v2 =

2−3a

, v3 =

a3−1

form a basis of R3.

Solution We apply the row operations to the matrix[v1 v2 v3

]. Then

1 2 a−2 −3 31 a −1

2R1+R2−−−−−−→−R1+R3

1 2 a0 1 2a + 30 a − 2 −1 − a

(2−a)R2+R3−−−−−−−−→

1 2 a

0

1 2a + 30 0 5 − 2a2

.

For v1, v2, v3 to form a basis of R3, all columns of the last matrix must be pivot. Therefore, the conditionon a such that the vectors form a basis of R3 is 5 − 2a2 6= 0, or

a 6= ±√

5/2.

2


Extend the vectors

v1 =

21−21

, v2 =

1201

to form a basis of R4.

Solution We add four standard vectors e1, e2, e3, e4 (called the standard basis of R4) to v1, v2 and lateron eliminate two of them, so that the remaining four will form a basis of R4. Consider

e1 =

1000

, e2 =

0100

, e3 =

0010

, e4 =

0001

.

We apply the row operations to the matrix[v1 v2 e1 e2 e3 e4

]. Then

2 1 1 0 0 01 2 0 1 0 0−2 0 0 0 1 01 1 0 0 0 1

−2R2+R1, 2R2+R3−−−−−−−−−−−−−→

−R2+R4

0 −3 1 −2 0 01 2 0 1 0 00 4 0 2 1 00 −1 0 −1 0 1

−3R4+R1−−−−−−−→4R4+R3

0 0 1 1 0 −31 2 0 1 0 00 0 0 −2 1 40 −1 0 −1 0 1

Ri↔Rj−−−−−→

1 2 0 1 0 0

0

−1 0 −1 0 1

0 0

1 1 0 −3

0 0 0

−2 1 4

.

Since the first four columns are pivot, we conclude that

v1 =

21−21

, v2 =

1201

, e1 =

1000

, e2 =

0100

form a basis of R4. 2

174


Consider

A =

0 0 1 2−2 2 −1 41 −1 1 −1

, b =

1h1

.

(a) For what value(s) of h, does the system Ax = b have solutions?

(b) Find the general solution of the homogeneous system Ax = 0.

(c) Can you find three columns of A that form a basis of R3? Explain.

Solution

(a) We simplify the augmented matrix[A b

]as follows

0 0 1 2 1−2 2 −1 4 h1 −1 1 −1 1

2R3+R2−−−−−→

0 0 1 2 10 0 1 2 h + 21 −1 1 −1 1

−R1+R2−−−−−−→−R1+R3

0 0 1 2 10 0 0 0 h + 11 −1 0 −3 0

Ri↔Rj−−−−−→

1 −1 0 −3 0

0 0

1 2 10 0 0 0 h + 1

.

Hence the system has solutions only if h + 1 = 0, or

h = −1.

(b) By (a) we apply the same row operations to the coefficient matrix A.

0 0 1 2−2 2 −1 41 −1 1 −1

−−−−−→

1 −1 0 −3

0 0

1 20 0 0 0

.

We can determine the general solution of the homogeneous system Ax = 0. Let x = (x1, x2, x3, x4).Then the general solution of Ax = 0 is given by x1 = x2 + 3x4, x3 = −2x4, where x2 and x4 are freevariables. In vector form, the general solution is

x =

x1

x2

x3

x4

=

x2 + 3x4

x2

−2x4

x4

= x2

1100

+ x4

30−21

,

where x2 and x4 are arbitrary.

(c) By the reduced row echelon form of the coefficient matrix A in (b), we know that only the first andthe third columns of A are pivot and hence rankA = 2. Therefore, the columns of A do not span R3

and hence we cannot find three columns of A that form a basis of R3.

2

175


Example 3.49 (Bases for Subspaces)

Find bases for ColA and RowA, where

A =

1 1 5 1 42 −1 1 2 23 0 6 0 −3

.

Solution We apply the row operations to A as follows

1 1 5 1 42 −1 1 2 23 0 6 0 −3

−2R1+R2−−−−−−−→−3R1+R3

1 1 5 1 40 −3 −9 0 −60 −3 −9 −3 −15

−R2+R3−−−−−−→

1 1 5 1 40 −3 −9 0 −60 0 0 −3 −9

− 13

R2−−−−−→− 1

3R3

1 1 5 1 40 1 3 0 20 0 0 1 3

−R2+R1−−−−−−→

1 0 2 1 20 1 3 0 20 0 0 1 3

−R3+R1−−−−−−→

1 0 2 0 −1

0

1 3 0 2

0 0 0

1 3

.

Thus, all three rows of A are pivot rows, while the first, the second, the fourth columns are pivot columns.Hence, the row vectors

r1 = (1, 1, 5, 1, 4), r2 = (2,−1, 1, 2, 2), r3 = (3, 0, 6, 0,−3)

form a basis for RowA. Besides, the column vectors

v1 =

123

, v2 =

1−10

, v4 =

120

form a basis for ColA. 2


Find a basis ofspan ((1, 0, 3, 1, 2), (2,−1, 1, 3, 5), (−1, 1, 2,−1, 2), (0, 2, 10, 1, 13)) .

What is its dimension?

Solution By definition, a subspace S of the Euclidean space Rn is said to be of finite dimension k ork-dimensional, written dim S = k, if S has a basis with k elements.

Let S be the given span of four vectors in R5. We first need to find a basis of S. By row operations,

1 2 −1 00 −1 1 23 1 2 101 3 −1 12 5 2 13

−3R1+R3, −R1+R4−−−−−−−−−−−−−→

−2R1+R5

1 2 −1 00 −1 1 20 −5 5 100 1 0 10 1 4 13

−5R2+R3, R2+R4−−−−−−−−−−−−→R2+R5

1 2 −1 00 −1 1 20 0 0 00 0 1 30 0 5 15

−5R4+R5−−−−−−−→R3↔R4

1 2 −1 0

0

−1 1 2

0 0

1 30 0 0 00 0 0 0

.

Therefore the vectors(1, 0, 3, 1, 2), (2,−1, 1, 3, 5), (−1, 1, 2,−1, 2)

form a basis of the span. Accordingly, the dimension of the span is 3, or dim S = 3. 2

176


Let

A =

1 2 0 −11 3 1 12 5 1 03 6 0 01 5 3 5

.

Find (1) the rank of A, (2) the dimension of ColA, and (3) the dimension of NulA.

Solution We apply the row operations to A as follows

1 2 0 −11 3 1 12 5 1 03 6 0 01 5 3 5

−R1+R2, −2R1+R3−−−−−−−−−−−−−→−3R1+R4, −R1+R5

1 2 0 −10 1 1 20 1 1 20 0 0 30 3 3 6

−2R2+R1, −R2+R3−−−−−−−−−−−−−→

−3R2+R5

1 0 −2 −50 1 1 20 0 0 00 0 0 30 0 0 0

13

R4−−−−−→R3↔R4

1 0 −2 −50 1 1 20 0 0 10 0 0 00 0 0 0

5R3+R1−−−−−−−→−2R3+R2

1 0 −2 0

0

1 1 0

0 0 0

10 0 0 00 0 0 0

.

(1) Columns 1, 2, 4 of A are pivot, the rank of A is therefore 3, or

rankA = 3.

(2) By the dimension of a subspace like ColA, please refer to the definition in Example 3.50 (page 176).Now by the row operations above, we know that the following columns of A: (1, 1, 2, 3, 1), (2, 3, 5, 6, 5),(−1, 1, 0, 0, 5) form a basis of ColA. Hence, the dimension of ColA is 3, or

dim ColA = 3.

(3) We need to solve the homogeneous system Ax = 0, where x = (x1, x2, x3, x4). By the above rowoperations, we know that the homogeneous system has the general solution

x =

x1

x2

x3

x4

=

2x3

−x3

x3

0

= x3

2−110

.

Thus the vector (2,−1, 1, 0) forms a basis of NulA. Hence, the dimension of NulA is 1, or

dim NulA = 1.

Remark. Here we just mention without proof the following fact. Let A be an m × n matrix. Then

rankA = dim ColA,

rankA + dim NulA = n.

2

177



Let

A =

1 1 2 1 00 2 2 4 28 2 10 0 26 3 9 2 1

.

(a) Find the row echelon form of A. Then find a subset of the set of columns of A, which forms a basisof ColA.

(b) Find a subset of the set of rows of A, which forms a basis of RowA.

Solution

(a) We apply the row operations to A as follows

1 1 2 1 00 2 2 4 28 2 10 0 26 3 9 2 1

−8R1+R3−−−−−−−→−6R1+R4

1 1 2 1 00 2 2 4 20 −6 −6 −8 20 −3 −3 −4 1

12

R2−−−−−→

1 1 2 1 00 1 1 2 10 −6 −6 −8 20 −3 −3 −4 1

−R2+R1, 6R2+R3−−−−−−−−−−−−→3R2+R4

1 0 1 −1 −10 1 1 2 10 0 0 4 80 0 0 2 4

− 1

2R3+R4−−−−−−−→14

R3

1 0 1 −1 −1

0

1 1 2 1

0 0 0

1 20 0 0 0 0

.

The following columns of A:

1086

,

1223

,

1402

form a basis of ColA.

(b) We apply the row operations to the transpose matrix At as follows

1 0 8 61 2 2 32 2 10 91 4 0 20 2 2 1

−R1+R2, −2R1+R2−−−−−−−−−−−−−→

−R1+R4

1 0 8 60 2 −6 −30 2 −6 −30 4 −8 −40 2 2 1

−R2+R3, −2R2+R4−−−−−−−−−−−−−→

−R2+R5

1 0 8 60 2 −6 −30 0 0 00 0 4 20 0 8 4

−2R4+R5, 12

R2−−−−−−−−−−→Ri↔Rj

1 0 8 60 1 −3 −3/20 0 4 20 0 0 00 0 0 0

14

R3−−−−−→

1 0 8 6

0

1 −3 −3/2

0 0

1 1/20 0 0 00 0 0 0

.

The following rows of A:

(1, 1, 2, 1, 0), (0, 2, 2, 4, 2), (8, 2, 10, 0, 2)

form a basis of RowA.

Remark. Alternatively, we may make use of the same row operations in (a) and immediatelyconclude that the first three rows of A are pivot and hence the first three rows of A form a basis ofRowA. Generally, we have two methods for finding a basis of RowA.

2

178

Chapter 4

Determinants and Eigenvalues

4.1 Find Determinants by Cofactor Expansions

4.2 Determinants and Inverses

By Theorem 1.6.1 (page 10) we have seen the formula for calculating the determinant of a given 2 × 2matrix. Let us discuss the details of determinants in this section.

For a 2× 2 matrix A =

[a bc d

], the number ad− bc is called the determinant of the matrix. That is,

detA = ad − bc. Theorem 1.6.1 says that

A2×2 is invertible ⇐⇒ detA 6= 0.

Example 4.1 (Invertible 2 × 2 matrix)

Consider the matrix

A =

[−2 6−2 5

].

Then its determinant is

detA = (−2)(5) − 6(−2) = 2.

The given matrix A is invertible. 2

179

4. Determinants and Eigenvalues

Example 4.2 (Non-invertible 2 × 2 matrix)

Consider the matrix

A =

[4 −2−2 1

].

Then its determinant isdetA = (4)(1) − (−2)(−2) = 0.

The given matrix A is not invertible. 2

For a 3 × 3 matrix

A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

,

the determinant is

detA = a11a22a33 + a12a23a31 + a13a21a32 − a13a22a31 − a11a23a32 − a12a21a33,

and we haveA3×3 is invertible ⇐⇒ detA 6= 0.

The fact also holds for any square matrix A, with much more complicated definition of determinants. InChapter 3, we shall present a method of finding eigenvalues of a matrix using a determinant equation.

λ is an eigenvalue of A ⇐⇒ det (A − λI) = 0.

Moreover, the following properties may help you computing the determinants (see Table 2.1 on page 52

for the three row operations).

180

4.2 Determinants and Inverses

1. ArRi+Rj−−−−−→ B =⇒ detB = detA.

2. ArRi−−−−−→ B =⇒ detB = r · detA.

3. ARi↔Rj−−−−−→ B =⇒ detB = −detA.

4. detA = detAt.

At is the transpose of A for which rows and columns are interchanged. Thus, the last property impliesthat there is no difference between rows and columns in computing the determinant. In particular, if wecarry out column operations similar to row operations, then we have the properties similar to the firstthree.

Example 4.3 (3 × 3 matrix)

Consider the matrix

A =

1 3 −3−3 7 −3−6 6 −2

To compute detA, we do −R2 + R1 and −2R2 + R3 to get

detA = det

4 −4 0−3 7 −30 −8 4

.

Then we do C1 + C2 and 2C3 + C2 (C for column) to get

detA = det

4 0 0−3 −2 −30 0 4

= (4)(−2)(4) = −32.

2

Example 4.4 (3 × 3 matrix)

Consider the matrix

A =

3 1 23 1 31 −1 2

.

To compute detA, we do C1 + C2 and −C1 + C3 to get

detA = det

3 4 −13 4 01 0 1

.

Then we do R3 + R1 and −R2 + R1 to get

detA = det

1 0 03 4 01 0 1

= (1)(4)(1) = 4.

2

181


In many applications of matrices to physical problems it is sometimes important to know which vectorsundergo no change in direction under a mapping defined by a matrix A.

4.3 Definitions of Egenvalue and Eigenvector

For example, suppose A represents the mapping in two dimensions defined as follows:

y

x

y = 2x

Each vector is projected onto the line

y = 2x as indicated in the figure.

Clearly, as a result of this mapping every vector undergoes a change in direction except those vectors whichlie along the line y = 2x (these vectors map onto themselves; that is v 7→ v) and those which areperpendicular to the line y = 2x; that is, those which lie along the line y = −x/2 (these vectors map ontothe zero vector; that is, v 7→ 0v = 0).

Vectors such as these which undergo no change in direction under a linear mapping are called eigenvectorsof the mapping.

Thus an eigenvector of a matrix (linear mapping) is simply a vector which maps onto a scalar multipleof itself. Since the zero vector always maps onto itself, we are not interested in such vectors.

14 Definition If Av = λv and v 6= 0, then v is called an eigenvector of A with the scalar λ.

The scalar λ which gives a measure of how v is “stretched” is called a corresponding eigenvalue. Weusually speak of v as an eigenvector corresponding to the eigenvalue λ.

182

4.4 Characteristic Equation

Note that eigenvalues and eigenvectors will always occur in pairs. We cannot have an eigenvalue without aneigenvector and we cannot have an eigenvector without an eigenvalue.

Example 4.5 (Eigenvalue and eigenvector)

Suppose A =

[6 16−1 −4

]then v =

[−81

]is an eigenvector with corresponding eigenvalue λ = 4 because

Av =

[6 16−1 −4

] [−81

]=

[−324

]= 4

[−81

]= λv.

2

What we need to do next is figure out just how we can determine the eigenvalues and eigenvectors fora given matrix. We will start with finding the eigenvalues for a matrix and once we have those we will beable to find the eigenvectors corresponding to each eigenvalue.


Suppose A is an 2 × 2 matrix which represents a linear mapping of 2-dimensional space onto itself. Thenv is an eigenvector of A if

Av = λv =⇒ (A − λI)v = 0.

This matrix equation represents a homogeneous system of 2 equations in 2 unknowns. Trivially v = 0 isa solution but we are seeking nonzero solutions. The system will have nontrivial solutions if and only if thecoefficient matrix A − λI is not invertible; that is, if det(A − λI) = 0. For this 2 × 2 matrix, we get apolynomial (of degree 2) for λ which can be solved to obtain two eigenvalues.

In summary, the method for finding eigenvalues of a given square matrix A is based on the last conditionof the following.

λ is an eigenvalue of A ⇐⇒ Av = λv for some v 6= 0

⇐⇒ (A − λI)v = 0 has nontrivial solutions

⇐⇒ A − λI is not invertible

⇐⇒ det (A − λI) = 0.

The equationdet(A − λI) = 0

is called the characteristic equation of the matrix A. In general if A is an n × n matrix thecharacteristic equation is a polynomial equation of degree n in λ. Also, from the Fundamental Theorem ofAlgebra we now know that there will be exactly n eigenvalues (possibly including repeats) for an n × nmatrix A. Note that because the Fundamental Theorem of Algebra does allow for the possibility of repeatedeigenvalues there will be at most n distinct eigenvalues for an n × n matrix. Because an eigenvalue canrepeat itself in the list of all eigenvalues we would like a way to differentiate between eigenvalues that repeatand eigenvalues that don’t repeat.

Suppose A is an n×n matrix and that λ1, λ2, · · · , λn is the complete list of all the eigenvalues of Aincluding repeats. If λ occurs exactly once in this list then we call λ a simple eigenvalue. If λ occursk > 2 times in the list we say that λ has multiplicity of k.

Once the eigenvalues λ are found, we can find the corresponding eigenvectors of the matrix A. Thiscan be done by first reducing the coefficient matrix A − λI in its row echelon form and then finding anynontrivial solutions x of the homogeneous equation (A − λI)x = 0. We carry out the steps in detail for2 × 2 and 3 × 3 matrices.

183


• 2 × 2 case

In 2 × 2 case,

A =

[a bc d

], A − λI =

[a − λ b

c d − λ

].

By Theorem 1.6.1 (page 10), the condition for A − λI to be not invertible is det (A − λI) = 0, or

(a − λ)(d − λ) − bc = 0.

This is the condition for λ to be an eigenvalue of A and is called the characteristic equation of A.

Example 4.6 (Eigenvalues and eigenvectors of 2 × 2 matrix)

Consider the matrix

A =

[−2 6−2 5

].

The characteristic equation is

(−2 − λ)(5 − λ) − 6(−2) = 2 − 3λ + λ2 = (1 − λ)(2 − λ) = 0

which gives the distinct real eigenvalues λ1 = 1, λ2 = 2.

· For λ1 = 1,

A − 1 · I =

[−3 6−2 4

]−→

[1 −20 0

].

By solving (A − 1 · I)x = 0, where x = (x1, x2), we have x1 − 2x2 = 0, or x1 = 2x2. Thus,

x =

[x1

x2

]=

[2x2

x2

]= x2

[21

],

we get the eigenvector v1 = (2, 1) corresponding to the eigenvalue λ1 = 1.

· For λ2 = 2,

A − 2 · I =

[−4 6−2 3

]−→

[2 −30 0

].

By solving (A − 2 · I)x = 0, where x = (x1, x2), we have 2x1 = 3x2. Thus,

x =

[(3/2)x2

x2

]=

x2

2

[32

],

we get the eigenvector v2 = (3, 2) corresponding to the eigenvalue λ2 = 2. 2


Consider the matrix

A =

[4 −41 0

].

The characteristic equation is(4 − λ)(−λ) − (−4) = (2 − λ)2 = 0

which gives the repeated eigenvalue λ = 2.

A − 2 · I =

[2 −41 −2

]−→

[1 −20 0

].

By solving (A − 2 · I)x = 0, we get only one eigenvector v = (2, 1). All other eigenvectors are nonzeromultiples of v. This time we cannot find enough number of linearly independent eigenvectors (that is twononparallel eigenvectors). 2

184


• 3 × 3 case

In Section 4.1 (page 179) we introduced a method using elementary row (and/or column) operations tocompute the determinant of a given matrix. In particular, this works well for many 3 × 3 matrices. Now,we are going to apply this technique in finding eigenvalues of some 3 × 3 matrices.


Consider the matrix

A =

1 3 −3−3 7 −3−6 6 −2

.

To compute

det (A − λI) = det

1 − λ 3 −3−3 7 − λ −3−6 6 −2 − λ

,

we do −R2 + R1 and −2R2 + R3 to get


4 − λ −4 + λ 0−3 7 − λ −30 −8 + 2λ 4 − λ

.

Then we do C1 + C2 and 2C3 + C2 (C for column) to get


4 − λ 0 0−3 −2 − λ −30 0 4 − λ

= (4 − λ)2(−2 − λ).

Therefore the characteristic equation is

(4 − λ)2(−2 − λ) = 0.

From this we get two eigenvalues λ1 = 4, λ2 = −2 (λ1 being a repeated eigenvalue).

· For λ1 = 4,

A − 4 · I =

−3 3 −3−3 3 −3−6 6 −6

−→

1 −1 10 0 00 0 0

.

By solving (A − 4 · I)x = 0, where x = (x1, x2, x3), we have x1 = x2 − x3. Thus,

x =

x2 − x3

x2

x3

= x2

110

+ x3

−101

,

we get two eigenvectors v1 = (1, 1, 0), v2 = (−1, 0, 1).

· For λ2 = −2,

A − (−2) · I =

3 3 −3−3 9 −3−6 6 0

−→

1 0 −1/20 1 −1/20 0 0

.

By solving (A − (−2) · I)x = 0, we get the eigenvector v3 = (1, 1, 2). 2

185



Consider the matrix

A =

3 1 23 1 31 −1 2

.

To compute


3 − λ 1 2

3 1 − λ 31 −1 2 − λ

,

we do C1 + C2 and −C1 + C3 to get


3 − λ 4 − λ −1 + λ

3 4 − λ 01 0 1 − λ

.

Then we do R3 + R1 and −R2 + R1 to get


1 − λ 0 0

3 4 − λ 01 0 1 − λ

= (1 − λ)2(4 − λ).

Remark that in the above the last matrix is an 3 × 3 triangular matrix then the determinant will be givenby the product of the diagonal entries. Therefore the characteristic equation is

(1 − λ)2(4 − λ) = 0.

From this we get two eigenvalues (λ1 being a repeated eigenvalue):

λ1 = 1, λ2 = 4.

· For λ1 = 1,

A − 1 · I =

2 1 23 0 31 −1 1

−→

1 0 10 1 00 0 0

.

By solving (A − 1 · I)x = 0, where x = (x1, x2, x3), we have x1 + x3 = 0, x2 = 0. Thus,

x =

−x3

0x3

= −x3

10−1

,

we get only one eigenvector v1 = (1, 0,−1).

· For λ2 = 4,

A − 4 · I =

−1 1 23 −3 31 −1 −2

−→

1 −1 00 0 10 0 0

.

By solving (A − 4 · I)x = 0, where x = (x1, x2, x3), we have x1 − x2 = 0, x3 = 0. Thus,

x =

x2

x2

0

= x2

110

,

we get the eigenvector v2 = (1, 1, 0). 2

186

4.5 Diagonalization

4.5 Diagonalization

In this section we are going to take a look at a special kind of matrix. We will start out with followingdefinition.

15 Definition Suppose that A is a square matrix and further suppose that there exists an invertiblematrix P (of the same size as A) such that P−1AP is a diagonal matrix. In such a case we call Adiagonalizable and say that P diagonalizes A.

If A is diagonalizable, we may write a diagonalization of A such that

P−1AP = D or explicitly written as A = PDP−1.

This very special representation of A is summarized in the following.

16 Definition A diagonalization of an n × n (square) matrix A is

A = PDP−1, P invertible, D =

λ1 0 · · · 00 λ2 · · · 0...

.... . .

...0 0 · · · λn

diagonal.

Think about that how can we find the invertible matrix P and the diagonal matrix D in the diagonalizationof A? In fact, we may consider the column partition of P, i.e., P =

[v1 v2 · · · vn

]. Then

P is invertible ⇐⇒ v1, v2, · · · , vn are n linearly independent eigenvectors of A.

Moreover, for invertible P and diagonal D = diag (λ1, λ2, · · · , λn),

A = PDP−1 ⇐⇒ AP = PD

⇐⇒[Av1 Av2 · · · Avn

]=

[λ1v1 λ2v2 · · · λnvn

]

⇐⇒ Avi = λivi, i = 1, 2, · · · , n.

The above equivalence illustrates that if we know how to find such λi and corresponding vi, then weknow how to define the matrices P and D. In fact, we have special names for such λi and vi in linearalgebra. Recall that vi is called an eigenvector of A corresponding to the eigenvalue λ.

The above connection will not only tell us when a matrix is diagonalizable, but the also tell us how toconstruct the invertible P when A is diagonalizable. We summarize it in the following theorem.

Theorem 4.5.1 The following are equivalent for an n × n matrix A.

1. A is diagonalizable.

2. A has n linearly independent eigenvectors.

187


Example 4.10 (Diagonalization of 2 × 2 matrix)

Consider the matrix A =

[−2 6−2 5

]. By direct computation, we have

A

[21

]=

[21

], A

[32

]=

[64

]= 2

[32

].

Therefore, v1 = (2, 1) is an eigenvector of A with eigenvalue λ1 = 1, and v2 = (3, 2) is an eigenvectorof A with eigenvalue λ2 = 2. From these we can define the matrices

P =[v1 v2

]=

[2 31 2

], D =

[λ1 00 λ2

]=

[1 00 2

].

Since v1 and v2 are linearly independent (nonparallel vectors) of A, A has the following diagonalization

A = PDP−1 =

[2 31 2

] [1 00 2

] [2 31 2

]−1

.

2

Example 4.11 (Diagonalizable 3 × 3 matrix)

Consider the matrix

A =

1 3 −3−3 7 −3−6 6 −2

in Example 4.8 (page 185). Since v1, v2, v3 are linearly independent eigenvectors of A, we obtain adiagonalization of A with

P =[v1 v2 v3

]=

1 −1 11 0 10 1 2

, D =

λ1 0 00 λ1 00 0 λ2

=

4 0 00 4 00 0 −2

.

The diagonalization of A is

A = PDP−1 =

1 −1 11 0 10 1 2

4 0 00 4 00 0 −2

1 −1 11 0 10 1 2

−1

. (4.1)

2

4.6 Diagonalizability

In the following we give the general procedure for finding a diagonalization of a given n × n matrix.

1. Find eigenvalues. Theoretically, this involves finding matrix determinant. We have discussed it indetail in Section 4.4 (page 183).

2. For each eigenvalue, find as many linear independent eigenvectors as you can.

3. If all the vectors you find in the second step for all corresponding eigenvalues combine to form a setof n linearly independent vectors, then we get a diagonalization.

188

4.6 Diagonalizability

Example 4.12 (Diagonalization of 3 × 3 matrix)

Consider again the matrix

A =

1 3 −3−3 7 −3−6 6 −2

.

We mentioned in the last example that the eigenvectors v1, v2, v3 of A are linearly independent. In thefollowing we are going to go through this in more detail. By applying row operations, we have

P =[v1 v2 v3

]=

1 −1 11 0 10 1 2

−→

1 0 00 1 00 0 1

.

In other words, P is row equivalent to the identity matrix I. That is to say, all columns of P are pivotand hence the three column vectors v1, v2, v3 in R3 are linearly independent. Note that v1, v2, v3 formthe three columns of P, and corresponding eigenvalues form the diagonal of D.

Alternatively, if we choose eigenvectors in order of v2, v3, v1, then we get a different diagonalizationfrom (4.1) in Example 4.11.

A =

−1 1 10 1 11 2 0

4 0 00 −2 00 0 4

−1 1 10 1 11 2 0

−1

.

2

Sometimes, in the last step we do not have enough number of linearly independent eigenvectors. In thiscase, the matrix is not diagonalizable.

Example 4.13 (Non-diagonalizable 3 × 3 matrix)

Consider the matrix

A =

3 1 23 1 31 −1 2

in Example 4.9 (page 186).

In Example 4.11, the repeated eigenvalue gives two linearly independent eigenvectors so that there existenough (totally three) eigenvectors that are linearly independent. However, for this example, we can onlyfind totally two linearly independent eigenvectors of A and thus v1, v2 in R3 cannot form a set of 3linearly independent vectors,

A is not diagonalizable.

2

The characteristic equation is a polynomial equation. As solutions of such an equation, the eigenvaluesmay be complex. We may carry out all the steps as before, using complex numbers whenever needed.However, we should emphasize that complex eigenvalues is not in the syllabus of MTH 2032 and thediscussion for complex diagonalization here is only for reference.

189


Example 4.14 (Complex diagonalization of 2 × 2 matrix)

Consider the matrix

A =

[3 −52 −3

].

The characteristic equation is

(3 − λ)(−3 − λ) − 2(−5) = λ2 + 1 = 0

which gives two complex conjugate eigenvalues λ1 = i, λ2 = −i.

· For λ1 = i,

A − i · I =

[3 − i −5

2 −3 − i

].

In the system (A − i · I)x = 0, x = (x1, x2), the two equations

(3 − i) x1 − 5x2 = 0,

2x1 − (3 + i) x2 = 0

are equivalent (multiplying (3 − i)/2 to the second equation gives you the first equation). The solution ofthe second equation is

x1 =3 + i

2x2, for arbitrary (complex number) x2.

In vector form, the solution is x =x2

2(3 + i, 2). Therefore, we get the eigenvector v1 = (3 + i, 2) for

λ1.

· For λ2 = −i,

A − (−i) · I =

[3 + i −5

2 −3 + i

].

In the system (A − (−i) · I)x = 0, x = (x1, x2), the two equations

(3 + i) x1 − 5x2 = 0,

2x1 − (3 − i) x2 = 0

are equivalent. The solution of the second equation is

x1 =3 − i

2x2, for arbitrary (complex number) x2.

In vector form, the solution is x =x2

2(3 − i, 2). Therefore, we get the eigenvector v2 = (3 − i, 2) for

λ2.

Since v1, v2 in the complex Euclidean space C2 are linearly independent, we get the diagonalization

A =

[3 + i 3 − i

2 2

] [i 00 −i

] [3 + i 3 − i

2 2

]−1

.

2

In Example 4.14, we observe that if a matrix has a pair of two complex conjugate eigenvalues (sayλ1,2 = α ± iβ), then the corresponding eigenvectors are again complex conjugate such as v1,2 = a ± ib.In other words, we only need to find the corresponding eigenvector for one complex eigenvalue, then theeigenvector for another complex conjugate eigenvalue follows immediately by simply changing

i 7→ −i.

190

4.7 Application of Diagonalization

4.7 Application of Diagonalization

In this section, we will describe one simple application of diagonalization of a matrix A.

Computation of Matrix Powers

When A is diagonalizable, we can make use of the diagonalization A = PDP−1 to compute A2 as

A2 =(PDP−1)(PDP−1) = PD

(P−1P

)DP−1 = PDDP−1 = PD2P−1.

So it is easy to outline a technique for calculating powers A2, A3, · · · , Ak.

Ak =(PDP−1)(PDP−1) · · ·

(PDP−1) = PDkP−1,

where k can be any positive integer. Here the k-th power of D = diag(λ1, λ2, · · · , λn) is still a diagonalmatrix such that

Dk =

λk1 0 · · · 0

0 λk2 · · · 0

......

. . ....

0 0 · · · λkn

.

Therefore we can obtain an expression for Ak by computing PDkP−1.

Example 4.15 (Matrix powers)

Given that A =

[0 2−3 5

]. Find Ak.

Solution A has the following diagonalization (why?)

A = PDP−1 =

[1 21 3

] [2 00 3

] [3 −2−1 1

].

Hence,

Ak = PDkP−1 =

[1 21 3

] [2k 0

0 3k

] [3 −2−1 1

]=

[3 · 2k − 2 · 3k −2k+1 + 2 · 3k

3 · 2k − 3k+1 −2k+1 + 3k+1

].

2

Example 4.16 (Matrix powers)

Given that A =

5 −6 −6−1 4 23 −6 −4

. Find Ak.

Solution A has the following diagonalization (why?)

A = PDP−1 =

3 2 2−1 0 13 1 0

1 0 00 2 00 0 2

−1 2 23 −6 −5−1 3 2

.

Hence,

Ak =

3 2 2−1 0 13 1 0

1 0 0

0 2k 0

0 0 2k

−1 2 23 −6 −5−1 3 2

=

−3 + 2k+2 6 − 3 · 2k+1 6 − 5 · 2k+1 + 2k+2

1 − 2k −2 + 3 · 2k −2 + 2k+1

−3 + 3 · 2k 6 − 3 · 2k+1 6 − 5 · 2k

.

2

191


192

Chapter 4

Determinants and Eigenvalues(True or False)

4.1 If A is an n × n matrix, then A can have at most n eigenvalues.

4.2 Any square matrix has at least one eigenvector.

4.3 If v is an eigenvector, then v 6= 0.

4.4 If λ is an eigenvalue, then λ 6= 0.

4.5 If all eigenvalues of A are zero, then A = O.

4.6 If all eigenvalues of A are 1, then A = I.

4.7 If A 6= I, then 1 is not an eigenvalue of A.

4.8 If A 6= O, then 0 is not an eigenvalue of A.

4.9 If A2 = O, then 0 may not be the only eigenvalue of A.

4.10 If λ is an eigenvalue of A, then λ is also an eigenvalue of At.

193

4. Determinants and Eigenvalues (True or False)

4.11 If λ is an eigenvalue of A, then λ is also an eigenvalue of A2.

4.12 If λ is an eigenvalue of A and B, then λ is also an eigenvalue of A + B.

4.13 If λ is an eigenvalue of A and B, then λ is also an eigenvalue of AB.

4.14 If λ is an eigenvalue of A and µ is an eigenvalue of B, then λµ is an eigenvalue of AB.

4.15 If λ 6= 0 is an eigenvalue of A, then λ−1 is an eigenvalue of A−1 (if exists).

4.16 Suppose A is invertible. If λ is an eigenvalue of A, then λ−1 is an eigenvalue of A−1.

4.17 If u and v are eigenvectors of A, then u + v is an eigenvector of A.

4.18 If u is an eigenvector of A and B, then u is an eigenvector of A + B.

4.19 If u is an eigenvector of A and B, then u is an eigenvector of AB.

4.20 If u and v are eigenvectors of A and B, then u + v is an eigenvector of A + B.

4.21 If v is an eigenvector of A2, then v is an eigenvector of A.

4.22 If v is an eigenvector of A, then v is also an eigenvector of A2.

4.23 If v is an eigenvector of A, then v is also an eigenvector of At.

4.24 If v is an eigenvector of A, then v is also an eigenvector of 2A.

4.25 If v is an eigenvector of A, then 2v is again an eigenvector of A.

194

4.26 If v1, v2 are eigenvectors of A corresponding to eigenvalues λ1, λ2, respectively. Then v1 + v2 is aneigenvector of A corresponding to eigenvalue λ1 + λ2.

4.27 If v is an eigenvector of A and B, then v is an eigenvector of A2 + 3AB.

4.28 If 0 is an eigenvalue of A, then A is not invertible.

4.29 If A is invertible, then 0 is never an eigenvalue of A.

4.30 If all eigenvalues of A are nonzero, then A is invertible.

4.31 Distinct eigenvectors are linearly independent.

4.32 Suppose u and v are eigenvectors of A with eigenvalues λ and µ. If λ 6= µ, then u and v are linearlyindependent.

4.33 If An×n has n distinct eigenvalues, then there is a basis of eigenvectors.

4.34 All real symmetric matrices are diagonalizable.

4.35 For any real matrix A, AtA is always diagonalizable.

4.36 All diagonalizable matrices are symmetric.

4.37 Any diagonal matrix is diagonalizable.

4.38 Any 1 × 1 matrix is diagonalizable.

4.39 If A is invertible and diagonalizable, so is A−1.

4.40 If A is diagonalizable, then At is also diagonalizable.

195

4. Determinants and Eigenvalues (True or False)

4.41 If A is diagonalizable, then A3 is also diagonalizable.

4.42 If A is diagonalizable, then A2 is also diagonalizable.

4.43 If A and B are diagonalizable, then AB is also diagonalizable.

4.44 If A3 is diagonalizable, then A is diagonalizable.

4.45 If A2 is diagonalizable, then A is also diagonalizable.

4.46 If An×n has n distinct eigenvalues, then A is diagonalizable.

4.47 If An×n is diagonalizable, then A must have n distinct eigenvalues.

4.48 If An×n has fewer than n distinct eigenvalues, then A is not diagonalizable.

4.49 If A and B have the same eigenvalues, then A is diagonalizable =⇒ B is also diagonalizable.

4.50 If A is invertible and diagonalizable, and B is not diagonalizable, then AB is not diagonalizable.

4.51 If A is diagonalizable, then A is invertible.

4.52 If A is invertible, then A is diagonalizable.

196

Chapter 4

Determinants and Eigenvalues(Worked Examples)

Example 4.1 (What is a diagonalization?)

Compute the matrices A1, A2, A3 and verify that they are all equal after multiplications.

A1 =

1 −1 11 1 −1−1 1 1

1 0 00 2 00 0 3

1 −1 11 1 −1−1 1 1

−1

,

A2 =

−1 1 11 1 −11 −1 1

2 0 00 1 00 0 3

−1 1 11 1 −11 −1 1

−1

,

A3 =

1 −1 1−1 1 11 1 −1

3 0 00 2 00 0 1

1 −1 1−1 1 11 1 −1

−1

.

Solution We compute A1 only. By doing row operations, we have[P I

]−→

[I P−1

], which gives

P−1 =

1/2 1/2 00 1/2 1/2

1/2 0 1/2

, where P =

1 −1 11 1 −1−1 1 1

.

Hence,

A1 =

1 −2 31 2 −3−1 2 3

1/2 1/2 00 1/2 1/2

1/2 0 1/2

=

2 −1/2 1/2−1 3/2 −1/21 1/2 5/2

.

Similarly, we can compute A2, A3 and finally verify that they all give the same matrix.

A1 = A2 = A3 =

2 −1/2 1/2−1 3/2 −1/21 1/2 5/2

.

2

Remark. From the matrix point of view, the above result is not coincidental. The reason behind iscalled the diagonalizability of a matrix. Suppose an n × n matrix A is given. The matrix is said to bediagonalizable if there exists an invertible matrix P such that D = P−1AP is diagonal. In other words,

197

4. Determinants and Eigenvalues (Worked Examples)

PDP−1 is said to be a diagonalization of A if there exist an invertible matrix P and a diagonal matrix D.Therefore the three given products of matrices are indeed three different diagonalizations of a matrix A.Such a diagonalized representation of A turns out to have some nice applications in linear algebra. However,first thing first, we should learn an algorithm for finding such an invertible matrix P that can turn A intoa diagonal matrix D provided that A is diagonalizable. The main ingredients here are eigenvalues andeigenvectors.

In the following we first give a formal definition for what are eigenvalues as well as eigenvectors of agiven square matrix.

Definition. Let A be any square matrix. A scalar λ is called an eigenvalue of A if there exists anonzero (column) vector v such that

Av = λv.

Any vector satisfying this relation is called an eigenvector of A belonging to the eigenvalue λ.

Based on the discussion in Section 8.2 (Lecture Notes, page 165) and the above definition, we havethe following theorem which serves as an algorithm for constructing diagonalizations for any diagonalizablematrices.

Theorem. An n× n matrix A is diagonalizable to a diagonal matrix D if and only if A has n linearlyindependent eigenvectors (that naturally form a basis of Rn). In this case, the diagonal entries of D arethe corresponding eigenvalues of A and D = P−1AP, where P is the matrix whose columns are theeigenvectors.

Example 4.2 (Diagonalization)

Verify the vectors v1 = (2, 0), v2 = (1,−3) form a basis of eigenvectors of the matrix

A =

[2 30 −7

].

Then find a diagonalization of the matrix.

Solution By direct computations, we have

Av1 =

[2 30 −7

] [20

]=

[40

]= 2 ·

[20

]= 2v1,

Av2 =

[2 30 −7

] [1−3

]=

[−721

]= −7 ·

[1−3

]= −7v2.

By definition, v1 is an eigenvector of A with eigenvalue 2, and v2 is an eigenvector of A with eigenvalue −7.Observe that v1 and v2 are nonparallel, they are indeed linearly independent eigenvectors of A which canform a basis of R2. By the fact that a square matrix A is diagonalizable if and only if we can find a basisconsisting of eigenvectors of A, we know that A is diagonalizable. In particular, A has a diagonalization

[2 30 −7

]=

[2 10 −3

] [2 00 −7

] [2 10 −3

]−1

.

Remark that if we choose eigenvectors in order of v2, v1, then we get a different diagonalization of A

[2 30 −7

]=

[1 2−3 0

] [−7 00 2

] [1 2−3 0

]−1

.

2

198

Example 4.3 (Diagonalization)

Verify the vectors v1 = (−1, 1, 2), v2 = (−2, 1, 4), v3 = (−1, 1, 4) form a basis of eigenvectors of the matrix

A =

1 2 −11 0 14 −4 5

.

Then find a diagonalization of the matrix.

Solution By direct computations, we have

Av1 =

1 2 −11 0 14 −4 5

−112

=

−112

= 1 ·

−112

= 1v1,

Av2 =

1 2 −11 0 14 −4 5

−214

=

−428

= 2 ·

−214

= 2v2,

Av3 =

1 2 −11 0 14 −4 5

−114

=

−3312

= 3 ·

−114

= 3v3.

Therefore, v1 is an eigenvector of A with eigenvalue λ1 = 1, v2 is an eigenvector of A with eigenvalueλ2 = 2, and v3 is an eigenvector of A with eigenvalue λ3 = 3. Thus we may construct

P =[v1 v2 v3

]=

−1 −2 −11 1 12 4 4

.

By doing elementary row operations, we can deduce that P −→ I. By this reduced row echelon form of P,we have the following observations.

4.1 P is row equivalent to the identity matrix I and hence all columns of P are pivot.

4.2 The eigenvectors v1, v2, v3 are linearly independent and hence form a basis of R3.

4.3 P has nonzero determinant and hence P is invertible, i.e., P−1 exists.

Once we know how to construct the matrix P, the diagonal matrix D follows immediately. The diagonalentries of D are taken to be exactly the eigenvalues of A. For constructing P =

[v1 v2 v3

], we should

be careful the order of vectors and should write the diagonal matrix D as follows

Ddef.= diag (λ1, λ2, λ3) =

1 0 00 2 00 0 3

.

In particular, A has a diagonalization

1 2 −11 0 14 −4 5

=

−1 −2 −11 1 12 4 4

1 0 00 2 00 0 3

−1 −2 −11 1 12 4 4

−1

.

Just similar to Example 4.2, the matrix A can have different diagonalizations. This really depends onyour choice of orderings. For instance, you should have D = diag (λ2, λ1, λ3) if you choose to constructP =

[v2 v1 v3

]. 2

199



Suppose u and v are eigenvectors of A with eigenvalues λ and µ. If λ 6= µ, then u and v are linearlyindependent.

Solution We need to prove that c1u + c2v = 0 =⇒ c1 = c2 = 0. Now we suppose

c1u + c2v = 0. (4.1)

Multiplying the matrix A to (4.1) gives

c1 Au + c2 Av = A0 = 0.

Since u and v are eigenvectors, we have

c1 λu + c2 µv = 0. (4.2)

Since λ and µ cannot be both zero (∵ distinct), without the loss of generality, let us suppose λ 6= 0.Multiplying λ to the equation (4.1) and subtracting it from (4.2) gives

c2 (λ − µ)v = 0.

Since v 6= 0 and λ 6= µ, then c2 = 0. It follows from (4.1) and the fact u 6= 0 that c1 = 0. 2

Remark. The result of Example 4.4 can be generalized to any n × n matrix A by induction on n. Thatis to say, in general, if v1, v2, · · · , vn are eigenvectors of A with distinct eigenvalues, then v1, v2, · · · , vn

are linearly independent. The immediate consequence of this fact is that the eigenvectors will form a basisof Rn. In this case, we are able to construct the matrix P =

[v1 v2 · · · vn

]and it is evident that P

is invertible (∵ all columns of P are pivot =⇒ P is row equivalent to the identity matrix =⇒ detP 6= 0).Recall the fact that a square matrix A is diagonalizable if and only if we can find a basis of Rn consistingof eigenvectors of A, we conclude that

if an n × n matrix has n distinct eigenvalues, then the matrix is diagonalizable.


Show that if an n × n matrix has n distinct eigenvalues, then the matrix is diagonalizable.

Solution If A is an n × n matrix with n distinct eigenvalues (say λ1, λ2, · · · , λn), there is at least oneeigenvector for each eigenvalue. Since eigenvectors for distinct eigenvalues are linearly independent, we obtainan independent set B of eigenvectors which has at least n vectors (say v1, v2, · · · , vn). Hence B is a basis ofRn. Therefore, A is diagonalizable. That is, there exist an invertible matrix P =

[v1 v2 · · · vn

]and

a diagonal matrix D = diag (λ1, λ2, · · · , λn), such that P−1AP = D. 2

Remark. Note that λ is an eigenvalue of an n × n matrix A if and only if Av = λv for some v 6= 0,equivalently, (A − λI)x = 0 has a nontrivial solution x. This is also equivalent to that (A − λI) is notinvertible, which is true if and only if the determinant of (A − λI) is zero. Hence, in order to hunt foreigenvalues of a given square matrix A, we need to solve the following algebraic equation for λ

det (A − λI) = 0.

This is often called the characteristic equation of A. The characteristic equation is indeed a polynomialequation in λ of degree n. Thus, any n × n matrix A can have at most n eigenvalues, since the polynomialequation of degree n can have at most n roots. Students should have the knowledge of evaluating thedeterminants of given 2 × 2 as well as 3 × 3 matrices.

200

Example 4.6 (Characteristic equation)

Solve the characteristic equation of

A =

[1 34 5

].

Solution The characteristic equation of A is det (A − λI) = 0, or

det

[1 − λ 3

4 5 − λ

]= (1 − λ)(5 − λ) − 4 × 3 = λ2 − 6λ − 7 = 0.

The above quadratic equation in λ can easily be factorized into (λ + 1)(λ− 7) = 0, which gives two distinctreal eigenvalues

λ1 = −1, λ2 = 7.

2

Example 4.7 (Characteristic equation)

Find the eigenvalues of the matrices

A =

[2 0a 3

], B =

1 a d f0 2 b e0 0 3 c0 0 0 4

.



[2 − λ 0

a 3 − λ

]= (2 − λ)(3 − λ) = 0

which gives two distinct real eigenvalues

λ1 = 2, λ2 = 3.

The characteristic equation of B is det (B − λI) = 0, or

det (B − λI) = det

1 − λ a d f0 2 − λ b e0 0 3 − λ c0 0 0 4 − λ

= (1 − λ)(2 − λ)(3 − λ)(4 − λ) = 0

which gives four distinct real eigenvalues

λ1 = 1, λ2 = 2, λ3 = 3, λ4 = 4.

2

Remark. (i) By Example 4.6, the characteristic equation of A is λ2 −6λ−7 = 0. Then A2 −6A−7I = O.The expression is obtained by replacing λ by A in the characteristic equation. Try to prove it for any matrixA which is diagonalizable. (ii) By Example 4.7, we observe that the eigenvalues of a triangular matrix aregiven by its diagonal entries. In fact, this is true for all upper-triangular matrices as well as lower-triangularmatrices. For example, if

A =

5 −2 6 −1

0

3 8 0

0 0

−3 4

0 0 0

1

,

then we immediately know that A has four distinct eigenvalues 1, ±3, 5. Since the four eigenvalues of Aare distinct, A is diagonalizable.

201


Example 4.8 (Distinct real eigenvalues of 2 × 2 matrix)

Find eigenvalues and eigenvectors of the matrix

A =

[1 24 3

].

Write down a diagonalization of A if it is diagonalizable.


(1 − λ)(3 − λ) − 8 = (λ + 1)(λ − 5) = 0

which gives the distinct real eigenvalues λ1 = −1, λ2 = 5.

· For λ1 = −1,

A − (−1) · I =

[2 24 4

]−→

[1 10 0

].

By solving (A − (−1) · I)x = 0, x = (x1, x2), we have x1 = −x2. Thus,

x = (−x2, x2) = x2 (−1, 1),

we get the eigenvector v1 = (−1, 1).

· For λ2 = 5,

A − 5 · I =

[−4 24 −2

]−→

[2 −10 0

].

By solving (A − 5 · I)x = 0, x = (x1, x2), we have 2x1 = x2. Thus,

x = (x1, 2x1) = x1 (1, 2),

we get the eigenvector v2 = (1, 2).

Since the eigenvalues λ1, λ2 are distinct, the corresponding eigenvectors v1, v2 are linearly independentand can form a basis of R2, we get a diagonalization of A

A =

[−1 11 2

] [−1 00 5

] [−1 11 2

]−1

.

2



A =

[3 41 3

].



(3 − λ)(3 − λ) − 4 = (1 − λ)(5 − λ) = 0

which gives the distinct real eigenvalues λ1 = 1, λ2 = 5.

202

· For λ1 = 1,

A − 1 · I =

[2 41 2

]−→

[1 20 0

].

By solving (A − 1 · I)x = 0, x = (x1, x2), we have x1 = −2x2. Thus,

x = (−2x2, x2) = x2 (−2, 1),

we get the eigenvector v1 = (−2, 1).

· For λ2 = 5,

A − 5 · I =

[−2 41 −2

]−→

[1 −20 0

].

By solving (A − 5 · I)x = 0, x = (x1, x2), we have x1 = 2x2. Thus,

x = (2x2, x2) = x2 (2, 1),

we get the eigenvector v2 = (2, 1).

Since the eigenvalues λ1, λ2 are distinct, the corresponding eigenvectors v1, v2 are linearly independentand can form a basis of R2, we get a diagonalization of A

A =

[−2 21 1

] [1 00 5

] [−2 21 1

]−1

.

2

Example 4.10 (Repeated eigenvalue of 2 × 2 matrix)


A =

[2 −11 4

].



(2 − λ)(4 − λ) − (−1) = (λ − 3)2 = 0

which gives a repeated eigenvalue λ = 3.

· For λ = 3,

A − 3 · I =

[−1 −11 1

]−→

[1 10 0

].

By solving (A − 3 · I)x = 0, x = (x1, x2), we have x1 = −x2. Thus,

x = (−x2, x2) = x2 (−1, 1),

we get the eigenvector v = (−1, 1).

Note that the eigenvector v forms a basis for Nul (A−3 ·I), but it does not form a basis for the Euclideanspace R2 (in fact, a basis for R2 must contain exactly two linearly independent vectors in R2). Hence, A isnot diagonalizable. 2

203




A =

1 3 30 2 40 0 3

.



det

1 − λ 3 3

0 2 − λ 40 0 3 − λ

= (1 − λ)(2 − λ)(3 − λ) = 0

which gives the distinct real eigenvalues λ1 = 1, λ2 = 2, λ3 = 3.

· For λ1 = 1,

A − 1 · I =

0 3 30 1 40 0 2

−→

0 1 00 0 10 0 0

.

By solving (A− 1 · I)x = 0, x = (x1, x2, x3), we have x2 = x3 = 0. Thus, x = (x1, 0, 0) = x1 (1, 0, 0), we

get the eigenvector v1 = (1, 0, 0).

· For λ2 = 2,

A − 2 · I =

−1 3 30 0 40 0 1

−→

1 −3 00 0 10 0 0

.

By solving (A − 2 · I)x = 0, x = (x1, x2, x3), we have x1 = 3x2 and x3 = 0. Thus, x = (3x2, x2, 0) =

x2 (3, 1, 0), we get the eigenvector v2 = (3, 1, 0).

· For λ3 = 3,

A − 3 · I =

−2 3 30 −1 40 0 0

−→

2 0 −150 1 −40 0 0

.

By solving (A−3·I)x = 0, x = (x1, x2, x3), we have 2x1 = 15x3 and x2 = 4x3. Thus, x = ( 152

x3, 4x3, x3) =12x3 (15, 8, 2), we get the eigenvector v3 = (15, 8, 2).

Thus, we get a diagonalization of A

A =

1 3 150 1 80 0 2

1 0 00 2 00 0 3

1 3 150 1 80 0 2

−1

.

2

Remark. In Example 4.11, A is an upper-triangular matrix so that we can determine the characteristicequation quickly by just finding the product of the main diagonal entries of (A − λI). In general cases, weneed to apply row/column operations to simplify the matrix (A − λI) until the simplified matrix is easyenough to find its determinant.

204



A =

1 −2 0−2 3 00 0 2

.



det

1 − λ −2 0−2 3 − λ 00 0 2 − λ

= (2 − λ)[(λ − 2)2 − 5

]= 0

which gives the distinct real eigenvalues λ1 = 2, λ2 = 2 +√

5, λ3 = 2 −√

5.

· For λ1 = 2,

A − 2 · I =

−1 −2 0−2 1 00 0 0

−→

1 0 00 1 00 0 0

.

By solving (A− 2 · I)x = 0, x = (x1, x2, x3), we have x1 = x2 = 0. Thus, x = (0, 0, x3) = x3 (0, 0, 1), we

get the eigenvector v1 = (0, 0, 1).

· For λ2 = 2 +√

5,

A − (2 +√

5) · I =

−1 −

√5 −2 0

−2 1 −√

5 0

0 0 −√

5

−→

1 1

2(−1 +

√5) 0

0 0 10 0 0

.

By solving (A − (2 +√

5) · I)x = 0, x = (x1, x2, x3), we have x1 = 12(1 −

√5) x2 and x3 = 0. Thus,

x = ( 12(1 −

√5) x2, x2, 0) = x2

2(1 −

√5, 2, 0), we get the eigenvector v2 = (1 −

√5, 2, 0).

· For λ3 = 2 −√

5,

A − (2 −√

5) · I =

−1 +

√5 −2 0

−2 1 +√

5 0

0 0√

5

−→

1 1

2(−1 −

√5) 0

0 0 10 0 0

.

By solving (A − (2 −√

5) · I)x = 0, x = (x1, x2, x3), we have x1 = 12(1 +

√5) x2 and x3 = 0. Thus,

x = ( 12(1 +

√5) x2, x2, 0) = x2

2(1 +

√5, 2, 0), we get the eigenvector v3 = (1 +

√5, 2, 0).

Since the eigenvalues λ1, λ2, λ3 are distinct, the corresponding eigenvectors v1, v2, v3 are linearlyindependent and can form a basis of R3, we get a diagonalization of A

A =

0 1 −

√5 1 +

√5

0 2 21 0 0

2 0 0

0 2 +√

5 0

0 0 2 −√

5

0 1 −

√5 1 +

√5

0 2 21 0 0

−1

.

2

205




A =

4 −2 −12 0 −21 −1 2

.



det

4 − λ −2 −1

2 −λ −21 −1 2 − λ

C1+C3=C1+C2

det

4 − λ 2 − λ 3 − λ

2 2 − λ 01 0 3 − λ

−R2+R1=−R3+R1

det

1 − λ 0 0

2 2 − λ 01 0 3 − λ

= (1 − λ)(2 − λ)(3 − λ) = 0


· For λ1 = 1,

A − 1 · I =

3 −2 −12 −1 −21 −1 1

−→

1 0 −30 1 −40 0 0

.

By solving (A − 1 · I)x = 0, x = (x1, x2, x3), we have x1 = 3x3 and x2 = 4x3. Thus,

x = (3x3, 4x3, x3) = x3 (3, 4, 1),

we get the eigenvector v1 = (3, 4, 1).

· For λ2 = 2,

A − 2 · I =

2 −2 −12 −2 −21 −1 0

−→

1 −1 00 0 10 0 0

.

By solving (A − 2 · I)x = 0, x = (x1, x2, x3), we have x1 = x2 and x3 = 0. Thus,

x = (x2, x2, 0) = x2 (1, 1, 0),


· For λ3 = 3,

A − 3 · I =

1 −2 −12 −3 −21 −1 −1

−→

1 0 −10 1 00 0 0

.


x = (x3, 0, x3) = x3 (1, 0, 1),


206


A =

3 1 14 1 01 0 1

1 0 00 2 00 0 3

3 1 14 1 01 0 1

−1

.

2



A =

2 −1 −11 4 1−3 1 0

.



det

2 − λ −1 −1

1 4 − λ 1−3 1 −λ

−C3+C1=−C3+C2

det

3 − λ 0 −1

0 3 − λ 1−3 + λ 1 + λ −λ

R1+R3= det

3 − λ 0 −1

0 3 − λ 10 1 + λ −1 − λ

C3+C2= det

3 − λ −1 −1

0 4 − λ 10 0 −1 − λ

= (3 − λ)(4 − λ)(−1 − λ) = 0

which gives the distinct real eigenvalues λ1 = 3, λ2 = 4, λ3 = −1.

· For λ1 = 3,

A − 3 · I =

−1 −1 −11 1 1−3 1 −3

−→

1 0 10 1 00 0 0

.

By solving (A − 3 · I)x = 0, x = (x1, x2, x3), we have x1 = −x3 and x2 = 0. Thus,

x = (−x3, 0, x3) = −x3 (1, 0, −1),

we get the eigenvector v1 = (1, 0, −1).

· For λ2 = 4,

A − 4 · I =

−2 −1 −11 0 1−3 1 −4

−→

1 0 10 1 −10 0 0

.

By solving (A − 4 · I)x = 0, x = (x1, x2, x3), we have x1 = −x3 and x2 = x3. Thus,

x = (−x3, x3, x3) = −x3 (1, −1, −1),

207


we get the eigenvector v2 = (1, −1, −1).

· For λ3 = −1,

A − (−1) · I =

3 −1 −11 5 1−3 1 1

−→

1 0 −1/40 1 1/40 0 0

.

By solving (A − (−1) · I)x = 0, x = (x1, x2, x3), we have x1 = x3/4 and x2 = −x3/4. Thus,

x = (1

4x3, −

1

4x3, x3) =

1

4x3 (1, −1, 4),

we get the eigenvector v3 = (1, −1, 4).


A =

1 1 10 −1 −1−1 −1 4

3 0 00 4 00 0 −1

1 1 10 −1 −1−1 −1 4

−1

.

2


Find eigenvalues of the matrix

A =

1 1 −3−1 3 −6−1 1 6

.


det

1 − λ 1 −3−1 3 − λ −6−1 1 6 − λ

C2+C1=3C2+C3

det

2 − λ 1 02 − λ 3 − λ 3 − 3λ

0 1 9 − λ

−R1+R2= det

2 − λ 1 0

0 2 − λ 3 − 3λ0 1 9 − λ

= (2 − λ) [(2 − λ)(9 − λ) − (3 − 3λ)]

= (2 − λ)(3 − λ)(5 − λ) = 0

which gives the distinct real eigenvalues

λ1 = 2, λ2 = 3, λ3 = 5.

208



A =

0 3 1−4 8 2−4 6 4

.



det

−λ 3 1−4 8 − λ 2−4 6 4 − λ

−R2+R3= det

−λ 3 1−4 8 − λ 20 −2 + λ 2 − λ

C3+C2= det

−λ 4 1−4 10 − λ 20 0 2 − λ

= (2 − λ) [(−λ)(10 − λ) − (−16)]

= (2 − λ)2(8 − λ) = 0

which gives the eigenvalues λ1 = 2, λ2 = 8 (λ1 being a repeated eigenvalue).

· For λ1 = 2,

A − 2 · I =

−2 3 1−4 6 2−4 6 2

−→

−2 3 10 0 00 0 0

.

By solving (A − 2 · I)x = 0, x = (x1, x2, x3), we have −2x1 + 3x2 + x3 = 0. Thus,

x = (x1, x2, 2x1 − 3x2) = x1 (1, 0, 2) + x2 (0, 1, −3),

we get the two eigenvectors v1 = (1, 0, 2), v2 = (0, 1, −3).

· For λ2 = 8,

A − 8 · I =

−8 3 1−4 0 2−4 6 −4

−→

1 0 −1/20 1 −10 0 0

.

By solving (A − 8 · I)x = 0, x = (x1, x2, x3), we have x1 = x3/2 and x2 = x3. Thus,

x = (1

2x3, x3, x3) =

1

2x3 (1, 2, 2),


Since v1, v2, v3 form a basis of the Euclidean space R3, we get a diagonalization of A

A =

1 0 10 1 22 −3 2

2 0 00 2 00 0 8

1 0 10 1 22 −3 2

−1

.

2

209




A =

1 3 −3−3 7 −3−6 6 −2

.



det

1 − λ 3 −3−3 7 − λ −3−6 6 −2 − λ

−R1+R2= det

1 − λ 3 −3−4 + λ 4 − λ 0−6 6 −2 − λ

C1+C2= det

1 − λ 4 − λ −3−4 + λ 0 0−6 0 −2 − λ

= (−2 − λ)(4 − λ)2 = 0

which gives the eigenvalues λ1 = −2, λ2 = 4 (λ2 being a repeated eigenvalue).

· For λ1 = −2,

A − (−2) · I =

3 3 −3−3 9 −3−6 6 0

−→

1 0 −1/20 1 −1/20 0 0

.

By solving (A − (−2) · I)x = 0, x = (x1, x2, x3), we have x1 = x3/2 and x2 = x3/2. Thus,

x = (x3

2,

x3

2, x3) =

x3

2(1, 1, 2),


· For λ2 = 4,

A − 4 · I =

−3 3 −3−3 3 −3−6 6 −6

−→

1 −1 10 0 00 0 0

.

By solving (A − 4 · I)x = 0, x = (x1, x2, x3), we have x1 = x2 − x3. Thus,

x = (x2 − x3, x2, x3) = x2 (1, 1, 0) + x3 (−1, 0, 1),

we get the two eigenvectors v2 = (1, 1, 0), v3 = (−1, 0, 1).


A =

1 1 −11 1 02 0 1

−2 0 00 4 00 0 4

1 1 −11 1 02 0 1

−1

.

2

210



A =

0 −2 2−3 −1 3−2 −2 4

.



det

−λ −2 2−3 −1 − λ 3−2 −2 4 − λ

−C2+C1=C2+C3

det

2 − λ −2 0−2 + λ −1 − λ 2 − λ

0 −2 2 − λ

R1+R2= det

2 − λ −2 0

0 −3 − λ 2 − λ0 −2 2 − λ

= (2 − λ) [(−3 − λ)(2 − λ) − (2 − λ)(−2)]

= −(2 − λ)2(1 + λ) = 0

which gives the eigenvalues λ1 = 2, λ2 = −1 (λ1 being a repeated eigenvalue).

· For λ1 = 2,

A − 2 · I =

−2 −2 2−3 −3 3−2 −2 2

−→

1 1 −10 0 00 0 0

.

By solving (A − 2 · I)x = 0, x = (x1, x2, x3), we have x1 + x2 − x3 = 0. Thus,

x = (x1, x2, x1 + x2) = x1 (1, 0, 1) + x2 (0, 1, 1),

we get the two eigenvectors v1 = (1, 0, 1), v2 = (0, 1, 1).

· For λ2 = −1,

A − (−1) · I =

1 −2 2−3 0 3−2 −2 5

−→

1 0 −10 1 −3/20 0 0

.

By solving (A − (−1) · I)x = 0, x = (x1, x2, x3), we have x1 = x3 and x2 = 3x3/2. Thus,

x = (x3,3

2x3, x3) =

1

2x3 (2, 3, 2),



A =

1 0 20 1 31 1 2

2 0 00 2 00 0 −1

1 0 20 1 31 1 2

−1

.

2

211




A =

1 3 20 3 −10 1 1

.



det

1 − λ 3 2

0 3 − λ −10 1 1 − λ

= (1 − λ) [(3 − λ)(1 − λ) − (−1)] = (1 − λ)(2 − λ)2 = 0

which gives the eigenvalues λ1 = 1, λ2 = 2 (λ2 being a repeated eigenvalue).

· For λ1 = 1,

A − 1 · I =

0 3 20 2 −10 1 0

−→

0 1 00 0 10 0 0

.

By solving (A − 1 · I)x = 0, x = (x1, x2, x3), we have x2 = x3 = 0. Thus,

x = (x1, 0, 0) = x1 (1, 0, 0),


· For λ2 = 2,

A − 2 · I =

−1 3 20 1 −10 1 −1

−→

1 0 −50 1 −10 0 0

.

By solving (A − 2 · I)x = 0, x = (x1, x2, x3), we have x1 = 5x3 and x2 = x3. Thus,

x = (5x3, x3, x3) = x3 (5, 1, 1),


Note that the eigenvector v1 forms a basis for Nul (A−1 · I), and v2 forms a basis for Nul (A−2 · I), buthowever they do not form a basis for the Euclidean space R3 (in fact, a basis for R3 must contain exactlythree linearly independent vectors in R3). Hence, A is not diagonalizable. 2

Remark. In Examples 4.16–4.19, we observe that if an 3 × 3 matrix A has two eigenvalues repeated,then A may be diagonalizable or A may be non-diagonalizable. For Examples 4.16–4.18, the matrices arediagonalizable since there are two linearly independent eigenvectors associated to the repeated eigenvalue.Although there are only two distinct eigenvalues in total, there are enough number of linearly independenteigenvectors (say v1, v2, v3) to form an invertible matrix P =

[v1 v2 v3

]that make P−1AP to be

diagonal. For Example 4.19, the matrix is not diagonalizable since the repeated eigenvalue can introduce onlyone linearly independent eigenvector. Thus, in total, there are only two linearly independent eigenvectorsthat do not form a basis of R3. It is also insufficient to form the matrix P. In this case, A is not diagonalizable.

212



A =

1 −1 30 −1 20 −2 3

.



det

1 − λ −1 3

0 −1 − λ 20 −2 3 − λ

= (1 − λ) [(−1 − λ)(3 − λ) − (−4)] = (1 − λ)3 = 0

which gives the unique eigenvalue λ = 1 (λ being a repeated eigenvalue of multiplicity 3).

· For λ = 1,

A − 1 · I =

0 −1 30 −2 20 −2 2

−→

0 1 00 0 10 0 0

.

By solving (A − 1 · I)x = 0, x = (x1, x2, x3), we have x2 = x3 = 0. Thus,

x = (x1, 0, 0) = x1 (1, 0, 0), (4.3)

we get the only one eigenvector v = (1, 0, 0).

Note that the eigenvector v forms a basis for Nul (A− 1 · I), but however it does not form a basis for theEuclidean space R3 (in fact, a basis for R3 must contain exactly three linearly independent vectors in R3).Hence, A is not diagonalizable. 2

Remark. Unlike Examples 4.16–4.19, the 3 × 3 matrix A in Example 4.20 has all three eigenvaluesrepeated. In this case, A must be non-diagonalizable. Recall that an 3 × 3 matrix A is diagonalizable onlyif eigenvectors of A can form a basis of R3. That is, there must exist three linearly independent eigenvectorsof A. By (4.3), we observe that this is possible only if the general solution of the equation (A − λI)x = 0(λ being the repeated eigenvalue) has 3 free variables (i.e., x1, x2, x3) such that

x = x1v1 + x2v2 + x3v3.

Those three vectors v1, v2, v3 will act as the three linearly independent eigenvectors of A. However, thisis impossible for A − λI 6= O. Just imagine that having 3 free variables really means having 3 nonpivotcolumns of A − λI, this is possible only if it is a zero matrix. In summary, any 3 × 3 non-diagonal matrixhaving three eigenvalues repeated will certainly be non-diagonalizable.

Let us recall that the situation is similar for 2× 2 matrices. In Example 4.10 (page 203), the 2× 2 matrixA having repeated eigenvalue is non-diagonalizable. The reason behind is the same. Since it is impossiblefor (A2×2 − λI)x = 0 (λ being the repeated eigenvalue, and A2×2 − λI 6= O) to have the general solutionwith 2 free variables. Equivalently, A does not have enough linearly independent eigenvectors to form a basisfor R2. Thus there is no way to construct the invertible matrix P and hence A must be non-diagonalizable.We conclude that

if an n × n (non-diagonal) matrix has all n eigenvalues repeated, then the matrix is not diagonalizable.

213


Example 4.21 (Diagonalizability)

Show that if A is diagonalizable, then At is also diagonalizable.

Solution If A is diagonalizable, then there exist an invertible matrix P and a diagonal matrix D suchthat A = PDP−1. Then

At = (PDP−1)t = (P−1)t Dt Pt = (Pt)−1 Dt Pt = PDP−1,

where the new matrix P = (Pt)−1 is invertible (∵ det P = det (Pt)−1 = 1/detPt = 1/detP 6= 0) andDt = D is again diagonal. Therefore, At has a diagonalization and hence At is diagonalizable. 2

Example 4.22 (Diagonalizability)

For what value(s) of a is the following matrix diagonalizable?

A =

[1 −aa 3

].

Solution The characteristic equation of A is det (A − λI) = λ2 − 4λ + (a2 + 3) = 0. Then

A is diagonalizable ⇐⇒ A has two distinct eigenvalues

⇐⇒ (−4)2 − 4(a2 + 3) 6= 0 ⇐⇒ a2 6= 1.

That is,

a 6= ±1.

2

Example 4.23 (Non-diagonalizability)

Let A be a nonzero nilpotent matrix, i.e.,

Am = O but Am−1 6= O for some m.

Show that A is not diagonalizable.

Solution On the contrary, suppose A is diagonalizable. Then A = PDP−1. So,

Am = (PDP−1)(PDP−1) · · · (PDP−1) = PDmP−1.

If Am = O, then Dm = O. But Dm = O implies D = O. This shows that A = O, contradicting thatAm−1 6= O. This contradiction implies that A is indeed not diagonalizable. 2

Example 4.24 (Non-diagonalizability)

Consider an upper-triangular matrix An×n of the form

a · · ·

0 a · · ·

......

. . .

0 0 0 a

,

where A − aI 6= O. Show that A is not diagonalizable.

Solution The characteristic equation of A is det (A − λI) = (a − λ)n = 0. So, a is the only eigenvalue ofA. Since A − aI 6= O, rank (A − aI) > 0 and

dim Nul (A − aI) = n − rank (A − aI) < n.

Thus, any basis of Nul (A − aI) has less than n vectors. It can never be a basis for the whole Euclideanspace Rn. In other words, eigenvectors of A do not form a basis for Rn. Hence, A is not diagonalizable. 2

214

As we mentioned in the beginning of this chapter (page 198) there are some nice applications by matrixdiagonalizations. Basically, we shall illustrate three simple applications in the following. They include

Application 1 Find the formula for Am, where m can be any (large) integer.

Application 2 Solve recurrence relations in which the initial values of x0, y0 are given.

xn+1 = axn + byn,

yn+1 = cxn + dyn.(4.4)

Example 4.25 (Application 1 of diagonalization)

The second application is to find the formula for Am, where m is a positive integer.

A is diagonalizable =⇒ A = PDP−1 =⇒ Am =(PDP−1) (

PDP−1) · · ·(PDP−1) = PDmP−1.

Here the crucial point is: the evaluation of Dm is much easier than that of Am. By knowing Dm means Am

can be determined accordingly. For example, suppose A is an 2 × 2 matrix, then

Am = P

[λ1 00 λ2

]m

P−1 = P

[λm

1 00 λm

2

]P−1.

2

Example 4.26 (Application 2 of diagonalization)

The third application is to solve recurrence relations of the form (4.4). The recurrence relations can bewritten into matrix form

xn+1 = Axn, where xn =

[xn

yn

], A =

[a bc d

].

Then xn = Axn−1 = A2xn−2 = · · · = Anx0, where x0 = (x0, y0). It follows that

xn = Anx0 = PDnP−1 x0

=[v1 v2

] [λn

1 00 λn

2

] [αβ

], where

[αβ

]= P−1

[x0

y0

]

=[λn

1 v1 λn2 v2

] [αβ

],

[xn

yn

]= α λn

1 v1 + β λn2 v2.

2

Example 4.27 (Formula for An)

Find the explicit formula for An (n is a positive integer), where

A =

[2 −34 −5

].

215


Solution It is known that the matrix A has distinct real eigenvalues λ1 = −1, λ2 = −2. The

corresponding eigenvectors are v1 = (1, 1), v2 = (3, 4). Thus if we construct P =[v1 v2

]=

[1 31 4

],

then P−1AP =

[−1 00 −2

]. Hence,

An =

[1 31 4

] [(−1)n 0

0 (−2)n

] [1 31 4

]−1

= (−1)n

[1 31 4

] [1 00 2n

] [4 −3−1 1

]

= (−1)n

[1 3 × 2n

1 4 × 2n

] [4 −3−1 1

]

= (−1)n

[4 − 3 × 2n −3 + 3 × 2n

4 − 4 × 2n −3 + 4 × 2n

].

2

Example 4.28 (Formula for An)

We just investigated in Example 4.27 that there is a general method for finding the formula for An, wheren is a positive integer. Use the same formula for An to verify whether it is also valid for the case n = −1.Do you think this is conincidental, or should be true in general?

Solution By doing row operations,

[A I

]=

[2 −3 1 04 −5 0 1

]−→

[1 0 −5/2 3/20 1 −2 1

]=

[I A−1

],

we know that the inverse of A is given by

A−1 =

[−5/2 3/2−2 1

].

Recall the formula for An and take n = −1 to obtain

(−1)−1

[4 − 3 × 2−1 −3 + 3 × 2−1

4 − 4 × 2−1 −3 + 4 × 2−1

]= −

[5/2 −3/22 −1

].

Obviously, the formula is valid for the case n = −1. In fact, the formula is also valid for the general casewhen n is an integer (here n can be positive, negative as well as zero). 2

Remark. We note that if detD 6= 0 and D is a diagonal matrix, then the inverse of D is the matrixwhose diagonal entries are the inverses of the diagonal entries of D (note that none of these is zero, sincedetD 6= 0). Also, for integer n > 0,

An = PDnP−1,

so inverting, we get

A−n = P(D−1)nP−1,

so our formula works even for n < 0, provided D is invertible (⇐⇒ A is invertible).

216

Example 4.29 (Recurrence relations)

Solve the system of recurrence relations given that x0 = 0 and y0 = −1.

xn+1 = 2xn − 3yn,

yn+1 = 4xn − 5yn.

Solution Let A be the coefficient matrix. Then the given recurrence relations become xn+1 = Axn,where xn = (xn, yn). We know that the matrix A has distinct real eigenvalues λ1 = −1, λ2 = −2. The

corresponding eigenvectors are v1 = (1, 1), v2 = (3, 4). Thus if we construct P =[v1 v2

]=

[1 31 4

],

then D =

[−1 00 −2

]. Hence,

xn = Anx0 = PDnP−1 x0

=

[1 31 4

] [(−1)n 0

0 (−2)n

] [4 −3−1 1

] [0−1

]

=

[(−1)n 3 × (−2)n

(−1)n 4 × (−2)n

] [3−1

],

[xn

yn

]=

[3 × (−1)n − 3 × (−2)n

3 × (−1)n − 4 × (−2)n

].

2

Example 4.30 (Recurrence relations)

Solve the system of recurrence relations given that x0 = 0 and y0 = −1.

xn+1 = 2xn − yn − 1,

yn+1 = −xn + 2yn + 2.

Solution The relations can be written in matrix form as xn+1 = Axn + b, where A =

[2 −1−1 2

]and

b =

[−12

]. It is then easy to prove by induction on n,

xn = Anx0 + (An−1 + · · · + A + I)b. (4.5)

Also it could be verified that

An =1

2

[1 + 3n 1 − 3n

1 − 3n 1 + 3n

]=

1

2U +

3n

2V,

where U =

[1 11 1

]and V =

[1 −1−1 1

]. Hence,

An−1 + · · · + A + I =n

2U +

(3n−1 + · · · + 3 + 1)

2V =

n

2U +

(3n − 1)

4V.

Thus (4.5) gives

xn =

(1

2U +

3n

2V

) [0−1

]+

(n

2U +

(3n − 1)

4V

) [−12

],

which simplifies to[xn

yn

]=

1

4

[2n + 1 − 3n

2n − 5 + 3n

].

2

217


Example 4.31 (Fibonacci numbers)

The Fibonacci numbers are 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, · · · . Equivalently, the numbers can be defined bythe following recurrence relation

ak+2 = ak+1 + ak, a0 = 1, a1 = 1.

Find the formula for ak.

Solution Let xk = (ak+1, ak). Then

xk+1 =

[ak+2

ak+1

]=

[ak+1 + ak

ak+1

]=

[1 11 0

] [ak+1

ak

]= Axk, where A =

[1 11 0

].

Hence, for k = 0, 1, 2, · · · ,

xk = Axk−1 = A2xk−2 = · · · = Akx0, where x0 =

[a1

a0

]=

[11

].

Thus we need to find out the formula for Ak. The characteristic equation of A is det (A − λI) = 0, or

det

[1 − λ 1

1 −λ

]= −λ (1 − λ) − 1 = λ2 − λ − 1 = 0

which gives the distinct real eigenvalues λ1 = 1+√

52

, λ2 = 1−√

52

.

· For λ1 = 1+√

52

,

A − 1+√

52

· I =

[1−

√5

21

1 − 1+√

52

]−→

[1 − 1+

√5

2

0 0

].

By solving (A − 1+√

52

· I)x = 0, x = (x1, x2), we have x1 = 1+√

52

x2. Thus,

x = (1 +

√5

2x2, x2) = x2 (

1 +√

5

2, 1),

we get the eigenvector v1 = ( 1+√

52

, 1).

· For λ2 = 1−√

52

, similarly, we get the eigenvector v2 = ( 1−√

52

, 1).

Thus we have the formula for Ak,

Ak = PDkP−1 =[v1 v2

] [λk

1 0

0 λk2

] [v1 v2

]−1, where

[v1 v2

]−1=

1√5

[1 − 1−

√5

2

−1 1+√

52

].

It follows that

xk = Akx0 =[v1 v2

] [λk

1 0

0 λk2

]· 1√

5

[1 − 1−

√5

2

−1 1+√

52

]·[11

]

=1√5

[λk

1v1 λk2v2

][

1+√

52

− 1−√

52

]=

1

2√

5

[λk

1 (1 +√

5)v1 − λk2 (1 −

√5)v2

].

Therefore, the formula for ak is given by the second component of xk, i.e.,

ak =1√5

[(1 +

√5

2

)k+1

−(

1 −√

5

2

)k+1].

2

218

Chapter 5

Vector Geometry

5.1 Geometric Vectors

5.2 Dot Product and Inner Product

The definition of a vector space V involves an arbitrary field K. In this chapter we restrict K to be thereal field R and we call V a real vector space. In the following we place an additional structure on avector space V to obtain an inner product space. We begin with a definition.

17 Definition Let V be a real vector space. Suppose to each pair of vectors u, v ∈ V there is assigneda scalar 〈u,v〉 ∈ R. This mapping is called an inner product in V if it satisfies the following axioms:

(i) 〈au1 + bu2, v〉 = a 〈u1,v〉 + b 〈u2,v〉.

(ii) 〈u,v〉 = 〈v,u〉.

(iii) 〈u,u〉 > 0; and 〈u,u〉 = 0 if and only if u = 0.

The vector space V with an inner product is called an inner product space.

Observe that 〈u,u〉 is always real, and so the inequality relation in (iii) makes sense. We also use thenotation

‖u‖ =√

〈u,u〉.This non-negative real number ‖u‖ is called the norm or length of u.

A real inner product space is sometimes called a Euclidean space.

Consider the dot product of two vectors in Rn:

x · y = (x1, x2, · · · , xn) · (y1, y2, · · · , yn) = x1y1 + x2y2 + · · · + xnyn.

This is an inner product on Rn, and Rn with this inner product is usually referred to as Euclideann-space. Although there are many different ways to define an inner product on Rn, we shall assume thisinner product on Rn unless otherwise stated.

219

5. Vector Geometry

5.3 Line and Plane

We emphasize that the vectors x and y must have the same “length” (dimension of the Euclidean space).For instance, it is meaningless to talk about the dot product of a vector in R2 and a vector in R3.

Example 5.1 (Dot product interpretation of linear equation)

The linear equation2x1 + 3x2 − x3 = 1

may be interpreted as(2, 3,−1) · (x1, x2, x3) = 1.

Therefore a solution of the equation can be considered as a vector x = (x1, x2, x3) such that its dot productwith (2, 3,−1) is 1.

Similarly, a system of linear equations can be considered as constraints on several dot products. Forexample, the system in Example 2.7 (page 49) means the dot product of x = (x1, x2) with (2,−1), (−1, 1)are respectively −1 and 2. 2

The multiplication AB of two matrices may be considered as the dot products between all the rows ofA and all the columns of B. For instance, from Example 1.8 (page 4), we have

w1

w2

w3

[v1 v2

]=

w1 · v1 w1 · v2

w2 · v1 w2 · v2

w3 · v1 w3 · v2

.

The dot product has the following properties

x · y = y · x, (x + y) · z = x · z + y · z,

(cx) · y = c (x · y) = x · (cy).

Moreover,x · x > 0 and x · x = 0 ⇐⇒ x = 0.

Many geometrical quantities can be deduced from the dot product. First of all, from the Pythagoreantheorem we have the length (called norm in mathematics)

‖x‖ =√

x21 + x2

2 + · · · + x2n =

√x · x.

x1

x2 x

‖x‖ =√

x21 + x2

2

Figure 5.1: Norm in R2

220

5.3 Line and Plane

The norm has the following properties

1. ‖cx‖ = |c| ‖x‖.

2. Schwartz inequality: |x · y| 6 ‖x‖ ‖y‖.

3. Triangle inequality: ‖x + y‖ 6 ‖x‖ + ‖y‖.

The distance between two points x and y is ‖x − y‖. The angle θ between two nonzero vectors x,y is given by

cos θ =x · y

‖x‖ ‖y‖ . (5.1)

The area of the parallelogram spanned by x and y is

A = ‖x‖ ‖y‖ | sin θ| = ‖x‖ ‖y‖√

1 − cos2 θ =√

‖x‖2 ‖y‖2 − (x · y)2.

Note that the term inside the square root is nonnegative by the Schwartz inequality.

Example 5.2 (Dot product and geometry)

Letx = (1, 2, 3), y = (0, 1, 4).

Thendot product: x · y = 1 · 0 + 2 · 1 + 3 · 4 = 14.

norm: ‖x‖ =√

12 + 22 + 32 =√

14.

norm: ‖y‖ =√

02 + 12 + 42 =√

17.

distance: ‖x − y‖ =√

(1 − 0)2 + (2 − 1)2 + (3 − 4)2 =√

3.

angle: θ = cos−1(14√

14√

17) = 24.84.

2

Two vectors x and y in Rn are orthogonal (or perpendicular), denoted x ⊥ y, if the angle betweenthem is 90. From (5.1), this means

x ⊥ y ⇐⇒ x · y = 0.

221

5. Vector Geometry

For orthogonal vectors we have the following famous Pythagorean theorem.

x ⊥ y =⇒ ‖x + y‖2 = ‖x‖2 + ‖y‖2. (5.2)

The proof is quite simple:‖x + y‖2 = (x + y) · (x + y)

= x · x + x · y + y · x + y · y∗= x · x + y · y

= ‖x‖2 + ‖y‖2.

The condition x ⊥ y is used for the equality labeled ∗.

Example 5.3 (Orthogonality between two vectors)

The two vectors in Example 5.2 are not orthogonal. The vectors (1, 2, 3) and (1, 1,−1) are orthogonal:

(1, 2, 3) · (1, 1,−1) = 1 · 1 + 2 · 1 + 3 · (−1) = 0.

In fact, a vector x = (x1, x2, x3) is orthogonal to (1, 2, 3) means x1 + 2x2 + 3x3 = 0. All such vectors(solutions of the homogeneous equation) form a plane in R3 (see Figure 5.2). 2

u = (1, 2, 3)

x ⊥ u

Figure 5.2: Plane orthogonal to u

5.4 Orthogonal and Orthonormal Set

A set v1, v2, · · · , vk of vectors is called an orthogonal set, if they are pairwise orthogonal:

vi · vj = 0, for i 6= j.

The set is said to be orthonormal if it further satisfies

‖vi‖ = 1, for any i.

This means all vectors also have norm 1. If no vectors in an orthogonal set v1, v2, · · · , vk are zero,then by dividing the corresponding lengths,

v1

‖v1‖,

v2

‖v2‖, · · · ,

vk

‖vk‖

becomes an orthonormal set.

222

5.5 Orthogonality and Linear Independence

Example 5.4 (Orthogonal and orthonormal set)

The vectorsv1 = (1,−2,−2), v2 = (−2, 1,−2), v3 = (−2,−2, 1)

form an orthogonal set because

v1 · v2 = 1 · (−2) + (−2) · 1 + (−2) · (−2) = 0,

v1 · v3 = 1 · (−2) + (−2) · (−2) + (−2) · 1 = 0,

v2 · v3 = (−2) · (−2) + 1 · (−2) + (−2) · 1 = 0.

To make the set orthonormal, we need to modify the lengths of the vectors to 1. By ‖v1‖ = ‖v2‖ = ‖v3‖ = 3,we get an orthonormal set that consists of

v1

‖v1‖=

(1

3, −2

3, −2

3

),

v2

‖v2‖=

(−2

3,

1

3, −2

3

),

v3

‖v3‖=

(−2

3, −2

3,

1

3

).

2

Example 5.5 (Complete orthogonal set)

We have seen in Example 5.3 that (1, 2, 3) and (1, 1,−1) are orthogonal. We would like to find anothervector v so that (1, 2, 3), (1, 1,−1), and v form an orthogonal set. Let v = (x1, x2, x3). Then

v ⊥ (1, 2, 3) ⇐⇒ v · (1, 2, 3) = x1 + 2x2 + 3x3 = 0,

v ⊥ (1, 1,−1) ⇐⇒ v · (1, 1,−1) = x1 + x2 − x3 = 0.

Therefore the problem boils down to solving the homogeneous system of two equations. The general solutionis v = x3 (5,−4, 1). For example, (1, 2, 3), (1, 1,−1), (5,−4, 1) form an orthogonal set. 2

5.5 Orthogonality and Linear Independence

Theorem 5.5.1 Given an orthogonal set v1, v2, · · · , vk of nonzero vectors, we have

1. x = c1v1 + c2v2 + · · · + ckvk =⇒ ci =x · vi

vi · vi

.

2. v1, v2, · · · , vk are linearly independent.

Let us take an orthogonal set v1, v2, v3 of three nonzero vectors for example. If

x = c1v1 + c2v2 + c3v3,

then by taking dot product with v1, we get

x · v1 = c1v1 · v1 + c2v2 · v1 + c3v3 · v1.

Since the set is orthogonal, we have v2 · v1 = v3 · v1 = 0, so that x · v1 = c1 (v1 · v1). However,

v1 6= 0 =⇒ v1 · v1 = ‖v1‖2 6= 0.

Therefore we conclude that c1 =x · v1

v1 · v1. By similar argument, we get formulae for c2 and c3.

By taking x = 0 in the first statement, we get the second statement.

223

5. Vector Geometry

Example 5.6 (Orthogonal basis)

Consider the orthogonal set v1, v2, v3 in Example 5.4. By Theorem 5.5.1 (page 223), they must belinearly independent and consequently, must be a basis of R3. Moreover, the theorem also tells us how toexpress a vector as a linear combination of v1, v2, v3. For example, for x = (1, 0, 0), we have

x · v1 = 1, x · v2 = −2, x · v3 = −2,

v1 · v1 = 9, v2 · v2 = 9, v3 · v3 = 9.

Therefore,

x =1

9v1 −

2

9v2 −

2

9v3.

2

The example above suggests the following important consequence of Theorem 5.5.1.

Theorem 5.5.2 An orthogonal set of n nonzero vectors in Rn is a basis of Rn.

Thus we define an orthogonal basis to be a basis consisting of orthogonal vectors. Then the theoremabove tells us that

orthogonal basis of Rn ⇐⇒ n nonzero and orthogonal vectors in Rn.

Furthermore, if the vectors have norm 1, then we call the basis an orthonormal basis.

Example 5.7 (Orthogonal and orthonormal basis)

From Example 5.4, (page 223) we see that

(1,−2,−2), (−2, 1,−2), (−2,−2, 1)

is an orthogonal basis of R3, and(

1

3, −2

3, −2

3

),

(−2

3,

1

3, −2

3

),

(−2

3, −2

3,

1

3

)(5.3)

is an orthonormal basis of R3. From Example 5.5, we see that

(1, 2, 3), (1, 1,−1), (5,−4, 1) (5.4)

is an orthogonal basis of R3. 2

Example 5.8 (Standard basis is orthonormal)

The standard basise1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1)

is an orthonormal basis of R3. In general, the standard basis e1, e2, · · · , en is an orthonormal basis ofRn.

2

224

5.6 Orthogonal Matrix

5.6 Orthogonal Matrix

Orthogonal and orthonormal bases may be understood from matrix viewpoint. Take three vectorsv1, v2, v3 in R3 for example. We have the corresponding matrix U =

[v1 v2 v3

]and (see

discussion after Example 5.1, page 220)

UtU =

vt

1

vt2

vt3

[v1 v2 v3

]=

v1 · v1 v1 · v2 v1 · v3

v2 · v1 v2 · v2 v2 · v3

v3 · v1 v3 · v2 v3 · v3

.

Therefore,

v1, v2, v3 orthogonal ⇐⇒ UtU is diagonal,

v1, v2, v3 orthonormal ⇐⇒ UtU = I.

These can be easily generalized to the case of more vectors. In particular, we have the following matrixinterpretation of orthonormal bases.

Theorem 5.6.1 The following are equivalent for n × n (square) matrix U.

1. The columns of U form an orthonormal basis.

2. UtU = I.

3. U is invertible, and U−1 = Ut.

Such matrices are called orthogonal matrices, although “orthonormal matrices” appears to be amore appropriate name.

Example 5.9 (Orthogonal matrix)

The orthonormal basis (5.3) corresponds to the orthogonal matrix

1

3

1 −2 −2−2 1 −2−2 −2 1

.

If we make the orthogonal basis (5.4) into an orthonormal one (by dividing norms), then we get anotherorthogonal matrix

1/

√14 1/

√3 5/

√42

2/√

14 1/√

3 −4/√

42

3/√

14 −1/√

3 1/√

42

.

We further note that by the third property in Theorem 5.6.1, we have

U−1 = Ut =

1/

√14 2/

√14 3/

√14

1/√

3 1/√

3 −1/√

3

5/√

42 −4/√

42 1/√

42

.

2

225

5. Vector Geometry

5.7 Orthogonal Projection

For a vector x = (x1, x2) in R2, the two coordinates x1 and x2 can be considered as orthogonalprojections to the e1 (representing x1-axis) and e2 (representing x2-axis) directions. In general, we mayconsider a span (a “multidirection”)

S = span (v1,v2, · · · ,vk)

in Rn and a vector x ∈ Rn. If we express x as

x = y + z, y ∈ S, z ⊥ any vector in S. (5.5)

Then we call y the orthogonal projection of x onto S. We denote

y = projS x.

Clearly,x ∈ S ⇐⇒ x = projS x. (5.6)

Theorem 5.7.1 Suppose S is the span of an orthogonal set v1, v2, · · · , vk of nonzero vectors.Then

projS x =x · v1

v1 · v1v1 +

x · v2

v2 · v2v2 + · · · + x · vk

vk · vk

vk. (5.7)

0

z x

yv

y = projspan (v) x

“projection to a line”

S0

z

x

y

y = projS x

“projection to a plane”

Figure 5.3: Orthogonal projection

We remark that the formula for projS x is the same as the one given in Theorem 5.5.1 (page 223). Inthat theorem we are in the situation that x ∈ S (which implies x = projS x by (5.6)). Therefore we maythink of (5.7) as generalization of the formula in Theorem 5.5.1.

The proof of Theorem 5.7.1 is also similar. Again consider an orthogonal set v1, v2, v3 of threenonzero vectors for example. Since y ∈ S, we have

y = c1v1 + c2v2 + c3v3.

226

5.7 Orthogonal Projection

Then taking dot product ofx = y + z = c1v1 + c2v2 + c3v3 + z

with v1, we get

x · v1 = c1v1 · v1 + c2v2 · v1 + c3v3 · v1 + z · v1.

Since the set is orthogonal, we have v2 · v1 = v3 · v1 = 0. Since z is orthogonal to any vector in S,including v1, we have z · v1 = 0. Therefore,

x · v1 = c1 v1 · v1

and the formula for c1 follows. We can get the formulae for the other coefficients similarly.

Example 5.10 (Orthogonal projection onto a line)

The orthogonal projection of x onto the line spanned by a nonzero vector v is

projspan (v) x =x · vv · v v.

For example, we have

projspan (1, 2, −4) (1, 1, 0) =(1, 1, 0) · (1, 2,−4)

(1, 2,−4) · (1, 2,−4)(1, 2,−4)

=3

21(1, 2,−4) =

(1

7,

2

7, −4

7

),

projspan (1, 2, 1, 2) (2, 1, 2, 2) =(2, 1, 2, 2) · (1, 2, 1, 2)

(1, 2, 1, 2) · (1, 2, 1, 2)(1, 2, 1, 2)

= (1, 2, 1, 2).

2

Example 5.11 (Orthogonal projection onto a plane)

Consider the orthogonal vectors v1 = (1, 2, 3), v2 = (1, 1,−1) in Example 5.3 (page 222). For x = (0, 1, 4),y = (4, 0, 1), we have

x · v1 = 14, y · v1 = 7, v1 · v1 = 14,

x · v2 = −3, y · v2 = 3, v2 · v2 = 3.

Therefore the orthogonal projections onto the span of v1, v2 are

projspan (v1, v2) x = v1 − v2 = (0, 1, 4),

projspan (v1, v2) y =1

2v1 + v2 =

(3

2, 2,

1

2

).

Note that since (0, 1, 4) = x and

(3

2, 2,

1

2

)6= y, we see that

x is in the span, while y is not in the span.

2

227

5. Vector Geometry

The orthogonal projection has the following useful interpretation.

Theorem 5.7.2 projSx is the vector on S that is closest to x.

In other words, the orthogonal projection is the best approximation of x by a vector in S. Thus manyoptimization problems can be interpreted as finding orthogonal projections.

To prove the theorem, we compare any vector w in S with the orthogonal projection y = projS x,which is characterized by (5.5). Since z is orthogonal to any vector in S, including y and w, we have

z · (y − w) = z · y − z · w = 0 − 0 = 0.

Therefore,

‖x − w‖2 = ‖y + z − w‖2

= ‖(y − w) + z‖2

∗= ‖y − w‖2 + ‖z‖2

> ‖z‖2

= ‖x − y‖2,

where the above equality ∗ follows from the Pythagorean theorem (5.2).

Because of the theorem, we define the distance from x to S to be

dist (x, S) = ‖z‖ = ‖x − projS x‖.

Example 5.12 (Distance and orthogonal projection)

In Example 5.11 (page 227), the distance between y and span (v1,v2) is

dist (y, span (v1,v2)) =

∥∥∥∥(4, 0, 1) − (3

2, 2,

1

2)

∥∥∥∥ =

√42

2.

On the other hand, the distance between x and span (v1,v2) is

dist (x, span (v1,v2)) = ‖(0, 1, 4) − (0, 1, 4)‖ = 0.

This is not surprising because x is already inside span (v1,v2). 2

228

5.8 Gram–Schmidt Orthogonalization

5.8 Gram–Schmidt Orthogonalization

We have seen how useful orthogonal sets and orthogonal bases are. However, it often happens a given setof vectors is not orthogonal, therefore we need an algorithm for turning any set into an orthogonal set. Theprocess of such kind is called orthogonalization.

Example 5.13 (Orthogonalizations for two vectors)

The vectors x = (1, 2, 3), y = (0, 1, 4) in Example 5.2 (page 221) are not orthogonal. To get twoorthogonal vectors from this, we choose the first vector v1 = x = (1, 2, 3). As for the second vector, we“shift” y until it becomes orthogonal to x. From the picture we see that the shifting is

v2 = y − projspan (v1) y

= (0, 1, 4) − 0 · 1 + 1 · 2 + 4 · 312 + 22 + 32

(1, 2, 3)

= (−1,−1, 1).

Therefore, v1, v2 form an orthogonal set. In fact, v1, v2 differs from the orthogonal set in Example 5.3(page 222) by multiplying some nonzero scalars. Such multiplications do not change orthogonality. 2

0v1 = u1

v2

0v1

u2

0

v3

u3

v1

v2

Figure 5.4: Gram–Schmidt orthogonalization

The idea of shifting the vectors to become orthogonal ones can be applied to the case of many vectors.For a given (linearly independent) set of vectors u1, u2, · · · , uk, we deduce the following orthogonal set

v1 = u1,

v2 = u2 − projspan (v1) u2∗= u2 −

u2 · v1

v1 · v1v1,

v3 = u3 − projspan (v1, v2) u3∗= u3 −

u3 · v1

v1 · v1v1 −

u3 · v2

v2 · v2v2,

...

(5.8)

where the equalities labeled ∗ follow from the fact that (inductively) v1, v2, · · · , vi is alreadyorthogonal, so that the formula (5.7) in Theorem 5.7.1 (page 226) can be applied.

229

5. Vector Geometry

The above algorithm (5.8) for producing an orthogonal set

v1, v2, · · · , vk

from any given (linearly independent) set

u1, u2, · · · , uk

is called the Gram–Schmidt process. The linear independence is needed to make sure that none of vi’syou get can become zero, so that the process can continue until the end.

From the geometric meaning of the process, we see that

span (u1, u2, · · · , uk) = span (v1, v2, · · · , vk). (5.9)

In fact, we have

span (u1, u2, · · · , ui) = span (v1, v2, · · · , vi), for 1 6 i 6 k.

Example 5.14 (Gram–Schmidt orthogonalization)

Consider the three vectors u1 = (1, 1,−1, 0), u2 = (0, 2,−1, 1), u3 = (−1, 0, 2, 2). The Gram–Schmidtprocess gives us the following orthogonal set

v1 = u1

= (1, 1,−1, 0),

v2 = u2 − projspan (v1) u2

= (0, 2,−1, 1) − 3

3(1, 1,−1, 0) = (−1, 1, 0, 1),

v3 = u3 − projspan (v1, v2) u3

= (−1, 0, 2, 2) − −3

3(1, 1,−1, 0) − 3

3(−1, 1, 0, 1) = (1, 0, 1, 1).

We may then use this orthogonal set to compute the orthogonal projection onto (see (5.9))

S = span (u1, u2, u3) = span (v1, v2, v3).

For example, for y = (1, 1, 1, 1), we have

projS y =1

3(1, 1,−1, 0) +

1

3(−1, 1, 0, 1) +

3

3(1, 0, 1, 1) =

(1,

2

3,

2

3,

4

3

).

The distance from y to S is

dist (y, S) =

∥∥∥∥(1, 1, 1, 1) −(

1,2

3,

2

3,

4

3

)∥∥∥∥ =1√3.

2

230

5.9 Projection and Approximation (Least Squares Applications

5.9 Projection and Approximation (Least Squares Applica-tions

We often encounter a problemAx = b

that may have no solution. When a solution is demanded and none exists, the best one can do is to find anx that makes Ax as close as possible to b.

Think of Ax as an approximation to b. The smaller the distance between b and Ax, given by

‖b − Ax‖,the better the approximation. The general least-squares problem it to find an x that makes ‖b−Ax‖as small as possible. The adjective “least-squares” arises from the fact that ‖b − Ax‖ is the square rootof a sum of squares.

18 Definition If A is m × n and b is in Rm, a least-squares solution of Ax = b is an x inRn such that

‖b − Ax‖ 6 ‖b − Ax‖for all x in Rn.

The most important aspect of the least-squares problem is that no matter what x we select, the vectorAx will necessarily be in the column space (why?),

Col A.

So we seek an x that makes Ax the closest point in Col A to b. (Of course, if b happens to be inCol A, then b is Ax for some x, and such an x is a “least-squares solution”.)

• Solution of the general least-squares problem

Given A and b as above, apply the Best Approximation Theorem (page 228) to the subspace Col A.Let

b = projCol Ab.

Because b is in the column space of A, the equation Ax = b is consistent, and there is an x in Rn

such that

Ax = b. (5.10)

Since b is the closest point in Col A to b, a vector x is a least-squares solution of Ax = b if andonly if x satisfies (5.10). Such an x in Rn is a list of weights that will build b out of the columns ofA. There are many solutions of (5.10) if the equation has free variables.

Suppose x satisfies Ax = b. By the Orthogonal Decomposition Theorem (page 226), the projectionb has the property that b − b is orthogonal to Col A, so

b − Ax is orthogonal to each column of A.

If aj is any column of A, then

aj · (b − Ax) = 0, and atj (b − Ax) = 0.

231

5. Vector Geometry

Since each atj is a row of At,

At (b − Ax) = 0. (5.11)

Thus, Atb − AtAx = 0 and hence

AtAx = Atb.

These calculations show that each least-squares solution of Ax = b satisfies the equation

AtAx = Atb. (5.12)

The matrix equation (5.12) represents a system of equations called the normal equations for Ax = b.A solution of (5.12) is often denoted by x.

Theorem 5.9.1 The set of least-squares solutions of Ax = b coincides with the non-empty set ofsolutions of the normal equations AtAx = Atb.

Proof. As shown above, the set of least-squares solutions is nonempty and each least-squares solution xsatisfies the normal equations. Conversely, suppose x satisfies AtAx = Atb. Then x satisfies (5.11)above, which shows that b − Ax is orthogonal to the rows of At and hence is orthogonal to the columnsof A. Since the columns of A span Col A, the vector b−Ax is orthogonal to all of Col A. Hence theequation

b = Ax + (b − Ax)

is a decomposition of b into the sum of a vector in Col A and a vector orthogonal to Col A. By theuniqueness of the orthogonal decomposition, Ax must be the orthogonal projection of b onto Col A.That is, Ax = b, and x is a least-squares solution.

Example 5.15 (Least-squares solution)

Find a least-squares solution of the inconsistent system Ax = b for

A =

4 00 21 1

, b =

2011

.

Solution To use (5.12), compute

AtA =

[4 0 10 2 1

]

4 00 21 1

=

[17 11 5

],

Atb =

[4 0 10 2 1

]

2011

=

[1911

].

Then the equation AtAx = Atb becomes

[17 11 5

] [x1

x2

]=

[1911

].

232

5.9 Projection and Approximation (Least Squares Applications

Row operations can be used to solve this system, but since AtA is invertible and 2 × 2, it is probablyeasier to compute

(AtA)−1 =1

84

[5 −1−1 17

]

and then to solve AtAx = Atb as

x = (AtA)−1Atb =1

84

[5 −1−1 17

] [1911

]=

1

84

[84168

]=

[12

].

2

In many calculations, AtA is invertible, but this is not always the case. The next example involves amatrix of the sort that appears in what are called analysis of variance problems in statistics.


Find a least-squares solution of Ax = b for

A =

1 1 0 01 1 0 01 0 1 01 0 1 01 0 0 11 0 0 1

, b =

−3−10251

.

Solution Compute

AtA =

1 1 1 1 1 11 1 0 0 0 00 0 1 1 0 00 0 0 0 1 1

1 1 0 01 1 0 01 0 1 01 0 1 01 0 0 11 0 0 1

=

6 2 2 22 2 0 02 0 2 02 0 0 2

,

Atb =

1 1 1 1 1 11 1 0 0 0 00 0 1 1 0 00 0 0 0 1 1

−3−10251

=

4−426

.

The augmented matrix for AtAx = Atb is

6 2 2 2 42 2 0 0 −42 0 2 0 22 0 0 2 6

−→

1 0 0 1 30 1 0 −1 −50 0 1 −1 −20 0 0 0 0

.

The general solution is x1 = 3 − x4, x2 = −5 + x4, x3 = −2 + x4, and x4 is free. So the generalleast-squares solution of Ax = b has the form

x =

3−5−20

+ x4

−1111

.

2

233

5. Vector Geometry

The next theorem gives a useful criterion for determining when there is only one least-squares solutionof Ax = b. (Of course, the orthogonal projection b is always unique.)

Theorem 5.9.2 The matrix AtA is invertible if and only if the columns of A are linearly independent.In this case, the equation Ax = b has only one least-squares solution x, and it is given by

x = (AtA)−1At b. (5.13)

Formula (5.13) for x is useful mainly for theoretical purposes and for hand calculations when Axas an approximation to b, the distance from b to Ax is called the least-squares error of thisapproximation.


Given A and b as in Example 5.15 (page 232), determine the least-squares error in the least-squaressolution of Ax = b.

Solution From Example 5.15,

b =

2011

and Ax =

4 00 21 1

[12

]=

443

.

Hence,

b − Ax =

2011

−

443

=

−2−48

and‖b − Ax‖ =

√(−2)2 + (−4)2 + 82 =

√84.

The least-squares error is√

84. For any x in R2, the distance between b and the vector Ax is atleast

√84. 2

• Linear regression problems

In the analysis of experimental data, we will usually try to find a straight line that “best-fit” a set of data.

We will try to minimize the difference between the data points and the approximation straight line,usually in the sense of “least sum of the difference squares”, i.e., minimizing the sum of the differenceintervals length-squares.

We will see some examples in Worked Examples including

(i) Data fitting problems using a straight line.

(ii) Data fitting problems using a polynomial curve.

(iii) Data fitting problems using a general curve.

234

Chapter 5

Vector Geometry (True orFalse)

5.1 Let u, v, w be vectors in R3. If u ⊥ v, v ⊥ w, then u ⊥ w.

5.2 If u1 ⊥ v1 and u2 ⊥ v2, then (u1 + u2) ⊥ (v1 + v2).

5.3 If u ⊥ v and u ⊥ w, then u ⊥ (v + w).

5.4 If u ⊥ w and v ⊥ w, then (u + v) ⊥ w.

5.5 If u ⊥ w and v ⊥ w, then (2u − 3v) ⊥ w.

5.6 If u ⊥ v, then for any scalars a and b, au ⊥ bv.

5.7 If u ⊥ u, then u = 0.

5.8 If u ∈ V is orthogonal to all vectors in V, then u = 0.

5.9 If u1,u2, · · · ,un and v1,v2, · · · ,vn are orthogonal sets, then u1 +v1,u2 +v2, · · · ,un +vnis also an orthogonal set.

5.10 If u1,u2, · · · ,un is orthogonal, then u1,u2, · · · ,un,0 is also orthogonal.

235

5. Vector Geometry (True or False)

5.11 Every linearly independent set in Rn is an orthogonal set.

5.12 Every orthogonal set in Rn is linearly independent.

5.13 If v1,v2, · · · ,vn,vn+1, · · · ,vn+m is orthonormal, then v1,v2, · · · ,vn is also orthonormal.

5.14 If r is any number, then ‖ru − rv‖ = r‖u − v‖.

5.15 If ‖u − v‖2 = ‖u‖2 + ‖v‖2, then u ⊥ v.

5.16 If u ⊥ v, then ‖u − v‖2 = ‖u‖2 + ‖v‖2.

5.17 If the columns of a square matrix A are orthogonal, then the rows of A are also orthogonal.

5.18 If the columns of a square matrix A are orthonormal, then the rows of A are also orthonormal.

5.19 If the rows of a square matrix A are orthonormal, then the columns of A are also orthonormal.

5.20 If no two of u, v, w are parallel, then u, v, w are linearly independent.

5.21 If vectors v1,v2, · · · ,vk are orthogonal, then they are linearly independent.

5.22 If vectors v1,v2, · · · ,vk are orthonormal, then they are linearly independent.

5.23 If 2u, −3v, 5w is an orthogonal basis of R3, then u,v,w is also an orthogonal basis.

5.24 If u,v,w is an orthogonal basis of R3, then 2u, −3v, 5w is also an orthogonal basis.

5.25 A matrix with orthonormal columns is an orthogonal matrix.

236

5.26 Any orthogonal matrix is invertible.

5.27 If U is an orthogonal matrix, then U must be square.

5.28 If U is an orthogonal matrix, then so is U2.

5.29 If U and V are orthogonal matrices of same size, then so is UV.

5.30 If U is an orthogonal matrix, then so is −U.

5.31 If U is an orthogonal matrix, then so is Ut.

5.32 If U is an orthogonal matrix, then so is U−1.

5.33 If U and V are orthogonal matrices of same size, then so is U + V.

5.34 If U and V are orthogonal matrices of same size, then so is UV−1.

5.35 If U and V are orthogonal matrices of same size, then U2V3 is an orthogonal matrix.

5.36 If the columns of an n × n matrix U form an orthonormal basis of Rn, then UtU is diagonal.

5.37 If the rows of an n × n matrix U form an orthonormal basis of Rn, then UtU is diagonal.

5.38 If UtU is diagonal, then the columns of U are linearly independent.

5.39 If for an n × n matrix of U, UtU is diagonal, then the columns of U form an orthogonal basis ofRn.

5.40 projW

(u + v) = projW

u + projW

v.

237

5. Vector Geometry (True or False)

5.41 projW

(2u + 3v) = 2projW

u + 3projW

v.

5.42 If z is orthogonal to u1 and to u2 and if W = span u1,u2, then z must be in W⊥.

5.43 The orthogonal projection of u on S is x, then the orthogonal projection of 2u on S⊥ is 2u − 2x.

5.44 The orthogonal projection of u and v on S are x and y, then the orthogonal projection of u + v on Sis x + y.

5.45 projspanu1,u2v = projspanu1 v + projspanu2 v.

5.46 If u ∈ W, then the orthogonal projection of u onto W is u.

5.47 If the orthogonal projection of y onto W⊥ is y, then the orthogonal projection of 10y onto W⊥ is10y.

5.48 Vectors in ColA are orthogonal to vectors in NulA.

5.49 Vectors in RowA are orthogonal to vectors in NulA.

5.50 Vectors in ColA are orthogonal to vectors in RowA.

238

Chapter 5

Vector Geometry (WorkedExamples)

Example 5.1 (Dot product)

Find the dot product of x = (2, 2, 1) and y = (2, 5,−3).

Solution The dot product (or inner product or scalar product) of x and y is given by

x · y = (2, 2, 1) · (2, 5,−3) = 2 · 2 + 2 · 5 + 1 · (−3) = 11.

That is, x · y is obtained by multiplying corresponding components and adding the resulting products. Thevectors x and y are said to be orthogonal (or perpendicular) if their dot product is zero, that is, if x ·y = 0.Therefore, for this example, the two given vectors x and y are not orthogonal. 2

Example 5.2 (Norm)

For x = (1,−2, 3,−4, 5), find the norm ‖x‖.

Solution The norm of x is given by

‖x‖ =√

x · x =√

12 + (−2)2 + 32 + (−4)2 + 52 =√

55.

The norm (or length) of a vector x in Rn, denoted by ‖x‖, is defined to be the nonnegative square root ofx ·x. In particular, if x = (x1, x2, · · · , xn), then ‖x‖ =

√x2

1 + x22 + · · · + x2

n. That is, ‖x‖ is the square rootof the sum of the squares of the components of x. Thus, ‖x‖ > 0, and ‖x‖ = 0 if and only if x = 0. 2

Example 5.3 (Normalize a vector)

Find the rescaled vectorx

‖x‖ , where x = (2, 2, 1).

Solution By finding the norm ‖x‖ =√

22 + 22 + 12 = 3, we could normalize x to the following unit vector

x

‖x‖ =1

3(2, 2, 1) = (

2

3,

2

3,

1

3).

Verify that the norm of the above rescaled vector is√

(2/3)2 + (2/3)2 + (1/3)2 = 1. In general, a vector xis called a unit vector if ‖x‖ = 1 or, equivalently, if x · x = 1. For any nonzero vector x in Rn, the vector

239

5. Vector Geometry (Worked Examples)

x = (1/‖x‖)x = x/‖x‖ is the unique unit vector in the same direction of x. The process of finding x fromx is called normalizing x. 2

Example 5.4 (Schwarz inequality)

Prove that|x · y| 6 ‖x‖ ‖y‖.

Solution For any real number t, we have

0 6 (tx + y) · (tx + y) = t2(x · x) + 2t(x · y) + (y · y) = ‖x‖2 t2 + 2(x · y) t + ‖y‖2.

Let a = ‖x‖2, b = 2(x·y), c = ‖y‖2. Then, for every value of t, at2+bt+c > 0. This means that the quadraticpolynomial cannot have two distinct real roots. This implies that the discriminant D = b2 − 4ac 6 0.Equivalently, b2

6 4ac. Thus,4(x · y)2 6 4‖x‖2 ‖y‖2.

Dividing by 4 and taking the square root of both sides gives the inequality. 2

Remark. The angle θ between nonzero vectors x and y in Rn is defined by

cos θ =x · y

‖x‖ ‖y‖ .

This definition is well-defined, since, by the Schwarz inequality, −1 6x·y

‖x‖ ‖y‖ 6 1. Thus, −1 6 cos θ 6 1,

and so the angle exists and is unique. Note that if x · y = 0, then θ = π/2. This then agrees with ourprevious definition of orthogonality.

Example 5.5 (Angle between vectors)

Consider the vectors x = (2, 3, 5) and y = (1,−4, 3) in R3. Find cos θ, where θ is the angle between them.

Solution The angle θ between x and y is given by

cos θ =x · y

‖x‖ ‖y‖ =5√

38√

26.

Note that θ is an acute angle, since cos θ is positive. 2

Example 5.6 (Minkowski’s inequality)

Prove that‖x + y‖ 6 ‖x‖ + ‖y‖.

Solution By the Schwarz inequality and other properties of dot product, we have

‖x + y‖2 = (x + y) · (x + y) = (x · x) + 2(x · y) + (y · y) 6 ‖x‖2 + 2‖x‖ ‖y‖ + ‖y‖2 = (‖x‖ + ‖y‖)2.

Taking the square root of both sides gives the inequality. 2

Remark. The Minkowski’s inequality is often known as the triangle inequality, because if we view x + yas the side of the triangle formed with sides x and y, then the inequality just says that the length of oneside of a triangle cannot be greater than the sum of the lengths of the other two sides.

240

Example 5.7 (Orthogonal / Orthonormal set)

Determine whether the vectors u1 = (1, 3, 4), u2 = (−1,−1, 1) form an orthogonal set. For orthogonal set,further make them orthonormal.

Solution Since u1 · u2 = −1 − 3 + 4 = 0, we know that the vectors u1 and u2 are orthogonal. In otherwords, u1, u2 form an orthogonal set. Then by the norms ‖u1‖ =

√26, ‖u2‖ =

√3, we further obtain an

orthonormal set v1,v2, where

v1 =u1

‖u1‖=

(1, 3, 4)√26

= (1√26

,3√26

,4√26

), v2 =u2

‖u2‖=

(−1,−1, 1)√3

= (− 1√3, − 1√

3,

1√3).

2

Remark. Generally speaking, vectors v1, v2, · · · , vk in Rn are said to form an orthogonal set of vectorsif each pair of vectors are orthogonal, that is, vi · vj = 0 for i 6= j. Stronger than that, vectors v1, v2, · · · ,vk in Rn are said to form an orthonormal set of vectors if the vectors are unit vectors and they form anorthogonal set, that is,

vi · vj =

0, if i 6= j,

1, if i = j.

Normalizing an orthogonal set refers to the process of multiplying each vector in the set by the reciprocalof its length in order to transform the set into an orthonormal set of vectors. Of course, we have assumedthat there is no zero vector in the orthogonal set. Otherwise, division by zero may occur.


Determine all values of k so that the two vectors (1, 2,−3) and (k2, 1, k) are orthogonal.

Solution Two vectors are orthogonal if and only if their dot product is zero. Therefore by

(1, 2,−3) · (k2, 1, k) = 0 ⇐⇒ k2 − 3k + 2 = 0 ⇐⇒ (k − 1)(k − 2) = 0,

the two vectors are orthogonal for

k = 1 or 2.

2

Remark. Now think about the question how to verify whether three vectors form an orthogonal set ornot? In fact, for more than two vectors, we should be more careful for the verification process. The vectorsv1, v2, · · · , vk are said to form an orthogonal set if v1, v2, · · · , vk are pairwise orthogonal (or mutuallyorthogonal). That is,

v1 · v2 = 0, v1 · v3 = 0, v1 · v4 = 0, v1 · v5 = 0, · · · , v1 · vk−1 = 0, v1 · vk = 0,

v2 · v3 = 0, v2 · v4 = 0, v2 · v5 = 0, · · · , v2 · vk−1 = 0, v2 · vk = 0,

v3 · v4 = 0, v3 · v5 = 0, · · · , v3 · vk−1 = 0, v3 · vk = 0,

...

vk−1 · vk = 0.

We emphasize that among the vectors of an orthogonal set, it is permissible that some vj ’s are zero vectors.This is one important difference between orthogonal set and orthonormal set. For every orthonormal set,all vectors are unit vectors so that none of them are zero.

241



Let v1 = (1, 2, 5), v2 = (1, 2,−1), v3 = (1,−3, 1). Do v1, v2, v3 form an orthogonal set? If yes, rescale thevectors to make them orthonormal.

Solution It could be verified that

v1 · v2 = 1 + 4 − 5 = 0,

v1 · v3 = 1 − 6 + 5 = 0,

v2 · v3 = 1 − 6 − 1 = −6.

Since v2 · v3 6= 0, the vectors v1, v2, v3 are not orthogonal. 2


Let v1 = (1, 2, 1), v2 = (−1, 1,−1), v3 = (1, 0,−1). Do v1, v2, v3 form an orthogonal set? If yes, rescalethe vectors to make them orthonormal.

Solution It could be verified that

v1 · v2 = −1 + 2 − 1 = 0,

v1 · v3 = 1 + 0 − 1 = 0,

v2 · v3 = −1 + 0 + 1 = 0.

Hence, v1, v2, v3 form an orthogonal set. By the norms ‖v1‖ =√

6, ‖v2‖ =√

3, ‖v3‖ =√

2, we obtain anorthonormal set consisting the three unit vectors

v1

‖v1‖= (

1√6,

2√6,

1√6),

v2

‖v2‖= (− 1√

3,

1√3, − 1√

3),

v3

‖v3‖= (

1√2, 0, − 1√

2).

2

Remark. By a basis of Rn we mean a set of n vectors in Rn which are linearly independent. In addition,if the n vectors in Rn form an orthogonal set, then we call it an orthogonal basis of Rn. An orthogonal basisof Rn can always be normalized to form an orthonormal basis of Rn.


Show that the standard basis of Rn is orthonormal for every n.

Solution We consider n = 3 only. Let e1, e2, e3 = (1, 0, 0), (0, 1, 0), (0, 0, 1) be the standard basis ofR3. It is clear that

e1 · e2 = e1 · e3 = e2 · e3 = 0,

e1 · e1 = e2 · e2 = e3 · e3 = 1.

Namely, e1, e2, e3 is an orthonormal basis of R3. More generally, the standard basis of Rn is orthonormalfor every n. 2

242


Show that v1 = (1, 3,−1), v2 = (1,−1,−2) are orthogonal. Find a third vector v3 so that v1, v2, v3 forman orthogonal basis of R3.

Solution Verify that v1 · v2 = 1 − 3 + 2 = 0, so v1 and v2 are orthogonal. We need an additional vectorv = (x, y, z) such that v is pairwise orthogonal to v1 and v2. That is, v · v1 = 0 and v · v2 = 0. This yieldsthe homogeneous system

x +3y −z = 0,

x −y −2z = 0.

Let A be the coefficient matrix of the system. By doing row operations,

A =

[1 3 −11 −1 −2

]−→

[

1 0 −7/4

0

1 1/4

].

Here only the third column is nonpivot and z is the only free variable. The general solution is given by

xyz

=

7z/4−z/4

z

=z

4

7−14

.

Thus we may take

v3 = (7,−1, 4).

Now, as we want, the three (nonzero) vectors v1, v2, v3 form an orthogonal set. Furthermore, it follows fromTheorem 10.2.1 (Lecture Notes, page 191) that v1, v2, v3 in R3 are indeed linearly independent. Accordingly,v1, v2, v3 form an orthogonal basis of R3. 2

Remark. We may interpret Theorem 10.2.1 (Lecture Notes, page 191) in the following.

Suppose S is an orthogonal set of nonzero vectors. Then S is linearly independent. (5.1)

Proof. Suppose S = v1,v2, · · · ,vk and suppose

c1v1 + c2v2 + · · · + ckvk = 0. (5.2)

Taking the dot product of (5.2) with v1, we get

0 = 0 · v1 = (c1v1 + c2v2 + · · · + ckvk) · v1

= c1 v1 · v1 + c2 v2 · v1 + · · · + ck vk · v1

= c1 v1 · v1 + c2 · 0 + · · · + ck · 0= c1 v1 · v1.

Since v1 6= 0, we have v1 · v1 6= 0. Thus c1 = 0. Similarly, for i = 2, · · · , k, taking the dot product of (5.2)with vi,

0 = 0 · vi = (c1v1 + c2v2 + · · · + ckvk) · vi

= c1 v1 · vi + · · · + ci vi · vi + · · · + ck vk · vi

= ci vi · vi.

But vi · vi 6= 0, and hence ci = 0. Thus S is linearly independent.

243



Show that the columns of the following matrix U form an orthogonal basis of R3. Hence find U−1 andexpress (1, 2, 3) as a linear combination of the columns of U.

U =

1 2 −20 1 52 −1 1

.

Solution By direct multiplication of matrices, we have

UtU =

1 0 22 1 −1−2 5 1

1 2 −20 1 52 −1 1

=

5 0 00 6 00 0 30

,

and we denote the last diagonal matrix as Ddef.= diag (5, 6, 30). That is, UtU = D. Now we recall the fact

UtU is diagonal ⇐⇒ columns of U are orthogonal (5.3)

that the three columns of U form an orthogonal set. Since none of the columns are zero, then by (5.1), thethree columns of U in R3 are linearly independent and hence they form an orthogonal basis of R3. SincedetD 6= 0, D is invertible, then D−1UtU = I implies that U is invertible and its inverse is

U−1 = D−1Ut =

1/5 0 00 1/6 00 0 1/30

1 0 22 1 −1−2 5 1

=

1/5 0 2/51/3 1/6 −1/6

−1/15 1/6 1/30

.

Let v1 = (1, 0, 2), v2 = (2, 1,−1), v3 = (−2, 5, 1) be the three columns of U. That is, U =[v1 v2 v3

].

By Theorem 10.2.1 (Lecture Notes, page 191), since v1, v2, v3 form an orthogonal basis of R3, we couldexpress x = (1, 2, 3) as a linear combination of v1, v2, v3. We first find

x · v1 = 7, x · v2 = 1, x · v3 = 11,

v1 · v1 = 5, v2 · v2 = 6, v3 · v3 = 30.

Therefore,

x =x · v1

v1 · v1v1 +

x · v2

v2 · v2v2 +

x · v3

v3 · v3v3 =

7

5v1 +

1

6v2 +

11

30v3.

2

Remark. (i) We note that the matrix U in (5.3) could be non-square and even some columns of U couldbe zero vectors. However, if all columns of U are unit vectors, we have a stronger version of (5.3):

UtU = I (the identity matrix) ⇐⇒ columns of U are orthonormal. (5.4)

In fact, (5.3) (resp. (5.4) ) can be used to verify whether the columns of a given matrix U can form anorthogonal set (resp. an orthonormal set) or not. In case of square matrix U and none of the columns of Uare zero, then by (5.1), one can further verify whether the columns of U can form an orthogonal basis (resp.an orthonormal basis) of Rn. In this case, since U is square, UtU = I ⇐⇒ UUt = I ⇐⇒ U is invertibleand U−1 = Ut. A square matrix U satisfying U−1 = Ut is called an orthogonal matrix.

(ii) By Theorem 10.2.1 (Lecture Notes, page 191), if we write x = c1v1 + c2v2 + · · · + ckvk, the scalars c1,c2, · · · , ck could be formally determined provided that v1, v2, · · · , vk are nonzero and they can form anorthogonal set. In Example 5.13, the three columns of U can satisfy this requirement since they can forman orthogonal basis of R3, as indeed the question required us to do so.

244


Show that the columns of the following matrix U form an orthogonal basis of R4. Then express (1, 0, 0, 0)and (0, 1, 2, 3) as linear combinations of the columns of U.

U =

1 −1 −1 21 1 −2 −11 1 2 11 −1 1 −2

.

Solution The columns of U are orthogonal because

UtU =

1 1 1 1−1 1 1 −1−1 −2 2 12 −1 1 −2

1 −1 −1 21 1 −2 −11 1 2 11 −1 1 −2

=

4 0 0 00 4 0 00 0 10 00 0 0 10

is a diagonal matrix. The four nonzero columns of U form an orthogonal basis of R4. Let x = (1, 0, 0, 0),y = (0, 1, 2, 3) and denote that U =

[u1 u2 u3 u4

], where uj is the j-th column of U. Then

x =x · u1

u1 · u1u1 +

x · u2

u2 · u2u2 +

x · u3

u3 · u3u3 +

x · u4

u4 · u4u4 =

1

4u1 −

1

4u2 −

1

10u3 +

1

5u4,

y =y · u1

u1 · u1u1 +

y · u2

u2 · u2u2 +

y · u3

u3 · u3u3 +

y · u4

u4 · u4u4 =

3

2u1 + 0u2 +

1

2u3 −

1

2u4.

2

Remark. As we mentioned in the previous remark (page 244), if U is an n × n real matrix, then thefollowing are equivalent:

1. U is an orthogonal matrix.

2. The columns of U form an orthonormal basis of Rn.

3. UtU = I.

4. U is invertible, and U−1 = Ut.

In fact, since U is square, then we further have

U is an orthogonal matrix ⇐⇒ columns of U form an orthonormal basis of Rn

⇐⇒ UtU = I

⇐⇒ UUt = I

⇐⇒ rows of U form an orthonormal basis of Rn.

In the above, UtU = I ⇐⇒ UUt = I follows from the fact that for any square matrices A, B, then AB = I=⇒ BA = I (Review Notes for Linear Algebra – True or False, 5.12). Then UUt = I ⇐⇒ (Ut)t Ut = I ⇐⇒columns of Ut form an orthonormal basis of Rn ⇐⇒ rows of U form an orthonormal basis of Rn. So nexttime when you see the keyword “orthogonal matrix”, you may recall any of the above equivalent statementsif necessary.

245



(a) Let

P =

1/

√3 1/

√3 1/

√3

0 1/√

2 −1/√

2

2/√

6 −1/√

6 −1/√

6

.

The columns (as well as the rows) of P are orthogonal to each other and are unit vectors. Thus P isan orthogonal matrix.

(b) Let P be an 2 × 2 orthogonal matrix. Then, for some real number θ, we have

P =

[cos θ sin θ− sin θ cos θ

]or P =

[cos θ sin θsin θ − cos θ

].

2


Prove that the product of two orthogonal matrices is again orthogonal.

Solution Suppose P and Q are orthogonal matrices. It follows that P−1 = Pt and Q−1 = Qt. By

(PQ)−1 = Q−1 P−1 = Qt Pt = (PQ)t

that implies PQ is again orthogonal. 2


Prove that the determinant of an orthogonal matrix is ±1.

Solution Suppose P is an orthogonal matrix. Then P−1 = Pt. It follows that

1 = det I = det (PP−1) = det (PPt) = detP · detPt.

But recall that detPt = detP and hence (detP)2 = 1. Therefore,

detP = ±1.

2


Let U be a square matrix. Show that if the columns of U are orthonormal, then the rows of U are alsoorthonormal. Give an example of a square matrix A such that the columns of A are orthogonal, but the rowsof A are not.

Solution By (5.4), if the columns of U are orthonormal, then UtU = I. Given that U is a square matrix,then U is indeed an orthogonal matrix and U−1 = Ut. It follows that

(Ut)t (Ut) = UUt = UU−1 = I.

The above implies that the columns of Ut are orthonormal. Equivalently, the rows of U are orthonormal.

Take A =

[2 1/2−1 1

]. Then the columns of A are orthogonal but the rows are not. 2

246

Example 5.19 (Orthogonal projection)

Find the orthogonal projection of x = (4,−4, 3) onto the line spanned by v = (5,−1, 2). Then find thedistance from x to the line.

Solution Let S be the subspace (line) of R3 spanned by v. That is, S = span (v) = kv : all number k.Then the orthogonal projection of x onto S is

projS x =x · vv · v v =

30

30v = (5,−1, 2).

Since projS x 6= x, we see that x is not in the span S. The distance from x to S is

dist (x, S) = ‖x − projS x‖ = ‖(4,−4, 3) − (5,−1, 2)‖ = ‖(−1,−3, 1)‖ =√

11.

2

Remark. (i) Recall from Theorem 10.3.1 (Lecture Notes, page 195) that the orthogonal projection of x ontoa subspace S of Rn can be formally determined provided that an orthogonal basis of S should be known. Forquestions if the subspace S is given as a span of some given vectors (say v1, v2, · · · , vk), then you are indeedrequired to prove that the vectors v1, v2, · · · , vk are orthogonal (and none of them are zero) so that theyare linearly independent by (5.1) and hence they can form an orthogonal basis for S. In general cases, if youare required to find the orthogonal projection of vector x onto some subspace of Rn (for example, null space,row space, column space of some matrix), again it is supposed you should first find an orthogonal basis of thesubspace and accordingly use Theorem 10.3.1 (Lecture Notes, page 195) to construct the orthogonal projection.(ii) If projS x 6= x, then x 6∈ S. The (shortest) distance from x to S is given by the norm of the orthogonalprojection of x onto the orthogonal complement of S. That is, dist (x, S) = ‖z‖ = ‖x−y‖ = ‖x−projS x‖.(iii) We just mentioned the keyword “orthogonal complement”, it is better to give a formal definition forthis concept. Let S be a subspace of Rn. The orthogonal complement of S, denoted by S⊥, consists of thosevectors in Rn that are orthogonal to every vector y in S, that is,

S⊥ = z ∈ Rn : z · y = 0 for every y ∈ S.

For an example, suppose u is the orthogonal projection of v to W⊥, then what is the orthogonal projectionof −v to W? v + u, v − u, −v − u or −v + u? In fact, the projection of −v to W⊥ is −u. Thereforethe projection of −v to W is (−v) − (−u) = u − v.


Find the orthogonal projection of x = (1,−2,−7) onto the plane spanned by the orthogonal vectors

v = (1, 1,−1), w = (1,−2,−1).

Then find the distance from x to the plane.

Solution By v · w = 1 − 2 + 1 = 0, we know that v and w are indeed orthogonal. Suppose S be thesubspace of R3 spanned by v and w. Since both v and w are nonzero, by (5.1), they are linearly independentand hence they form an orthogonal basis for S. That is, S = span (v,w) = plane in R3. By Theorem 10.3.1(Lecture Notes, page 195), the orthogonal projection of x onto S is

projS x =x · vv · v v +

x · ww · w w = 2v + 2w = (4,−2,−4).

Since projS x 6= x, we see that x is not in the plane S. The distance from x to the plane S is

dist (x, S) = ‖x − projS x‖ = ‖(1,−2,−7) − (4,−2,−4)‖ = ‖(−3, 0,−3)‖ = 3√

2.

2

247



Find the orthogonal projection of x = (2, 1, 3,−2) onto the plane spanned by orthogonal vectors

v = (1, 1, 1, 1), w = (2,−1, 1,−2).

Then find the distance from x to the plane.

Solution By v · w = 2 − 1 + 1 − 2 = 0, we know that v and w are indeed orthogonal. Suppose S be thesubspace of R4 spanned by v and w. Since both v and w are nonzero, by (5.1), they are linearly independentand hence they form an orthogonal basis for S. That is, S = span (v,w) = plane in R4. By Theorem 10.3.1(Lecture Notes, page 195), the orthogonal projection of x onto S is

projS x =x · vv · v v +

x · ww · w w =

4

4v +

10

10w = (3, 0, 2,−1).

Since projS x 6= x, we see that x is not in the plane S. The distance from x to the plane S is

dist (x, S) = ‖x − projS x‖ = ‖(2, 1, 3,−2) − (3, 0, 2,−1)‖ = ‖(−1, 1, 1,−1)‖ = 2.

2


Find the orthogonal projection of x = (3, 4,−2) onto the subspace of R3 which has the orthonormal basis

v1 =1

3(2, 1, 2), v2 =

1√18

(1,−4, 1).

Then find the distance from x to the subspace of R3.

Solution Suppose S be the subspace of R3 spanned by v1 and v2. Just recall from Theorem 10.3.1(Lecture Notes, page 195) that we need an orthogonal basis for S before we can write down the orthogonalprojection. For this example, we are given an orthonormal basis which is in particular an orthogonal basis.For easier hand calculations, we may take

u1 = (2, 1, 2), u2 = (1,−4, 1)

be the orthogonal basis for S. That is, S = span (u1,u2) = plane in R3. We then use u1,u2 to constructthe orthogonal projection. The orthogonal projection of x onto S is

projS x =x · u1

u1 · u1u1 +

x · u2

u2 · u2u2 =

6

9u1 +

−15

18u2 = (

1

2, 4,

1

2).

Since projS x 6= x, we see that x is not in the span S. The distance from x to S is

dist (x, S) = ‖x − projS x‖ =

∥∥∥∥(3, 4,−2) − (1

2, 4,

1

2)

∥∥∥∥ =

√25

2=

5√

2

2.

2

Remark. In Examples 5.19–5.22, we find that each subspace S of Rn is always given by a span of somevectors in Rn and those vectors are already pairwise orthogonal. However, in general cases, vectors in Rn

are not necessarily orthogonal, and you need to first “orthogonalize” them to make them usable. We willsee an example later for this kind (Example 5.27, page 252).

248

Example 5.23 (Orthogonal diagonalization)

Orthogonally diagonalize the symmetric matrix

A =

1 0 −10 1 1−1 1 2

.

That is, find an orthogonal matrix Q and diagonal matrix D so that Q−1AQ = D.

Solution Recall the fact that all real symmetric matrices are diagonalizable. Hence, A is diagonalizableand should have a diagonalization A = PDP−1 for some invertible P and diagonal D. Normally, P isconstructed as a column partition matrix with eigenvectors of A as its columns. That is, P =

[v1 v2 v3

],

where v1, v2, v3 are eigenvectors of A. Now back to our problem, we need to orthogonally diagonalize A.The keyword here is “orthogonally”, so indeed we need to find an invertible matrix that is also an orthogonalmatrix. Here we use Q (instead of P) to represent this matrix for its orthogonal property. Now we need toguarantee the eigenvectors v1, v2, v3 of A form an orthonormal set (pairwise orthogonal unit vectors).

The characteristic equation of A is det (A − λI) = 0, or

det

1 − λ 0 −1

0 1 − λ 1−1 1 2 − λ

= λ(1 − λ)(λ − 3) = 0


· For λ1 = 0,

A − 0 · I =

1 0 −10 1 1−1 1 2

−→

1 0 −10 1 10 0 0

.

By solving (A − 0 · I)x = 0, x = (x1, x2, x3), we have x1 = x3 and x2 = −x3. Thus,

x = (x3, −x3, x3) = x3 (1,−1, 1),

we get the eigenvector v1 = (1,−1, 1).

· For λ2 = 1,

A − 1 · I =

0 0 −10 0 1−1 1 1

−→

1 −1 00 0 10 0 0

.


x = (x2, x2, 0) = x2 (1, 1, 0),


· For λ3 = 3,

A − 3 · I =

−2 0 −10 −2 1−1 1 −1

−→

1 0 1/20 1 −1/20 0 0

.

By solving (A − 3 · I)x = 0, x = (x1, x2, x3), we have x1 = −x3/2 and x2 = x3/2. Thus,

x = (−x3

2,

x3

2, x3) =

x3

2(−1, 1, 2),

we get the eigenvector v3 = (−1, 1, 2).

249


Verify thatv1 · v2 = 1 − 1 + 0 = 0,

v1 · v3 = −1 − 1 + 2 = 0,

v2 · v3 = −1 + 1 + 0 = 0.

“Luckily”, v1, v2, v3 are pairwise orthogonal and hence they form an orthogonal set. Since eigenvectorsmust be nonzero, v1, v2, v3 are linearly independent by (5.1) and hence they form an orthogonal basis ofR3. Obviously, v1, v2, v3 are not unit vectors. We need to first normalize them.

By ‖v1‖ =√

3, ‖v2‖ =√

2, ‖v3‖ =√

6, we further obtain an orthonormal basis of R3 consisting

x1 =v1

‖v1‖=

1√3

(1,−1, 1), x2 =v2

‖v2‖=

1√2

(1, 1, 0), x3 =v3

‖v3‖=

1√6

(−1, 1, 2).

Thus if we construct Q =[x1 x2 x3

], we can use it to diagonalize A such that Q−1AQ = D, where

Q =

1/

√3 1/

√2 −1/

√6

−1/√

3 1/√

2 1/√

6

1/√

3 0 2/√

6

, D =

0 0 00 1 00 0 3

.

Here we emphasize that Q is an orthogonal matrix (1) which satisfies Q−1 = Qt, (2) whose columns forman orthonormal basis of R3. Please review the equivalent statements in the remark (page 245). 2

Remark. For Example 5.23, we find that the three eigenvectors of A are already orthogonal by ourconstruction. However, in many cases, the eigenvectors are not orthogonal, we then need an algorithm toconvert them to orthogonal vectors. One method for this purpose is called Gram–Schmidt orthogonalizationprocess. We shall illustrate the details in the following. Suppose u1,u2, · · · ,un is a basis of a subspaceV. One can use this basis to construct an orthogonal basis v1,v2, · · · ,vn of V as follows. Set

v1 = u1,

v2 = u2 −u2 · v1

v1 · v1v1,

v3 = u3 −u3 · v1

v1 · v1v1 −

u3 · v2

v2 · v2v2,

...

vn = un − un · v1

v1 · v1v1 −

un · v2

v2 · v2v2 − · · · − un · vn−1

vn−1 · vn−1vn−1.

In other words, for k = 2, 3, · · · , n, we define

vk = uk − ck1 v1 − ck2 v2 − · · · − ck,k−1 vk−1,

where cki = (uk ·vi)/(vi ·vi) is the component of uk along vi. In fact, each vk is orthogonal to the preceedingv’s. Thus v1, v2, · · · , vn form an orthogonal basis for V as claimed. Normalizing each vk will then yieldan orthonormal basis for V. The above construction is the so-called Gram–Schmidt process. We have someremarks for this process. (1) Each vector vk is a linear combination of uk and the preceeding v’s. Henceone can easily show, by induction, that each vk is a linear combination of u1, u2, · · · , un. This accounts forspan (v1,v2,v3) = span (u1,u2,u3). (2) Since taking multiples of vectors does not affect orthogonality, itmay be simpler in hand calculations to clear fractions in any new vk, by multiplying vk by an appropriatescalar, before obtaining the next vk+1. (3) Suppose w1, w2, · · · , wm are linearly independent, and sothey form a basis for W = span (w1,w2, · · · ,wm). Applying Gram–Schmidt process to the w’s yields anorthogonal basis for W.

250

Example 5.24 (Gram–Schmidt Orthogonalization)

Apply Gram–Schmidt process to the following vectors to produce an orthogonal set.

u1 = (1, 0, 2), u2 = (1,−3, 7).

Solution Apply Gram–Schmidt process,

v1 = u1 = (1, 0, 2),

v2 = u2 −u2 · v1

v1 · v1v1 = (1,−3, 7) − 15

5(1, 0, 2) = (−2,−3, 1).

Now v1, v2 form an orthogonal set. Also, span (v1,v2) = span (u1,u2). 2


Find an orthonormal basis for the subspace of R4 spanned by

u1 = (0, 2, 1, 0), u2 = (1,−1, 0, 0), u3 = (1, 2, 0,−1).

Solution Let S be the subspace of R4 spanned by u1, u2, u3. We find that u1, u2, u3 are linearlyindependent because all three columns of

[u1 u2 u3

]are pivot:

[u1 u2 u3

]=

0 1 12 −1 21 0 00 0 −1

−→

1 0 0

0

1 0

0 0

10 0 0

.

Thus u1, u2, u3 indeed form a basis for S. However, we find that u1, u2, u3 are not pairwise orthogonal(u1 ·u2 6= 0). We then use Gram–Schmidt process to obtain an orthogonal set v1,v2,v3 from u1,u2,u3.We set v1 = u1 = (0, 2, 1, 0), and from

u2 −u2 · v1

v1 · v1v1 = (1,−1, 0, 0) − −2

5(0, 2, 1, 0) = (1, −1

5,

2

5, 0) =

1

5(5,−1, 2, 0),

we may take v2 = (5,−1, 2, 0), and from

u3 −u3 · v1

v1 · v1v1 −

u3 · v2

v2 · v2v2 = (1, 2, 0,−1) − 4

5(0, 2, 1, 0) − 3

30(5,−1, 2, 0) = (

1

2,

1

2, −1, −1),

we may take v3 = (1, 1,−2,−2).

Now the three vectors v1, v2, v3 form an orthogonal set. Since none of them are zero, v1, v2, v3 forman orthogonal basis for the subspace S such that S = span (u1,u2,u3) = span (v1,v2,v3). Finally, we mustnormalize them to obtain an orthonormal basis for S, that is,

(0,

2√5,

1√5, 0), (

5√30

, − 1√30

,2√30

, 0), (1√10

,1√10

, − 2√10

,− 2√10

)

.

2


Find an orthonormal set from

u1 = (1, 2, 1), u2 = (1, 3, 1), u3 = (2, 2, 1).

Solution We note that in some cases, we even do not need Gram–Schmidt process for generating anorthogonal set. For this example, u1, u2, u3 are linearly independent and form a basis of R3. In particular,u1, u2, u3 span R3. So one simple orthonormal set that spans R3 is the set of the standard basis, i.e.,

e1, e2, e3 = (1, 0, 0), (0, 1, 0), (0, 0, 1).2

251



Consider the subspace W spanned by

u1 = (1, 0, 2, 0), u2 = (1,−2, 2,−4), u3 = (−1, 1, 3, 2).

Find the orthogonal projection of (1, 1, 0, 0) onto W. Then find the distance from the vector to W.

Solution By row operations we may reduce the matrix[u1 u2 u3

]and find that all columns are pivot,

we know that u1, u2, u3 are linearly independent. Since they span W, they indeed form a basis for W.Apply Gram–Schmidt process,

v1 = u1 = (1, 0, 2, 0),

v2 = u2 −u2 · v1

v1 · v1v1 = u2 −

5

5v1 = (0,−2, 0,−4),

v3 = u3 −u3 · v1

v1 · v1v1 −

u3 · v2

v2 · v2v2 = u3 −

5

5v1 −

−10

20v2 = (−2, 0, 1, 0).

Then W = span (v1,v2,v3). Let x = (1, 1, 0, 0). Then the orthogonal projection of x onto W is

projW

x =x · v1

v1 · v1v1 +

x · v2

v2 · v2v2 +

x · v3

v3 · v3v3

=1

5v1 +

−2

20v2 +

−2

5v3 = (1,

1

5, 0,

2

5).

The distance from x to W is

dist (x, W) = ‖x − projW

x‖ =

∥∥∥∥(1, 1, 0, 0) − (1,1

5, 0,

2

5)

∥∥∥∥ =

∥∥∥∥(0,4

5, 0, −2

5)

∥∥∥∥ =2√5.

2


Verify that the following vectors

v1 = (1, 2,−1, 1), v2 = (−1, 1, 0,−1)

are orthogonal to each other. Extend v1, v2 to an orthogonal basis of R4.

Solution Verify that v1 ·v2 = −1 + 2 + 0− 1 = 0, so v1 and v2 are orthogonal. We need two independentvectors u3, u4 which are orthogonal to v1 and v2. Let u = (x, y, z, w) such that u · v1 = 0 and u · v2 = 0.This yields a homogeneous system. Let A be the coefficient matrix of the system. By doing row operations,

A =

[1 2 −1 1−1 1 0 −1

]−→

[

1 0 −1/3 1

0

1 −1/3 0

].

Here columns 3 and 4 are nonpivot and hence z and w are free variables. The general solution is given by

(x, y, z, w) = (z

3− w,

z

3, z, w) =

z

3(1, 1, 3, 0) + w (−1, 0, 0, 1).

Thus we may take

u3 = (1, 1, 3, 0), u4 = (−1, 0, 0, 1).

Since u3 and u4 are not orthogonal (u3 · u4 6= 0), we use Gram–Schmidt process to obtain

v3 = u3 = (1, 1, 3, 0),

u4 −u4 · v3

v3 · v3v3 = (−1, 0, 0, 1) − −1

11(1, 1, 3, 0) =

1

11(−10, 1, 3, 11), take v4 = (−10, 1, 3, 11).

Then the vectors v1, v2, v3, v4 form an orthogonal basis of R4. 2

252


Orthogonally diagonalize the symmetric matrix

A =

0 2 −12 3 −2−1 −2 0

.

That is, find an orthogonal matrix Q and diagonal matrix D so that Q−1AQ = D.


det

−λ 2 −12 3 − λ −2−1 −2 −λ

= (5 − λ)(λ + 1)2 = 0

which gives the eigenvalues λ1 = 5, λ2 = −1 (λ2 being a repeated eigenvalue).

· For λ1 = 5,

A − 5 · I =

−5 2 −12 −2 −2−1 −2 −5

−→

1 0 10 1 20 0 0

.

By solving (A − 5 · I)x = 0, x = (x1, x2, x3), we have x1 = −x3 and x2 = −2x3. Thus,

x = (−x3, −2x3, x3) = x3 (−1,−2, 1),

we get the eigenvector u1 = (−1,−2, 1).

· For λ2 = −1,

A − (−1) · I =

1 2 −12 4 −2−1 −2 1

−→

1 2 −10 0 00 0 0

.

By solving (A − (−1) · I)x = 0, x = (x1, x2, x3), we have x1 = −2x2 + x3. Thus,

x = (−2x2 + x3, x2, x3) = x2 (−2, 1, 0) + x3 (1, 0, 1),

we get the two eigenvectors u2 = (−2, 1, 0), u3 = (1, 0, 1).

Verify thatu1 · u2 = 2 − 2 + 0 = 0,

u1 · u3 = −1 + 0 + 1 = 0,

u2 · u3 = −2 + 0 + 0 = −2.

Since u2 · u3 6= 0, the vectors u1, u2, u3 are not orthogonal. We then use Gram–Schmidt process for theselast two vectors. Therefore, we take

v1 = u1 = (−1,−2, 1),

v2 = u2 = (−2, 1, 0),

and from

u3 −u3 · v2

v2 · v2v2 = (1, 0, 1) − −2

5(−2, 1, 0) = (

1

5,

2

5, 1) =

1

5(1, 2, 5), take v3 = (1, 2, 5).

Now, as we want, the vectors v1, v2, v3 form an orthogonal set. Since none of them are zero, v1, v2, v3 arelinearly independent by (5.1) and hence they form an orthogonal basis of R3.

253


By ‖v1‖ =√

6, ‖v2‖ =√

5, ‖v3‖ =√

30, we further obtain an orthonormal basis of R3 consisting

x1 =v1

‖v1‖=

1√6

(−1,−2, 1), x2 =v2

‖v2‖=

1√5

(−2, 1, 0), x3 =v3

‖v3‖=

1√30

(1, 2, 5).

Thus if we construct Q =[x1 x2 x3

], we can use it to diagonalize A such that

Q−1AQ = D,

where

Q =

−1/

√6 −2/

√5 1/

√30

−2/√

6 1/√

5 2/√

30

1/√

6 0 5/√

30

, D =

5 0 00 −1 00 0 −1

.

Here we emphasize that Q is an orthogonal matrix (1) which satisfies Q−1 = Qt, (2) whose columns forman orthonormal basis of R3. Thus, we also have

QtAQ = D.

2


In R3, find the distance from the point (1, 1, 1) to the plane

x1 + x2 + 2x3 = 0.

Solution Any vector in the subspace (plane) can be expressed as

(x1, x2, x3) = (−x2 − 2x3, x2, x3) = x2 (−1, 1, 0) + x3 (−2, 0, 1).

Hence, u1 = (−1, 1, 0) and u2 = (−2, 0, 1) form a basis of the subspace. Apply the Gram–Schmidt process,

v1 = u1 = (−1, 1, 0),

v2 = u2 −u2 · v1

v1 · v1v1 = (−2, 0, 1) − 2

2(−1, 1, 0) = (−1,−1, 1).

Thus the orthogonal projection of the vector x = (1, 1, 1) to the subspace is

x · v1

v1 · v1v1 +

x · v2

v2 · v2v2 =

0

2(−1, 1, 0) +

−1

3(−1,−1, 1) =

1

3(1, 1,−1).

Thus the distance from x to the plane is

∥∥∥∥(1, 1, 1) − 1

3(1, 1,−1)

∥∥∥∥ =

∥∥∥∥(2

3,

2

3,

4

3)

∥∥∥∥ =2√

6

3.

Alternatively, by the equation x1 + x2 + 2x3 = 0, we know that the vector z = (1, 1, 2) is perpendicular tothe plane. Let W = span (z). Then the distance is given by

‖projW

x‖ =∥∥∥x · zz · z z

∥∥∥ =

∥∥∥∥4

6(1, 1, 2)

∥∥∥∥ =2√

6

3.

2

254

Example 5.31 (Data fitting problem using a straight line)

Find a straight line y = c + mx that best fits the following set of data on the xy-plane.

(2, 1), (5, 2), (7, 3), (8, 3).

Solution When we can find a straight line y = c + mx passing through all the points, it will of course“best fit” the data. However, the corresponding system admits no solution.

c + 2m = 1,

c + 5m = 2,

c + 7m = 3,

c + 8m = 3

⇐⇒

1 21 51 71 8

[cm

]=

1233

is inconsistent.

In this example, the sum of the difference squares is

(c + 2m − 1)2 + (c + 5m − 2)2 + (c + 7m − 3)2 + (c + 8m − 3)2.

Note that a vector in Col A is b0 = (c+2m, c+5m, c+7m, c+8m), so the “sum of difference square” isexactly ‖b0 −b‖ and therefore the technique of the normal equation should give us the best approximatedsolution. Now, by direct computations, we have

AtA =

[4 2222 142

], At b =

[957

],

and the corresponding normal equation AtAx = At b has a unique solution ( 27, 5

14), so the straight line

that best fits the given set of data is

y =2

7+

5

14x.

Example 5.32 (Data fitting problem using a polynomial curve)

Find a polynomial of degree at most 2 that best fits the following set of data on the xy-plane.

(2, 1), (5, 2), (7, 3), (8, 3).

Solution A general polynomial of degree at most 2 can be represented as the form

y = a0 · 1 + a1 · x + a2 · x2.

When such a polynomial curve can pass through all the four points, it will of course “best fit” the data.However, the corresponding system admits no solution.

a0 · 1 + a1 · 2 + a2 · 22 = 1,

a0 · 1 + a1 · 5 + a2 · 52 = 2,

a0 · 1 + a1 · 7 + a2 · 72 = 3,

a0 · 1 + a1 · 8 + a2 · 82 = 3

⇐⇒

1 2 22

1 5 52

1 7 72

1 8 82

a0

a1

a2

=

1233

is inconsistent.

Again, the technique of normal equation will help. By direct computations, we have

AtA =

4 22 14222 142 988142 988 7138

, At b =

957393

.

We note that AtA is invertible, so the solution of the normal equation is

(AtA)−1 At b =

19132

1944

− 1132

.

255


This shows that the polynomial of degree at most 2 that best fits the data is

y =19

132+

19

44x − 1

132x2.

Example 5.33 (Data fitting problem using a general curve)

Find a curve in the form y = a0 + a1 sin x + a2 sin 2x that best fits the following set of data.

(π

6, 1), (

π

4, 2), (

π

3, 3), (

π

2, 3).

Solution The system we are considering is again an inconsistent system:

a0 · 1 + a1 · sinπ

6+ a2 · sin

2π

6= 1,

a0 · 1 + a1 · sinπ

4+ a2 · sin

2π

4= 2,

a0 · 1 + a1 · sinπ

3+ a2 · sin

2π

3= 3,

a0 · 1 + a1 · sinπ

2+ a2 · sin

2π

2= 3.

A =

1 sinπ

6sin

2π

6

1 sinπ

4sin

2π

4

1 sinπ

3sin

2π

3

1 sinπ

2sin

2π

2

.

By direct computations, we have

AtA =

41

2(3 +

√2 +

√3) 1 +

√3

1

2(3 +

√2 +

√3)

5

2

1

4(3 + 2

√2 +

√3)

1 +√

31

4(3 + 2

√2 +

√3)

5

2

,

At b =

9

7

2+

√2 +

3√

3

22 + 2

√3

.

As we are looking for approximated solution, exact calculation is not necessary. So, an approximated solutionfor the normal equation is

a0 ≈ −2.29169, a1 ≈ 5.31308, a1 ≈ 0.673095.

Then the best fitting curve is

y = (−2.29169) + (5.31308) sin x + (0.673095) sin 2x.

256

Chapter 6

Miscellaneous

6.1 Cross Product

257

6. Miscellaneous

6.2 Linear Transformation

258

6.3 Linear Operators and Similarity

6.3 Linear Operators and Similarity

259

6. Miscellaneous

6.4 Fourier Transformation

260

Chapter 7

Answers to True or FalseQuestions

1.1 False. A =

[0 11 0

].

1.2 False. A =

[1 0 00 0 0

], B =

0 00 00 1

.

1.3 False. A = B = C =

[0 01 0

].

1.4 False. Choose B 6= C and A = O.

1.5 False. A =

[1 00 0

], B =

[0 00 0

], C =

[0 00 1

].

1.6 False.

[0 10 0

]2

=

[0 00 0

].

1.7 True. (A+B)3 = (A+B)(A2 +AB+BA+B2) = A3 +A2B+ABA+AB2 +BA2 +BAB+B2A+B3 =

A3 + A2B + A2B + AB2 + A2B + AB2 + AB2 + B3 = A3 + 3A2B + 3AB2 + B3.

1.8 True. A is invertible ⇐⇒ AB = BA = I for some B. If A is m × n and B is n × m, then we must have

m = n.

1.9 True. Equivalent to 1.8.

1.10 True. A has zero determinant and hence not invertible.

1.11 False. A may have a zero column. Then A is not invertible.

1.12 True. AB is defined and square. det (AB) = (detA)(detB) 6= 0.

1.13 False. Let A invertible. Choose B = −A so that B is invertible. But then A + B = O is not invertible.

261

7. Answers to True or False Questions

1.14 False. Choose A, B non-square such as A =

[1 0 00 0 1

], B =

1 00 00 1

.

1.15 True. We prove it by contradiction. In the following proof,

we need so-called elementary matrices Ei, for instance, like

1 0 00 0 10 1 0

,

1 0 00 −6 00 0 1

,

1 0 00 1 0−4 0 1

. In

this case, these

three matrices correspond to the elementary row operations R2 ↔ R3, −6R2, −4R1 + R3, respectively. In

general, each row operation always has a corresponding elementary matrix which is also invertible. So, if A

is row equivalent to B by doing some row operations to A, then we also mean B = Es · · ·E2E1A for some

invertible E1, E2, · · · , Es.

Proof: Suppose the square matrix A is not invertible. Then A is not row equivalent to I =⇒ the row echelon

form of A must have a zero row =⇒ Es · · ·E2E1A has a zero row =⇒ Es · · ·E2E1AB = Es · · ·E2E1 =

invertible matrix also has a zero row =⇒ a contradiction. This proves A must be invertible and hence BA = I.

1.16 True. If A, B are square matrices, by 1.15, then AB = I ⇐⇒ BA = I.

1.17 False. Take A, B in 1.14. Then AB = I but BA has a zero row and hence not invertible.

1.18 False. A =

[2 41 2

]. A2 6= O but A not invertible (detA = 0).

1.19 True. Prove by contradiction. Suppose A2 = O. Then detA2 = detO = 0. Because detA2 = (detA)2

and hence detA = 0 and A is not invertible.

1.20 True. A is square and A (A + 7I) = I =⇒ A−1 = A + 7I, by 1.15.

1.21 True. If A is an m×n matrix, then At is n×m. If, furthermore, A is symmetric, then by definition, At = A

which implies that m = n. Otherwise, they won’t have the same size.

1.22 True. A symmetric ⇐⇒ At = A. Then (A−1)t = (At)−1 = A−1 and (A3)t = (At)3 = A3.

1.23 True. (2B)t = 2Bt = 2(AtA)t = 2At(At)t = 2AtA = 2B.

1.24 True. By definition, A symmetric ⇐⇒ At = A. In general we also have (A+B)t = At+Bt and (cA)t = cAt.

Hence, if A is symmetric, then for any polynomial f(x), the transpose of f(A) must equal to f(A) itself. Hence,

f(A) is symmetric.

1.25 False. A =

[1 00 1

], B =

[−1 00 −1

].

1.26 False. A = I. Then detA = 1, det (2A) = det

[2 00 2

]= 4 6= 2 · detA.

1.27 False. A = I. Then At = A.

1.28 False. (1) If B is obtained from A by interchanging any two rows of A, then detB = −detA. (2) If B is

obtained from A by multiplying a row of A by k, then detB = k · detA.

262

1.29 True. Let A be a square matrix. (1) If a multiple of one row of A is added to another row to produce a

matrix B, then detB = detA. (2) If two rows of A are interchanged to produce B, then detB = −detA.

(3) If one row of A is multiplied by k to produce B, then detB = k · detA.

1.30 False. A =

[1 23 4

]. Then A → I, and detA = −2 6= 1.

1.31 True. If B is obtained from A by either one of the three elementary row operations, then detB = k · detA,

where k 6= 0.

1.32 False. A =

[1 10 1

]. Then detA = 1. The correct statement should be:

The determinant of a (lower or upper) triangular matrix is the product of the entries on the diagonal.

1.33 False. A =

[1 11 1

]→

[1 10 0

].

1.34 False. A =

[1 11 1

], B =

[1 10 0

].

1.35 True. 1.31 shows that, if A is row equivalent to B, then their determinants are both zero or both nonzero.

Thus, A is invertible =⇒ detA 6= 0 =⇒ detB 6= 0 =⇒ B is invertible.

2.1 False. Take c = 0 and A = I. Statement is true for c 6= 0.

2.2 False. A =

[1 0 00 0 0

]

2×3

but rank A = 1.

2.3 False. A =

1 00 00 0

3×2

but rank A = 1.

2.4 True. An×n has rank n =⇒ A has full rank and A → In =⇒ detA 6= 0 =⇒ A is invertible.

2.5 True. Number of pivots 6 number of rows < number of columns =⇒ at least one column is nonpivot

=⇒ at least one free variable must exist in the general solution of Ax = 0 =⇒ nontrivial solution.

2.6 False. A =

1 00 10 0

. Then Ax = 0 has only trivial solution.

2.7 True. rankA = m =⇒ all rows of A are pivot =⇒ b is nonpivot =⇒ Ax = b has solutions for all b.

2.8 False. A =

[1 0 00 0 1

]. Then Ax = 0 has a nontrivial solution x = (0, y, 0).

2.9 True. rankA = 5 =⇒ all columns of A are pivot =⇒ no nonpivot column of A

=⇒ no free variable =⇒ Ax = 0 has only trivial solution =⇒ unique.

2.10 False. Suppose A is an 7 × 5 matrix with rankA = 5 and has a zero row at the bottom.

Also suppose b = (0, 0, 0, 0, 0, 0, 1). Then Ax = b has no solution.

2.11 True. Ax = b has solutions for all b =⇒ all rows of A are pivot =⇒ rankA = m.

263


2.12 False. A =

[1 0 00 0 1

]. Then Ax = b has solutions for all b, but rank A = 2.

2.13 True. Ax = b has solutions for all b =⇒ all rows of A are pivot =⇒ all columns of At are pivot

=⇒ no nonpivot column of At =⇒ no free variable =⇒ Atx = 0 has only trivial solution.

2.14 True. Ax = b has solutions for all b =⇒ all rows of A are pivot =⇒ rankA = m 6 n.

2.15 True. Ax = 0 has only trivial solution =⇒ all columns of A are pivot =⇒ rankA = n 6 m.

2.16 True. Ax = b has solutions for all b =⇒ all rows of A are pivot =⇒ rankA = m.

2.17 False. A =

[1 0 00 0 1

]. Then all rows of A are pivot =⇒ Ax = b has solutions for all b. But rankA = 2.

2.18 False. A =

1 00 10 0

. Then all columns of A are pivot =⇒ Ax = 0 has only zero solution. But rankA = 2.

2.19 True. Refer to 2.13.

2.20 False. A =[0 1

], B =

[10

].

2.21 False. The row echelon form of A may have a zero row =⇒ rankA < n =⇒ number of pivot columns of A <

n

=⇒ number of nonpivot columns of A > 1 =⇒ number of free variables > 1 =⇒ solution non-unique.

2.22 True. Solution unique =⇒ no free variable =⇒ no nonpivot column of A =⇒ all columns of A are pivot

=⇒ rankA = n =⇒ A invertible =⇒ A row equivalent to I =⇒ detA 6= 0 (use 1.31 with det I 6= 0).

2.23 True. Solution unique =⇒ no free variable =⇒ no nonpivot column of A =⇒

all columns of A are pivot =⇒ Ax = 0 has only trivial solution.

2.24 False. All rows of A are pivot 6=⇒ all columns of A are pivot. Consider A =

[

1 0 0

0

1 0

].

2.25 True. Trivial solution =⇒ no free variable =⇒ all columns of A are pivot =⇒ rankA = n.

2.26 True. Nontrivial solution ⇐⇒ number of free variables > 1 ⇐⇒ number of nonpivot columns of A > 1

⇐⇒ number of pivot columns of A < n ⇐⇒ rankA < n ⇐⇒ the row echelon form of A has a zero row

⇐⇒ A is row equivalent to a matrix (say B) that contains a zero row ⇐⇒ detA = 0 (since detB = 0).

2.27 True. Only trivial solution ⇐⇒ no free variable ⇐⇒ no nonpivot column of A

⇐⇒ all columns of A are pivot ⇐⇒ rankA = n (A is square) ⇐⇒ A is row equivalent to I.

2.28 True. By 2.27, A is row equivalent to I. It follows from 1.31 that detA 6= 0 (since det I 6= 0).

2.29 False. Number of free variables = n − rankA = 5 − 2 = 3 6= 2.

264

2.30 True. Number of variables = number of free variables + rankA =⇒ rankA = 17 − 8 = 9.

2.31 False. All rows of A are pivot 6=⇒ all columns of A are pivot. Consider A −→

1 0 0

0

1 0

0 0 0

1

.


2.33 False. A =

[1 0 00 1 0

]

2×3

. All rows of A are pivot =⇒ Ax = b has solutions for all b.

2.34 True. rank[A b

]> rankA =⇒ rank

[A b

]= rankA+1 =⇒ b is pivot =⇒ Ax = b has no solution.

2.35 False. The 5th column of[A b

]3×5

is pivot =⇒ b is pivot =⇒ Ax = b has no solution.

2.36 True. Ax = 0 has only trivial solution =⇒ no free variable =⇒ no nonpivot column of A =⇒

all columns of A are pivot =⇒ all rows of A are pivot (∵ A square) =⇒ Ax = b has solutions for all b.

2.37 False. A =

1 01 01 0

→

1 00 00 0

. Then Ax = 0 has a nontrivial solution x = (0, y).

2.38 True. Only trivial solution =⇒ no free variable =⇒ all columns of A are pivot =⇒

all rows of A are pivot =⇒ A invertible =⇒ Ax = b has a unique solution (which is x = A−1b).

2.39 True.[A b

]invertible =⇒

[A b

]is row equivalent to I =⇒ b is pivot =⇒ Ax = b has no solution.

2.40 True. Ax = b has no solution for some b =⇒ not all rows of A are pivot =⇒ rankA <

number of rows of A = number of columns of A =⇒ not all columns of A are pivot =⇒ at least one

free variable =⇒ Ax = 0 has a nontrivial solution.

2.41 False.[A b

]=

[1 0 00 0 1

]in which b is pivot.

2.42 False. There always exists an nonzero vector x such that Bx = 0. Hence, ABx = 0. AB cannot be

invertible.

3.1 False. Let e1, e2 be the standard basis of R2. Take v1,v2 = e1, e2 and w1,w2 = −e1,−e2. Then

v1,v2 and w1,w2 are linearly independent. But then v1 + w1,v2 + w2 = 0,0 is of course linearly

dependent.

3.2 False. Take v1,v2 = e1,0 and w1,w2 = 0, e2. Then v1,v2 is linearly dependent because 0 ·e1 +

1 ·0 = 0, and w1,w2 is linearly dependent because 1 ·0+0 ·e2 = 0. But then v1 +w1,v2 +w2 = e1, e2

is of course linearly independent.

3.3 False. Take v1,v2 = (1, 0), (0, 1) and v2,v3 = (0, 1), (1, 1). Then v3 = v1 + v2.

265


3.4 True. If any nonempty subset of S = v1,v2, · · · ,vn+k is linearly dependent, then S must also be linearly

dependent.

3.5 False. (0, 1), (1, 0), (1, 1) are linearly dependent but (0, 1), (1, 0) are linearly independent.

3.6 False. Take v1 = (1, 0), v2 = (0, 1), v3 = (1, 1). Then three vectors in R2 must be linearly dependent.

3.7 True. Follows from 3.4.

3.8 False. Take v1 = (1, 1, 1), v2 = (2, 2, 2) in R3. Then S = v1,v2 is linearly dependent.

3.9 False. A =

[1 0 00 1 0

]with rankA = 2. A has a zero column =⇒ not all columns of A are pivot =⇒

columns of A are linearly dependent. 0 · c1 + 0 · c2 + 1 · c3 = 0, cj are columns of A.

3.10 True. rank A = n =⇒ all columns of A are pivot =⇒ columns of A are linearly independent.

3.11 True. rank A = m =⇒ all rows of A are pivot =⇒ rows of A are linearly independent.

3.12 False. A =

1 00 10 0

with rankA = 2. A has a zero row =⇒ not all rows of A are pivot =⇒ rows of A

are linearly dependent.

3.13 False. The columns of a matrix A are linearly independent if and only if the equation Ax = 0 has only the

trivial solution.

3.14 False. If a set contains more vectors than there are entries in each vector, then the set is linearly dependent.

That is, any set v1,v2, · · · ,vm in Rn is linearly dependent if m > n.

3.15 True. rank A = m =⇒ all rows of A are pivot =⇒ Ax = b has solutions for all b ∈ Rm =⇒ b is a

linear combination of columns of A for all b =⇒ columns of A span Rm.

3.16 False. A =

1 00 10 0

with rankA = 2. We can find b =

001

∈ R3 such that b is not a linear combination

of

100

,

010

=⇒ columns of A do not span R3.

3.17 False. Let e1, e2 be the standard basis of R2. Then (1, 1) ∈ span e1, e2 but (1, 1) 6∈ span e1.

3.18 True. b = a1v1 + · · · + anvn for some ai. Then b = a1v1 + · · · + anvn + 0vn+1.

3.19 True. If A =[v1 v2 · · · vn

], with the columns in Rm, then ColA is the same as span v1,v2, · · · ,vn.

The column space of an m × n matrix is a subspace of Rm.

3.20 True. If A is an m × n matrix, each row of A has n entries and thus can be identified with a vector in Rn.

The set of all linear combinations of the row vectors is called the row space of A and is denoted by RowA.

Each row has n entries, so RowA is a subspace of Rn. Since the rows of A can be identified with the columns

of At, we could also write ColAt in place of RowA.

266

3.21 True. Take A =[v1 v2 · · · vn

]. Then A is square. Now, v1, v2, · · · , vn span Rn =⇒ all rows of A

are pivot =⇒ rankA = n =⇒ all columns of A are pivot =⇒ v1,v2, · · · ,vn is linearly independent.

3.22 True. Follows from 3.21.

3.23 True. Columns of A are linearly independent =⇒ all columns of A are pivot =⇒ all rows of At are pivot

=⇒ rankAt = 3. Then for any b ∈ R3, rank[At b

]3×6

= rankAt = 3 =⇒ b is a linear combination of

columns of At for any b ∈ R3 =⇒ b is a linear combination of rows of A for any b ∈ R3 =⇒ rows of A

span R3.

3.24 True. If v1,v2, · · · ,vn is linearly independent, then v1,v2, · · · ,vn is a basis of the span V. Thus,

dim V = number of vectors in a basis of V = n. If v1,v2, · · · ,vn is linearly dependent, then dim V < n.

3.25 True. For b ∈ V, there exist scalars c1, c2, · · · , cn such that b = c1v1 + c2v2 + · · · + cnvn = c1v1 + c2v2 +

· · · + cnvn + 0vn+1 + · · · + 0vn+k =⇒ b can be expressed as a linear combination of v1, v2, · · · , vn, vn+1,

· · · , vn+k. In other words, v1, v2, · · · , vn, vn+1, · · · , vn+k span V.

3.26 False. v1 = (1, 0, 0), v2 = (0, 1, 0) do not span R3, but v1 = (1, 0, 0), v2 = (0, 1, 0), v3 = (0, 0, 1) span R3.

3.27 False. The vectors can be linearly dependent. Let e1, e2, e3 be the standard basis of R3. Then

e1, e2, e1 + e2 span a plane in R3 but e1, e2, e1 + e2 is not a basis (vectors being linearly depen-

dent).

3.28 True. Let Bn×n =[Av1 · · · Avn

]= A

[v1 · · · vn

]= AT, where T =

[v1 · · · vn

]n×n

. By

given, v1, · · · ,vn is a basis of Rn =⇒ all rows as well as all columns of T are pivot =⇒ T is row

equivalent to In =⇒ T is invertible. Hence, B is also invertible. It follows that, for Av1, · · · ,Avn ∈ Rn,

Av1,Av2, · · · ,Avn is a basis of Rn.

3.29 True. Suppose c1 + c2 sin t + c3 cos t = 0. Then t = 0 =⇒ c1 + c3 = 0; t = π/2 =⇒ c1 + c2 = 0; t = π

=⇒ c1 − c3 = 0. Hence, c1 = c2 = c3 = 0. By definition, the functions are linearly independent.

3.30 False. Linear dependence relation is: 1 − sin2 t − cos2 t = 0.

3.31 True. Since A0 = 0, the equation Ax = 0 always has the zero solution x = 0.

3.32 False.

[1 0 00 0 1

]x = 0 has a nontrivial solution x =

010

.

3.33 False. The null space of an m × n matrix A is a subspace of Rn. Equivalently, the set of all solutions of a

system Ax = 0 of m homogeneous linear equations in n unknowns is a subspace of Rn.

3.34 True. The columns of an invertible n × n matrix from a basis for Rn

because they are linearly independent (∵ all columns of A are pivot) and span Rn (∵ all rows of A are pivot).

3.35 True. rankA = 5 =⇒ all columns of A are pivot =⇒ no nonpivot column of A =⇒ no free variable

=⇒ Ax = 0 has only trivial solution =⇒ NulAdef.= x : Ax = 0 = 0 =⇒ NulA has no basis =⇒

dim (NulA) = 0.

267


3.36 False. By definition, the dimension of NulA is the number of vectors in a basis of NulA. The number of

vectors in a basis of NulA is equal to the number of nonpivot columns of A which must be strictly smaller

than the number of columns of A (in other words, which must be strictly smaller than the number of variables).

3.37 True. Since the pivot columns of A form a basis for ColA, the dimension of ColA is just the number of

pivot columns of A, that is the rank of A.

3.38 True. The nonpivot columns of A correspond to the free variables in Ax = 0. Thus the dimension of

NulA is the number of nonpivot columns of A. Since the number of pivot columns plus the number of

nonpivot columns of A is exactly the number of columns, the dimensions of ColA and NulA have the useful

connection: If a matrix A has n columns, then rankA + dim NulA = n.

3.39 True. Ax = 0 has only trivial solution =⇒ no free variable =⇒ no nonpivot column of A =⇒ all columns

of A are pivot =⇒ columns of A are linearly independent =⇒ columns of A form a basis of ColA.

3.40 True. Ax = 0 has only trivial solution =⇒ all columns of A are pivot =⇒ rankA = number of columns of A.

3.41 False. A =

1 00 10 0

. Then Ax = 0 has only trivial solution but rankA = 2 6= 3.

3.42 True. Columns of A span Rm =⇒ b is a linear combination of columns of A for all b ∈ Rm =⇒ Ax = b

has solutions for all b ∈ Rm.

3.43 True. Columns of A do not span Rm =⇒ b cannot not be a linear combination of columns of A for some

b ∈ Rm =⇒ Ax = b has no solution for some b ∈ Rm.

3.44 True. Ax = b has no solution for some b in Rm =⇒ b is pivot in[A b

]for some b in Rm =⇒ not

all rows of A are pivot.

3.45 False. A =

[1 0 10 1 1

]and b =

[b1b2

]. Then Ax = b has solutions for all b but columns of A are linearly

dependent (column 3 = column 1 + column 2). Thus, columns of A do not form a basis of ColA.

3.46 False. A =

1 1 01 1 00 0 1

−→

1 1 0

0 0

1

0 0 0

= B. However, the pivot columns (the first and third) of B do

not span ColA because, for example, (0, 0, 1) cannot be a linear combination of the pivot columns of B.

3.47 False. Ax = b has solutions for all b =⇒ all rows of A are pivot =⇒ rankA = number of rows of A 6=⇒

rankA = number of columns of A. Take A =

[1 0 00 1 0

].

3.48 True. Ax = b has solutions for all b =⇒ all rows of A are pivot =⇒ rankA = number of rows of A.

3.49 False.[A b

]is invertible =⇒

[A b

]is row equivalent to I =⇒

b is pivot =⇒ Ax = b has no solution =⇒ b is not a linear combination of columns of A =⇒ b 6∈ ColA.

3.50 True. rankA = 3 =⇒ all rows of A are pivot =⇒ Ax = b has solutions for all b =⇒ b ∈ ColA.

268

3.51 True. rank[A b

]= rankA =⇒ b is nonpivot =⇒ Ax = b has solutions =⇒ b ∈ ColA.

3.52 True. Columns of A are linearly dependent =⇒ not all columns of A are pivot =⇒ rankA < 4 =⇒ row

echelon form of A has a zero row =⇒ detA = 0 (by 1.31) =⇒ A is not invertible.

3.53 True. A is not invertible =⇒ A is not row equivalent to I =⇒ row echelon form of A has a zero row =⇒

rankA < 4 =⇒ not all columns of A are pivot =⇒ columns of A are linearly dependent.

3.54 True. rankA = 3 =⇒ all rows of A are pivot =⇒ columns of A span R3 =⇒ b ∈ ColA for all b ∈ R3

=⇒ ColA ⊇ R3 =⇒ ColA = R3.

3.55 True. rankA = 3 =⇒ rankAt = 3 =⇒ ColAt = R3 (by 3.54) =⇒ RowA = R3.

3.56 False. Choose v1 6= 0 and v2 = 0.

3.57 False. Choose v4 = −v3.


3.59 True. Let B = b1,b2, · · · ,bn be a basis of V. Suppose that B1 = d1,d2, · · · ,dm is another basis of V.

Recall that a basis is a maximal independent set as well as a minimal spanning set. Now, since B1 is linearly

independent, we have m 6 n. On the other hand, B1 is a basis and B is linearly independent, we have n 6 m

as well. Thus, m = n.

3.60 False. v = 0 =⇒ span v = 0 contains origin only.

3.61 False. Nonzero u, v in R3 such that u ‖ v =⇒ span u,v = span u = a line.

4.1 True. The characteristic equation det (A − λI) = 0 is indeed a polynomial equation of degree n which has

at most n roots (eigenvalues).

4.2 True. Any polynomial equation has at least one root.

4.3 True. By definition, an eigenvector v satisfying Av = λv must be a nonzero vector.

4.4 False. It is permissible to have zero eigenvalue.

[1 00 0

]has eigenvalues 0, 1.

4.5 False. A =

[0 10 0

]has repeated eigenvalue 0.

4.6 False. A =

[1 10 1


4.7 False. A =

[1 00 0

]has eigenvalues 0, 1.

269


4.8 False. A =

[0 10 0


4.9 False. The correct statement should be : If A2 = O, then 0 is the only eigenvalue of A. Let λ be the

eigenvalue of A with eigenvector v. Then Av = λv, A2v = A(λv) = λ2v. Since A2 = O and v 6= 0, then

λ = 0 is the only eigenvalue of A.

4.10 True. By det (At − λI) = det (A − λI)t = det (A − λI), then A and At must have the same eigenvalues

since they have exactly the same characteristic polynomial.

4.11 False. A =

[0 11 0

]has eigenvalues ±1. But −1 is not an eigenvalue of A2 = I.

4.12 False. A =

[1 00 −1

], B =

[−1 00 1

]. Then 1 is an eigenvalue of A and B but 1 is not an eigenvalue of

A + B = O.

4.13 False. Take A, B in 4.12. Then 1 is not an eigenvalue of AB =

[−1 00 −1

].

4.14 False. Take A, B in 4.12. Then −1 is an eigenvalue of A and −1 is an eigenvalue of B, but 1 = (−1)(−1) is

not an eigenvalue of AB.

4.15 True. Av = λv =⇒ A−1 · Av = A−1 · λv =⇒ λ−1v = A−1v =⇒ λ−1 is an eigenvalue of A−1 with

the same eigenvector.

4.16 True. Av = λv =⇒ A−1 ·Av = A−1 · λv =⇒ v = λ A−1v. Since A is invertible, by 4.29, 0 is never an

eigenvalue. Thus, λ 6= 0. Hence, λ−1v = A−1v =⇒ λ−1 is an eigenvalue of A−1 with the same eigenvector.

4.17 False. A =

[0 10 0

]and u =

[10

], v =

[−10

].

4.18 True. If u is an eigenvector of A and B, then Au = λu and Bu = µu, for some λ, µ =⇒ (A+B)u = (λ+µ)u.

4.19 True. If u is an eigenvector of A and B, then Au = λu and Bu = µu, for some λ, µ =⇒ (AB)u =

A(Bu) = A(µu) = µ(Au) = (µλ)u.

4.20 False. A = B =

[1 01 1

], u =

[01

]and v =

[0−1

]. But u + v = 0 can never be an eigenvector.

4.21 False. A =

[0 11 0

]and v =

[01

].

4.22 True. v is an eigenvector of A =⇒ Av = λv for some λ =⇒ A2v = A(Av) = A(λv) = λ(Av) = λ2v.

4.23 False. A =

[0 10 0

]and v =

[10

].

4.24 True. (2A)v = 2Av = 2λv = (2λ)v.

270

4.25 True. A(2v) = 2Av = 2λv = λ(2v).

4.26 False. A =

[1 00 −1

]has eigenvalues ±1, but 0 = −1 + 1 is not an eigenvalue of A.

4.27 True. (A2 + 3AB)v = A(Av) + 3A(Bv) = A(λv) + 3A(µv) = λ(Av) + 3µ(Av) = λ(λv) + 3µ(λv) =

(λ2 + 3λµ)v.

4.28 True. Suppose 0 is an eigenvalue of An×n. Then 0 is a root of the characteristic equation =⇒ det (A−0In) =

0 =⇒ detA = 0 =⇒ A is not invertible.

4.29 True. Suppose A is invertible. Then detA 6= 0 =⇒ λ = 0 does not satisfy the characteristic equation

det (A − λI) = 0 =⇒ 0 is not an eigenvalue of A.

4.30 True. On the contrary, suppose An×n is not invertible. Then detA = 0 =⇒ det(A−λIn) = 0, λ = 0 =⇒

λ = 0 is a root of the characteristic equation =⇒ 0 is an eigenvalue of A.

4.31 False. By 4.25, v is an eigenvector of A =⇒ 2v is an eigenvector of A.

4.32 True. We need to prove that c1u + c2v = 0 =⇒ c1 = c2 = 0. Proof: c1 Au + c2 Av = A0 = 0. Since

u and v are eigenvectors, we have c1 λu + c2 µv = 0. Since λ and µ cannot be both zero, without the loss of

generality, let us suppose λ 6= 0. Solving the equation with c1 λu + c2 λv = 0 gives c2 (λ − µ)v = 0. Since

v 6= 0 and λ 6= µ, then c2 must be zero and hence c1 must also be zero (u 6= 0).

4.33 True. A has n distinct eigenvalues =⇒ the corresponding n eigenvectors are linearly independent =⇒ the

n eigenvectors form a basis of Rn.

4.34 True. For example, if A =

1 2 32 4 −13 −1 5

, then the characteristic equation det (A − λI) = −25 − 15λ +

10λ2 − λ3 = 0 has no repeated root. Hence all eigenvalues of A are distinct and A is diagonalizable.

4.35 True. For any real A, the matrix AtA is real symmetric: (AtA)t = At(At)t = AtA. It follows from 4.34

that AtA is diagonalizable.

4.36 False.

[1 10 2

]is diagonalizable but not symmetric.

4.37 True. Diagonal matrix D always has a diagonalization: D = IDI−1, where I is an identity matrix.

4.38 True. A =[a]

is always diagonal. Then A = IAI−1 is a diagonalization of A.

4.39 True. A is diagonalizable =⇒ ∃ invertible P and diagonal D such that P−1AP = D =⇒ P−1A−1P =

D−1, which is diagonal. Also remark that, for example, if D = diag (λ1, λ2, λ3) then D−1 =

diag (1/λ1, 1/λ2, 1/λ3).

4.40 True. A is diagonalizable =⇒ ∃ invertible P and diagonal D such that P−1AP = D =⇒ PtAt(Pt)−1 =

Dt = D.

4.41 True. A is diagonalizable =⇒ ∃ invertible P and diagonal D such that P−1AP = D =⇒ P−1A3P = D3,

which is diagonal. Also remark that, for example, if D = diag (λ1, λ2, λ3) then D3 = diag (λ31, λ3

2, λ33).

271


4.42 True. Similar proof in 4.41. In fact, if A is diagonalizable, then An is also diagonalizable, for n = ±1,±2, · · · .

4.43 False. A =

[−1 00 1

], B =

[−1 10 1

]are diagonalizable. But AB =

[1 −10 1

]is not.

4.44 False. A =

[0 10 0

]has repeated eigenvalue λ = 0 =⇒ A is not diagonalizable. But A3 = O is diagonal

and hence diagonalizable.

4.45 False. Take A in 4.44.

4.46 True. Distinct eigenvalues λ1, λ2, · · · , λn =⇒ corresponding eigenvectors v1, v2, · · · , vn linearly in-

dependent =⇒ P =[v1 v2 · · · vn

]is invertible =⇒ A = PDP−1 is a diagonalization, where

D = diag (λ1, λ2, · · · , λn).

4.47 False. A =

0 1 11 0 11 1 0

=

1 0 10 1 1−1 −1 1

−1 0 00 −1 00 0 2

1 0 10 1 1−1 −1 1

−1

.

4.48 False. A may have repeated eigenvalues but enough number of eigenvectors to form a diagonalization. Take

the 3 × 3 matrix A in 4.47. Then A has less than 3 distinct eigenvalues (−1 is repeated) but it is still

diagonalizable.

4.49 False. A =

[1 00 1

], B =

[1 10 1

]have the same eigenvalues (repeated eigenvalue 1). But here only A is

diagonalizable (A is diagonal and by 4.37).

4.50 False. A =

[1 00 −1

], B =

[1 10 1

]. Then A is invertible (detA 6= 0) and diagonalizable (A is diagonal and

by 4.37), B is not diagonalizable (2×2 matrix with repeated eigenvalue), but AB =

[1 10 −1

]is diagonalizable

(AB is upper-triangular with distinct diagonal entries).

4.51 False. A =

[1 00 0

]is diagonal.

4.52 False. A =

[1 10 1


5.1 False. Take u = e1, v = e2, w = 2e1, where e1, e2, e3 are standard basis of R3.

5.2 False. (u1 +u2) · (v1 +v2) = u1 ·v1 +u1 ·v2 +u2 ·v1 +u2 ·v2 = 0+u1 ·v2 +u2 ·v1 +0 6= 0, in general.

5.3 True. u ⊥ v and u ⊥ w =⇒ u · v = 0 and u · w = 0 =⇒ u · (v + w) = 0 =⇒ u ⊥ (v + w).

5.4 True. (u + v) · w = u · w + v · w = 0 + 0 = 0.

5.5 True. (2u − 3v) · w = 2u · w − 3v · w = 2 · 0 − 3 · 0 = 0.

272

5.6 True. au · bv = abu · v = ab · 0 = 0.

5.7 True. u · u = 0 =⇒ ‖u‖2 = 0 =⇒ ‖u‖ =√

x21 + x2

2 + · · · + x2n = 0 =⇒ x1 = x2 = · · · = xn = 0 =⇒

u = 0.

5.8 True. u ∈ V is orthogonal to all vectors in V, so in particular, u is orthogonal to a1, · · · ,an, where

a1, · · · ,an is an orthogonal basis of V. Then u = c1a1 + · · · + cnan for some ci. From u · a1 =

(c1a1 + · · · + cnan) · a1 = c1‖a1‖2 = 0 =⇒ c1 = 0. Similarly, c2 = c3 = · · · = cn = 0. Hence, u = 0.

5.9 False. Take u1 = e1, u2 = e2 and v1 = e2, v2 = e1.

5.10 True. 0 · ui = 0, for all i =⇒ the new set is still pairwise orthogonal.

5.11 False. (1, 0), (1, 1).

5.12 False. (1, 0), (0, 0).

5.13 True. If W = v1,v2, · · · ,vn,vn+1, · · · ,vn+m is orthonormal, then they are mutually orthogonal and

‖vi‖ = 1 for all i. Now v1,v2, · · · ,vn is only a subset of W, it must be orthonormal.

5.14 False. r can be negative number.

5.15 True. ‖u− v‖2 = (u− v) · (u− v) = u · u− u · v − v · u + v · v = ‖u‖2 + ‖v‖2 − 2u · v = ‖u‖2 + ‖v‖2 =⇒

u · v = 0 =⇒ u ⊥ v.

5.16 True. u ⊥ v =⇒ u · v = v · u = 0 =⇒ ‖u − v‖2 = (u − v) · (u − v) = u · u − u · v − v · u + v · v =

u · u + v · v = ‖u‖2 + ‖v‖2.

5.17 False. A =

[1 01 0

]of which columns are orthogonal but rows are not.

5.18 True. Columns of A are orthonormal =⇒ AtA = I =⇒ AAt = I (by 1.15) =⇒ (At)t(At) = I =⇒

columns of At are orthonormal =⇒ rows of A are also orthonormal.

5.19 True. Rows of A are orthonormal =⇒ columns of At are orthonormal =⇒ rows of At are orthonormal

(by 1.18) =⇒ columns of A are orthonormal.

5.20 False. u = (1, 0), v = (0, 1), w = (1, 1). Then w = u + v.

5.21 False. v1 = (1, 0, 0), v2 = (0, 1, 0), v3 = (0, 0, 0). Then 0 · v1 + 0 · v2 + 1 · v3 = 0.

5.22 True. If v1,v2, · · · ,vk are orthonormal, then v1,v2, · · · ,vk is an orthogonal set of nonzero vectors.

Hence, a1v1 + · · · + akvk = 0 =⇒ (a1v1 + · · · + akvk) · v1 = a1‖v1‖2 = 0 =⇒ a1 = 0. Similarly, a2 = a3 =

· · · = ak = 0. By definition, v1,v2, · · · ,vk is a linearly independent set.

5.23 True. Note for α, β 6= 0, (αa) · (βb) = (αβ)a · b. So, αa ⊥ βb =⇒ a ⊥ b. Hence, u,v,w is also an

orthogonal basis.

273


5.24 True. u, v, w are nonzero orthogonal. Then (2u) · (−3v) = −6u · v = 0, (2u) · (5w) = 10u · w = 0,

(−3v) · (5w) = −15v · w = 0.

5.25 False. By definition, an orthogonal matrix is a square invertible matrix U such that U−1 = Ut. In fact,

any square matrix with orthonormal columns is an orthogonal matrix.

5.26 True. By definition.

5.27 True. U orthogonal =⇒ columns of U form a basis of Rn =⇒ number of columns of U = dimension of

Rn = n = number of rows of U.

5.28 True. U is an orthogonal matrix =⇒ U−1 = Ut. Let V = U2 =⇒ V−1 = (U2)−1 = (U−1)2 = (Ut)2 =

(U2)t = Vt =⇒ V is an orthogonal matrix.

5.29 True. (UV)t(UV) = (VtUt)(UV) = Vt(UtU)V = VtV = I.

5.30 True. (−U)t(−U) = (−Ut)(−U) = UtU = I.

5.31 True. (Ut)t(Ut) = UUt = I.

5.32 True. (U−1)t(U−1) = (Ut)−1U−1 = (UUt)−1 = I−1 = I.

5.33 False. Let U orthogonal. Choose V = −U. Then V orthogonal but U + V = O non-orthogonal.

5.34 True. (UV−1)t(UV−1) = (V−1)t(UtU)V−1 = (Vt)−1V−1 = (VVt)−1 = I−1 = I.

5.35 True. (U2V3)−1 = (V−1)3(U−1)2 = (Vt)3(Ut)2 = (U2V3)t =⇒ U2V3 is an orthogonal matrix.

5.36 True. Write Un×n =[v1 v2 · · · vn

], vi ∈ Rn. Then

UtU =

vt1...

vtn

[v1 · · · vn

]=

v1 · v1 v1 · v2 · · · v1 · vn

v2 · v1 v2 · v2 · · · v2 · vn

..

.vn · v1 vn · v2 · · · vn · vn

.

If v1, · · · ,vn is an orthonormal set, then vi ·vj =

1, if i = j,0, if i 6= j.

Hence UtU = In, which is diagonal.

5.37 True. By 1.19 and 1.36.

5.38 False. U =

[0 10 1

]. Then UtU =

[0 00 2

], but columns of U are linearly dependent.

5.39 False. U =

[0 10 1

]. Then UtU =

[0 00 2

]is diagonal but the columns of U are linearly dependent (∴

they cannot form a basis of R2, although they can still form an orthogonal set).

5.40 True. If W = span a1, · · · ,ak, where a1, · · · ,ak is an orthonormal basis of W, then projW y =k∑

i=1

y·ai

ai·aiai. So, projW (u + v) =

k∑i=1

(u+v)·ai

ai·aiai =

k∑i=1

u·ai

ai·aiai +

k∑i=1

v·ai

ai·aiai = projW u + projW v.

274

5.41 True. By 1.40, we have projW (2u+3v) = projW (2u)+projW (3v). So we only need to show projW (λy) =

λ projW y. Now, projW (λy) =k∑

i=1

(λy)·ai

ai·aiai = λ

k∑i=1

y·ai

ai·aiai = λ projW y.

5.42 True. z · u1 = 0 and z · u2 = 0 =⇒ z · (c1u1 + c2u2) = 0 for any c1 and c2. Thus, z ∈ W⊥.

5.43 True. projS (2u) = 2 projS u = 2x =⇒ projS⊥

(2u) = 2u − projS (2u) = 2u − 2x.

5.44 True. By 1.40, projS (u + v) = projS u + projS v = x + y.

5.45 False. Choose u1, u2 linearly dependent. Then, span u1,u2 = span u1.

5.46 True. Let a1, · · · ,an be the orthonormal basis of W. Then u ∈ W =⇒ u =n∑

j=1λjaj . So, u · ak =

n∑j=1

λj(aj · ak) = λk =⇒ projaku = u·ak

‖ak‖2 ak = λkak =⇒ projW u =n∑

k=1

λkak = u.

5.47 True. Suppose W has orthogonal basis v1,v2, · · · ,vk. Then 10y− 10y·v1v1·v1

v1−10y·v2v2·v2

v2−· · ·− 10y·vk

vk·vkvk =

10(y − y·v1

v1·v1v1 − y·v2

v2·v2v2 − · · · − y·vk

vk·vkvk

)= 10y.

5.48 False. A =

[1 0 00 0 1

]. Then ColA ⊂ R2 but NulA ⊂ R3. That is, the column space and the null space

are subspaces of different Euclidean spaces. The dot product of one vector in R2 with another vector in R3 is

meaningless.

5.49 True. Consider the row partition of A in which rj ’s are row vectors in A. Am×nx =

r1

r2

.

.

.rm

m×n

x =

r1 · xr2 · x

.

.

.rm · x

m×1

= 0 gives rj · x = 0 for all j =⇒ rj ⊥ x for all j.

5.50 False. Take A in 1.48. Then ColA ⊂ R2 but RowA ⊂ R3. That is, the column space and the row space

are subspaces of different Euclidean spaces. The dot product of one vector in R2 with another vector in R3 is

meaningless.

275

INDEX

Index

Addition of Matrices, 3Augmented matrix, 1

Basis, 145, 148

Characteristic equation, 183Column space, 136

Determinant(s), 179Determine independence, 143Diagonalizable matrix, 187Dot product, 219

Eigenvalues, 188Eigenvectors, 188Elementary matrix, 58

Fundamential solutions, 62

Gaussian elimination, 46Gram-Schmidt orthogonalization, 229

Homogeneous system, 61

Inconsistent, 43Indentity matrix, 5Inner product, 219Inverse, 9Invertible matrix, 10

Linear combination, 4, 142Linearly dependent, 142Linearly independent, 141, 143

Matrix multiplication, 4Maximal independent vectors, 144Minimal spanning vectors, 140

Norm, 219Normal equations, 232Null space, 131, 149Number of pivots, 17

Orthogonal projection, 228Orthogonal set, 222Orthogonal vectors, 221

Orthogonality, 222Orthonormal basis, 224Orthonormal set, 222

Partitiioned matrix, 8Partitioned matrix, 15, 16Pivot, 12Pivot collection, 144Pythagorean theorem, 222

Rank, 17Reduced row echelon form, 12Row space, 136

Scalar multiplication, 15Span, 136Standard basis, 145Subspace, 128, 129, 132Symmetric matrix, 14

Transpose, 14, 16Trivial solution, 61

Vector space, 132Vectors, 125

Zero matrix, 3Zero vector, 125

276

Linear Algebra Study Guide - EdUHKMTH 2032 Linear Algebra Study Guide Dr. Tony Yee Department of...

Documents

Transcript of Linear Algebra Study Guide - EdUHKMTH 2032 Linear Algebra Study Guide Dr. Tony Yee Department of...